LLM Speed Benchmark (LLMSB)

December 15, 2023

LLM Speed Benchmark (LLMSB) is a fully open-source benchmarking tool for assessing Large Language Models’ (LLMs) performance across different hardware platforms. Its ultimate goal is to compile a comprehensive dataset detailing LLM models’ performance on various systems, enabling users to more effectively choose the right LLM model(s) for their projects.

This tool was built by Mehmet Yilmaz during his time as a part-time Engineering Intern for Anarchy (YC W23). It uses HuggingFace’s transformers library for loading and running an LLM model. When ran, it gathers the following metrics for a model:

total runtime
tokens per second
general hardware specs
CPU usage: current frequency & cores’ usage percentage over time
RAM usage: RAM & Swap overtime
GPU usage: load, memory usage, & temperature overtime

You can see an example output of the benchmark’s run HERE for the codellama-13b-oasst-sft-v10 model running on a H100. I personally own an Nvidia RTX 2070 Ti, which has 8 GB of VRAM. Sadly, for most modern LLM models, 8 GB of VRAM is not enough to interface with a model. Due to this, it’s highly recommended that anyone running these models should use cloud services like RunPod to “rent” GPU(s) to run these benchmarks unless you have access to powerful GPU(s).

Learn more about the LLM Speed Benchmark (LLMSB) and view the code on GitHub. If that link doesn’t work, you can access the code via Mehmet’s fork on GitHub HERE.