[demos] Add benchmark example for llm demo
This commit is contained in:
parent
06fd64e74c
commit
48b9c077ed
@ -61,6 +61,33 @@ HF_DATASETS_CACHE=/root/cache \
|
|||||||
|
|
||||||
For both examples, more arguments info could refer to BigDL-LLM [chatglm2](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/chatglm2).
|
For both examples, more arguments info could refer to BigDL-LLM [chatglm2](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/chatglm2).
|
||||||
|
|
||||||
|
## LLM Inference Benchmark
|
||||||
|
|
||||||
|
Based on the [benchmark](https://github.com/intel-analytics/BigDL/tree/main/python/llm/dev/benchmark) demo from BigDL, a simple [benchmark](./benchmarks/) is provided to measure the performance of LLM inference both in host and in TEE.
|
||||||
|
|
||||||
|
Output will be like:
|
||||||
|
```
|
||||||
|
=========First token cost xx.xxxxs=========
|
||||||
|
=========Last token cost average xx.xxxxs (xx tokens in all)=========
|
||||||
|
```
|
||||||
|
|
||||||
|
The following **model_path** could be the path of chatglm2-6b or Qwen-7B-Chat.
|
||||||
|
**OMP_NUM_THREADS** is used to set the number of threads for OpenMP.
|
||||||
|
|
||||||
|
### Benchmark in Host
|
||||||
|
```bash
|
||||||
|
OMP_NUM_THREADS=16 ./python-occlum/bin/python \
|
||||||
|
./benchmarks/bench.py --repo-id-or-model-path <model_path>
|
||||||
|
```
|
||||||
|
|
||||||
|
### Benchmark in TEE
|
||||||
|
```bash
|
||||||
|
cd occlum_instance
|
||||||
|
OMP_NUM_THREADS=16 occlum run /bin/python3 \
|
||||||
|
/benchmarks/bench.py --repo-id-or-model-path <model_path>
|
||||||
|
```
|
||||||
|
|
||||||
|
By our benchmark result in Intel Ice Lake server, LLM inference performance within a TEE is approximately 30% less compared to on a host environment.
|
||||||
|
|
||||||
## Do inference with webui
|
## Do inference with webui
|
||||||
|
|
||||||
|
22
demos/bigdl-llm/benchmarks/bench.py
Normal file
22
demos/bigdl-llm/benchmarks/bench.py
Normal file
@ -0,0 +1,22 @@
|
|||||||
|
import argparse
|
||||||
|
import torch
|
||||||
|
from bigdl.llm.transformers import AutoModel, AutoModelForCausalLM
|
||||||
|
from transformers import AutoTokenizer
|
||||||
|
from benchmark_util import BenchmarkWrapper
|
||||||
|
|
||||||
|
parser = argparse.ArgumentParser(description='Predict Tokens using `generate()` API for ChatGLM2 model')
|
||||||
|
parser.add_argument('--repo-id-or-model-path', type=str, default="THUDM/chatglm2-6b",
|
||||||
|
help='The huggingface repo id for the ChatGLM2 model to be downloaded'
|
||||||
|
', or the path to the huggingface checkpoint folder')
|
||||||
|
|
||||||
|
args = parser.parse_args()
|
||||||
|
model_path = args.repo_id_or_model_path
|
||||||
|
model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True, load_in_4bit=True)
|
||||||
|
model = BenchmarkWrapper(model, do_print=True)
|
||||||
|
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
|
||||||
|
prompt = "今天睡不着怎么办"
|
||||||
|
|
||||||
|
with torch.inference_mode():
|
||||||
|
input_ids = tokenizer.encode(prompt, return_tensors="pt")
|
||||||
|
output = model.generate(input_ids, do_sample=False, max_new_tokens=512)
|
||||||
|
output_str = tokenizer.decode(output[0], skip_special_tokens=True)
|
4741
demos/bigdl-llm/benchmarks/benchmark_util.py
Normal file
4741
demos/bigdl-llm/benchmarks/benchmark_util.py
Normal file
File diff suppressed because it is too large
Load Diff
@ -12,3 +12,4 @@ $script_dir/miniconda/bin/conda create \
|
|||||||
# Install BigDL LLM
|
# Install BigDL LLM
|
||||||
$script_dir/python-occlum/bin/pip install torch==2.1.0 --index-url https://download.pytorch.org/whl/cpu
|
$script_dir/python-occlum/bin/pip install torch==2.1.0 --index-url https://download.pytorch.org/whl/cpu
|
||||||
$script_dir/python-occlum/bin/pip install --pre --upgrade bigdl-llm[all] bigdl-llm[serving]
|
$script_dir/python-occlum/bin/pip install --pre --upgrade bigdl-llm[all] bigdl-llm[serving]
|
||||||
|
$script_dir/python-occlum/bin/pip install transformers_stream_generator einops
|
||||||
|
@ -18,6 +18,7 @@ targets:
|
|||||||
copy:
|
copy:
|
||||||
- dirs:
|
- dirs:
|
||||||
- ../chatglm2
|
- ../chatglm2
|
||||||
|
- ../benchmarks
|
||||||
- target: /opt/occlum/glibc/lib
|
- target: /opt/occlum/glibc/lib
|
||||||
copy:
|
copy:
|
||||||
- files:
|
- files:
|
||||||
|
Loading…
Reference in New Issue
Block a user