[demos] Add benchmark example for llm demo
This commit is contained in:
		
							parent
							
								
									06fd64e74c
								
							
						
					
					
						commit
						48b9c077ed
					
				| @ -61,6 +61,33 @@ HF_DATASETS_CACHE=/root/cache \ | ||||
| 
 | ||||
| For both examples, more arguments info could refer to BigDL-LLM [chatglm2](https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/chatglm2). | ||||
| 
 | ||||
| ## LLM Inference Benchmark | ||||
| 
 | ||||
| Based on the [benchmark](https://github.com/intel-analytics/BigDL/tree/main/python/llm/dev/benchmark) demo from BigDL, a simple [benchmark](./benchmarks/) is provided to measure the performance of LLM inference both in host and in TEE. | ||||
| 
 | ||||
| Output will be like: | ||||
| ``` | ||||
| =========First token cost xx.xxxxs========= | ||||
| =========Last token cost average xx.xxxxs (xx tokens in all)========= | ||||
| ``` | ||||
| 
 | ||||
| The following **model_path** could be the path of chatglm2-6b or Qwen-7B-Chat. | ||||
| **OMP_NUM_THREADS** is used to set the number of threads for OpenMP. | ||||
| 
 | ||||
| ### Benchmark in Host | ||||
| ```bash | ||||
| OMP_NUM_THREADS=16 ./python-occlum/bin/python \ | ||||
|     ./benchmarks/bench.py  --repo-id-or-model-path <model_path> | ||||
| ``` | ||||
| 
 | ||||
| ### Benchmark in TEE | ||||
| ```bash | ||||
| cd occlum_instance | ||||
| OMP_NUM_THREADS=16 occlum run /bin/python3 \ | ||||
|     /benchmarks/bench.py --repo-id-or-model-path <model_path> | ||||
| ``` | ||||
| 
 | ||||
| By our benchmark result in Intel Ice Lake server, LLM inference performance within a TEE is approximately 30% less compared to on a host environment. | ||||
| 
 | ||||
| ## Do inference with webui | ||||
| 
 | ||||
|  | ||||
							
								
								
									
										22
									
								
								demos/bigdl-llm/benchmarks/bench.py
									
									
									
									
									
										Normal file
									
								
							
							
								
								
								
								
								
									
									
								
							
						
						
									
										22
									
								
								demos/bigdl-llm/benchmarks/bench.py
									
									
									
									
									
										Normal file
									
								
							| @ -0,0 +1,22 @@ | ||||
| import argparse | ||||
| import torch | ||||
| from bigdl.llm.transformers import AutoModel, AutoModelForCausalLM | ||||
| from transformers import AutoTokenizer | ||||
| from benchmark_util import BenchmarkWrapper | ||||
| 
 | ||||
| parser = argparse.ArgumentParser(description='Predict Tokens using `generate()` API for ChatGLM2 model') | ||||
| parser.add_argument('--repo-id-or-model-path', type=str, default="THUDM/chatglm2-6b", | ||||
|                     help='The huggingface repo id for the ChatGLM2 model to be downloaded' | ||||
|                             ', or the path to the huggingface checkpoint folder') | ||||
| 
 | ||||
| args = parser.parse_args() | ||||
| model_path = args.repo_id_or_model_path | ||||
| model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True, load_in_4bit=True) | ||||
| model = BenchmarkWrapper(model, do_print=True) | ||||
| tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True) | ||||
| prompt = "今天睡不着怎么办" | ||||
|   | ||||
| with torch.inference_mode(): | ||||
|     input_ids = tokenizer.encode(prompt, return_tensors="pt") | ||||
|     output = model.generate(input_ids, do_sample=False, max_new_tokens=512) | ||||
|     output_str = tokenizer.decode(output[0], skip_special_tokens=True) | ||||
							
								
								
									
										4741
									
								
								demos/bigdl-llm/benchmarks/benchmark_util.py
									
									
									
									
									
										Normal file
									
								
							
							
								
								
								
								
								
									
									
								
							
						
						
									
										4741
									
								
								demos/bigdl-llm/benchmarks/benchmark_util.py
									
									
									
									
									
										Normal file
									
								
							
										
											
												File diff suppressed because it is too large
												Load Diff
											
										
									
								
							| @ -12,3 +12,4 @@ $script_dir/miniconda/bin/conda create \ | ||||
| # Install BigDL LLM | ||||
| $script_dir/python-occlum/bin/pip install torch==2.1.0 --index-url https://download.pytorch.org/whl/cpu | ||||
| $script_dir/python-occlum/bin/pip install --pre --upgrade bigdl-llm[all] bigdl-llm[serving] | ||||
| $script_dir/python-occlum/bin/pip install transformers_stream_generator einops | ||||
|  | ||||
| @ -18,6 +18,7 @@ targets: | ||||
|     copy: | ||||
|       - dirs: | ||||
|           - ../chatglm2 | ||||
|           - ../benchmarks | ||||
|   - target: /opt/occlum/glibc/lib | ||||
|     copy: | ||||
|       - files: | ||||
|  | ||||
		Loading…
	
		Reference in New Issue
	
	Block a user