diff --git a/docs/readthedocs/docs/source/images/occlum-llm.png b/docs/readthedocs/docs/source/images/occlum-llm.png new file mode 100644 index 00000000..f7a7c29d Binary files /dev/null and b/docs/readthedocs/docs/source/images/occlum-llm.png differ diff --git a/docs/readthedocs/docs/source/index.rst b/docs/readthedocs/docs/source/index.rst index 8531865f..e1d430de 100644 --- a/docs/readthedocs/docs/source/index.rst +++ b/docs/readthedocs/docs/source/index.rst @@ -30,6 +30,7 @@ Table of Contents tutorials/gen_occlum_instance.md tutorials/distributed_pytorch.md tutorials/occlum_ppml.md + tutorials/LLM_inference.md .. toctree:: :maxdepth: 2 diff --git a/docs/readthedocs/docs/source/tutorials/LLM_inference.md b/docs/readthedocs/docs/source/tutorials/LLM_inference.md new file mode 100644 index 00000000..a769882d --- /dev/null +++ b/docs/readthedocs/docs/source/tutorials/LLM_inference.md @@ -0,0 +1,18 @@ +# LLM Inference in TEE + +LLM ( Large Language Model) inference in TEE can protect the model, input prompt or output. The key challenges are: + +1. the performance of LLM inference in TEE (CPU) +2. can LLM inference run in TEE? + +With the significant LLM inference speed-up brought by [BigDL-LLM](https://github.com/intel-analytics/BigDL/tree/main/python/llm), and the Occlum LibOS, now high-performance and efficient LLM inference in TEE could be realized. + +## Overview + +![LLM inference](../images/occlum-llm.png) + +Above is the overview chart and flow description. + +For step 3, users could use the Occlum [init-ra AECS](https://occlum.readthedocs.io/en/latest/remote_attestation.html#init-ra-solution) solution which has no invasion to the application. + +More details please refer to [LLM demo](https://github.com/occlum/occlum/tree/master/demos/bigdl-llm).