diff --git a/demos/pytorch/distributed/README.md b/demos/pytorch/distributed/README.md
index 2ca35909..7d4abb96 100644
--- a/demos/pytorch/distributed/README.md
+++ b/demos/pytorch/distributed/README.md
@@ -9,6 +9,7 @@ There are a few environment variables that are related to distributed PyTorch tr
 2. MASTER_PORT
 3. WORLD_SIZE
 4. RANK
+5. OMP_NUM_THREADS
 
 `MASTER_ADDR` and `MASTER_PORT` specifies a rendezvous point where all the training processes will connect to.
 
@@ -18,6 +19,8 @@ There are a few environment variables that are related to distributed PyTorch tr
 
 The `MASTER_ADDR`, `MASTER_PORT` and `WORLD_SIZE` should be identical for all the participants while the `RANK` should be unique.
 
+`OMP_NUM_THREADS` generally can be set to the number of physical CPU core numbers. But in Occlum, the more `OMP_NUM_THREADS` is, the more TCS and memory are required.
+
 **Note that in most cases PyTorch only use multi-threads. If you find a process fork, please set `num_workers=1` env.**
 
 ### TLS related environment variables
@@ -75,7 +78,7 @@ bash ./build_pytorch_occlum_instance.sh
 Step 4 (in the Occlum container): Run node one PyTorch instance
 ```bash
 cd /root/demos/pytorch/distributed/occlum_instance
-WORLD_SIZE=2 RANK=0 occlum run /bin/python3 mnist.py --epoch 3 --no-cuda --seed 42 --save-model
+WORLD_SIZE=2 RANK=0 OMP_NUM_THREADS=16 occlum run /bin/python3 mnist.py --epoch 3 --no-cuda --seed 42 --save-model
 ```
 
 If successful, it will wait for the node two to join.
@@ -86,7 +89,7 @@ Using distributed PyTorch with gloo backend
 Step 5 (in the Occlum container): Run node two PyTorch instance
 ```bash
 cd /root/demos/pytorch/distributed/occlum_instance
-WORLD_SIZE=2 RANK=1 occlum run /bin/python3 mnist.py --epoch 3 --no-cuda --seed 42 --save-model
+WORLD_SIZE=2 RANK=1 OMP_NUM_THREADS=16 occlum run /bin/python3 mnist.py --epoch 3 --no-cuda --seed 42 --save-model
 ```
 
 If everything goes well, node one and two has similar logs as below.
diff --git a/demos/pytorch/distributed/build_pytorch_occlum_instance.sh b/demos/pytorch/distributed/build_pytorch_occlum_instance.sh
index caed2684..b78427b2 100755
--- a/demos/pytorch/distributed/build_pytorch_occlum_instance.sh
+++ b/demos/pytorch/distributed/build_pytorch_occlum_instance.sh
@@ -36,7 +36,7 @@ function build_instance()
     new_json="$(jq '.resource_limits.user_space_size = "4000MB" |
                     .resource_limits.kernel_space_heap_size = "256MB" |
                     .resource_limits.max_num_of_threads = 64 |
-                    .env.untrusted += [ "MASTER_ADDR", "MASTER_PORT", "WORLD_SIZE", "RANK", "TORCH_CPP_LOG_LEVEL" ] |
+                    .env.untrusted += [ "MASTER_ADDR", "MASTER_PORT", "WORLD_SIZE", "RANK", "OMP_NUM_THREADS", "HOME" ] |
                     .env.default += ["GLOO_DEVICE_TRANSPORT=TCP_TLS"] |
                     .env.default += ["GLOO_DEVICE_TRANSPORT_TCP_TLS_PKEY=/ppml/certs/test.key"] |
                     .env.default += ["GLOO_DEVICE_TRANSPORT_TCP_TLS_CERT=/ppml/certs/test.crt"] |