[demos] Add distributed pytorch demo

2022-12-02 18:24:25 +08:00 · 2022-12-02 18:24:25 +08:00 · 47bd1fd7af
commit 47bd1fd7af
parent a5cdcc8045
14 changed files with 455 additions and 8 deletions
--- a/.github/workflows/demo_test.yml
+++ b/.github/workflows/demo_test.yml
@ -276,11 +276,34 @@ jobs:
        build-envs: 'OCCLUM_RELEASE_BUILD=1'

    - name: Build python and pytorch
-      run: docker exec ${{ github.job }} bash -c "cd /root/occlum/demos/pytorch; ./install_python_with_conda.sh"
+      run: docker exec ${{ github.job }} bash -c "cd /root/occlum/demos/pytorch/standalone; ./install_python_with_conda.sh"

    - name: Run pytorch test
-      run: docker exec ${{ github.job }} bash -c "cd /root/occlum/demos/pytorch; SGX_MODE=SIM ./run_pytorch_on_occlum.sh"
+      run: docker exec ${{ github.job }} bash -c "cd /root/occlum/demos/pytorch/standalone; SGX_MODE=SIM ./run_pytorch_on_occlum.sh"

+  Distributed_Pytorch_test:
+    runs-on: ubuntu-20.04
+    steps:
+    - uses: actions/checkout@v1
+      with:
+        submodules: true
+
+    - uses: ./.github/workflows/composite_action/sim
+      with:
+        container-name: ${{ github.job }}
+        build-envs: 'OCCLUM_RELEASE_BUILD=1'
+
+    - name: Build python and pytorch
+      run: docker exec ${{ github.job }} bash -c "cd /root/occlum/demos/pytorch/distributed; ./install_python_with_conda.sh"
+
+    - name: Build pytorch Occlum instance
+      run: docker exec ${{ github.job }} bash -c "cd /root/occlum/demos/pytorch/distributed; SGX_MODE=SIM ./build_pytorch_occlum_instance.sh"
+
+    - name: Start pytorch Occlum instance node one
+      run: docker exec ${{ github.job }} bash -c "cd /root/occlum/demos/pytorch/distributed/occlum_instance; WORLD_SIZE=2 RANK=0 occlum run /bin/python3 mnist.py --epoch 3 --no-cuda --seed 42 --save-model &"
+
+    - name: Start pytorch Occlum instance node two
+      run: docker exec ${{ github.job }} bash -c "cd /root/occlum/demos/pytorch/distributed/occlum_instance_2; WORLD_SIZE=2 RANK=1 occlum run /bin/python3 mnist.py --epoch 3 --no-cuda --seed 42 --save-model"

  Tensorflow_test:
    runs-on: ubuntu-20.04
--- a/demos/README.md
+++ b/demos/README.md
@ -22,7 +22,7 @@ This set of demos shows how real-world apps can be easily run inside SGX enclave
 * [grpc](grpc/): A client and server communicating through [gRPC](https://grpc.io), containing [glibc-supported demo](grpc/grpc_glibc) and [musl-supported demo](grpc/grpc_musl).
 * [https_server](https_server/): A HTTPS file server based on [Mongoose Embedded Web Server Library](https://github.com/cesanta/mongoose).
 * [openvino](openvino/) A benchmark of [OpenVINO Inference Engine](https://docs.openvinotoolkit.org/2019_R3/_docs_IE_DG_inference_engine_intro.html).
-* [pytorch](pytorch/): A demo of [PyTorch](https://pytorch.org/).
+* [pytorch](pytorch/): Demos of standalone and distributed [PyTorch](https://pytorch.org/).
 * [redis](redis/): A demo of [Redis](https://redis.io).
 * [sofaboot](sofaboot/): A demo of [SOFABoot](https://github.com/sofastack/sofa-boot), an open source Java development framework based on Spring Boot.
 * [sqlite](sqlite/) A demo of [SQLite](https://www.sqlite.org) SQL database engine.
--- a/demos/pytorch/distributed/.gitignore
+++ b/demos/pytorch/distributed/.gitignore
--- a/demos/pytorch/distributed/README.md
+++ b/demos/pytorch/distributed/README.md
@ -0,0 +1,107 @@
+# Distributed PyTorch Demo
+
+This project demonstrates how Occlum enables _unmodified_ distributed [PyTorch](https://pytorch.org/) programs running in SGX enclaves, on the basis of _unmodified_ [Python](https://www.python.org).
+
+## Environment variables for Distributed PyTorch model
+There are a few environment variables that are related to distributed PyTorch training, which are:
+
+1. MASTER_ADDR
+2. MASTER_PORT
+3. WORLD_SIZE
+4. RANK
+
+`MASTER_ADDR` and `MASTER_PORT` specifies a rendezvous point where all the training processes will connect to.
+
+`WORLD_SIZE` specifies how many training processes will participate in the training.
+
+`RANK` is the unique identifier for each of the training process.
+
+The `MASTER_ADDR`, `MASTER_PORT` and `WORLD_SIZE` should be identical for all the participants while the `RANK` should be unique.
+
+**Note that in most cases PyTorch only use multi-threads. If you find a process fork, please set `num_workers=1` env.**
+
+### TLS related environment variables
+There is a environment variable called `GLOO_DEVICE_TRANSPORT` that can be used to specify the transport.
+
+The default value is set to TCP.  If TLS is required to satisfy the security requirement, then, please also set the following environment variables:
+
+1. GLOO_DEVICE_TRANSPORT=TCP_TLS
+2. GLOO_DEVICE_TRANSPORT_TCP_TLS_PKEY
+3. GLOO_DEVICE_TRANSPORT_TCP_TLS_CERT
+4. GLOO_DEVICE_TRANSPORT_TCP_TLS_CA_FILE
+
+These environments are set as below in our demo.
+```json
+  "env": {
+    "default": [
+      "GLOO_DEVICE_TRANSPORT=TCP_TLS",
+      "GLOO_DEVICE_TRANSPORT_TCP_TLS_PKEY=/ppml/certs/test.key",
+      "GLOO_DEVICE_TRANSPORT_TCP_TLS_CERT=/ppml/certs/test.crt",
+      "GLOO_DEVICE_TRANSPORT_TCP_TLS_CA_FILE=/ppml/certs/myCA.pem",
+```
+
+The CA files above are generated by openssl. Details please refer to the function **generate_ca_files** in the script [`build_pytorch_occlum_instance.sh`](./build_pytorch_occlum_instance.sh).
+
+## How to Run
+
+This tutorial is written under the assumption that you have Docker installed and use Occlum in a Docker container.
+
+Occlum is compatible with glibc-supported Python, we employ miniconda as python installation tool. You can import PyTorch packages using conda. Here, miniconda is automatically installed by install_python_with_conda.sh script, the required python and PyTorch packages for this project are also loaded by this script. Here, we take occlum/occlum:0.29.3-ubuntu20.04 as example.
+
+In the following example, we will try to run a distributed PyTorch training using `fasion-MNIST` dataset with 2 processes (Occlum instance).
+
+Thus, we set `WORLD_SIZE` to 2.
+
+Generally, `MASTER_ADDR` can be set to the IP address of the process with RANK 0. In our case, two processes are running in the same container, thus `MASTER_ADDR` can be simply set to `localhost`.
+
+Step 1 (on the host): Start an Occlum container
+```bash
+docker pull occlum/occlum:0.29.3-ubuntu20.04
+docker run -it --name=pythonDemo --device /dev/sgx/enclave occlum/occlum:0.29.3-ubuntu20.04 bash
+```
+
+Step 2 (in the Occlum container): Download miniconda and install python to prefix position.
+```bash
+cd /root/demos/pytorch/distributed
+bash ./install_python_with_conda.sh
+```
+
+Step 3 (in the Occlum container): Build the Distributed PyTorch Occlum instances
+```bash
+cd /root/demos/pytorch/distributed
+bash ./build_pytorch_occlum_instance.sh
+```
+
+Step 4 (in the Occlum container): Run node one PyTorch instance
+```bash
+cd /root/demos/pytorch/distributed/occlum_instance
+WORLD_SIZE=2 RANK=0 occlum run /bin/python3 mnist.py --epoch 3 --no-cuda --seed 42 --save-model
+```
+
+If successful, it will wait for the node two to join.
+```log
+Using distributed PyTorch with gloo backend
+```
+
+Step 5 (in the Occlum container): Run node two PyTorch instance
+```bash
+cd /root/demos/pytorch/distributed/occlum_instance
+WORLD_SIZE=2 RANK=1 occlum run /bin/python3 mnist.py --epoch 3 --no-cuda --seed 42 --save-model
+```
+
+If everything goes well, node one and two has similar logs as below.
+```log
+After downloading data
+2022-12-05T09:40:05Z INFO     Train Epoch: 1 [0/469 (0%)]       loss=2.3037
+2022-12-05T09:40:05Z INFO     Reducer buckets have been rebuilt in this iteration.
+2022-12-05T09:40:06Z INFO     Train Epoch: 1 [10/469 (2%)]      loss=2.3117
+2022-12-05T09:40:06Z INFO     Train Epoch: 1 [20/469 (4%)]      loss=2.2826
+2022-12-05T09:40:06Z INFO     Train Epoch: 1 [30/469 (6%)]      loss=2.2904
+2022-12-05T09:40:07Z INFO     Train Epoch: 1 [40/469 (9%)]      loss=2.2860
+2022-12-05T09:40:07Z INFO     Train Epoch: 1 [50/469 (11%)]     loss=2.2784
+2022-12-05T09:40:08Z INFO     Train Epoch: 1 [60/469 (13%)]     loss=2.2779
+2022-12-05T09:40:08Z INFO     Train Epoch: 1 [70/469 (15%)]     loss=2.2689
+2022-12-05T09:40:08Z INFO     Train Epoch: 1 [80/469 (17%)]     loss=2.2513
+2022-12-05T09:40:09Z INFO     Train Epoch: 1 [90/469 (19%)]     loss=2.2536
+...
+```
--- a/demos/pytorch/distributed/build_pytorch_occlum_instance.sh
+++ b/demos/pytorch/distributed/build_pytorch_occlum_instance.sh
@ -0,0 +1,55 @@
+#!/bin/bash
+set -e
+
+BLUE='\033[1;34m'
+NC='\033[0m'
+
+script_dir="$( cd "$( dirname "${BASH_SOURCE[0]}"  )" >/dev/null 2>&1 && pwd )"
+python_dir="$script_dir/occlum_instance/image/opt/python-occlum"
+
+
+function generate_ca_files()
+{
+    cn_name=${1:-"localhost"}
+    # Generate CA files
+    openssl req -x509 -nodes -days 1825 -newkey rsa:2048 -keyout myCA.key -out myCA.pem -subj "/CN=${cn_name}"
+    # Prepare test private key
+    openssl genrsa -out test.key 2048
+    # Use private key to generate a Certificate Sign Request
+    openssl req -new -key test.key -out test.csr -subj "/C=CN/ST=Shanghai/L=Shanghai/O=Ant/CN=${cn_name}"
+    # Use CA private key and CA file to sign test CSR
+    openssl x509 -req -in test.csr -CA myCA.pem -CAkey myCA.key -CAcreateserial -out test.crt -days 825 -sha256
+}
+
+function build_instance()
+{
+    rm -rf occlum_instance* && occlum new occlum_instance
+    pushd occlum_instance
+    rm -rf image
+    copy_bom -f ../pytorch.yaml --root image --include-dir /opt/occlum/etc/template
+
+    if [ ! -d $python_dir ];then
+        echo "Error: cannot stat '$python_dir' directory"
+        exit 1
+    fi
+
+    new_json="$(jq '.resource_limits.user_space_size = "4000MB" |
+                    .resource_limits.kernel_space_heap_size = "256MB" |
+                    .resource_limits.max_num_of_threads = 64 |
+                    .env.untrusted += [ "MASTER_ADDR", "MASTER_PORT", "WORLD_SIZE", "RANK", "TORCH_CPP_LOG_LEVEL" ] |
+                    .env.default += ["GLOO_DEVICE_TRANSPORT=TCP_TLS"] |
+                    .env.default += ["GLOO_DEVICE_TRANSPORT_TCP_TLS_PKEY=/ppml/certs/test.key"] |
+                    .env.default += ["GLOO_DEVICE_TRANSPORT_TCP_TLS_CERT=/ppml/certs/test.crt"] |
+                    .env.default += ["GLOO_DEVICE_TRANSPORT_TCP_TLS_CA_FILE=/ppml/certs/myCA.pem"] |
+                    .env.default += ["PYTHONHOME=/opt/python-occlum"] |
+                    .env.default += [ "MASTER_ADDR=127.0.0.1", "MASTER_PORT=29500" ] ' Occlum.json)" && \
+    echo "${new_json}" > Occlum.json
+    occlum build
+    popd
+}
+
+generate_ca_files
+build_instance
+
+# Test instance for 2 nodes distributed pytorch training
+cp -r occlum_instance occlum_instance_2
--- a/demos/pytorch/distributed/install_python_with_conda.sh
+++ b/demos/pytorch/distributed/install_python_with_conda.sh
@ -0,0 +1,10 @@
+#!/bin/bash
+set -e
+script_dir="$( cd "$( dirname "${BASH_SOURCE[0]}"  )" >/dev/null 2>&1 && pwd )"
+
+# Install python and dependencies to specified position
+[ -f Miniconda3-latest-Linux-x86_64.sh ] || wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
+[ -d miniconda ] || bash ./Miniconda3-latest-Linux-x86_64.sh -b -p $script_dir/miniconda
+$script_dir/miniconda/bin/conda create --prefix $script_dir/python-occlum -y \
+    python=3.8.10 numpy=1.21.5 scipy=1.7.3 scikit-learn=1.0 pandas=1.3 \
+    Cython pytorch torchvision -c pytorch
--- a/demos/pytorch/distributed/mnist.py
+++ b/demos/pytorch/distributed/mnist.py
@ -0,0 +1,210 @@
+from __future__ import print_function
+
+import argparse
+import logging
+import os
+import time
+
+from torchvision import datasets, transforms
+from torch.utils.data.distributed import DistributedSampler
+import torch
+import torch.distributed as dist
+import torch.nn as nn
+import torch.nn.functional as F
+import torch.optim as optim
+
+WORLD_SIZE = int(os.environ.get("WORLD_SIZE", 1))
+
+RANK = int(os.environ.get("RANK", 0))
+
+class Net(nn.Module):
+    def __init__(self):
+        super(Net, self).__init__()
+        self.conv1 = nn.Conv2d(1, 20, 5, 1)
+        self.conv2 = nn.Conv2d(20, 50, 5, 1)
+        self.fc1 = nn.Linear(4*4*50, 500)
+        self.fc2 = nn.Linear(500, 10)
+
+    def forward(self, x):
+        x = F.relu(self.conv1(x))
+        x = F.max_pool2d(x, 2, 2)
+        x = F.relu(self.conv2(x))
+        x = F.max_pool2d(x, 2, 2)
+        x = x.view(-1, 4*4*50)
+        x = F.relu(self.fc1(x))
+        x = self.fc2(x)
+        return F.log_softmax(x, dim=1)
+
+
+def train(args, model, device, train_loader, optimizer, epoch):
+    model.train()
+    for batch_idx, (data, target) in enumerate(train_loader):
+        data, target = data.to(device), target.to(device)
+        optimizer.zero_grad()
+        output = model(data)
+        loss = F.nll_loss(output, target)
+        loss.backward()
+        optimizer.step()
+        if batch_idx % args.log_interval == 0:
+            msg = "Train Epoch: {} [{}/{} ({:.0f}%)]\tloss={:.4f}".format(
+                epoch, batch_idx, len(train_loader),
+                100. * batch_idx / len(train_loader), loss.item())
+            logging.info(msg)
+            niter = epoch * len(train_loader) + batch_idx
+
+
+def test(args, model, device, test_loader, epoch):
+    model.eval()
+    test_loss = 0
+    correct = 0
+    with torch.no_grad():
+        for data, target in test_loader:
+            data, target = data.to(device), target.to(device)
+            output = model(data)
+            # sum up batch loss
+            test_loss += F.nll_loss(output, target, reduction="sum").item()
+            # get the index of the max log-probability
+            pred = output.max(1, keepdim=True)[1]
+            correct += pred.eq(target.view_as(pred)).sum().item()
+
+    test_loss /= len(test_loader.dataset)
+    logging.info("{{metricName: accuracy, metricValue: {:.4f}}};{{metricName: loss, metricValue: {:.4f}}}\n".format(
+        float(correct) / (len(test_loader.dataset) / WORLD_SIZE), test_loss))
+
+
+def should_distribute():
+    return dist.is_available() and WORLD_SIZE > 1
+
+
+def is_distributed():
+    return dist.is_available() and dist.is_initialized()
+
+
+def main():
+    # Training settings
+    parser = argparse.ArgumentParser(description="PyTorch MNIST Example")
+    parser.add_argument("--batch-size", type=int, default=64, metavar="N",
+                        help="input batch size for training (default: 64)")
+    parser.add_argument("--test-batch-size", type=int, default=1000, metavar="N",
+                        help="input batch size for testing (default: 1000)")
+    parser.add_argument("--epochs", type=int, default=10, metavar="N",
+                        help="number of epochs to train (default: 10)")
+    parser.add_argument("--lr", type=float, default=0.01, metavar="LR",
+                        help="learning rate (default: 0.01)")
+    parser.add_argument("--momentum", type=float, default=0.5, metavar="M",
+                        help="SGD momentum (default: 0.5)")
+    parser.add_argument("--no-cuda", action="store_true", default=False,
+                        help="disables CUDA training")
+    parser.add_argument("--seed", type=int, default=1, metavar="S",
+                        help="random seed (default: 1)")
+    parser.add_argument("--log-interval", type=int, default=10, metavar="N",
+                        help="how many batches to wait before logging training status")
+    parser.add_argument("--log-path", type=str, default="",
+                        help="Path to save logs. Print to StdOut if log-path is not set")
+    parser.add_argument("--save-model", action="store_true", default=False,
+                        help="For Saving the current Model")
+
+    if dist.is_available():
+        parser.add_argument("--backend", type=str, help="Distributed backend",
+                            choices=[dist.Backend.GLOO,
+                                     dist.Backend.NCCL, dist.Backend.MPI],
+                            default=dist.Backend.GLOO)
+    args = parser.parse_args()
+
+    # Use this format (%Y-%m-%dT%H:%M:%SZ) to record timestamp of the metrics.
+    # If log_path is empty print log to StdOut, otherwise print log to the file.
+    if args.log_path == "":
+        logging.basicConfig(
+            format="%(asctime)s %(levelname)-8s %(message)s",
+            datefmt="%Y-%m-%dT%H:%M:%SZ",
+            level=logging.DEBUG)
+    else:
+        logging.basicConfig(
+            format="%(asctime)s %(levelname)-8s %(message)s",
+            datefmt="%Y-%m-%dT%H:%M:%SZ",
+            level=logging.DEBUG,
+            filename=args.log_path)
+
+    use_cuda = not args.no_cuda and torch.cuda.is_available()
+    if use_cuda:
+        print("Using CUDA")
+
+    torch.manual_seed(args.seed)
+
+    device = torch.device("cuda" if use_cuda else "cpu")
+
+    if should_distribute():
+        print("Using distributed PyTorch with {} backend".format(
+            args.backend), flush=True)
+        dist.init_process_group(backend=args.backend)
+
+    kwargs = {"num_workers": 1, "pin_memory": True} if use_cuda else {}
+
+    print("Before downloading data", flush=True)
+    train_data = datasets.FashionMNIST("./data",
+                            train=True,
+                            download=True,
+                            transform=transforms.Compose([
+                            transforms.ToTensor()
+                            ]))
+
+
+    test_data = datasets.FashionMNIST("./data",
+                            train=True,
+                            download=True,
+                            transform=transforms.Compose([
+                            transforms.ToTensor()
+                            ]))
+    if is_distributed():
+        train_sampler = DistributedSampler(train_data, num_replicas=WORLD_SIZE, rank=RANK, shuffle=True, drop_last=False, seed=args.seed)
+        test_sampler = DistributedSampler(test_data, num_replicas=WORLD_SIZE, rank=RANK, shuffle=True, drop_last=False, seed=args.seed)
+        train_loader = torch.utils.data.DataLoader(train_data, batch_size=args.batch_size,sampler=train_sampler, **kwargs)
+        test_loader = torch.utils.data.DataLoader(test_data, batch_size=args.test_batch_size, shuffle=False, **kwargs)
+    else:
+        train_loader = torch.utils.data.DataLoader(
+            train_data,
+            batch_size=args.batch_size, shuffle=True, **kwargs)
+        test_loader = torch.utils.data.DataLoader(test_data,
+        batch_size=args.test_batch_size, shuffle=False, **kwargs)
+
+    print("After downloading data", flush=True)
+
+    test_loader = torch.utils.data.DataLoader(
+        datasets.FashionMNIST("./data",
+                              train=False,
+                              transform=transforms.Compose([
+                                  transforms.ToTensor()
+                              ])),
+        batch_size=args.test_batch_size, shuffle=False, **kwargs)
+
+    model = Net().to(device)
+
+    if is_distributed():
+        Distributor = nn.parallel.DistributedDataParallel
+        model = Distributor(model)
+
+    optimizer = optim.SGD(model.parameters(), lr=args.lr,
+                          momentum=args.momentum)
+
+
+    start = time.perf_counter()
+    cpu_start = time.process_time()
+
+    for epoch in range(1, args.epochs + 1):
+        train(args, model, device, train_loader, optimizer, epoch)
+        test(args, model, device, test_loader, epoch)
+
+    cpu_end = time.process_time()
+    end = time.perf_counter()
+    print("CPU Elapsed time:", cpu_end - cpu_start)
+    print("Elapsed time:", end - start)
+
+    if (args.save_model):
+        torch.save(model.state_dict(), "mnist_cnn.pt")
+
+    if is_distributed():
+        dist.destroy_process_group()
+
+
+if __name__ == "__main__":
+    main()
--- a/demos/pytorch/distributed/pytorch.yaml
+++ b/demos/pytorch/distributed/pytorch.yaml
@ -0,0 +1,39 @@
+includes:
+  - base.yaml
+targets:
+  - target: /bin
+    createlinks:
+      - src: /opt/python-occlum/bin/python3
+        linkname: python3
+    copy:
+      - files:
+          - /opt/occlum/toolchains/busybox/glibc/busybox
+  # python packages
+  - target: /opt
+    copy: 
+      - dirs:
+          - ../python-occlum
+  # python code
+  - target: /
+    copy:
+      - files: 
+          - ../mnist.py
+  - target: /opt/occlum/glibc/lib
+    copy:
+      - files:
+          - /lib/x86_64-linux-gnu/libnss_dns.so.2
+          - /lib/x86_64-linux-gnu/libnss_files.so.2
+  # etc files
+  - target: /etc
+    copy:
+      - dirs:
+          - /etc/ssl
+      - files:
+          - /etc/nsswitch.conf
+  # CA files
+  - target: /ppml/certs/
+    copy:
+      - files:
+          - ../myCA.pem
+          - ../test.key
+          - ../test.crt
--- a/demos/pytorch/standalone/.gitignore
+++ b/demos/pytorch/standalone/.gitignore
@ -0,0 +1,3 @@
+occlum_instance/
+miniconda/
+Miniconda3*
--- a/demos/pytorch/standalone/README.md
+++ b/demos/pytorch/standalone/README.md
@ -10,22 +10,22 @@ Use the nn package to define our model as a sequence of layers. nn.Sequential is

 This tutorial is written under the assumption that you have Docker installed and use Occlum in a Docker container.

-Occlum is compatible with glibc-supported Python, we employ miniconda as python installation tool. You can import PyTorch packages using conda. Here, miniconda is automatically installed by install_python_with_conda.sh script, the required python and PyTorch packages for this project are also loaded by this script. Here, we take occlum/occlum:0.23.0-ubuntu18.04 as example.
+Occlum is compatible with glibc-supported Python, we employ miniconda as python installation tool. You can import PyTorch packages using conda. Here, miniconda is automatically installed by install_python_with_conda.sh script, the required python and PyTorch packages for this project are also loaded by this script. Here, we take occlum/occlum:0.29.3-ubuntu20.04 as example.

 Step 1 (on the host): Start an Occlum container
 ```
-docker pull occlum/occlum:0.23.0-ubuntu18.04
-docker run -it --name=pythonDemo --device /dev/sgx/enclave occlum/occlum:0.23.0-ubuntu18.04 bash
+docker pull occlum/occlum:0.29.3-ubuntu20.04
+docker run -it --name=pythonDemo --device /dev/sgx/enclave occlum/occlum:0.29.3-ubuntu20.04 bash
 ```

 Step 2 (in the Occlum container): Download miniconda and install python to prefix position.
 ```
-cd /root/demos/pytorch
+cd /root/demos/pytorch/standalone
 bash ./install_python_with_conda.sh
 ```

 Step 3 (in the Occlum container): Run the sample code on Occlum
 ```
-cd /root/demos/pytorch
+cd /root/demos/standalone/pytorch
 bash ./run_pytorch_on_occlum.sh
 ```
--- a/demos/pytorch/standalone/demo.py
+++ b/demos/pytorch/standalone/demo.py
--- a/demos/pytorch/standalone/install_python_with_conda.sh
+++ b/demos/pytorch/standalone/install_python_with_conda.sh
--- a/demos/pytorch/standalone/pytorch.yaml
+++ b/demos/pytorch/standalone/pytorch.yaml
--- a/demos/pytorch/standalone/run_pytorch_on_occlum.sh
+++ b/demos/pytorch/standalone/run_pytorch_on_occlum.sh