diff --git a/.github/workflows/demo_test.yml b/.github/workflows/demo_test.yml index 5d1a4e96..d375669a 100644 --- a/.github/workflows/demo_test.yml +++ b/.github/workflows/demo_test.yml @@ -276,11 +276,34 @@ jobs: build-envs: 'OCCLUM_RELEASE_BUILD=1' - name: Build python and pytorch - run: docker exec ${{ github.job }} bash -c "cd /root/occlum/demos/pytorch; ./install_python_with_conda.sh" + run: docker exec ${{ github.job }} bash -c "cd /root/occlum/demos/pytorch/standalone; ./install_python_with_conda.sh" - name: Run pytorch test - run: docker exec ${{ github.job }} bash -c "cd /root/occlum/demos/pytorch; SGX_MODE=SIM ./run_pytorch_on_occlum.sh" + run: docker exec ${{ github.job }} bash -c "cd /root/occlum/demos/pytorch/standalone; SGX_MODE=SIM ./run_pytorch_on_occlum.sh" + Distributed_Pytorch_test: + runs-on: ubuntu-20.04 + steps: + - uses: actions/checkout@v1 + with: + submodules: true + + - uses: ./.github/workflows/composite_action/sim + with: + container-name: ${{ github.job }} + build-envs: 'OCCLUM_RELEASE_BUILD=1' + + - name: Build python and pytorch + run: docker exec ${{ github.job }} bash -c "cd /root/occlum/demos/pytorch/distributed; ./install_python_with_conda.sh" + + - name: Build pytorch Occlum instance + run: docker exec ${{ github.job }} bash -c "cd /root/occlum/demos/pytorch/distributed; SGX_MODE=SIM ./build_pytorch_occlum_instance.sh" + + - name: Start pytorch Occlum instance node one + run: docker exec ${{ github.job }} bash -c "cd /root/occlum/demos/pytorch/distributed/occlum_instance; WORLD_SIZE=2 RANK=0 occlum run /bin/python3 mnist.py --epoch 3 --no-cuda --seed 42 --save-model &" + + - name: Start pytorch Occlum instance node two + run: docker exec ${{ github.job }} bash -c "cd /root/occlum/demos/pytorch/distributed/occlum_instance_2; WORLD_SIZE=2 RANK=1 occlum run /bin/python3 mnist.py --epoch 3 --no-cuda --seed 42 --save-model" Tensorflow_test: runs-on: ubuntu-20.04 diff --git a/demos/README.md b/demos/README.md index 9c1e4671..76f240e6 100644 --- a/demos/README.md +++ b/demos/README.md @@ -22,7 +22,7 @@ This set of demos shows how real-world apps can be easily run inside SGX enclave * [grpc](grpc/): A client and server communicating through [gRPC](https://grpc.io), containing [glibc-supported demo](grpc/grpc_glibc) and [musl-supported demo](grpc/grpc_musl). * [https_server](https_server/): A HTTPS file server based on [Mongoose Embedded Web Server Library](https://github.com/cesanta/mongoose). * [openvino](openvino/) A benchmark of [OpenVINO Inference Engine](https://docs.openvinotoolkit.org/2019_R3/_docs_IE_DG_inference_engine_intro.html). -* [pytorch](pytorch/): A demo of [PyTorch](https://pytorch.org/). +* [pytorch](pytorch/): Demos of standalone and distributed [PyTorch](https://pytorch.org/). * [redis](redis/): A demo of [Redis](https://redis.io). * [sofaboot](sofaboot/): A demo of [SOFABoot](https://github.com/sofastack/sofa-boot), an open source Java development framework based on Spring Boot. * [sqlite](sqlite/) A demo of [SQLite](https://www.sqlite.org) SQL database engine. diff --git a/demos/pytorch/.gitignore b/demos/pytorch/distributed/.gitignore similarity index 100% rename from demos/pytorch/.gitignore rename to demos/pytorch/distributed/.gitignore diff --git a/demos/pytorch/distributed/README.md b/demos/pytorch/distributed/README.md new file mode 100644 index 00000000..2ca35909 --- /dev/null +++ b/demos/pytorch/distributed/README.md @@ -0,0 +1,107 @@ +# Distributed PyTorch Demo + +This project demonstrates how Occlum enables _unmodified_ distributed [PyTorch](https://pytorch.org/) programs running in SGX enclaves, on the basis of _unmodified_ [Python](https://www.python.org). + +## Environment variables for Distributed PyTorch model +There are a few environment variables that are related to distributed PyTorch training, which are: + +1. MASTER_ADDR +2. MASTER_PORT +3. WORLD_SIZE +4. RANK + +`MASTER_ADDR` and `MASTER_PORT` specifies a rendezvous point where all the training processes will connect to. + +`WORLD_SIZE` specifies how many training processes will participate in the training. + +`RANK` is the unique identifier for each of the training process. + +The `MASTER_ADDR`, `MASTER_PORT` and `WORLD_SIZE` should be identical for all the participants while the `RANK` should be unique. + +**Note that in most cases PyTorch only use multi-threads. If you find a process fork, please set `num_workers=1` env.** + +### TLS related environment variables +There is a environment variable called `GLOO_DEVICE_TRANSPORT` that can be used to specify the transport. + +The default value is set to TCP. If TLS is required to satisfy the security requirement, then, please also set the following environment variables: + +1. GLOO_DEVICE_TRANSPORT=TCP_TLS +2. GLOO_DEVICE_TRANSPORT_TCP_TLS_PKEY +3. GLOO_DEVICE_TRANSPORT_TCP_TLS_CERT +4. GLOO_DEVICE_TRANSPORT_TCP_TLS_CA_FILE + +These environments are set as below in our demo. +```json + "env": { + "default": [ + "GLOO_DEVICE_TRANSPORT=TCP_TLS", + "GLOO_DEVICE_TRANSPORT_TCP_TLS_PKEY=/ppml/certs/test.key", + "GLOO_DEVICE_TRANSPORT_TCP_TLS_CERT=/ppml/certs/test.crt", + "GLOO_DEVICE_TRANSPORT_TCP_TLS_CA_FILE=/ppml/certs/myCA.pem", +``` + +The CA files above are generated by openssl. Details please refer to the function **generate_ca_files** in the script [`build_pytorch_occlum_instance.sh`](./build_pytorch_occlum_instance.sh). + +## How to Run + +This tutorial is written under the assumption that you have Docker installed and use Occlum in a Docker container. + +Occlum is compatible with glibc-supported Python, we employ miniconda as python installation tool. You can import PyTorch packages using conda. Here, miniconda is automatically installed by install_python_with_conda.sh script, the required python and PyTorch packages for this project are also loaded by this script. Here, we take occlum/occlum:0.29.3-ubuntu20.04 as example. + +In the following example, we will try to run a distributed PyTorch training using `fasion-MNIST` dataset with 2 processes (Occlum instance). + +Thus, we set `WORLD_SIZE` to 2. + +Generally, `MASTER_ADDR` can be set to the IP address of the process with RANK 0. In our case, two processes are running in the same container, thus `MASTER_ADDR` can be simply set to `localhost`. + +Step 1 (on the host): Start an Occlum container +```bash +docker pull occlum/occlum:0.29.3-ubuntu20.04 +docker run -it --name=pythonDemo --device /dev/sgx/enclave occlum/occlum:0.29.3-ubuntu20.04 bash +``` + +Step 2 (in the Occlum container): Download miniconda and install python to prefix position. +```bash +cd /root/demos/pytorch/distributed +bash ./install_python_with_conda.sh +``` + +Step 3 (in the Occlum container): Build the Distributed PyTorch Occlum instances +```bash +cd /root/demos/pytorch/distributed +bash ./build_pytorch_occlum_instance.sh +``` + +Step 4 (in the Occlum container): Run node one PyTorch instance +```bash +cd /root/demos/pytorch/distributed/occlum_instance +WORLD_SIZE=2 RANK=0 occlum run /bin/python3 mnist.py --epoch 3 --no-cuda --seed 42 --save-model +``` + +If successful, it will wait for the node two to join. +```log +Using distributed PyTorch with gloo backend +``` + +Step 5 (in the Occlum container): Run node two PyTorch instance +```bash +cd /root/demos/pytorch/distributed/occlum_instance +WORLD_SIZE=2 RANK=1 occlum run /bin/python3 mnist.py --epoch 3 --no-cuda --seed 42 --save-model +``` + +If everything goes well, node one and two has similar logs as below. +```log +After downloading data +2022-12-05T09:40:05Z INFO Train Epoch: 1 [0/469 (0%)] loss=2.3037 +2022-12-05T09:40:05Z INFO Reducer buckets have been rebuilt in this iteration. +2022-12-05T09:40:06Z INFO Train Epoch: 1 [10/469 (2%)] loss=2.3117 +2022-12-05T09:40:06Z INFO Train Epoch: 1 [20/469 (4%)] loss=2.2826 +2022-12-05T09:40:06Z INFO Train Epoch: 1 [30/469 (6%)] loss=2.2904 +2022-12-05T09:40:07Z INFO Train Epoch: 1 [40/469 (9%)] loss=2.2860 +2022-12-05T09:40:07Z INFO Train Epoch: 1 [50/469 (11%)] loss=2.2784 +2022-12-05T09:40:08Z INFO Train Epoch: 1 [60/469 (13%)] loss=2.2779 +2022-12-05T09:40:08Z INFO Train Epoch: 1 [70/469 (15%)] loss=2.2689 +2022-12-05T09:40:08Z INFO Train Epoch: 1 [80/469 (17%)] loss=2.2513 +2022-12-05T09:40:09Z INFO Train Epoch: 1 [90/469 (19%)] loss=2.2536 +... +``` diff --git a/demos/pytorch/distributed/build_pytorch_occlum_instance.sh b/demos/pytorch/distributed/build_pytorch_occlum_instance.sh new file mode 100755 index 00000000..caed2684 --- /dev/null +++ b/demos/pytorch/distributed/build_pytorch_occlum_instance.sh @@ -0,0 +1,55 @@ +#!/bin/bash +set -e + +BLUE='\033[1;34m' +NC='\033[0m' + +script_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )" +python_dir="$script_dir/occlum_instance/image/opt/python-occlum" + + +function generate_ca_files() +{ + cn_name=${1:-"localhost"} + # Generate CA files + openssl req -x509 -nodes -days 1825 -newkey rsa:2048 -keyout myCA.key -out myCA.pem -subj "/CN=${cn_name}" + # Prepare test private key + openssl genrsa -out test.key 2048 + # Use private key to generate a Certificate Sign Request + openssl req -new -key test.key -out test.csr -subj "/C=CN/ST=Shanghai/L=Shanghai/O=Ant/CN=${cn_name}" + # Use CA private key and CA file to sign test CSR + openssl x509 -req -in test.csr -CA myCA.pem -CAkey myCA.key -CAcreateserial -out test.crt -days 825 -sha256 +} + +function build_instance() +{ + rm -rf occlum_instance* && occlum new occlum_instance + pushd occlum_instance + rm -rf image + copy_bom -f ../pytorch.yaml --root image --include-dir /opt/occlum/etc/template + + if [ ! -d $python_dir ];then + echo "Error: cannot stat '$python_dir' directory" + exit 1 + fi + + new_json="$(jq '.resource_limits.user_space_size = "4000MB" | + .resource_limits.kernel_space_heap_size = "256MB" | + .resource_limits.max_num_of_threads = 64 | + .env.untrusted += [ "MASTER_ADDR", "MASTER_PORT", "WORLD_SIZE", "RANK", "TORCH_CPP_LOG_LEVEL" ] | + .env.default += ["GLOO_DEVICE_TRANSPORT=TCP_TLS"] | + .env.default += ["GLOO_DEVICE_TRANSPORT_TCP_TLS_PKEY=/ppml/certs/test.key"] | + .env.default += ["GLOO_DEVICE_TRANSPORT_TCP_TLS_CERT=/ppml/certs/test.crt"] | + .env.default += ["GLOO_DEVICE_TRANSPORT_TCP_TLS_CA_FILE=/ppml/certs/myCA.pem"] | + .env.default += ["PYTHONHOME=/opt/python-occlum"] | + .env.default += [ "MASTER_ADDR=127.0.0.1", "MASTER_PORT=29500" ] ' Occlum.json)" && \ + echo "${new_json}" > Occlum.json + occlum build + popd +} + +generate_ca_files +build_instance + +# Test instance for 2 nodes distributed pytorch training +cp -r occlum_instance occlum_instance_2 diff --git a/demos/pytorch/distributed/install_python_with_conda.sh b/demos/pytorch/distributed/install_python_with_conda.sh new file mode 100755 index 00000000..1e910e54 --- /dev/null +++ b/demos/pytorch/distributed/install_python_with_conda.sh @@ -0,0 +1,10 @@ +#!/bin/bash +set -e +script_dir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )" + +# Install python and dependencies to specified position +[ -f Miniconda3-latest-Linux-x86_64.sh ] || wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh +[ -d miniconda ] || bash ./Miniconda3-latest-Linux-x86_64.sh -b -p $script_dir/miniconda +$script_dir/miniconda/bin/conda create --prefix $script_dir/python-occlum -y \ + python=3.8.10 numpy=1.21.5 scipy=1.7.3 scikit-learn=1.0 pandas=1.3 \ + Cython pytorch torchvision -c pytorch \ No newline at end of file diff --git a/demos/pytorch/distributed/mnist.py b/demos/pytorch/distributed/mnist.py new file mode 100644 index 00000000..30a543f0 --- /dev/null +++ b/demos/pytorch/distributed/mnist.py @@ -0,0 +1,210 @@ +from __future__ import print_function + +import argparse +import logging +import os +import time + +from torchvision import datasets, transforms +from torch.utils.data.distributed import DistributedSampler +import torch +import torch.distributed as dist +import torch.nn as nn +import torch.nn.functional as F +import torch.optim as optim + +WORLD_SIZE = int(os.environ.get("WORLD_SIZE", 1)) + +RANK = int(os.environ.get("RANK", 0)) + +class Net(nn.Module): + def __init__(self): + super(Net, self).__init__() + self.conv1 = nn.Conv2d(1, 20, 5, 1) + self.conv2 = nn.Conv2d(20, 50, 5, 1) + self.fc1 = nn.Linear(4*4*50, 500) + self.fc2 = nn.Linear(500, 10) + + def forward(self, x): + x = F.relu(self.conv1(x)) + x = F.max_pool2d(x, 2, 2) + x = F.relu(self.conv2(x)) + x = F.max_pool2d(x, 2, 2) + x = x.view(-1, 4*4*50) + x = F.relu(self.fc1(x)) + x = self.fc2(x) + return F.log_softmax(x, dim=1) + + +def train(args, model, device, train_loader, optimizer, epoch): + model.train() + for batch_idx, (data, target) in enumerate(train_loader): + data, target = data.to(device), target.to(device) + optimizer.zero_grad() + output = model(data) + loss = F.nll_loss(output, target) + loss.backward() + optimizer.step() + if batch_idx % args.log_interval == 0: + msg = "Train Epoch: {} [{}/{} ({:.0f}%)]\tloss={:.4f}".format( + epoch, batch_idx, len(train_loader), + 100. * batch_idx / len(train_loader), loss.item()) + logging.info(msg) + niter = epoch * len(train_loader) + batch_idx + + +def test(args, model, device, test_loader, epoch): + model.eval() + test_loss = 0 + correct = 0 + with torch.no_grad(): + for data, target in test_loader: + data, target = data.to(device), target.to(device) + output = model(data) + # sum up batch loss + test_loss += F.nll_loss(output, target, reduction="sum").item() + # get the index of the max log-probability + pred = output.max(1, keepdim=True)[1] + correct += pred.eq(target.view_as(pred)).sum().item() + + test_loss /= len(test_loader.dataset) + logging.info("{{metricName: accuracy, metricValue: {:.4f}}};{{metricName: loss, metricValue: {:.4f}}}\n".format( + float(correct) / (len(test_loader.dataset) / WORLD_SIZE), test_loss)) + + +def should_distribute(): + return dist.is_available() and WORLD_SIZE > 1 + + +def is_distributed(): + return dist.is_available() and dist.is_initialized() + + +def main(): + # Training settings + parser = argparse.ArgumentParser(description="PyTorch MNIST Example") + parser.add_argument("--batch-size", type=int, default=64, metavar="N", + help="input batch size for training (default: 64)") + parser.add_argument("--test-batch-size", type=int, default=1000, metavar="N", + help="input batch size for testing (default: 1000)") + parser.add_argument("--epochs", type=int, default=10, metavar="N", + help="number of epochs to train (default: 10)") + parser.add_argument("--lr", type=float, default=0.01, metavar="LR", + help="learning rate (default: 0.01)") + parser.add_argument("--momentum", type=float, default=0.5, metavar="M", + help="SGD momentum (default: 0.5)") + parser.add_argument("--no-cuda", action="store_true", default=False, + help="disables CUDA training") + parser.add_argument("--seed", type=int, default=1, metavar="S", + help="random seed (default: 1)") + parser.add_argument("--log-interval", type=int, default=10, metavar="N", + help="how many batches to wait before logging training status") + parser.add_argument("--log-path", type=str, default="", + help="Path to save logs. Print to StdOut if log-path is not set") + parser.add_argument("--save-model", action="store_true", default=False, + help="For Saving the current Model") + + if dist.is_available(): + parser.add_argument("--backend", type=str, help="Distributed backend", + choices=[dist.Backend.GLOO, + dist.Backend.NCCL, dist.Backend.MPI], + default=dist.Backend.GLOO) + args = parser.parse_args() + + # Use this format (%Y-%m-%dT%H:%M:%SZ) to record timestamp of the metrics. + # If log_path is empty print log to StdOut, otherwise print log to the file. + if args.log_path == "": + logging.basicConfig( + format="%(asctime)s %(levelname)-8s %(message)s", + datefmt="%Y-%m-%dT%H:%M:%SZ", + level=logging.DEBUG) + else: + logging.basicConfig( + format="%(asctime)s %(levelname)-8s %(message)s", + datefmt="%Y-%m-%dT%H:%M:%SZ", + level=logging.DEBUG, + filename=args.log_path) + + use_cuda = not args.no_cuda and torch.cuda.is_available() + if use_cuda: + print("Using CUDA") + + torch.manual_seed(args.seed) + + device = torch.device("cuda" if use_cuda else "cpu") + + if should_distribute(): + print("Using distributed PyTorch with {} backend".format( + args.backend), flush=True) + dist.init_process_group(backend=args.backend) + + kwargs = {"num_workers": 1, "pin_memory": True} if use_cuda else {} + + print("Before downloading data", flush=True) + train_data = datasets.FashionMNIST("./data", + train=True, + download=True, + transform=transforms.Compose([ + transforms.ToTensor() + ])) + + + test_data = datasets.FashionMNIST("./data", + train=True, + download=True, + transform=transforms.Compose([ + transforms.ToTensor() + ])) + if is_distributed(): + train_sampler = DistributedSampler(train_data, num_replicas=WORLD_SIZE, rank=RANK, shuffle=True, drop_last=False, seed=args.seed) + test_sampler = DistributedSampler(test_data, num_replicas=WORLD_SIZE, rank=RANK, shuffle=True, drop_last=False, seed=args.seed) + train_loader = torch.utils.data.DataLoader(train_data, batch_size=args.batch_size,sampler=train_sampler, **kwargs) + test_loader = torch.utils.data.DataLoader(test_data, batch_size=args.test_batch_size, shuffle=False, **kwargs) + else: + train_loader = torch.utils.data.DataLoader( + train_data, + batch_size=args.batch_size, shuffle=True, **kwargs) + test_loader = torch.utils.data.DataLoader(test_data, + batch_size=args.test_batch_size, shuffle=False, **kwargs) + + print("After downloading data", flush=True) + + test_loader = torch.utils.data.DataLoader( + datasets.FashionMNIST("./data", + train=False, + transform=transforms.Compose([ + transforms.ToTensor() + ])), + batch_size=args.test_batch_size, shuffle=False, **kwargs) + + model = Net().to(device) + + if is_distributed(): + Distributor = nn.parallel.DistributedDataParallel + model = Distributor(model) + + optimizer = optim.SGD(model.parameters(), lr=args.lr, + momentum=args.momentum) + + + start = time.perf_counter() + cpu_start = time.process_time() + + for epoch in range(1, args.epochs + 1): + train(args, model, device, train_loader, optimizer, epoch) + test(args, model, device, test_loader, epoch) + + cpu_end = time.process_time() + end = time.perf_counter() + print("CPU Elapsed time:", cpu_end - cpu_start) + print("Elapsed time:", end - start) + + if (args.save_model): + torch.save(model.state_dict(), "mnist_cnn.pt") + + if is_distributed(): + dist.destroy_process_group() + + +if __name__ == "__main__": + main() diff --git a/demos/pytorch/distributed/pytorch.yaml b/demos/pytorch/distributed/pytorch.yaml new file mode 100644 index 00000000..7c99be2e --- /dev/null +++ b/demos/pytorch/distributed/pytorch.yaml @@ -0,0 +1,39 @@ +includes: + - base.yaml +targets: + - target: /bin + createlinks: + - src: /opt/python-occlum/bin/python3 + linkname: python3 + copy: + - files: + - /opt/occlum/toolchains/busybox/glibc/busybox + # python packages + - target: /opt + copy: + - dirs: + - ../python-occlum + # python code + - target: / + copy: + - files: + - ../mnist.py + - target: /opt/occlum/glibc/lib + copy: + - files: + - /lib/x86_64-linux-gnu/libnss_dns.so.2 + - /lib/x86_64-linux-gnu/libnss_files.so.2 + # etc files + - target: /etc + copy: + - dirs: + - /etc/ssl + - files: + - /etc/nsswitch.conf + # CA files + - target: /ppml/certs/ + copy: + - files: + - ../myCA.pem + - ../test.key + - ../test.crt diff --git a/demos/pytorch/standalone/.gitignore b/demos/pytorch/standalone/.gitignore new file mode 100644 index 00000000..4ba7c712 --- /dev/null +++ b/demos/pytorch/standalone/.gitignore @@ -0,0 +1,3 @@ +occlum_instance/ +miniconda/ +Miniconda3* diff --git a/demos/pytorch/README.md b/demos/pytorch/standalone/README.md similarity index 87% rename from demos/pytorch/README.md rename to demos/pytorch/standalone/README.md index b05eb2d6..8c3c535d 100644 --- a/demos/pytorch/README.md +++ b/demos/pytorch/standalone/README.md @@ -10,22 +10,22 @@ Use the nn package to define our model as a sequence of layers. nn.Sequential is This tutorial is written under the assumption that you have Docker installed and use Occlum in a Docker container. -Occlum is compatible with glibc-supported Python, we employ miniconda as python installation tool. You can import PyTorch packages using conda. Here, miniconda is automatically installed by install_python_with_conda.sh script, the required python and PyTorch packages for this project are also loaded by this script. Here, we take occlum/occlum:0.23.0-ubuntu18.04 as example. +Occlum is compatible with glibc-supported Python, we employ miniconda as python installation tool. You can import PyTorch packages using conda. Here, miniconda is automatically installed by install_python_with_conda.sh script, the required python and PyTorch packages for this project are also loaded by this script. Here, we take occlum/occlum:0.29.3-ubuntu20.04 as example. Step 1 (on the host): Start an Occlum container ``` -docker pull occlum/occlum:0.23.0-ubuntu18.04 -docker run -it --name=pythonDemo --device /dev/sgx/enclave occlum/occlum:0.23.0-ubuntu18.04 bash +docker pull occlum/occlum:0.29.3-ubuntu20.04 +docker run -it --name=pythonDemo --device /dev/sgx/enclave occlum/occlum:0.29.3-ubuntu20.04 bash ``` Step 2 (in the Occlum container): Download miniconda and install python to prefix position. ``` -cd /root/demos/pytorch +cd /root/demos/pytorch/standalone bash ./install_python_with_conda.sh ``` Step 3 (in the Occlum container): Run the sample code on Occlum ``` -cd /root/demos/pytorch +cd /root/demos/standalone/pytorch bash ./run_pytorch_on_occlum.sh ``` diff --git a/demos/pytorch/demo.py b/demos/pytorch/standalone/demo.py similarity index 100% rename from demos/pytorch/demo.py rename to demos/pytorch/standalone/demo.py diff --git a/demos/pytorch/install_python_with_conda.sh b/demos/pytorch/standalone/install_python_with_conda.sh similarity index 100% rename from demos/pytorch/install_python_with_conda.sh rename to demos/pytorch/standalone/install_python_with_conda.sh diff --git a/demos/pytorch/pytorch.yaml b/demos/pytorch/standalone/pytorch.yaml similarity index 100% rename from demos/pytorch/pytorch.yaml rename to demos/pytorch/standalone/pytorch.yaml diff --git a/demos/pytorch/run_pytorch_on_occlum.sh b/demos/pytorch/standalone/run_pytorch_on_occlum.sh similarity index 100% rename from demos/pytorch/run_pytorch_on_occlum.sh rename to demos/pytorch/standalone/run_pytorch_on_occlum.sh