[demos] Add distributed pytorch demo

2022-12-02 18:24:25 +08:00 · 2022-12-02 18:24:25 +08:00 · 47bd1fd7af
commit 47bd1fd7af
parent a5cdcc8045
14 changed files with 455 additions and 8 deletions
--- a/.github/workflows/demo_test.yml
+++ b/.github/workflows/demo_test.yml
@ -276,11 +276,34 @@ jobs:
        build-envs: 'OCCLUM_RELEASE_BUILD=1'
    - name: Build python and pytorch
-      run: docker exec ${{ github.job }} bash -c "cd /root/occlum/demos/pytorch; ./install_python_with_conda.sh"
+      run: docker exec ${{ github.job }} bash -c "cd /root/occlum/demos/pytorch/standalone; ./install_python_with_conda.sh"
    - name: Run pytorch test
-      run: docker exec ${{ github.job }} bash -c "cd /root/occlum/demos/pytorch; SGX_MODE=SIM ./run_pytorch_on_occlum.sh"
+      run: docker exec ${{ github.job }} bash -c "cd /root/occlum/demos/pytorch/standalone; SGX_MODE=SIM ./run_pytorch_on_occlum.sh"
  Distributed_Pytorch_test:
    runs-on: ubuntu-20.04
    steps:
    - uses: actions/checkout@v1
      with:
        submodules: true
    - uses: ./.github/workflows/composite_action/sim
      with:
        container-name: ${{ github.job }}
        build-envs: 'OCCLUM_RELEASE_BUILD=1'
    - name: Build python and pytorch
      run: docker exec ${{ github.job }} bash -c "cd /root/occlum/demos/pytorch/distributed; ./install_python_with_conda.sh"
    - name: Build pytorch Occlum instance
      run: docker exec ${{ github.job }} bash -c "cd /root/occlum/demos/pytorch/distributed; SGX_MODE=SIM ./build_pytorch_occlum_instance.sh"
    - name: Start pytorch Occlum instance node one
      run: docker exec ${{ github.job }} bash -c "cd /root/occlum/demos/pytorch/distributed/occlum_instance; WORLD_SIZE=2 RANK=0 occlum run /bin/python3 mnist.py --epoch 3 --no-cuda --seed 42 --save-model &"
    - name: Start pytorch Occlum instance node two
      run: docker exec ${{ github.job }} bash -c "cd /root/occlum/demos/pytorch/distributed/occlum_instance_2; WORLD_SIZE=2 RANK=1 occlum run /bin/python3 mnist.py --epoch 3 --no-cuda --seed 42 --save-model"
  Tensorflow_test:
    runs-on: ubuntu-20.04
--- a/demos/README.md
+++ b/demos/README.md
@ -22,7 +22,7 @@ This set of demos shows how real-world apps can be easily run inside SGX enclave
 * [grpc](grpc/): A client and server communicating through [gRPC](https://grpc.io), containing [glibc-supported demo](grpc/grpc_glibc) and [musl-supported demo](grpc/grpc_musl).
 * [https_server](https_server/): A HTTPS file server based on [Mongoose Embedded Web Server Library](https://github.com/cesanta/mongoose).
 * [openvino](openvino/) A benchmark of [OpenVINO Inference Engine](https://docs.openvinotoolkit.org/2019_R3/_docs_IE_DG_inference_engine_intro.html).
-* [pytorch](pytorch/): A demo of [PyTorch](https://pytorch.org/).
+* [pytorch](pytorch/): Demos of standalone and distributed [PyTorch](https://pytorch.org/).
 * [redis](redis/): A demo of [Redis](https://redis.io).
 * [sofaboot](sofaboot/): A demo of [SOFABoot](https://github.com/sofastack/sofa-boot), an open source Java development framework based on Spring Boot.
 * [sqlite](sqlite/) A demo of [SQLite](https://www.sqlite.org) SQL database engine.
--- a/demos/pytorch/distributed/.gitignore
+++ b/demos/pytorch/distributed/.gitignore
--- a/demos/pytorch/distributed/README.md
+++ b/demos/pytorch/distributed/README.md
@ -0,0 +1,107 @@
 # Distributed PyTorch Demo
 This project demonstrates how Occlum enables _unmodified_ distributed [PyTorch](https://pytorch.org/) programs running in SGX enclaves, on the basis of _unmodified_ [Python](https://www.python.org).
 ## Environment variables for Distributed PyTorch model
 There are a few environment variables that are related to distributed PyTorch training, which are:
 1. MASTER_ADDR
 2. MASTER_PORT
 3. WORLD_SIZE
 4. RANK
 `MASTER_ADDR` and `MASTER_PORT` specifies a rendezvous point where all the training processes will connect to.
 `WORLD_SIZE` specifies how many training processes will participate in the training.
 `RANK` is the unique identifier for each of the training process.
 The `MASTER_ADDR`, `MASTER_PORT` and `WORLD_SIZE` should be identical for all the participants while the `RANK` should be unique.
 **Note that in most cases PyTorch only use multi-threads. If you find a process fork, please set `num_workers=1` env.**
 ### TLS related environment variables
 There is a environment variable called `GLOO_DEVICE_TRANSPORT` that can be used to specify the transport.
 The default value is set to TCP.  If TLS is required to satisfy the security requirement, then, please also set the following environment variables:
 1. GLOO_DEVICE_TRANSPORT=TCP_TLS
 2. GLOO_DEVICE_TRANSPORT_TCP_TLS_PKEY
 3. GLOO_DEVICE_TRANSPORT_TCP_TLS_CERT
 4. GLOO_DEVICE_TRANSPORT_TCP_TLS_CA_FILE
 These environments are set as below in our demo.
 ```json
  "env": {
    "default": [
      "GLOO_DEVICE_TRANSPORT=TCP_TLS",
      "GLOO_DEVICE_TRANSPORT_TCP_TLS_PKEY=/ppml/certs/test.key",
      "GLOO_DEVICE_TRANSPORT_TCP_TLS_CERT=/ppml/certs/test.crt",
      "GLOO_DEVICE_TRANSPORT_TCP_TLS_CA_FILE=/ppml/certs/myCA.pem",
 ```
 The CA files above are generated by openssl. Details please refer to the function **generate_ca_files** in the script [`build_pytorch_occlum_instance.sh`](./build_pytorch_occlum_instance.sh).
 ## How to Run
 This tutorial is written under the assumption that you have Docker installed and use Occlum in a Docker container.
 Occlum is compatible with glibc-supported Python, we employ miniconda as python installation tool. You can import PyTorch packages using conda. Here, miniconda is automatically installed by install_python_with_conda.sh script, the required python and PyTorch packages for this project are also loaded by this script. Here, we take occlum/occlum:0.29.3-ubuntu20.04 as example.
 In the following example, we will try to run a distributed PyTorch training using `fasion-MNIST` dataset with 2 processes (Occlum instance).
 Thus, we set `WORLD_SIZE` to 2.
 Generally, `MASTER_ADDR` can be set to the IP address of the process with RANK 0. In our case, two processes are running in the same container, thus `MASTER_ADDR` can be simply set to `localhost`.
 Step 1 (on the host): Start an Occlum container
 ```bash
 docker pull occlum/occlum:0.29.3-ubuntu20.04
 docker run -it --name=pythonDemo --device /dev/sgx/enclave occlum/occlum:0.29.3-ubuntu20.04 bash
 ```
 Step 2 (in the Occlum container): Download miniconda and install python to prefix position.
 ```bash
 cd /root/demos/pytorch/distributed
 bash ./install_python_with_conda.sh
 ```
 Step 3 (in the Occlum container): Build the Distributed PyTorch Occlum instances
 ```bash
 cd /root/demos/pytorch/distributed
 bash ./build_pytorch_occlum_instance.sh
 ```
 Step 4 (in the Occlum container): Run node one PyTorch instance
 ```bash
 cd /root/demos/pytorch/distributed/occlum_instance
 WORLD_SIZE=2 RANK=0 occlum run /bin/python3 mnist.py --epoch 3 --no-cuda --seed 42 --save-model
 ```
 If successful, it will wait for the node two to join.
 ```log
 Using distributed PyTorch with gloo backend
 ```
 Step 5 (in the Occlum container): Run node two PyTorch instance
 ```bash
 cd /root/demos/pytorch/distributed/occlum_instance
 WORLD_SIZE=2 RANK=1 occlum run /bin/python3 mnist.py --epoch 3 --no-cuda --seed 42 --save-model
 ```
 If everything goes well, node one and two has similar logs as below.
 ```log
 After downloading data
 2022-12-05T09:40:05Z INFO     Train Epoch: 1 [0/469 (0%)]       loss=2.3037
 2022-12-05T09:40:05Z INFO     Reducer buckets have been rebuilt in this iteration.
 2022-12-05T09:40:06Z INFO     Train Epoch: 1 [10/469 (2%)]      loss=2.3117
 2022-12-05T09:40:06Z INFO     Train Epoch: 1 [20/469 (4%)]      loss=2.2826
 2022-12-05T09:40:06Z INFO     Train Epoch: 1 [30/469 (6%)]      loss=2.2904
 2022-12-05T09:40:07Z INFO     Train Epoch: 1 [40/469 (9%)]      loss=2.2860
 2022-12-05T09:40:07Z INFO     Train Epoch: 1 [50/469 (11%)]     loss=2.2784
 2022-12-05T09:40:08Z INFO     Train Epoch: 1 [60/469 (13%)]     loss=2.2779
 2022-12-05T09:40:08Z INFO     Train Epoch: 1 [70/469 (15%)]     loss=2.2689
 2022-12-05T09:40:08Z INFO     Train Epoch: 1 [80/469 (17%)]     loss=2.2513
 2022-12-05T09:40:09Z INFO     Train Epoch: 1 [90/469 (19%)]     loss=2.2536
 ...
 ```
--- a/demos/pytorch/distributed/build_pytorch_occlum_instance.sh
+++ b/demos/pytorch/distributed/build_pytorch_occlum_instance.sh
@ -0,0 +1,55 @@
 #!/bin/bash
 set -e
 BLUE='\033[1;34m'
 NC='\033[0m'
 script_dir="$( cd "$( dirname "${BASH_SOURCE[0]}"  )" >/dev/null 2>&1 && pwd )"
 python_dir="$script_dir/occlum_instance/image/opt/python-occlum"
 function generate_ca_files()
 {
    cn_name=${1:-"localhost"}
    # Generate CA files
    openssl req -x509 -nodes -days 1825 -newkey rsa:2048 -keyout myCA.key -out myCA.pem -subj "/CN=${cn_name}"
    # Prepare test private key
    openssl genrsa -out test.key 2048
    # Use private key to generate a Certificate Sign Request
    openssl req -new -key test.key -out test.csr -subj "/C=CN/ST=Shanghai/L=Shanghai/O=Ant/CN=${cn_name}"
    # Use CA private key and CA file to sign test CSR
    openssl x509 -req -in test.csr -CA myCA.pem -CAkey myCA.key -CAcreateserial -out test.crt -days 825 -sha256
 }
 function build_instance()
 {
    rm -rf occlum_instance* && occlum new occlum_instance
    pushd occlum_instance
    rm -rf image
    copy_bom -f ../pytorch.yaml --root image --include-dir /opt/occlum/etc/template
    if [ ! -d $python_dir ];then
        echo "Error: cannot stat '$python_dir' directory"
        exit 1
    fi
    new_json="$(jq '.resource_limits.user_space_size = "4000MB" |
                    .resource_limits.kernel_space_heap_size = "256MB" |
                    .resource_limits.max_num_of_threads = 64 |
                    .env.untrusted += [ "MASTER_ADDR", "MASTER_PORT", "WORLD_SIZE", "RANK", "TORCH_CPP_LOG_LEVEL" ] |
                    .env.default += ["GLOO_DEVICE_TRANSPORT=TCP_TLS"] |
                    .env.default += ["GLOO_DEVICE_TRANSPORT_TCP_TLS_PKEY=/ppml/certs/test.key"] |
                    .env.default += ["GLOO_DEVICE_TRANSPORT_TCP_TLS_CERT=/ppml/certs/test.crt"] |
                    .env.default += ["GLOO_DEVICE_TRANSPORT_TCP_TLS_CA_FILE=/ppml/certs/myCA.pem"] |
                    .env.default += ["PYTHONHOME=/opt/python-occlum"] |
                    .env.default += [ "MASTER_ADDR=127.0.0.1", "MASTER_PORT=29500" ] ' Occlum.json)" && \
    echo "${new_json}" > Occlum.json
    occlum build
    popd
 }
 generate_ca_files
 build_instance
 # Test instance for 2 nodes distributed pytorch training
 cp -r occlum_instance occlum_instance_2
--- a/demos/pytorch/distributed/install_python_with_conda.sh
+++ b/demos/pytorch/distributed/install_python_with_conda.sh
@ -0,0 +1,10 @@
 #!/bin/bash
 set -e
 script_dir="$( cd "$( dirname "${BASH_SOURCE[0]}"  )" >/dev/null 2>&1 && pwd )"
 # Install python and dependencies to specified position
 [ -f Miniconda3-latest-Linux-x86_64.sh ] || wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
 [ -d miniconda ] || bash ./Miniconda3-latest-Linux-x86_64.sh -b -p $script_dir/miniconda
 $script_dir/miniconda/bin/conda create --prefix $script_dir/python-occlum -y \
    python=3.8.10 numpy=1.21.5 scipy=1.7.3 scikit-learn=1.0 pandas=1.3 \
    Cython pytorch torchvision -c pytorch
--- a/demos/pytorch/distributed/mnist.py
+++ b/demos/pytorch/distributed/mnist.py
@ -0,0 +1,210 @@
 from __future__ import print_function
 import argparse
 import logging
 import os
 import time
 from torchvision import datasets, transforms
 from torch.utils.data.distributed import DistributedSampler
 import torch
 import torch.distributed as dist
 import torch.nn as nn
 import torch.nn.functional as F
 import torch.optim as optim
 WORLD_SIZE = int(os.environ.get("WORLD_SIZE", 1))
 RANK = int(os.environ.get("RANK", 0))
 class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5, 1)
        self.conv2 = nn.Conv2d(20, 50, 5, 1)
        self.fc1 = nn.Linear(4*4*50, 500)
        self.fc2 = nn.Linear(500, 10)
    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.max_pool2d(x, 2, 2)
        x = F.relu(self.conv2(x))
        x = F.max_pool2d(x, 2, 2)
        x = x.view(-1, 4*4*50)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)
 def train(args, model, device, train_loader, optimizer, epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % args.log_interval == 0:
            msg = "Train Epoch: {} [{}/{} ({:.0f}%)]\tloss={:.4f}".format(
                epoch, batch_idx, len(train_loader),
                100. * batch_idx / len(train_loader), loss.item())
            logging.info(msg)
            niter = epoch * len(train_loader) + batch_idx
 def test(args, model, device, test_loader, epoch):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            # sum up batch loss
            test_loss += F.nll_loss(output, target, reduction="sum").item()
            # get the index of the max log-probability
            pred = output.max(1, keepdim=True)[1]
            correct += pred.eq(target.view_as(pred)).sum().item()
    test_loss /= len(test_loader.dataset)
    logging.info("{{metricName: accuracy, metricValue: {:.4f}}};{{metricName: loss, metricValue: {:.4f}}}\n".format(
        float(correct) / (len(test_loader.dataset) / WORLD_SIZE), test_loss))
 def should_distribute():
    return dist.is_available() and WORLD_SIZE > 1
 def is_distributed():
    return dist.is_available() and dist.is_initialized()
 def main():
    # Training settings
    parser = argparse.ArgumentParser(description="PyTorch MNIST Example")
    parser.add_argument("--batch-size", type=int, default=64, metavar="N",
                        help="input batch size for training (default: 64)")
    parser.add_argument("--test-batch-size", type=int, default=1000, metavar="N",
                        help="input batch size for testing (default: 1000)")
    parser.add_argument("--epochs", type=int, default=10, metavar="N",
                        help="number of epochs to train (default: 10)")
    parser.add_argument("--lr", type=float, default=0.01, metavar="LR",
                        help="learning rate (default: 0.01)")
    parser.add_argument("--momentum", type=float, default=0.5, metavar="M",
                        help="SGD momentum (default: 0.5)")
    parser.add_argument("--no-cuda", action="store_true", default=False,
                        help="disables CUDA training")
    parser.add_argument("--seed", type=int, default=1, metavar="S",
                        help="random seed (default: 1)")
    parser.add_argument("--log-interval", type=int, default=10, metavar="N",
                        help="how many batches to wait before logging training status")
    parser.add_argument("--log-path", type=str, default="",
                        help="Path to save logs. Print to StdOut if log-path is not set")
    parser.add_argument("--save-model", action="store_true", default=False,
                        help="For Saving the current Model")
    if dist.is_available():
        parser.add_argument("--backend", type=str, help="Distributed backend",
                            choices=[dist.Backend.GLOO,
                                     dist.Backend.NCCL, dist.Backend.MPI],
                            default=dist.Backend.GLOO)
    args = parser.parse_args()
    # Use this format (%Y-%m-%dT%H:%M:%SZ) to record timestamp of the metrics.
    # If log_path is empty print log to StdOut, otherwise print log to the file.
    if args.log_path == "":
        logging.basicConfig(
            format="%(asctime)s %(levelname)-8s %(message)s",
            datefmt="%Y-%m-%dT%H:%M:%SZ",
            level=logging.DEBUG)
    else:
        logging.basicConfig(
            format="%(asctime)s %(levelname)-8s %(message)s",
            datefmt="%Y-%m-%dT%H:%M:%SZ",
            level=logging.DEBUG,
            filename=args.log_path)
    use_cuda = not args.no_cuda and torch.cuda.is_available()
    if use_cuda:
        print("Using CUDA")
    torch.manual_seed(args.seed)
    device = torch.device("cuda" if use_cuda else "cpu")
    if should_distribute():
        print("Using distributed PyTorch with {} backend".format(
            args.backend), flush=True)
        dist.init_process_group(backend=args.backend)
    kwargs = {"num_workers": 1, "pin_memory": True} if use_cuda else {}
    print("Before downloading data", flush=True)
    train_data = datasets.FashionMNIST("./data",
                            train=True,
                            download=True,
                            transform=transforms.Compose([
                            transforms.ToTensor()
                            ]))
    test_data = datasets.FashionMNIST("./data",
                            train=True,
                            download=True,
                            transform=transforms.Compose([
                            transforms.ToTensor()
                            ]))
    if is_distributed():
        train_sampler = DistributedSampler(train_data, num_replicas=WORLD_SIZE, rank=RANK, shuffle=True, drop_last=False, seed=args.seed)
        test_sampler = DistributedSampler(test_data, num_replicas=WORLD_SIZE, rank=RANK, shuffle=True, drop_last=False, seed=args.seed)
        train_loader = torch.utils.data.DataLoader(train_data, batch_size=args.batch_size,sampler=train_sampler, **kwargs)
        test_loader = torch.utils.data.DataLoader(test_data, batch_size=args.test_batch_size, shuffle=False, **kwargs)
    else:
        train_loader = torch.utils.data.DataLoader(
            train_data,
            batch_size=args.batch_size, shuffle=True, **kwargs)
        test_loader = torch.utils.data.DataLoader(test_data,
        batch_size=args.test_batch_size, shuffle=False, **kwargs)
    print("After downloading data", flush=True)
    test_loader = torch.utils.data.DataLoader(
        datasets.FashionMNIST("./data",
                              train=False,
                              transform=transforms.Compose([
                                  transforms.ToTensor()
                              ])),
        batch_size=args.test_batch_size, shuffle=False, **kwargs)
    model = Net().to(device)
    if is_distributed():
        Distributor = nn.parallel.DistributedDataParallel
        model = Distributor(model)
    optimizer = optim.SGD(model.parameters(), lr=args.lr,
                          momentum=args.momentum)
    start = time.perf_counter()
    cpu_start = time.process_time()
    for epoch in range(1, args.epochs + 1):
        train(args, model, device, train_loader, optimizer, epoch)
        test(args, model, device, test_loader, epoch)
    cpu_end = time.process_time()
    end = time.perf_counter()
    print("CPU Elapsed time:", cpu_end - cpu_start)
    print("Elapsed time:", end - start)
    if (args.save_model):
        torch.save(model.state_dict(), "mnist_cnn.pt")
    if is_distributed():
        dist.destroy_process_group()
 if __name__ == "__main__":
    main()
--- a/demos/pytorch/distributed/pytorch.yaml
+++ b/demos/pytorch/distributed/pytorch.yaml
@ -0,0 +1,39 @@
 includes:
  - base.yaml
 targets:
  - target: /bin
    createlinks:
      - src: /opt/python-occlum/bin/python3
        linkname: python3
    copy:
      - files:
          - /opt/occlum/toolchains/busybox/glibc/busybox
  # python packages
  - target: /opt
    copy: 
      - dirs:
          - ../python-occlum
  # python code
  - target: /
    copy:
      - files: 
          - ../mnist.py
  - target: /opt/occlum/glibc/lib
    copy:
      - files:
          - /lib/x86_64-linux-gnu/libnss_dns.so.2
          - /lib/x86_64-linux-gnu/libnss_files.so.2
  # etc files
  - target: /etc
    copy:
      - dirs:
          - /etc/ssl
      - files:
          - /etc/nsswitch.conf
  # CA files
  - target: /ppml/certs/
    copy:
      - files:
          - ../myCA.pem
          - ../test.key
          - ../test.crt
--- a/demos/pytorch/standalone/.gitignore
+++ b/demos/pytorch/standalone/.gitignore
@ -0,0 +1,3 @@
 occlum_instance/
 miniconda/
 Miniconda3*
--- a/demos/pytorch/standalone/README.md
+++ b/demos/pytorch/standalone/README.md
@ -10,22 +10,22 @@ Use the nn package to define our model as a sequence of layers. nn.Sequential is
 This tutorial is written under the assumption that you have Docker installed and use Occlum in a Docker container.
-Occlum is compatible with glibc-supported Python, we employ miniconda as python installation tool. You can import PyTorch packages using conda. Here, miniconda is automatically installed by install_python_with_conda.sh script, the required python and PyTorch packages for this project are also loaded by this script. Here, we take occlum/occlum:0.23.0-ubuntu18.04 as example.
+Occlum is compatible with glibc-supported Python, we employ miniconda as python installation tool. You can import PyTorch packages using conda. Here, miniconda is automatically installed by install_python_with_conda.sh script, the required python and PyTorch packages for this project are also loaded by this script. Here, we take occlum/occlum:0.29.3-ubuntu20.04 as example.
 Step 1 (on the host): Start an Occlum container
 ```
-docker pull occlum/occlum:0.23.0-ubuntu18.04
+docker pull occlum/occlum:0.29.3-ubuntu20.04
-docker run -it --name=pythonDemo --device /dev/sgx/enclave occlum/occlum:0.23.0-ubuntu18.04 bash
+docker run -it --name=pythonDemo --device /dev/sgx/enclave occlum/occlum:0.29.3-ubuntu20.04 bash
 ```
 Step 2 (in the Occlum container): Download miniconda and install python to prefix position.
 ```
-cd /root/demos/pytorch
+cd /root/demos/pytorch/standalone
 bash ./install_python_with_conda.sh
 ```
 Step 3 (in the Occlum container): Run the sample code on Occlum
 ```
-cd /root/demos/pytorch
+cd /root/demos/standalone/pytorch
 bash ./run_pytorch_on_occlum.sh
 ```
--- a/demos/pytorch/standalone/demo.py
+++ b/demos/pytorch/standalone/demo.py
--- a/demos/pytorch/standalone/install_python_with_conda.sh
+++ b/demos/pytorch/standalone/install_python_with_conda.sh
--- a/demos/pytorch/standalone/pytorch.yaml
+++ b/demos/pytorch/standalone/pytorch.yaml
--- a/demos/pytorch/standalone/run_pytorch_on_occlum.sh
+++ b/demos/pytorch/standalone/run_pytorch_on_occlum.sh