BACKGROUND
The exit_group syscall, which is implicitly called by libc after the main function
returns, kills all threads in a thread group, even if these threads are
running, sleeping, or waiting on a futex.
PROBLEM
In normal use cases, exit_group does nothing since a well-written program
should terminate all threads before the main function returns. But when this is
not the case, exit_group can clean up the mess.
Currently, Occlum does not implement exit_group. And the Occlum PAL process
waits for all tasks (i.e., SGX threads) to finish before exiting. So without
exit_group implemented, some tasks may be still running if after the main task
exits. And this causes the Occlum PAL process to wait---forever.
WORKAROUND
To implement a real exit_group, we need signals to kill threads. But we do not
have signals, yet. So we come up with a workaround: instead of waiting all
tasks to finish in PAL, we just wait for the main task. As soon as the main
task exits, the PAL process terminates, killing the remaining tasks.
The original implementation of program loader is written under the assumption
that there are only two loadable segments per ELF, one is code, and the other
is data. But this assumption is unnecessary and proves to be wrong for an ELF
on Alpine Linux, which has two extra read-only, loadable segments for security
hardening. This commit clears the obstacle towards running unmodified
executables from Alpine Linux.
In addition to getting rid of the false assumption of two fixed loadable segments,
this commit improves the quality of the code related to program loading and
process initialization.
* 'occlum init' does not copy signing key file any more.
* 'occlum build' supports to set signing key and signing tool in args.
* 'occlum run' supports to run enclave in sgx release mode.
1. Change the port for server_poll to listen to avoid "address in use" conflict
between test/server and test/server_epoll, and add port as an argument for
test/client to send message
2. As posix-spwan may fail, change the fixed number of processes to spawn to
the number of processes successfully spawned in server_epoll
1. Now we support set App's env in Occlum.json, for example:
"env": [
"OCCLUM=yes",
"TEST=true"
]
2. Rewrite env test cases
3. Update Dockerfile to install "jq" tool
1. All generated, build files are now in a separate build directory;
2. The CLI tool supports three sub-commands: init, build, and run;
3. Refactor tests to use the new tool.
In addition, to ensure that all future Rust code complies with
`cargo fmt`, we add a Git post-commit hook that generates warnings
if the commited code is not formated consistently.
* Add patch to Rust SGX SDK to enable integrity-only SgxFile
* Upgrade to the new SEFS extended with the integrity-only mode
* Use integrity-only SEFS for /bin and /lib in test
* Add the MAC of integrity-only SEFS to Occlum.json in test
* Mount multiple FS according to Occlum.json
* Check the MACs of integrity-only SEFS images
The old system call mechanism works by relocating the symbol __occlum_syscall
provided by libocclum_stub.so to the real entry point of the LibOS. This symbol
relocation is done by the program loader. Now, the new system call mechanism is
based on passing the entry point via the auxiliary vector. This new mechanism
is simpler and is more compatible with the upcoming support for ld.so.
Changes:
1. Fix a bug in serializing auxiliary vector in the stack of a user program;
2. Passing syscall entry via auxiliary vector;
3. Remove relocating for the __occlum_syscall symbol;
4. Remove the dependency on libocclum_stub.so in tests.
There are two types of stacks: the kernel ones and the user ones. The kernel
stacks are used by Occlum and managed by Intel SGX SDK itself, while the user
stacks are used by the threads created and managed by Occlum. These user stacks
are transparent to Intel SGX SDK so far.
The problem is that Intel SGX SDK needs to be aware of the user stacks.
SGX exception handlers will check whether the rsp value---when the exception
happened---is within the stack of the current SGX thread. If the check fails,
the registered exception handler will not be triggered. But when exceptions are
triggered by the threads running upon Occlum, the rsp value points to the user
stacks, which Intel SGX SDK are completely unware of. So the check always
fails.
Therefore, we extend Intel SGX SDK with two new APIs:
int sgx_enable_user_stack(size_t stack_base, size_t stack_limit);
void sgx_disable_user_stack(void);
And this commit uses the two APIs to inform Intel SGX SDK about the
Occlum-managed stacks. And the rsp checks in SGX exception handlers will
check whether rsp is within the user stacks.