It turns out taking a lock in every system call is a significant
performance bottleneck. In light of this finding, we replace a mutex in
a critical path of system call with an atomic boolean.
This rewrite serves three purposes:
1. Fix some subtle bugs in the old implementation;
2. Implement mremap using mmap and munmap so that mremap can automatically
enjoy new features (e.g., mprotect and memory permissions) once mmap and
munmap support the feature.
3. Write down the invariants hold by VMManager explictly so that the correctness
of the new implementation can be reason more easily.
Update the occlum.json to align with the gen_enclave_conf design.
Below is the two updated structures:
"metadata": {
"product_id": 0,
"version_number": 0,
"debuggable": true
},
"resource_limits": {
"max_num_of_threads": 32,
"kernel_space_heap_size": "32MB",
"kernel_space_stack_size": "1MB",
"user_space_size": "256MB"
}
On lightweight Linux distribution, like alpine, getpwuid()
returns NULL, and errno is ENOENT, this patch fix crash
caused by this situation.
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Add "untrusted" sections for environment variables defined in Occlum.json. Environment
variable defined in "default" will be shown in libos directly. Environment variable
defined in "untrusted" can be passed from occlum run or PAL layer and can override
the value in "default" and thus is considered "untrusted".
Fix std::alloc::Alloc not found
The lastest Rust changes the trait to std::alloc::AllocRef.
Update the docker files to support sgx 2.9.1
Remove the compilerRT dependency for rust sdk update
Before this commit, the three ECalls of the LibOS enclave do not give
the exact reason on error. In this commit, we modify the enclave entry code
to return the errno and list all possible values of errno in Enclave.edl.
In this commit, we add eight signal-related syscalls
* kill
* tkill
* tgkill
* rt_sigaction
* rt_sigreturn
* rt_sigprocmask
* rt_sigpending
* exit_group
We implement the following major features for signals:
* Generate, mask, and deliver signals
* Support user-defined signal handlers
* Support nested invocation of signal handlers
* Support passing arguments: signum, sigaction, and ucontext
* Support both process-directed and thread-directed signals
* Capture hardware exceptions and convert them to signals
* Deliver fatal signals (like SIGKILL) to kill processes gracefully
But we still have gaps, including but not limited to the points below:
* Convert #PF (page fault) and #GP (general protection) exceptions to signals
* Force delivery of signals via interrupt
* Support simulation mode
When a unix socket only calls function listen, its object is not created
but its status becomes listening. At this time closing the socket would
cause a panic before this commit.
The next generation of Intel CPUs does not support Intel MPX. Enabling MPX
by default crashes the LibOS on startup. So we disable MPX by default. The
long term plan is to turn on/off MPX via compiling options.
This commits improves both readability and correctness of the scheduling-related
system calls. In terms of readability, it extracts all scheduling-related code
ouf of the process/ directory and put it in a sched/ directory. In terms
of correctness, the new scheduling subsystem introduces CpuSet and SchedAgent
types to maintain and manipulate CPU scheduler settings in a secure and robust way.
As a major rewrite to the process/thread subsystem, this commits:
1. Implements threads as a first-class object, which represents a group of OS resources
and a thread of execution;
2. Implements processes as a first-class object that manages threads and maintains
the parent-child relationship between processes;
3. Refactors the code in process subsystem to follow the improved coding style and
conventions emerged in recent commits;
4. Refactors the code in other subsystems to use the new process/thread subsystem.
SEFS depends on version 0.9 of bitvec crate, which has been yanked on crates.io
by the crate author for some reasons. To fix this, we upgrade to the latest
version of bitvec crate.
This commit introduces a unified logging strategy, summarized as below:
1. Use `error!` to mark errors or unexpected conditions, e.g., a
`Result::Err` returned from a system call.
2. Use `warn!` to warn about potentially problematic issues, e.g.,
executing a workaround or fake implementation.
3. Use `info!` to show important events (from users' perspective) in
normal execution, e.g., creating/exiting a process/thread.
4. Use `debug!` to track major events in normal execution, e.g., the
high-level arguments of a system call.
5. Use `trace!` to record the most detailed info, e.g., when a system
call enters and exits the LibOS.
Now one can specify the log level of the LibOS by setting `OCCLUM_LOG_LEVEL`
environment variable. The possible values are "off", "error", "warn",
"info", and "trace".
However, for the sake of security, the log level of a release enclave
(DisableDebug = 1 in Enclave.xml) is always "off" (i.e., no log) regardless of
the log level specified by the untrusted environment.
This commit introduces a system call table, which brings several benefits:
1. The table is a centralized info hub that one can find an answer for every
question about system calls, e.g., what is the number and arguments of a
system call, is it implemented or supported, and if so, what is the
function that actual implements it.
2. System call-related code can be automatically derived from the system call
table through a clever use of macros. In this way, the code avoids repeating
itself.
Before this commit, there are two strange bugs:
1. No backtraces are displayed on panic by Rust; and,
2. Thread local storage in Rust sometimes causes panics.
It turns out that the the root cause of the two bugs are the same: Occlum's
patch to Intel SGX SDK that informs SDK about the stack range of the currnet
LibOS user-level thread. The problem about this patch is that it modifies some
fundamental data structures and Rust SGX SDK does not know the modification.
This causes Rust SGX SDK to panic in certain conditions.
To resolve the conflict for good, this commit gets rid of the patch to Intel
SGX SDK by updating SDK's stack ranges upon user/kernel switch.
1. Use arch_prctl to replace RDFSBASE/WRFSBASE
Ptrace can't get right value if WRFSBASE is called which
will make debugger fail in simulation mode. Use arch_prctl
to replace these instructions in simulation mode.
2. Disable the busy thread in exit_group test
exit_group doesn't have a real implementation yet but test
under SGX simulation mode give core dump for exit_group test.
Disable the busy loop thread and the core dump disappear.
3. Add SDK lib path to LD_LIBRARY_PATH
Linker sometims can't find urts_sim and uae_service_sim when
running. Explicitly add path to LD_LIBRARY_PATH when running
occlum command.
Signed-off-by: sanqian.hcy <sanqian.hcy@antfin.com>
This commits is a dummy implementation of file advisory locks.
Specifically, for regular files, fcntl `F_SETLK` (i.e., acquiring
or releasing locks) always succeeds and fcntl `F_GETLK` (i.e., testing locks)
always returns no locks.
1. Move the system call handling functions into the "syscalls.rs"
2. Split syscall memory safe implementations into small sub-modules
3. Move the unix_socket and io_multiplexing into "net"
4. Remove some unnecessary code
It is slow to allocate big buffers using SGX SDK's malloc. Even worse, it
consumes a large amount of precious trusted memory inside enclaves. This
commit avoids using trusted buffers and allocates untrusted buffers for
sendmsg/recvmsg directly via OCall, thus improving the performance of
sendmsg/recvmsg. Note that this optimization does not affect the security of
network data as it has to be sent/received via OCalls.
Before this commit, using custom C types in ECalls/OCalls defined in Occlum's
EDL is cumbersme. Now this issue is resolved by providing `occlum_edl_types.h`
header file. There are two versions of this file: one is under
`src/libos/include/edl/` for LibOS, the other is under
`src/pal/include/edl/` for PAL. So now to define a new custom C type, just
edit the two versions of `occlum_edl_types.h` to define the type.
SGX SDK's sgx_init_quote may return SGX_ERROR_BUSY, which is previously not
handled. The implementation of ioctl for /dev/sgx is now fixed to handle this
error.
By providing Occlum PAL as a shared library, it is now possible to embed and
use Occlum in an user-controled process (instead of an Occlum-controlled one).
The APIs of Occlum PAL can be found in `src/pal/include/occlum_pal_api.h`. The
Occlum PAL library, namely `libocclum-pal.so`, can be found in `.occlum/build/lib`.
To use the library, check out the source code of `occlum-run` (under
`src/run`), which can be seen as a sample code for using the Occlum PAL
library.
* Fix readlink from `/proc/self/exe` to get absolute path of the executable file
* Add readlink from`/proc/self/fd/<fd>` to get the file's real path
Note that for now we only support read links _statically_, meaning that even
if the file or any of its ancestors is moved after the file is opened, the
absolute paths obtained from the API does not change.
The output buffer given to getdents may not be large enough for the next directory
entry. If no directory entries has been loaded into the buffer, just return
EINVAL. Otherwise, return the total length of the directory entries already
loaded in the buffer
1. Add a separate net/ directory for the network subsystem;
2. Move some existing socket code to net/;
3. Implement sendmsg/recvmsg with OCalls;
4. Extend client/server test cases.
1. Introduce the new infrastructure for ioctl support
2. Refactor the old ioctls to use the new infrastructure
3. Implement builtin ioctls (e.g., TIOCGWINSZ and TIOCSWINSZ for stdout)
4. Implement non-builtin, driver-specific ioctls (e.g., ioctls for /dev/sgx)
1. Use epoll_wait to support epoll_pwait as there is no signal mechanism
2. The timeout is fixed to zero for not waiting for any signal to come
to speed up
3. Change the test case of server_epoll to use epoll_pwait
BACKGROUND
The exit_group syscall, which is implicitly called by libc after the main function
returns, kills all threads in a thread group, even if these threads are
running, sleeping, or waiting on a futex.
PROBLEM
In normal use cases, exit_group does nothing since a well-written program
should terminate all threads before the main function returns. But when this is
not the case, exit_group can clean up the mess.
Currently, Occlum does not implement exit_group. And the Occlum PAL process
waits for all tasks (i.e., SGX threads) to finish before exiting. So without
exit_group implemented, some tasks may be still running if after the main task
exits. And this causes the Occlum PAL process to wait---forever.
WORKAROUND
To implement a real exit_group, we need signals to kill threads. But we do not
have signals, yet. So we come up with a workaround: instead of waiting all
tasks to finish in PAL, we just wait for the main task. As soon as the main
task exits, the PAL process terminates, killing the remaining tasks.
The original implementation of program loader is written under the assumption
that there are only two loadable segments per ELF, one is code, and the other
is data. But this assumption is unnecessary and proves to be wrong for an ELF
on Alpine Linux, which has two extra read-only, loadable segments for security
hardening. This commit clears the obstacle towards running unmodified
executables from Alpine Linux.
In addition to getting rid of the false assumption of two fixed loadable segments,
this commit improves the quality of the code related to program loading and
process initialization.
* 'occlum init' does not copy signing key file any more.
* 'occlum build' supports to set signing key and signing tool in args.
* 'occlum run' supports to run enclave in sgx release mode.
1. Now we support set App's env in Occlum.json, for example:
"env": [
"OCCLUM=yes",
"TEST=true"
]
2. Rewrite env test cases
3. Update Dockerfile to install "jq" tool
1. All generated, build files are now in a separate build directory;
2. The CLI tool supports three sub-commands: init, build, and run;
3. Refactor tests to use the new tool.
In addition, to ensure that all future Rust code complies with
`cargo fmt`, we add a Git post-commit hook that generates warnings
if the commited code is not formated consistently.
* Add patch to Rust SGX SDK to enable integrity-only SgxFile
* Upgrade to the new SEFS extended with the integrity-only mode
* Use integrity-only SEFS for /bin and /lib in test
* Add the MAC of integrity-only SEFS to Occlum.json in test
* Mount multiple FS according to Occlum.json
* Check the MACs of integrity-only SEFS images
The old system call mechanism works by relocating the symbol __occlum_syscall
provided by libocclum_stub.so to the real entry point of the LibOS. This symbol
relocation is done by the program loader. Now, the new system call mechanism is
based on passing the entry point via the auxiliary vector. This new mechanism
is simpler and is more compatible with the upcoming support for ld.so.
Changes:
1. Fix a bug in serializing auxiliary vector in the stack of a user program;
2. Passing syscall entry via auxiliary vector;
3. Remove relocating for the __occlum_syscall symbol;
4. Remove the dependency on libocclum_stub.so in tests.
There are two types of stacks: the kernel ones and the user ones. The kernel
stacks are used by Occlum and managed by Intel SGX SDK itself, while the user
stacks are used by the threads created and managed by Occlum. These user stacks
are transparent to Intel SGX SDK so far.
The problem is that Intel SGX SDK needs to be aware of the user stacks.
SGX exception handlers will check whether the rsp value---when the exception
happened---is within the stack of the current SGX thread. If the check fails,
the registered exception handler will not be triggered. But when exceptions are
triggered by the threads running upon Occlum, the rsp value points to the user
stacks, which Intel SGX SDK are completely unware of. So the check always
fails.
Therefore, we extend Intel SGX SDK with two new APIs:
int sgx_enable_user_stack(size_t stack_base, size_t stack_limit);
void sgx_disable_user_stack(void);
And this commit uses the two APIs to inform Intel SGX SDK about the
Occlum-managed stacks. And the rsp checks in SGX exception handlers will
check whether rsp is within the user stacks.
Init support for cpuid and rdtsc instruction handling in occlum.
This patch includes:
1. cpuid exception handler for all information leaves;
2. rdtsc exception handler;
3. handler registration;
4. cpuid test;
5. rdtsc test.
Signed-off-by: 散樗 <kailun.qkl@antfin.com>
The first bug is that a VMRange may not be allocated to a 4KB-aligned address.
The second bug is that a VMRange may not be deallocated by its parent VMRange.