Struct sigaction has a field named sa_mask, which specifies the blocked
signals while executing the signal handler. Previously, this field is not
supported. This commit adds this missing feature.
There are scenarios where the available CPUs are less than all the CPUs
on the machine. Therefore, sched_get/setaffinity should be allowed when
the input buffer size is no less than the available CPUs but less than
all the CPUs.
This reverts commit 1e456f025d6b4e34a726180e7a27a04424fe79d1.
This commit results in segmentation fault when the application munmaps
its own stack. Should be committed back after removing the dependency of
sysret on the user space stack.
The new interrupt subsystem breaks the simulation mode in two ways:
1. The signal 64 is not handled by Intel SGX SDK in simulation mode. A
handled real-time signal crashes the process.
2. The newly-enabled test case exit_group depends on interrupts. But
enclave interrupts, like enclave exceptions, are not supported in
simulation mode.
This commit ensures signal 64 is ignored by default and exit_group test
case is not enabled in simulation mode.
Before this commit, events like signals and exit_group are handled by
LibOS threads in a cooperative fashion: if the user code executed by a
LibOS thread does not invoke system calls (e.g., a busy loop), then the LibOS
won't have any opportunity to take control and handle events.
With the help from the POSIX signal-based interrupt mechanism of
Occlum's version of Intel SGX SDK, the LibOS can now interrupt the
execution of arbitrary user code in a LibOS thread by sending real-time
POSIX signals (the signal number is 64) to it. These signals are sent by
a helper thread spawn by Occlum PAL. The helper thread periodically
enters into the enclave to check if there are any LibOS threads with
pending events. If any, the helper thread broadcast POSIX signals to
them. When interrupted by a signal, the receiver LibOS thread may be in
one of the two previously problematic states in terms of event handling:
1. Executing non-cooperative user code (e.g., a busy loop). In this
case, the signal will trigger an interrupt handler inside the enclave,
which can then enter the LibOS kernel to deal with any pending events.
2. Executing an OCall that invokes blocking system calls (e.g., futex,
nanosleep, or blocking I/O). In this case, the signal will interrupt the
blocking system call so that the OCall can return back to the enclave.
Thanks to the new interrupt subsystem, some event-based system calls
are made robust. One such example is exit_group. We can now guarantee
that exit_group can force any thread in a process to exit.
The first bug is a race condition when acquiring the lock of a process's
parent. An example code with race condition looks like below:
```rust
let process : ProessRef = current!().process();
let parent : ProcessRef = process.parent();
let parent_guard : SgxMutexGuard<ProesssInner> = parent.inner();
// This assertion may fail because the process's parent may change to another
// process before the lock is acquired
assert!(parent.pid() == process.parent().pid());
```
The second bug is that when a process exits, its children processes are
not transfered to the idle process correctly.
1. Move the memory zeroization of mmap to munmap to increase mmap
performance
2. Do memory zeroizaiton during the drop of VMManager to guarentee all
allocated memory is zeroized before the next allocation
This commits consists of three major changes:
1. Support a new interface to get the base64 quote only.
This is useful in the case that application sends the quote
to service provider server and get the final IAS report there.
The application itself doesn't depend on IAS in this case.
2. Improve the C++ programming style. Now, we only provide
C++ classes and limited C APIs(for configuration and sgx device).
3. Use the more general keywords as names prefix.
Signed-off-by: Junxian Xiao <junxian.xjx@antfin.com>
It turns out taking a lock in every system call is a significant
performance bottleneck. In light of this finding, we replace a mutex in
a critical path of system call with an atomic boolean.
This rewrite serves three purposes:
1. Fix some subtle bugs in the old implementation;
2. Implement mremap using mmap and munmap so that mremap can automatically
enjoy new features (e.g., mprotect and memory permissions) once mmap and
munmap support the feature.
3. Write down the invariants hold by VMManager explictly so that the correctness
of the new implementation can be reason more easily.
Not all config entries are created equal: some are more likely to be
customized by users, some are not so often. This commit reorders the
config entries in descending order of expected popularity.
Update the occlum.json to align with the gen_enclave_conf design.
Below is the two updated structures:
"metadata": {
"product_id": 0,
"version_number": 0,
"debuggable": true
},
"resource_limits": {
"max_num_of_threads": 32,
"kernel_space_heap_size": "32MB",
"kernel_space_stack_size": "1MB",
"user_space_size": "256MB"
}