A relay notifier that observes the underlying endpoint is added as the
notifier for the socket. It broadcasts to its observers when either end
of the channel has IoEvents.
Read, write, connect and accept have both blocking and nonblocking mode.
It may block after the status lock is acquired resulting in potential
deadlock. This commit resolve the deadlock issue.
1. Add OCCLUM_COV to conditionally enable gcov profiling for libos Rust
code;
2. Add a makefile target to locally generate the coverage report in html
format.
1. Five new ioctl commands of /dev/sgx are added for occlum
applications to securely get and verify DCAP quote;
2. Not all the functions of the intel DCAP package are open to
developers to simplify the DCAP usage;
3. The test may only run on the platform with DCAP driver installed;
4. A macro OCCLUM_DISABLE_DCAP is used to separate the DCAP code from
the other code.
5. Skip DCAP test when DCAP driver is not detected or in simulation mode
1. Implement type-safe functions;
2. Improve the correctness of nearly all the functions;
3. Improve the readability by introducing Listener and Endpoint for StreamUnix;
4. Substitue RingBuf with Channel in Unix socket.
This bugfix ensures that when an object of Producer/Consumer for
channels is dropped, its shutdown method is called automatically. This ensures
that the peer of a Producer/Consumer gets notified and won't wait indefinitely.
Usually, files are unregistered from an epoll file via the EPOLL_CTL_DEL command
explicitly. But for the sake of users' convenience, Linux supports
unregistering a file automatically from the epoll files that monitor the file
when the file is closed. This commit adds this capability.
When using the optimized string lib in Occlum, the memset function would
use xmm0 register, as the result, the FP area initialization code would
modify the FP area before saving it. So just ignor the FP area
initialization code.
1. >> has higher precedence than &. Use parentheses to conduct & first;
2. In the latest Intel software developer's manual, cpuid leaf 06H EDX
is related to the logical processor.
Before this commit, the epoll implementation works by simply delegating to the
host OS through OCall. One major problem with this implementation is
that it can only handle files that are backed by a file of the host OS
(e.g., sockets), but not those are are mainly implemented by the LibOS
(e.g., pipes). Therefore, a new epoll implementation that can handle all
kinds of files is needed.
This commit completely rewrites the epoll implementation by leveraging
the new event subsystem. Now the new epoll can handle all file types:
1. Host files, e.g., sockets, eventfd;
2. LibOS files, e.g., pipes;
3. Hybrid files, e.g., epoll files.
For a new file type to support epoll, it only neends to implement no
more than four methods of the File trait:
* poll (required for all file types);
* notifier (required for all file files);
* host_fd (only required for host files);
* recv_host_events (only required for host files).
1. Introduce channels, which provide an efficient means for IPC;
2. Leverage channels to rewrite pipe, improving the performance (3X),
robustness, and readability.
This pipe rewrite is not done: some more commits will be added to
implement poll and epoll for pipe.
An event can be anything ranging from the exit of a process (interesting
to `wait4`) to the arrival of a blocked signal (interesting to
`sigwaitinfo`), from the completion of a file operation (interesting to
`epoll`) to the change of a file status (interesting to `inotify`).
To meet the event-related demands from various subsystems, this event
subsystem is designed to provide a set of general-purpose primitives:
* `Waiter`, `Waker`, and `WaiterQueue` are primitives to put threads
to sleep and later wake them up.
* `Event`, `Observer`, and `Notifier` are primitives to handle and
broadcast events.
* `WaiterQueueObserver` implements the common pattern of waking up
threads once some interesting events happen.
Socket-related ocalls, e.g, sendto, sendmsg and write, may cause SIGPIPE
in host. Since the ocall is called by libos, this kind of signal should
be handled in libos. We ignore SIGPIPE in host and raise the same signal
in libos if the return value of the above ocalls is EPIPE. In this way
the signal is handled by libos.
Rlimit are now on the same page of memory space limits defined in Occlum.json. Specific
memory size configuration can be set to child process with `prlimit` syscall or using `ulimit`
command in shell script.
Struct sigaction has a field named sa_mask, which specifies the blocked
signals while executing the signal handler. Previously, this field is not
supported. This commit adds this missing feature.
There are scenarios where the available CPUs are less than all the CPUs
on the machine. Therefore, sched_get/setaffinity should be allowed when
the input buffer size is no less than the available CPUs but less than
all the CPUs.
This reverts commit 1e456f025d6b4e34a726180e7a27a04424fe79d1.
This commit results in segmentation fault when the application munmaps
its own stack. Should be committed back after removing the dependency of
sysret on the user space stack.
Before this commit, events like signals and exit_group are handled by
LibOS threads in a cooperative fashion: if the user code executed by a
LibOS thread does not invoke system calls (e.g., a busy loop), then the LibOS
won't have any opportunity to take control and handle events.
With the help from the POSIX signal-based interrupt mechanism of
Occlum's version of Intel SGX SDK, the LibOS can now interrupt the
execution of arbitrary user code in a LibOS thread by sending real-time
POSIX signals (the signal number is 64) to it. These signals are sent by
a helper thread spawn by Occlum PAL. The helper thread periodically
enters into the enclave to check if there are any LibOS threads with
pending events. If any, the helper thread broadcast POSIX signals to
them. When interrupted by a signal, the receiver LibOS thread may be in
one of the two previously problematic states in terms of event handling:
1. Executing non-cooperative user code (e.g., a busy loop). In this
case, the signal will trigger an interrupt handler inside the enclave,
which can then enter the LibOS kernel to deal with any pending events.
2. Executing an OCall that invokes blocking system calls (e.g., futex,
nanosleep, or blocking I/O). In this case, the signal will interrupt the
blocking system call so that the OCall can return back to the enclave.
Thanks to the new interrupt subsystem, some event-based system calls
are made robust. One such example is exit_group. We can now guarantee
that exit_group can force any thread in a process to exit.
The first bug is a race condition when acquiring the lock of a process's
parent. An example code with race condition looks like below:
```rust
let process : ProessRef = current!().process();
let parent : ProcessRef = process.parent();
let parent_guard : SgxMutexGuard<ProesssInner> = parent.inner();
// This assertion may fail because the process's parent may change to another
// process before the lock is acquired
assert!(parent.pid() == process.parent().pid());
```
The second bug is that when a process exits, its children processes are
not transfered to the idle process correctly.
1. Move the memory zeroization of mmap to munmap to increase mmap
performance
2. Do memory zeroizaiton during the drop of VMManager to guarentee all
allocated memory is zeroized before the next allocation
It turns out taking a lock in every system call is a significant
performance bottleneck. In light of this finding, we replace a mutex in
a critical path of system call with an atomic boolean.
This rewrite serves three purposes:
1. Fix some subtle bugs in the old implementation;
2. Implement mremap using mmap and munmap so that mremap can automatically
enjoy new features (e.g., mprotect and memory permissions) once mmap and
munmap support the feature.
3. Write down the invariants hold by VMManager explictly so that the correctness
of the new implementation can be reason more easily.
Update the occlum.json to align with the gen_enclave_conf design.
Below is the two updated structures:
"metadata": {
"product_id": 0,
"version_number": 0,
"debuggable": true
},
"resource_limits": {
"max_num_of_threads": 32,
"kernel_space_heap_size": "32MB",
"kernel_space_stack_size": "1MB",
"user_space_size": "256MB"
}