Occlum is a single-address-space library OS. Previously, userspace memory are divided for each process.
And all the memory are allocated when the process is created, which leads to a lot of wasted space and
complicated configuration.
In the current implementation, the whole userspace is managed as a memory pool that consists of chunks. There
are two kinds of chunks:
(1) Single VMA chunk: a chunk with only one VMA. Should be owned by exactly one process.
(2) Multi VMA chunk: a chunk with default chunk size and there could be a lot of VMAs in this chunk. Can be used
by different processes.
This design can help to achieve mainly two goals:
(1) Simplify the configuration: Users don't need to configure the process.default_mmap_size anymore. And multiple processes
running in the same Occlum instance can use dramatically different sizes of memory.
(2) Gain better performance: Two-level management(chunks & VMAs) reduces the time for finding, inserting, deleting, and iterating.
A relay notifier that observes the underlying endpoint is added as the
notifier for the socket. It broadcasts to its observers when either end
of the channel has IoEvents.
Read, write, connect and accept have both blocking and nonblocking mode.
It may block after the status lock is acquired resulting in potential
deadlock. This commit resolve the deadlock issue.
1. Add OCCLUM_COV to conditionally enable gcov profiling for libos Rust
code;
2. Add a makefile target to locally generate the coverage report in html
format.
1. Five new ioctl commands of /dev/sgx are added for occlum
applications to securely get and verify DCAP quote;
2. Not all the functions of the intel DCAP package are open to
developers to simplify the DCAP usage;
3. The test may only run on the platform with DCAP driver installed;
4. A macro OCCLUM_DISABLE_DCAP is used to separate the DCAP code from
the other code.
5. Skip DCAP test when DCAP driver is not detected or in simulation mode
1. Implement type-safe functions;
2. Improve the correctness of nearly all the functions;
3. Improve the readability by introducing Listener and Endpoint for StreamUnix;
4. Substitue RingBuf with Channel in Unix socket.
This bugfix ensures that when an object of Producer/Consumer for
channels is dropped, its shutdown method is called automatically. This ensures
that the peer of a Producer/Consumer gets notified and won't wait indefinitely.
The current Tcmalloc has memory leak issue. So change it as optional. By
default, dlmalloc is used. Enable tcmalloc with below command:
make TCMALLOC=Y
Usually, files are unregistered from an epoll file via the EPOLL_CTL_DEL command
explicitly. But for the sake of users' convenience, Linux supports
unregistering a file automatically from the epoll files that monitor the file
when the file is closed. This commit adds this capability.
When using the optimized string lib in Occlum, the memset function would
use xmm0 register, as the result, the FP area initialization code would
modify the FP area before saving it. So just ignor the FP area
initialization code.
1. >> has higher precedence than &. Use parentheses to conduct & first;
2. In the latest Intel software developer's manual, cpuid leaf 06H EDX
is related to the logical processor.
Before this commit, the epoll implementation works by simply delegating to the
host OS through OCall. One major problem with this implementation is
that it can only handle files that are backed by a file of the host OS
(e.g., sockets), but not those are are mainly implemented by the LibOS
(e.g., pipes). Therefore, a new epoll implementation that can handle all
kinds of files is needed.
This commit completely rewrites the epoll implementation by leveraging
the new event subsystem. Now the new epoll can handle all file types:
1. Host files, e.g., sockets, eventfd;
2. LibOS files, e.g., pipes;
3. Hybrid files, e.g., epoll files.
For a new file type to support epoll, it only neends to implement no
more than four methods of the File trait:
* poll (required for all file types);
* notifier (required for all file files);
* host_fd (only required for host files);
* recv_host_events (only required for host files).
1. Introduce channels, which provide an efficient means for IPC;
2. Leverage channels to rewrite pipe, improving the performance (3X),
robustness, and readability.
This pipe rewrite is not done: some more commits will be added to
implement poll and epoll for pipe.
An event can be anything ranging from the exit of a process (interesting
to `wait4`) to the arrival of a blocked signal (interesting to
`sigwaitinfo`), from the completion of a file operation (interesting to
`epoll`) to the change of a file status (interesting to `inotify`).
To meet the event-related demands from various subsystems, this event
subsystem is designed to provide a set of general-purpose primitives:
* `Waiter`, `Waker`, and `WaiterQueue` are primitives to put threads
to sleep and later wake them up.
* `Event`, `Observer`, and `Notifier` are primitives to handle and
broadcast events.
* `WaiterQueueObserver` implements the common pattern of waking up
threads once some interesting events happen.
Socket-related ocalls, e.g, sendto, sendmsg and write, may cause SIGPIPE
in host. Since the ocall is called by libos, this kind of signal should
be handled in libos. We ignore SIGPIPE in host and raise the same signal
in libos if the return value of the above ocalls is EPIPE. In this way
the signal is handled by libos.
This commit mainly accomplish two things:
1. Use makefile to manage dependencies for `occlum build`, which can save lots of time
2. Take dirs `build`, `run` outside from `.occlum`. Remove env var "OCCLUM_INSTANCE_DIR"
Rlimit are now on the same page of memory space limits defined in Occlum.json. Specific
memory size configuration can be set to child process with `prlimit` syscall or using `ulimit`
command in shell script.
Struct sigaction has a field named sa_mask, which specifies the blocked
signals while executing the signal handler. Previously, this field is not
supported. This commit adds this missing feature.
There are scenarios where the available CPUs are less than all the CPUs
on the machine. Therefore, sched_get/setaffinity should be allowed when
the input buffer size is no less than the available CPUs but less than
all the CPUs.
This reverts commit 1e456f025d6b4e34a726180e7a27a04424fe79d1.
This commit results in segmentation fault when the application munmaps
its own stack. Should be committed back after removing the dependency of
sysret on the user space stack.
The new interrupt subsystem breaks the simulation mode in two ways:
1. The signal 64 is not handled by Intel SGX SDK in simulation mode. A
handled real-time signal crashes the process.
2. The newly-enabled test case exit_group depends on interrupts. But
enclave interrupts, like enclave exceptions, are not supported in
simulation mode.
This commit ensures signal 64 is ignored by default and exit_group test
case is not enabled in simulation mode.