An event can be anything ranging from the exit of a process (interesting
to `wait4`) to the arrival of a blocked signal (interesting to
`sigwaitinfo`), from the completion of a file operation (interesting to
`epoll`) to the change of a file status (interesting to `inotify`).
To meet the event-related demands from various subsystems, this event
subsystem is designed to provide a set of general-purpose primitives:
* `Waiter`, `Waker`, and `WaiterQueue` are primitives to put threads
to sleep and later wake them up.
* `Event`, `Observer`, and `Notifier` are primitives to handle and
broadcast events.
* `WaiterQueueObserver` implements the common pattern of waking up
threads once some interesting events happen.
Socket-related ocalls, e.g, sendto, sendmsg and write, may cause SIGPIPE
in host. Since the ocall is called by libos, this kind of signal should
be handled in libos. We ignore SIGPIPE in host and raise the same signal
in libos if the return value of the above ocalls is EPIPE. In this way
the signal is handled by libos.
This commit mainly accomplish two things:
1. Use makefile to manage dependencies for `occlum build`, which can save lots of time
2. Take dirs `build`, `run` outside from `.occlum`. Remove env var "OCCLUM_INSTANCE_DIR"
The new interrupt subsystem breaks the simulation mode in two ways:
1. The signal 64 is not handled by Intel SGX SDK in simulation mode. A
handled real-time signal crashes the process.
2. The newly-enabled test case exit_group depends on interrupts. But
enclave interrupts, like enclave exceptions, are not supported in
simulation mode.
This commit ensures signal 64 is ignored by default and exit_group test
case is not enabled in simulation mode.
Before this commit, events like signals and exit_group are handled by
LibOS threads in a cooperative fashion: if the user code executed by a
LibOS thread does not invoke system calls (e.g., a busy loop), then the LibOS
won't have any opportunity to take control and handle events.
With the help from the POSIX signal-based interrupt mechanism of
Occlum's version of Intel SGX SDK, the LibOS can now interrupt the
execution of arbitrary user code in a LibOS thread by sending real-time
POSIX signals (the signal number is 64) to it. These signals are sent by
a helper thread spawn by Occlum PAL. The helper thread periodically
enters into the enclave to check if there are any LibOS threads with
pending events. If any, the helper thread broadcast POSIX signals to
them. When interrupted by a signal, the receiver LibOS thread may be in
one of the two previously problematic states in terms of event handling:
1. Executing non-cooperative user code (e.g., a busy loop). In this
case, the signal will trigger an interrupt handler inside the enclave,
which can then enter the LibOS kernel to deal with any pending events.
2. Executing an OCall that invokes blocking system calls (e.g., futex,
nanosleep, or blocking I/O). In this case, the signal will interrupt the
blocking system call so that the OCall can return back to the enclave.
Thanks to the new interrupt subsystem, some event-based system calls
are made robust. One such example is exit_group. We can now guarantee
that exit_group can force any thread in a process to exit.
On lightweight Linux distribution, like alpine, getpwuid()
returns NULL, and errno is ENOENT, this patch fix crash
caused by this situation.
Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com>
Add "untrusted" sections for environment variables defined in Occlum.json. Environment
variable defined in "default" will be shown in libos directly. Environment variable
defined in "untrusted" can be passed from occlum run or PAL layer and can override
the value in "default" and thus is considered "untrusted".
Before this commit, the three ECalls of the LibOS enclave do not give
the exact reason on error. In this commit, we modify the enclave entry code
to return the errno and list all possible values of errno in Enclave.edl.
In this commit, we add eight signal-related syscalls
* kill
* tkill
* tgkill
* rt_sigaction
* rt_sigreturn
* rt_sigprocmask
* rt_sigpending
* exit_group
We implement the following major features for signals:
* Generate, mask, and deliver signals
* Support user-defined signal handlers
* Support nested invocation of signal handlers
* Support passing arguments: signum, sigaction, and ucontext
* Support both process-directed and thread-directed signals
* Capture hardware exceptions and convert them to signals
* Deliver fatal signals (like SIGKILL) to kill processes gracefully
But we still have gaps, including but not limited to the points below:
* Convert #PF (page fault) and #GP (general protection) exceptions to signals
* Force delivery of signals via interrupt
* Support simulation mode
This commits improves both readability and correctness of the scheduling-related
system calls. In terms of readability, it extracts all scheduling-related code
ouf of the process/ directory and put it in a sched/ directory. In terms
of correctness, the new scheduling subsystem introduces CpuSet and SchedAgent
types to maintain and manipulate CPU scheduler settings in a secure and robust way.
Now one can specify the log level of the LibOS by setting `OCCLUM_LOG_LEVEL`
environment variable. The possible values are "off", "error", "warn",
"info", and "trace".
However, for the sake of security, the log level of a release enclave
(DisableDebug = 1 in Enclave.xml) is always "off" (i.e., no log) regardless of
the log level specified by the untrusted environment.