In this commit, we add eight signal-related syscalls
* kill
* tkill
* tgkill
* rt_sigaction
* rt_sigreturn
* rt_sigprocmask
* rt_sigpending
* exit_group
We implement the following major features for signals:
* Generate, mask, and deliver signals
* Support user-defined signal handlers
* Support nested invocation of signal handlers
* Support passing arguments: signum, sigaction, and ucontext
* Support both process-directed and thread-directed signals
* Capture hardware exceptions and convert them to signals
* Deliver fatal signals (like SIGKILL) to kill processes gracefully
But we still have gaps, including but not limited to the points below:
* Convert #PF (page fault) and #GP (general protection) exceptions to signals
* Force delivery of signals via interrupt
* Support simulation mode
This commits improves both readability and correctness of the scheduling-related
system calls. In terms of readability, it extracts all scheduling-related code
ouf of the process/ directory and put it in a sched/ directory. In terms
of correctness, the new scheduling subsystem introduces CpuSet and SchedAgent
types to maintain and manipulate CPU scheduler settings in a secure and robust way.
As a major rewrite to the process/thread subsystem, this commits:
1. Implements threads as a first-class object, which represents a group of OS resources
and a thread of execution;
2. Implements processes as a first-class object that manages threads and maintains
the parent-child relationship between processes;
3. Refactors the code in process subsystem to follow the improved coding style and
conventions emerged in recent commits;
4. Refactors the code in other subsystems to use the new process/thread subsystem.
Now one can specify the log level of the LibOS by setting `OCCLUM_LOG_LEVEL`
environment variable. The possible values are "off", "error", "warn",
"info", and "trace".
However, for the sake of security, the log level of a release enclave
(DisableDebug = 1 in Enclave.xml) is always "off" (i.e., no log) regardless of
the log level specified by the untrusted environment.
1. Use arch_prctl to replace RDFSBASE/WRFSBASE
Ptrace can't get right value if WRFSBASE is called which
will make debugger fail in simulation mode. Use arch_prctl
to replace these instructions in simulation mode.
2. Disable the busy thread in exit_group test
exit_group doesn't have a real implementation yet but test
under SGX simulation mode give core dump for exit_group test.
Disable the busy loop thread and the core dump disappear.
3. Add SDK lib path to LD_LIBRARY_PATH
Linker sometims can't find urts_sim and uae_service_sim when
running. Explicitly add path to LD_LIBRARY_PATH when running
occlum command.
Signed-off-by: sanqian.hcy <sanqian.hcy@antfin.com>
This commits is a dummy implementation of file advisory locks.
Specifically, for regular files, fcntl `F_SETLK` (i.e., acquiring
or releasing locks) always succeeds and fcntl `F_GETLK` (i.e., testing locks)
always returns no locks.
It is slow to allocate big buffers using SGX SDK's malloc. Even worse, it
consumes a large amount of precious trusted memory inside enclaves. This
commit avoids using trusted buffers and allocates untrusted buffers for
sendmsg/recvmsg directly via OCall, thus improving the performance of
sendmsg/recvmsg. Note that this optimization does not affect the security of
network data as it has to be sent/received via OCalls.
SGX SDK's sgx_init_quote may return SGX_ERROR_BUSY, which is previously not
handled. The implementation of ioctl for /dev/sgx is now fixed to handle this
error.
* Fix readlink from `/proc/self/exe` to get absolute path of the executable file
* Add readlink from`/proc/self/fd/<fd>` to get the file's real path
Note that for now we only support read links _statically_, meaning that even
if the file or any of its ancestors is moved after the file is opened, the
absolute paths obtained from the API does not change.
The output buffer given to getdents may not be large enough for the next directory
entry. If no directory entries has been loaded into the buffer, just return
EINVAL. Otherwise, return the total length of the directory entries already
loaded in the buffer
1. Add a separate net/ directory for the network subsystem;
2. Move some existing socket code to net/;
3. Implement sendmsg/recvmsg with OCalls;
4. Extend client/server test cases.
1. Introduce the new infrastructure for ioctl support
2. Refactor the old ioctls to use the new infrastructure
3. Implement builtin ioctls (e.g., TIOCGWINSZ and TIOCSWINSZ for stdout)
4. Implement non-builtin, driver-specific ioctls (e.g., ioctls for /dev/sgx)
1. Use epoll_wait to support epoll_pwait as there is no signal mechanism
2. The timeout is fixed to zero for not waiting for any signal to come
to speed up
3. Change the test case of server_epoll to use epoll_pwait
BACKGROUND
The exit_group syscall, which is implicitly called by libc after the main function
returns, kills all threads in a thread group, even if these threads are
running, sleeping, or waiting on a futex.
PROBLEM
In normal use cases, exit_group does nothing since a well-written program
should terminate all threads before the main function returns. But when this is
not the case, exit_group can clean up the mess.
Currently, Occlum does not implement exit_group. And the Occlum PAL process
waits for all tasks (i.e., SGX threads) to finish before exiting. So without
exit_group implemented, some tasks may be still running if after the main task
exits. And this causes the Occlum PAL process to wait---forever.
WORKAROUND
To implement a real exit_group, we need signals to kill threads. But we do not
have signals, yet. So we come up with a workaround: instead of waiting all
tasks to finish in PAL, we just wait for the main task. As soon as the main
task exits, the PAL process terminates, killing the remaining tasks.
The original implementation of program loader is written under the assumption
that there are only two loadable segments per ELF, one is code, and the other
is data. But this assumption is unnecessary and proves to be wrong for an ELF
on Alpine Linux, which has two extra read-only, loadable segments for security
hardening. This commit clears the obstacle towards running unmodified
executables from Alpine Linux.
In addition to getting rid of the false assumption of two fixed loadable segments,
this commit improves the quality of the code related to program loading and
process initialization.