Before this commit, the epoll implementation works by simply delegating to the
host OS through OCall. One major problem with this implementation is
that it can only handle files that are backed by a file of the host OS
(e.g., sockets), but not those are are mainly implemented by the LibOS
(e.g., pipes). Therefore, a new epoll implementation that can handle all
kinds of files is needed.
This commit completely rewrites the epoll implementation by leveraging
the new event subsystem. Now the new epoll can handle all file types:
1. Host files, e.g., sockets, eventfd;
2. LibOS files, e.g., pipes;
3. Hybrid files, e.g., epoll files.
For a new file type to support epoll, it only neends to implement no
more than four methods of the File trait:
* poll (required for all file types);
* notifier (required for all file files);
* host_fd (only required for host files);
* recv_host_events (only required for host files).
1. Introduce channels, which provide an efficient means for IPC;
2. Leverage channels to rewrite pipe, improving the performance (3X),
robustness, and readability.
This pipe rewrite is not done: some more commits will be added to
implement poll and epoll for pipe.
An event can be anything ranging from the exit of a process (interesting
to `wait4`) to the arrival of a blocked signal (interesting to
`sigwaitinfo`), from the completion of a file operation (interesting to
`epoll`) to the change of a file status (interesting to `inotify`).
To meet the event-related demands from various subsystems, this event
subsystem is designed to provide a set of general-purpose primitives:
* `Waiter`, `Waker`, and `WaiterQueue` are primitives to put threads
to sleep and later wake them up.
* `Event`, `Observer`, and `Notifier` are primitives to handle and
broadcast events.
* `WaiterQueueObserver` implements the common pattern of waking up
threads once some interesting events happen.
Socket-related ocalls, e.g, sendto, sendmsg and write, may cause SIGPIPE
in host. Since the ocall is called by libos, this kind of signal should
be handled in libos. We ignore SIGPIPE in host and raise the same signal
in libos if the return value of the above ocalls is EPIPE. In this way
the signal is handled by libos.
Rlimit are now on the same page of memory space limits defined in Occlum.json. Specific
memory size configuration can be set to child process with `prlimit` syscall or using `ulimit`
command in shell script.