Linux provides certain security mechanisms that are used by containers such as Docker, LXD and systemd-nspawn. We can use the same mechanisms to sandbox systemd services shipped by the distribution or the ones we write ourselves. The purpose is to protect the system even if the service is compromised.
Arch Linux maintainers use several of these options in the system unit files that they ship, while Debian and Ubuntu maintainers generally only use the options that the upstream developer has used, if any. For examples, have a look at the systemd unit files for memcached and mariadb in
These options generally fall in these categories:
- Filesystem namespace
- Other namespaces such as user
- seccomp and system call filter
Some of these options have a performance cost, in particular seccomp:
- https://firstname.lastname@example.org/T/, see Lennart Poettering’s comments
Although someone could simply enable all available options and take the performance hit when security is critical over anything else, in other cases it makes sense to try to balance security and performance and carefully pick the options that provide most security benefit for an acceptable performance cost.
Get a report on a service’s security score
The first step is to generate a report on the service
sudo systemd-analyze security mydaemon.service --no-pager
This will give as a total score. Notice that the available options have a weight, according to their estimated impact on security.
Then we can add a snippet with extra options using
sudo systemctl edit mydaemon.service
List of options
I had a look at the source of systemd v248.3
Filesystem namespace: low cost, high impact
Capabilities: low cost, high impact (man prctl)
Other options to consider for a chroot:
Also namespace thus low cost.
Seccomp (high cost, varying impact)
#if HAVE_SECCOMP ... #endif
I think it makes sense to use most of the filesystem sandboxing options and capabilties and then pick those seccomp options that will have the most impact on the security of the system, using the report of
systemd-analyze security as a guide.
See systemd service configuration options for a full list of currently available options. Some options may not be supported by older systemd versions. See
analyze/analyze-security.c for the weight of options or the report of
Example systemd unit
# Tuned after: # sudo systemd-analyze security caddy.service --no-pager ## Filesystem namespace options (cheap) # Mount most things read-only and set read-write paths ProtectSystem=strict ReadWritePaths=/var/lib/caddy /var/log/caddy InaccessiblePaths=... ProtectHome=true PrivateTmp=true ProtectProc=invisible ProtectKernelTunables=true ProtectControlGroups=true ## Capabilities (man prctl) (cheap) NoNewPrivileges=true #CapabilityBoundingSet= CapabilityBoundingSet=CAP_NET_BIND_SERVICE ## Seccomp (expensive) # High impact RestrictAddressFamilies=AF_UNIX AF_INET AF_INET6 RestrictNamespaces=true # Misc recommended PrivateDevices=true ProtectKernelModules=true ProtectKernelLogs=true ProtectClock=true # Other #RestrictSUIDSGID=true #RestrictRealtime=true #MemoryDenyWriteExecute=true #LockPersonality=true #SystemCallArchitectures=native #SystemCallFilter=@system-service #SystemCallErrorNumber=EPERM