Linux provides certain security mechanisms that are used by containers such as Docker, LXD and systemd-nspawn. We can use the same mechanisms to sandbox systemd services shipped by the distribution or the ones we write ourselves. The purpose is to protect the system even if the service is compromised.

Arch Linux maintainers use several of these options in the system unit files that they ship, while Debian and Ubuntu maintainers generally only use the options that the upstream developer has used, if any. For examples, have a look at the systemd unit files for memcached and mariadb in /usr/lib/systemd/system.

These options generally fall in these categories:

  • Filesystem namespace
  • Other namespaces such as user
  • Capabilities
  • seccomp and system call filter

Some of these options have a performance cost, in particular seccomp:

Although someone could simply enable all available options and take the performance hit when security is critical over anything else, in other cases it makes sense to try to balance security and performance and carefully pick the options that provide most security benefit for an acceptable performance cost.

Get a report on a service’s security score

The first step is to generate a report on the service

sudo systemd-analyze security mydaemon.service --no-pager

This will give as a total score. Notice that the available options have a weight, according to their estimated impact on security.

Then we can add a snippet with extra options using

sudo systemctl edit mydaemon.service

List of options

I had a look at the source of systemd v248.3

Filesystem namespace: low cost, high impact

src/core/namespace.h

  • ProtectHome
  • ProtectSystem
  • ProtectProc
  • ProcSubset
  • ProtectKernelTunables
  • ProtectKernelModules
  • BindMount
  • MountImage

Capabilities: low cost, high impact (man prctl)

  • CapabilityBoundingSet
  • AmbientCapabilities
  • NoNewPrivileges

Other options to consider for a chroot:

Also namespace thus low cost.

  • RootDirectory
  • MountAPIVFS
  • PrivateUsers
  • DynamicUser

Seccomp (high cost, varying impact)

src/core/execute.c #if HAVE_SECCOMP ... #endif

  • SystemCallFilter=
  • SystemCallLog=
  • SystemCallArchitectures=
  • RestrictAddressFamilies=
  • MemoryDenyWriteExecute=
  • RestrictRealtime=
  • RestrictSUIDSGID=
  • ProtectKernelTunables=
  • ProtectKernelModules=
  • ProtectKernelLogs=
  • ProtectClock=
  • PrivateDevices=
  • RestrictNamespaces=
  • LockPersonality=

I think it makes sense to use most of the filesystem sandboxing options and capabilties and then pick those seccomp options that will have the most impact on the security of the system, using the report of systemd-analyze security as a guide.

See systemd service configuration options for a full list of currently available options. Some options may not be supported by older systemd versions. See analyze/analyze-security.c for the weight of options or the report of systemd-analyze security.

Example systemd unit

# Tuned after:
# sudo systemd-analyze security caddy.service --no-pager

## Filesystem namespace options (cheap)

# Mount most things read-only and set read-write paths
ProtectSystem=strict
ReadWritePaths=/var/lib/caddy /var/log/caddy
InaccessiblePaths=...
ProtectHome=true
PrivateTmp=true
ProtectProc=invisible
ProtectKernelTunables=true
ProtectControlGroups=true

## Capabilities (man prctl) (cheap)
NoNewPrivileges=true
#CapabilityBoundingSet=
CapabilityBoundingSet=CAP_NET_BIND_SERVICE

## Seccomp (expensive)

# High impact
RestrictAddressFamilies=AF_UNIX AF_INET AF_INET6
RestrictNamespaces=true

# Misc recommended
PrivateDevices=true
ProtectKernelModules=true
ProtectKernelLogs=true
ProtectClock=true

# Other
#RestrictSUIDSGID=true
#RestrictRealtime=true
#MemoryDenyWriteExecute=true
#LockPersonality=true
#SystemCallArchitectures=native
#SystemCallFilter=@system-service
#SystemCallErrorNumber=EPERM