about summary refs log tree commit diff
path: root/nixos
diff options
context:
space:
mode:
authoraszlig <aszlig@nix.build>2019-03-10 12:21:55 +0100
committeraszlig <aszlig@nix.build>2019-03-14 19:14:01 +0100
commitac64ce994509aaad8c5b55254595a5f989ba24e9 (patch)
tree93dedf4a452ccec6050dd73227b3109c5eba49f8 /nixos
parentb703c4d998ff9d07f902276a2f357f883292cf44 (diff)
downloadnixlib-ac64ce994509aaad8c5b55254595a5f989ba24e9.tar
nixlib-ac64ce994509aaad8c5b55254595a5f989ba24e9.tar.gz
nixlib-ac64ce994509aaad8c5b55254595a5f989ba24e9.tar.bz2
nixlib-ac64ce994509aaad8c5b55254595a5f989ba24e9.tar.lz
nixlib-ac64ce994509aaad8c5b55254595a5f989ba24e9.tar.xz
nixlib-ac64ce994509aaad8c5b55254595a5f989ba24e9.tar.zst
nixlib-ac64ce994509aaad8c5b55254595a5f989ba24e9.zip
nixos: Add 'chroot' options to systemd.services
Currently, if you want to properly chroot a systemd service, you could
do it using BindReadOnlyPaths=/nix/store (which is not what I'd call
"properly", because the whole store is still accessible) or use a
separate derivation that gathers the runtime closure of the service you
want to chroot. The former is the easier method and there is also a
method directly offered by systemd, called ProtectSystem, which still
leaves the whole store accessible. The latter however is a bit more
involved, because you need to bind-mount each store path of the runtime
closure of the service you want to chroot.

This can be achieved using pkgs.closureInfo and a small derivation that
packs everything into a systemd unit, which later can be added to
systemd.packages. That's also what I did several times[1][2] in the
past.

However, this process got a bit tedious, so I decided that it would be
generally useful for NixOS, so this very implementation was born.

Now if you want to chroot a systemd service, all you need to do is:

  {
    systemd.services.yourservice = {
      description = "My Shiny Service";
      wantedBy = [ "multi-user.target" ];

      chroot.enable = true;
      serviceConfig.ExecStart = "${pkgs.myservice}/bin/myservice";
    };
  }

If more than the dependencies for the ExecStart* and ExecStop* (which
btw. also includes "script" and {pre,post}Start) need to be in the
chroot, it can be specified using the chroot.packages option. By
default (which uses the "full-apivfs"[3] confinement mode), a user
namespace is set up as well and /proc, /sys and /dev are mounted
appropriately.

In addition - and by default - a /bin/sh executable is provided as well,
which is useful for most programs that use the system() C library call
to execute commands via shell. The shell providing /bin/sh is dash
instead of the default in NixOS (which is bash), because it's way more
lightweight and after all we're chrooting because we want to lower the
attack surface and it should be only used for "/bin/sh -c something".

Prior to submitting this here, I did a first implementation of this
outside[4] of nixpkgs, which duplicated the "pathSafeName" functionality
from systemd-lib.nix, just because it's only a single line.

However, I decided to just re-use the one from systemd here and
subsequently made it available when importing systemd-lib.nix, so that
the systemd-chroot implementation also benefits from fixes to that
functionality (which is now a proper function).

Unfortunately, we do have a few limitations as well. The first being
that DynamicUser doesn't work in conjunction with tmpfs, because it
already sets up a tmpfs in a different path and simply ignores the one
we define. We could probably solve this by detecting it and try to
bind-mount our paths to that different path whenever DynamicUser is
enabled.

The second limitation/issue is that RootDirectoryStartOnly doesn't work
right now, because it only affects the RootDirectory option and not the
individual bind mounts or our tmpfs. It would be helpful if systemd
would have a way to disable specific bind mounts as well or at least
have some way to ignore failures for the bind mounts/tmpfs setup.

Another quirk we do have right now is that systemd tries to create a
/usr directory within the chroot, which subsequently fails. Fortunately,
this is just an ugly error and not a hard failure.

[1]: https://github.com/headcounter/shabitica/blob/3bb01728a0237ad5e7/default.nix#L43-L62
[2]: https://github.com/aszlig/avonc/blob/dedf29e092481a33dc/nextcloud.nix#L103-L124
[3]: The reason this is called "full-apivfs" instead of just "full" is
     to make room for a *real* "full" confinement mode, which is more
     restrictive even.
[4]: https://github.com/aszlig/avonc/blob/92a20bece4df54625e/systemd-chroot.nix

Signed-off-by: aszlig <aszlig@nix.build>
Diffstat (limited to 'nixos')
-rw-r--r--nixos/modules/module-list.nix1
-rw-r--r--nixos/modules/security/systemd-chroot.nix160
-rw-r--r--nixos/modules/system/boot/systemd-lib.nix9
-rw-r--r--nixos/tests/all-tests.nix1
-rw-r--r--nixos/tests/systemd-chroot.nix129
5 files changed, 295 insertions, 5 deletions
diff --git a/nixos/modules/module-list.nix b/nixos/modules/module-list.nix
index 2c7b15e65c49..768bc40d1796 100644
--- a/nixos/modules/module-list.nix
+++ b/nixos/modules/module-list.nix
@@ -170,6 +170,7 @@
   ./security/rtkit.nix
   ./security/wrappers/default.nix
   ./security/sudo.nix
+  ./security/systemd-chroot.nix
   ./services/admin/oxidized.nix
   ./services/admin/salt/master.nix
   ./services/admin/salt/minion.nix
diff --git a/nixos/modules/security/systemd-chroot.nix b/nixos/modules/security/systemd-chroot.nix
new file mode 100644
index 000000000000..befe2d3418c9
--- /dev/null
+++ b/nixos/modules/security/systemd-chroot.nix
@@ -0,0 +1,160 @@
+{ config, pkgs, lib, ... }:
+
+let
+  inherit (lib) types;
+  inherit (import ../system/boot/systemd-lib.nix {
+    inherit config pkgs lib;
+  }) mkPathSafeName;
+in {
+  options.systemd.services = lib.mkOption {
+    type = types.attrsOf (types.submodule ({ name, config, ... }: {
+      options.chroot.enable = lib.mkOption {
+        type = types.bool;
+        default = false;
+        description = ''
+          If set, all the required runtime store paths for this service are
+          bind-mounted into a <literal>tmpfs</literal>-based <citerefentry>
+            <refentrytitle>chroot</refentrytitle>
+            <manvolnum>2</manvolnum>
+          </citerefentry>.
+        '';
+      };
+
+      options.chroot.packages = lib.mkOption {
+        type = types.listOf (types.either types.str types.package);
+        default = [];
+        description = let
+          mkScOption = optName: "<option>serviceConfig.${optName}</option>";
+        in ''
+          Additional packages or strings with context to add to the closure of
+          the chroot. By default, this includes all the packages from the
+          ${lib.concatMapStringsSep ", " mkScOption [
+            "ExecReload" "ExecStartPost" "ExecStartPre" "ExecStop"
+            "ExecStopPost"
+          ]} and ${mkScOption "ExecStart"} options.
+
+          <note><para><emphasis role="strong">Only</emphasis> the latter
+          (${mkScOption "ExecStart"}) will be used if
+          ${mkScOption "RootDirectoryStartOnly"} is enabled.</para></note>
+
+          <note><para>Also, the store paths listed in <option>path</option> are
+          <emphasis role="strong">not</emphasis> included in the closure as
+          well as paths from other options except those listed
+          above.</para></note>
+        '';
+      };
+
+      options.chroot.withBinSh = lib.mkOption {
+        type = types.bool;
+        default = true;
+        description = ''
+          Whether to symlink <command>dash</command> as
+          <filename>/bin/sh</filename> to the chroot.
+
+          This is useful for some applications, which for example use the
+          <citerefentry>
+            <refentrytitle>system</refentrytitle>
+            <manvolnum>3</manvolnum>
+          </citerefentry> library function to execute commands.
+        '';
+      };
+
+      options.chroot.confinement = lib.mkOption {
+        type = types.enum [ "full-apivfs" "chroot-only" ];
+        default = "full-apivfs";
+        description = ''
+          The value <literal>full-apivfs</literal> (the default) sets up
+          private <filename class="directory">/dev</filename>, <filename
+          class="directory">/proc</filename>, <filename
+          class="directory">/sys</filename> and <filename
+          class="directory">/tmp</filename> file systems in a separate user
+          name space.
+
+          If this is set to <literal>chroot-only</literal>, only the file
+          system name space is set up along with the call to <citerefentry>
+            <refentrytitle>chroot</refentrytitle>
+            <manvolnum>2</manvolnum>
+          </citerefentry>.
+
+          <note><para>This doesn't cover network namespaces and is solely for
+          file system level isolation.</para></note>
+        '';
+      };
+
+      config = lib.mkIf config.chroot.enable {
+        serviceConfig = let
+          rootName = "${mkPathSafeName name}-chroot";
+        in {
+          RootDirectory = pkgs.runCommand rootName {} "mkdir \"$out\"";
+          TemporaryFileSystem = "/";
+          MountFlags = lib.mkDefault "private";
+        } // lib.optionalAttrs config.chroot.withBinSh {
+          BindReadOnlyPaths = [ "${pkgs.dash}/bin/dash:/bin/sh" ];
+        } // lib.optionalAttrs (config.chroot.confinement == "full-apivfs") {
+          MountAPIVFS = true;
+          PrivateDevices = true;
+          PrivateTmp = true;
+          PrivateUsers = true;
+          ProtectControlGroups = true;
+          ProtectKernelModules = true;
+          ProtectKernelTunables = true;
+        };
+        chroot.packages = let
+          startOnly = config.serviceConfig.RootDirectoryStartOnly or false;
+          execOpts = if startOnly then [ "ExecStart" ] else [
+            "ExecReload" "ExecStart" "ExecStartPost" "ExecStartPre" "ExecStop"
+            "ExecStopPost"
+          ];
+          execPkgs = lib.concatMap (opt: let
+            isSet = config.serviceConfig ? ${opt};
+          in lib.optional isSet config.serviceConfig.${opt}) execOpts;
+        in execPkgs ++ lib.optional config.chroot.withBinSh pkgs.dash;
+      };
+    }));
+  };
+
+  config.assertions = lib.concatLists (lib.mapAttrsToList (name: cfg: let
+    whatOpt = optName: "The 'serviceConfig' option '${optName}' for"
+                    + " service '${name}' is enabled in conjunction with"
+                    + " 'chroot.enable'";
+  in lib.optionals cfg.chroot.enable [
+    { assertion = !cfg.serviceConfig.RootDirectoryStartOnly or false;
+      message = "${whatOpt "RootDirectoryStartOnly"}, but right now systemd"
+              + " doesn't support restricting bind-mounts to 'ExecStart'."
+              + " Please either define a separate service or find a way to run"
+              + " commands other than ExecStart within the chroot.";
+    }
+    { assertion = !cfg.serviceConfig.DynamicUser or false;
+      message = "${whatOpt "DynamicUser"}. Please create a dedicated user via"
+              + " the 'users.users' option instead as this combination is"
+              + " currently not supported.";
+    }
+  ]) config.systemd.services);
+
+  config.systemd.packages = lib.concatLists (lib.mapAttrsToList (name: cfg: let
+    rootPaths = let
+      contents = lib.concatStringsSep "\n" cfg.chroot.packages;
+    in pkgs.writeText "${mkPathSafeName name}-string-contexts.txt" contents;
+
+    chrootPaths = pkgs.runCommand "${mkPathSafeName name}-chroot-paths" {
+      closureInfo = pkgs.closureInfo { inherit rootPaths; };
+      serviceName = "${name}.service";
+      excludedPath = rootPaths;
+    } ''
+      mkdir -p "$out/lib/systemd/system"
+      serviceFile="$out/lib/systemd/system/$serviceName"
+
+      echo '[Service]' > "$serviceFile"
+
+      while read storePath; do
+        if [ -L "$storePath" ]; then
+          # Currently, systemd can't cope with symlinks in Bind(ReadOnly)Paths,
+          # so let's just bind-mount the target to that location.
+          echo "BindReadOnlyPaths=$(readlink -e "$storePath"):$storePath"
+        elif [ "$storePath" != "$excludedPath" ]; then
+          echo "BindReadOnlyPaths=$storePath"
+        fi
+      done < "$closureInfo/store-paths" >> "$serviceFile"
+    '';
+  in lib.optional cfg.chroot.enable chrootPaths) config.systemd.services);
+}
diff --git a/nixos/modules/system/boot/systemd-lib.nix b/nixos/modules/system/boot/systemd-lib.nix
index 68a40377ee13..28ad4f121bbe 100644
--- a/nixos/modules/system/boot/systemd-lib.nix
+++ b/nixos/modules/system/boot/systemd-lib.nix
@@ -9,12 +9,11 @@ in rec {
 
   shellEscape = s: (replaceChars [ "\\" ] [ "\\\\" ] s);
 
+  mkPathSafeName = lib.replaceChars ["@" ":" "\\" "[" "]"] ["-" "-" "-" "" ""];
+
   makeUnit = name: unit:
-    let
-      pathSafeName = lib.replaceChars ["@" ":" "\\" "[" "]"] ["-" "-" "-" "" ""] name;
-    in
     if unit.enable then
-      pkgs.runCommand "unit-${pathSafeName}"
+      pkgs.runCommand "unit-${mkPathSafeName name}"
         { preferLocalBuild = true;
           allowSubstitutes = false;
           inherit (unit) text;
@@ -24,7 +23,7 @@ in rec {
           echo -n "$text" > $out/${shellEscape name}
         ''
     else
-      pkgs.runCommand "unit-${pathSafeName}-disabled"
+      pkgs.runCommand "unit-${mkPathSafeName name}-disabled"
         { preferLocalBuild = true;
           allowSubstitutes = false;
         }
diff --git a/nixos/tests/all-tests.nix b/nixos/tests/all-tests.nix
index 2ddb54bcc3d7..fe67e2453505 100644
--- a/nixos/tests/all-tests.nix
+++ b/nixos/tests/all-tests.nix
@@ -216,6 +216,7 @@ in
   switchTest = handleTest ./switch-test.nix {};
   syncthing-relay = handleTest ./syncthing-relay.nix {};
   systemd = handleTest ./systemd.nix {};
+  systemd-chroot = handleTest ./systemd-chroot.nix {};
   taskserver = handleTest ./taskserver.nix {};
   telegraf = handleTest ./telegraf.nix {};
   tomcat = handleTest ./tomcat.nix {};
diff --git a/nixos/tests/systemd-chroot.nix b/nixos/tests/systemd-chroot.nix
new file mode 100644
index 000000000000..523e1ad9f4df
--- /dev/null
+++ b/nixos/tests/systemd-chroot.nix
@@ -0,0 +1,129 @@
+import ./make-test.nix {
+  name = "systemd-chroot";
+
+  machine = { pkgs, lib, ... }: let
+    testServer = pkgs.writeScript "testserver.sh" ''
+      #!${pkgs.stdenv.shell}
+      export PATH=${lib.escapeShellArg "${pkgs.coreutils}/bin"}
+      ${lib.escapeShellArg pkgs.stdenv.shell} 2>&1
+      echo "exit-status:$?"
+    '';
+
+    testClient = pkgs.writeScriptBin "chroot-exec" ''
+      #!${pkgs.stdenv.shell} -e
+      output="$(echo "$@" | nc -NU "/run/test$(< /teststep).sock")"
+      ret="$(echo "$output" | sed -nre '$s/^exit-status:([0-9]+)$/\1/p')"
+      echo "$output" | head -n -1
+      exit "''${ret:-1}"
+    '';
+
+    mkTestStep = num: { description, config ? {}, testScript }: {
+      systemd.sockets."test${toString num}" = {
+        description = "Socket for Test Service ${toString num}";
+        wantedBy = [ "sockets.target" ];
+        socketConfig.ListenStream = "/run/test${toString num}.sock";
+        socketConfig.Accept = true;
+      };
+
+      systemd.services."test${toString num}@" = {
+        description = "Chrooted Test Service ${toString num}";
+        chroot = (config.chroot or {}) // { enable = true; };
+        serviceConfig = (config.serviceConfig or {}) // {
+          ExecStart = testServer;
+          StandardInput = "socket";
+        };
+      } // removeAttrs config [ "chroot" "serviceConfig" ];
+
+      __testSteps = lib.mkOrder num ''
+        subtest '${lib.escape ["\\" "'"] description}', sub {
+          $machine->succeed('echo ${toString num} > /teststep');
+          ${testScript}
+        };
+      '';
+    };
+
+  in {
+    imports = lib.imap1 mkTestStep [
+      { description = "chroot-only confinement";
+        config.chroot.confinement = "chroot-only";
+        testScript = ''
+          $machine->succeed(
+            'test "$(chroot-exec ls -1 / | paste -sd,)" = bin,nix',
+            'test "$(chroot-exec id -u)" = 0',
+            'chroot-exec chown 65534 /bin',
+          );
+        '';
+      }
+      { description = "full confinement with APIVFS";
+        testScript = ''
+          $machine->fail(
+            'chroot-exec ls -l /etc',
+            'chroot-exec ls -l /run',
+            'chroot-exec chown 65534 /bin',
+          );
+          $machine->succeed(
+            'test "$(chroot-exec id -u)" = 0',
+            'chroot-exec chown 0 /bin',
+          );
+        '';
+      }
+      { description = "check existence of bind-mounted /etc";
+        config.serviceConfig.BindReadOnlyPaths = [ "/etc" ];
+        testScript = ''
+          $machine->succeed('test -n "$(chroot-exec cat /etc/passwd)"');
+        '';
+      }
+      { description = "check if User/Group really runs as non-root";
+        config.serviceConfig.User = "chroot-testuser";
+        config.serviceConfig.Group = "chroot-testgroup";
+        testScript = ''
+          $machine->succeed('chroot-exec ls -l /dev');
+          $machine->succeed('test "$(chroot-exec id -u)" != 0');
+          $machine->fail('chroot-exec touch /bin/test');
+        '';
+      }
+      (let
+        symlink = pkgs.runCommand "symlink" {
+          target = pkgs.writeText "symlink-target" "got me\n";
+        } "ln -s \"$target\" \"$out\"";
+      in {
+        description = "check if symlinks are properly bind-mounted";
+        config.chroot.packages = lib.singleton symlink;
+        testScript = ''
+          $machine->fail('chroot-exec test -e /etc');
+          $machine->succeed('chroot-exec cat ${symlink} >&2');
+          $machine->succeed('test "$(chroot-exec cat ${symlink})" = "got me"');
+        '';
+      })
+      { description = "check if StateDirectory works";
+        config.serviceConfig.User = "chroot-testuser";
+        config.serviceConfig.Group = "chroot-testgroup";
+        config.serviceConfig.StateDirectory = "testme";
+        testScript = ''
+          $machine->succeed('chroot-exec touch /tmp/canary');
+          $machine->succeed('chroot-exec "echo works > /var/lib/testme/foo"');
+          $machine->succeed('test "$(< /var/lib/testme/foo)" = works');
+          $machine->succeed('test ! -e /tmp/canary');
+        '';
+      }
+    ];
+
+    options.__testSteps = lib.mkOption {
+      type = lib.types.lines;
+      description = "All of the test steps combined as a single script.";
+    };
+
+    config.environment.systemPackages = lib.singleton testClient;
+
+    config.users.groups.chroot-testgroup = {};
+    config.users.users.chroot-testuser = {
+      description = "Chroot Test User";
+      group = "chroot-testgroup";
+    };
+  };
+
+  testScript = { nodes, ... }: ''
+    $machine->waitForUnit('multi-user.target');
+    ${nodes.machine.config.__testSteps}
+  '';
+}