about summary refs log tree commit diff
path: root/nixpkgs/lib/fileset/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'nixpkgs/lib/fileset/README.md')
-rw-r--r--nixpkgs/lib/fileset/README.md267
1 files changed, 267 insertions, 0 deletions
diff --git a/nixpkgs/lib/fileset/README.md b/nixpkgs/lib/fileset/README.md
new file mode 100644
index 000000000000..93e0199c32a4
--- /dev/null
+++ b/nixpkgs/lib/fileset/README.md
@@ -0,0 +1,267 @@
+# File set library
+
+This is the internal contributor documentation.
+The user documentation is [in the Nixpkgs manual](https://nixos.org/manual/nixpkgs/unstable/#sec-fileset).
+
+## Goals
+
+The main goal of the file set library is to be able to select local files that should be added to the Nix store.
+It should have the following properties:
+- Easy:
+  The functions should have obvious semantics, be low in number and be composable.
+- Safe:
+  Throw early and helpful errors when mistakes are detected.
+- Lazy:
+  Only compute values when necessary.
+
+Non-goals are:
+- Efficient:
+  If the abstraction proves itself worthwhile but too slow, it can be still be optimized further.
+
+## Tests
+
+Tests are declared in [`tests.sh`](./tests.sh) and can be run using
+```
+./tests.sh
+```
+
+## Benchmark
+
+A simple benchmark against the HEAD commit can be run using
+```
+./benchmark.sh HEAD
+```
+
+This is intended to be run manually and is not checked by CI.
+
+## Internal representation
+
+The internal representation is versioned in order to allow file sets from different Nixpkgs versions to be composed with each other, see [`internal.nix`](./internal.nix) for the versions and conversions between them.
+This section describes only the current representation, but past versions will have to be supported by the code.
+
+### `fileset`
+
+An attribute set with these values:
+
+- `_type` (constant string `"fileset"`):
+  Tag to indicate this value is a file set.
+
+- `_internalVersion` (constant `3`, the current version):
+  Version of the representation.
+
+- `_internalIsEmptyWithoutBase` (bool):
+  Whether this file set is the empty file set without a base path.
+  If `true`, `_internalBase*` and `_internalTree` are not set.
+  This is the only way to represent an empty file set without needing a base path.
+
+  Such a value can be used as the identity element for `union` and the return value of `unions []` and co.
+
+- `_internalBase` (path):
+  Any files outside of this path cannot influence the set of files.
+  This is always a directory and should be as long as possible.
+  This is used by `lib.fileset.toSource` to check that all files are under the `root` argument
+
+- `_internalBaseRoot` (path):
+  The filesystem root of `_internalBase`, same as `(lib.path.splitRoot _internalBase).root`.
+  This is here because this needs to be computed anyway, and this computation shouldn't be duplicated.
+
+- `_internalBaseComponents` (list of strings):
+  The path components of `_internalBase`, same as `lib.path.subpath.components (lib.path.splitRoot _internalBase).subpath`.
+  This is here because this needs to be computed anyway, and this computation shouldn't be duplicated.
+
+- `_internalTree` ([filesetTree](#filesettree)):
+  A tree representation of all included files under `_internalBase`.
+
+- `__noEval` (error):
+  An error indicating that directly evaluating file sets is not supported.
+
+## `filesetTree`
+
+One of the following:
+
+- `{ <name> = filesetTree; }`:
+  A directory with a nested `filesetTree` value for directory entries.
+  Entries not included may either be omitted or set to `null`, as necessary to improve efficiency or laziness.
+
+- `"directory"`:
+  A directory with all its files included recursively, allowing early cutoff for some operations.
+  This specific string is chosen to be compatible with `builtins.readDir` for a simpler implementation.
+
+- `"regular"`, `"symlink"`, `"unknown"` or any other non-`"directory"` string:
+  A nested file with its file type.
+  These specific strings are chosen to be compatible with `builtins.readDir` for a simpler implementation.
+  Distinguishing between different file types is not strictly necessary for the functionality this library,
+  but it does allow nicer printing of file sets.
+
+- `null`:
+  A file or directory that is excluded from the tree.
+  It may still exist on the file system.
+
+## API design decisions
+
+This section justifies API design decisions.
+
+### Internal structure
+
+The representation of the file set data type is internal and can be changed over time.
+
+Arguments:
+- (+) The point of this library is to provide high-level functions, users don't need to be concerned with how it's implemented
+- (+) It allows adjustments to the representation, which is especially useful in the early days of the library.
+- (+) It still allows the representation to be stabilized later if necessary and if it has proven itself
+
+### Influence tracking
+
+File set operations internally track the top-most directory that could influence the exact contents of a file set.
+Specifically, `toSource` requires that the given `fileset` is completely determined by files within the directory specified by the `root` argument.
+For example, even with `dir/file.txt` being the only file in `./.`, `toSource { root = ./dir; fileset = ./.; }` gives an error.
+This is because `fileset` may as well be the result of filtering `./.` in a way that excludes `dir`.
+
+Arguments:
+- (+) This gives us the guarantee that adding new files to a project never breaks a file set expression.
+  This is also true in a lesser form for removed files:
+  only removing files explicitly referenced by paths can break a file set expression.
+- (+) This can be removed later, if we discover it's too restrictive
+- (-) It leads to errors when a sensible result could sometimes be returned, such as in the above example.
+
+### Empty file set without a base
+
+There is a special representation for an empty file set without a base path.
+This is used for return values that should be empty but when there's no base path that would makes sense.
+
+Arguments:
+- Alternative: This could also be represented using `_internalBase = /.` and `_internalTree = null`.
+  - (+) Removes the need for a special representation.
+  - (-) Due to [influence tracking](#influence-tracking),
+    `union empty ./.` would have `/.` as the base path,
+    which would then prevent `toSource { root = ./.; fileset = union empty ./.; }` from working,
+    which is not as one would expect.
+  - (-) With the assumption that there can be multiple filesystem roots (as established with the [path library](../path/README.md)),
+    this would have to cause an error with `union empty pathWithAnotherFilesystemRoot`,
+    which is not as one would expect.
+- Alternative: Do not have such a value and error when it would be needed as a return value
+  - (+) Removes the need for a special representation.
+  - (-) Leaves us with no identity element for `union` and no reasonable return value for `unions []`.
+    From a set theory perspective, which has a well-known notion of empty sets, this is unintuitive.
+
+### No intersection for lists
+
+While there is `intersection a b`, there is no function `intersections [ a b c ]`.
+
+Arguments:
+- (+) There is no known use case for such a function, it can be added later if a use case arises
+- (+) There is no suitable return value for `intersections [ ]`, see also "Nullary intersections" [here](https://en.wikipedia.org/w/index.php?title=List_of_set_identities_and_relations&oldid=1177174035#Definitions)
+  - (-) Could throw an error for that case
+  - (-) Create a special value to represent "all the files" and return that
+    - (+) Such a value could then not be used with `fileFilter` unless the internal representation is changed considerably
+  - (-) Could return the empty file set
+    - (+) This would be wrong in set theory
+- (-) Inconsistent with `union` and `unions`
+
+### Intersection base path
+
+The base path of the result of an `intersection` is the longest base path of the arguments.
+E.g. the base path of `intersection ./foo ./foo/bar` is `./foo/bar`.
+Meanwhile `intersection ./foo ./bar` returns the empty file set without a base path.
+
+Arguments:
+- Alternative: Use the common prefix of all base paths as the resulting base path
+  - (-) This is unnecessarily strict, because the purpose of the base path is to track the directory under which files _could_ be in the file set. It should be as long as possible.
+    All files contained in `intersection ./foo ./foo/bar` will be under `./foo/bar` (never just under `./foo`), and `intersection ./foo ./bar` will never contain any files (never under `./.`).
+    This would lead to `toSource` having to unexpectedly throw errors for cases such as `toSource { root = ./foo; fileset = intersect ./foo base; }`, where `base` may be `./bar` or `./.`.
+  - (-) There is no benefit to the user, since base path is not directly exposed in the interface
+
+### Empty directories
+
+File sets can only represent a _set_ of local files.
+Directories on their own are not representable.
+
+Arguments:
+- (+) There does not seem to be a sensible set of combinators when directories can be represented on their own.
+  Here's some possibilities:
+  - `./.` represents the files in `./.` _and_ the directory itself including its subdirectories, meaning that even if there's no files, the entire structure of `./.` is preserved
+
+    In that case, what should `fileFilter (file: false) ./.` return?
+    It could return the entire directory structure unchanged, but with all files removed, which would not be what one would expect.
+
+    Trying to have a filter function that also supports directories will lead to the question of:
+    What should the behavior be if `./foo` itself is excluded but all of its contents are included?
+    It leads to having to define when directories are recursed into, but then we're effectively back at how the `builtins.path`-based filters work.
+
+  - `./.` represents all files in `./.` _and_ the directory itself, but not its subdirectories, meaning that at least `./.` will be preserved even if it's empty.
+
+    In that case, `intersection ./. ./foo` should only include files and no directories themselves, since `./.` includes only `./.` as a directory, and same for `./foo`, so there's no overlap in directories.
+    But intuitively this operation should result in the same as `./foo` – everything else is just confusing.
+- (+) This matches how Git only supports files, so developers should already be used to it.
+- (-) Empty directories (even if they contain nested directories) are neither representable nor preserved when coercing from paths.
+  - (+) It is very rare that empty directories are necessary.
+  - (+) We can implement a workaround, allowing `toSource` to take an extra argument for ensuring certain extra directories exist in the result.
+- (-) It slows down store imports, since the evaluator needs to traverse the entire tree to remove any empty directories
+  - (+) This can still be optimized by introducing more Nix builtins if necessary
+
+### String paths
+
+File sets do not support Nix store paths in strings such as `"/nix/store/...-source"`.
+
+Arguments:
+- (+) Such paths are usually produced by derivations, which means `toSource` would either:
+  - Require [Import From Derivation](https://nixos.org/manual/nix/unstable/language/import-from-derivation) (IFD) if `builtins.path` is used as the underlying primitive
+  - Require importing the entire `root` into the store such that derivations can be used to do the filtering
+- (+) The convenient path coercion like `union ./foo ./bar` wouldn't work for absolute paths, requiring more verbose alternate interfaces:
+  - `let root = "/nix/store/...-source"; in union "${root}/foo" "${root}/bar"`
+
+    Verbose and dangerous because if `root` was a path, the entire path would get imported into the store.
+
+  - `toSource { root = "/nix/store/...-source"; fileset = union "./foo" "./bar"; }`
+
+    Does not allow debug printing intermediate file set contents, since we don't know the paths contents before having a `root`.
+
+  - `let fs = lib.fileset.withRoot "/nix/store/...-source"; in fs.union "./foo" "./bar"`
+
+    Makes library functions impure since they depend on the contextual root path, questionable composability.
+
+- (+) The point of the file set abstraction is to specify which files should get imported into the store.
+
+  This use case makes little sense for files that are already in the store.
+  This should be a separate abstraction as e.g. `pkgs.drvLayout` instead, which could have a similar interface but be specific to derivations.
+  Additional capabilities could be supported that can't be done at evaluation time, such as renaming files, creating new directories, setting executable bits, etc.
+- (+) An API for filtering/transforming Nix store paths could be much more powerful,
+  because it's not limited to just what is possible at evaluation time with `builtins.path`.
+  Operations such as moving and adding files would be supported.
+
+### Single files
+
+File sets cannot add single files to the store, they can only import files under directories.
+
+Arguments:
+- (+) There's no point in using this library for a single file, since you can't do anything other than add it to the store or not.
+  And it would be unclear how the library should behave if the one file wouldn't be added to the store:
+  `toSource { root = ./file.nix; fileset = <empty>; }` has no reasonable result because returing an empty store path wouldn't match the file type, and there's no way to have an empty file store path, whatever that would mean.
+
+### `fileFilter` takes a path
+
+The `fileFilter` function takes a path, and not a file set, as its second argument.
+
+- (-) Makes it harder to compose functions, since the file set type, the return value, can't be passed to the function itself like `fileFilter predicate fileset`
+  - (+) It's still possible to use `intersection` to filter on file sets: `intersection fileset (fileFilter predicate ./.)`
+    - (-) This does need an extra `./.` argument that's not obvious
+      - (+) This could always be `/.` or the project directory, `intersection` will make it lazy
+- (+) In the future this will allow `fileFilter` to support a predicate property like `subpath` and/or `components` in a reproducible way.
+  This wouldn't be possible if it took a file set, because file sets don't have a predictable absolute path.
+  - (-) What about the base path?
+    - (+) That can change depending on which files are included, so if it's used for `fileFilter`
+      it would change the `subpath`/`components` value depending on which files are included.
+- (+) If necessary, this restriction can be relaxed later, the opposite wouldn't be possible
+
+### Strict path existence checking
+
+Coercing paths that don't exist to file sets always gives an error.
+
+- (-) Sometimes you want to remove a file that may not always exist using `difference ./. ./does-not-exist`,
+  but this does not work because coercion of `./does-not-exist` fails,
+  even though its existence would have no influence on the result.
+  - (+) This is dangerous, because you wouldn't be protected against typos anymore.
+    E.g. when trying to prevent `./secret` from being imported, a typo like `difference ./. ./sercet` would import it regardless.
+  - (+) `difference ./. (maybeMissing ./does-not-exist)` can be used to do this more explicitly.
+  - (+) `difference ./. (difference ./foo ./foo/bar)` should report an error when `./foo/bar` does not exist ("double negation"). Unfortunately, the current internal representation does not lend itself to a behavior where both `difference x ./does-not-exists` and double negation are handled and checked correctly.
+    This could be fixed, but would require significant changes to the internal representation that are not worth the effort and the risk of introducing implicit behavior.