Drop dataset components

§1 The drop command is the antagonist of get. Whatever a drop can do, should be undoable by a subsequent get (given unchanged remote availability).

§2 Like get, drop primarily operates on a mandatory path specification (to discover relevant files and sudatasets to operate on).

§3 drop has --what parameter that serves as an extensible “mode-switch” to cover all relevant scenarios, like ‘drop all file content in the work-tree’ (e.g. --what files, default, #5858), ‘drop all keys from any branch’ (i.e. --what allkeys, #2328), but also ‘“drop” AKA uninstall entire subdataset hierarchies’ (e.g. --what all), or drop preferred content (--what preferred-content, #3122).

§4 drop prevents data loss by default (#4750). Like get it features a --reckless “mode-switch” to disable some or all potentially slow safety mechanism, i.e. ‘key available in sufficient number of other remotes’, ‘main or all branches pushed to remote(s)’ (#1142), ‘only check availability of keys associated with the worktree, but not other branches’. “Reckless operation” can be automatic, when following a reckless get (#4744).

§5 drop properly manages annex lifetime information, e.g. by announcing an annex as dead on removal of a repository (#3887).

§6 Like get, drop supports parallelization #1953

§7 datalad drop is not intended to be a comprehensive frontend to git annex drop (e.g. limited support for e.g. #1482 outside standard use cases like #2328).

Note

It is understood that the current uninstall command is largely or completely made obsolete by this drop concept.

§8 Given the development in #5842 towards the complete obsolescence of remove it becomes necessary to import one of its proposed features:

§9 drop should be able to recognize a botched attempt to delete a dataset with a plain rm -rf, and act on it in a meaningful way, even if it is just hinting at chmod + rm -rf.

Use cases

The following use cases operate in the dataset hierarchy depicted below:

super
├── dir
│   ├── fileD1
│   └── fileD2
├── fileS1
├── fileS2
├── subA
│   ├── fileA
│   ├── subsubC
│   │   ├── fileC
│   └── subsubD
└── subB
    └── fileB

Unless explicitly stated, all command are assumed to be executed in the root of super.

U1: datalad drop fileS1

Drops the file content of file1 (as currently done by drop)
U2: datalad drop dir

Drop all file content in the directory (fileD{1,2}; as currently done by drop
U3: datalad drop subB

Drop all file content from the entire subB (fileB)
U4: datalad drop subB --what all

Same as above (default --what files), because it is not operating in the context of a superdataset (no automatic upward lookups). Possibly hint at next usage pattern).
U5: datalad drop -d . subB --what all

Drop all from the superdataset under this path. I.e. drop all from the subdataset and drop the subdataset itself (AKA uninstall)
U6: datalad drop subA --what all

Error: “subA contains subdatasets, forgot –recursive?”
U7: datalad drop -d . subA -r --what all

Drop all content from the subdataset (fileA) and its subdatasets (fileC), uninstall the subdataset (subA) and its subdatasets (subsubC, subsubD)
U8: datalad drop subA -r --what all

Same as above, but keep subA installed
U9: datalad drop sub-A -r

Drop all content from the subdataset and its subdatasets (fileA, fileC)
U10: datalad drop . -r --what all

Drops all file content and subdatasets, but leaves the superdataset repository behind
U11: datalad drop -d . subB

Does nothing and hints at alternative usage, see https://github.com/datalad/datalad/issues/5832#issuecomment-889656335
U12: cd .. && datalad drop super/dir

Like get, errors because the execution is not associated with a dataset. This avoids complexities, when the given path’s point to multiple (disjoint) datasets. It is understood that it could be done, but it is intentionally not done. datalad -C super drop dir or datalad drop -d super super/dir would work.