What is DataLad?
DataLad is a free and open source distributed data management system that keeps track of your data, creates structure, ensures reproducibility, supports collaboration, and integrates with widely used data infrastructure.
Install DataLad
Install DataLad and its dependencies, Git and git-annex, on all major operating systems using Python and the datalad-installer:
$ datalad-installer git-annex -m datalad/packages
$ pip install datalad
Depending on your operating system, other installation options are also possible. For detailed instructions on all installation and procedures and further configuration, please visit the DataLad Handbook
DataLad is part of the Debian and Ubuntu operating systems and available on CentOS, Redhat, Fedora, and similar systems. DataLad can be installed or upgraded via conda and apt:
Using conda:
Using apt:
Find out more about Linux installation in the DataLad Handbook
DataLad is available via OS X’s homebrew package manager or alternatively via conda:
Using conda:
Using homebrew:
Find out more about macOS installation in the DataLad Handbook
On a Windows machine with Python, the best route for installing DataLad is to install its dependencies with the datalad-installer and then follow up with pip:
$ datalad-installer git-annex -m datalad/packages
$ pip install datalad
Find out more about Windows installation in the DataLad Handbook
Keep Track
Building on top of Git and git-annex, DataLad allows you to version control arbitrarily large files in datasets, without the need for custom data structures, central infrastructure, or third party services.
- Track changes to your data
- Revert to previous versions
- Capture full provenance records
- Ensure complete reproducibility
Create Structure
A DataLad dataset is a directory with files, managed by DataLad. You can link other datasets, known as subdatasets, and perform commands recursively across an arbitrarily deep hierarchy of datasets. This helps you to create structure while maintaining advanced provenance capture abilities, versioning, and actionable file retrieval.
Use DataLad
DataLad is a free and open source Python-based tool that is compatible with all major operating systems. It can be used via its Graphical User Interface or via the command line to:
-
create
new datasets locally -
clone
other datasets -
get
content on-demand -
save
changes to datasets -
drop
content as needed -
push
changes to a remote location
... and much more!
Try out DataLaddatalad create my_dataset
datalad save -m "hello world"
datalad push --to location
datalad clone location
datalad get example.txt
datalad drop example.txt
Collaborate
DataLad lets you consume datasets provided by others, and collaborate with them. You can install existing datasets and update them from their sources, or create sibling datasets that you can publish updates to and pull updates from. The collaborative power of Git, for your data.
DataLad in the Wild
DataLad is integrated with a variety of hosting services and data management platforms, and extended and used by a diverse community. Export datasets to third party services such as GitHub or Figshare with built-in commands. Extend DataLad to be compatible with your preferred data supplier or workflow. Or use a multitude of other DataLad-compatible services such as Dropbox or Amazon S3. Search through all integrations, extensions, and use cases to find the right fit for your data!
Browse use casesLearn More
DataLad is not solely a data management system, but also an open source community of users, developers, and researchers all contributing to its growth. To support this community, DataLad maintains several important resources:
Install
DataLad
Install DataLad and its dependencies on Linux, macOS, or Windows
DataLad
Handbook
Become an expert DataLad user with this rich educational resource
DataLad on
GitHub
Contribute via GitHub by creating issues or sending a pull request
Developer
Docs
Dive into the DataLad API with the developer documentation
DataLad
Tutorials
Hands-on tutorials and videos to help you on your DataLad journey
DataLad
Course
A course on Research Data Management with DataLad
Get Support
For tougher challenges during your data management journey, there are a number of ways that you can get in touch with the DataLad community, its experts, and core developers. Head over to Matrix to chat, join us in a weekly Office Hour call, or create an issue on GitHub!
Community
Chat
Join the community on Matrix, say hi, and ask questions
Office
Hour
Get real-time help from DataLad experts to solve your challenges
File an
issue
File an issue to let the developers know about a bug or a feature request
Supporting DataLad
DataLad development is funded as a US-German project on collaborative research, with primary funding from the US National Science Foundation (NSF 1912266, NSF 1429999) and the German Federal Ministry of Education and Research (BMBF 01GQ1905, BMBF 01GQ1411). Additional support has been provided by the US National Institute of Biomedical Imaging and Bioengineering (NIH 1P41EB019936-01A1) via ReproNim, the European Union’s Horizon 2020 research and innovation programme under (945539, 826421), the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation, SFB1451-INF), and the German federal state of Saxony-Anhalt and the European Regional Development Fund.
Citing DataLad
Please cite the following article when referring to DataLad in publications:
Yaroslav O. Halchenko, Kyle Meyer, Benjamin Poldrack, Debanjum Singh Solanky, Adina S. Wagner, Jason Gors, Dave MacFarlane, Dorian Pustina, Vanessa Sochat, Satrajit S. Ghosh, Christian Mönch, Christopher J. Markiewicz, Laura Waite, Ilya Shlyakhter, Alejandro de la Vega, Soichi Hayashi, Christian Olaf Häusler, Jean-Baptiste Poline, Tobias Kadelka, Kusti Skytén, Dorota Jarecka, David Kennedy, Ted Strauss, Matt Cieslak, Peter Vavra, Horea-Ioan Ioanas, Robin Schneider, Mika Pflüger, James V. Haxby, Simon B. Eickhoff, and Michael Hanke, (2021). DataLad: distributed system for joint management of code, data, and their relationship. Journal of Open Source Software, 6(63), 3262, 10.21105/joss.03262