Version Control Systems

Implementing version control is a commonly used best practice among programmers, and its usage is strongly encouraged for researchers whether they are working with large codebases or small Jupyter Notebooks. Version control is especially helpful when collaborating with other researchers or developers as it enables changes to be tracked coherently and new code to be explored separately from a stable, main codebase.

Code

Git

Git is by far the most popular version control system and comes with a powerful command line interface while integrating with code sharing and development platforms, like GitHub and GitLab.

UBC Library Research Commons frequently runs introductory workshops on Git and GitHub, which are listed here. And the materials for these workshops are available below:

Additionally, the SFU’s Research Computing Group provides a full-day workshop on Bash during their annual Summer School in early June with some of the materials for that course available below:

Additional Resources:

GitHub

GitHub is a popular platform for sharing source code and Jupyter Notebooks as Git repositories. These can either be public and shared broadly with other users to copy (clone) or private for restricted sharing among invited users. UBC LT provides access to GitHub Enterprise which can securely share repositories among students within a single course or with other UBC collaborators.

GitHub Desktop provides a helpful graphical user interface that can help in managing local Git repositories and pushing changes to remote repositories on GitHub.

An additional benefit to storing your code in a public repository on GitHub is that your code can easily be archived into Zenodo, where it will receive a digital object identifier (DOI). This then makes it extraordinarily easy for your code to be cited and accessed by other researchers. If later you make revisions or improvements to your code, you can cut a new release and Zenodo will automatically update your archive and assign a new DOI.

While GitHub excels at displaying code on the web, it also supports some other interesting functionalities relevant to geospatial computing. For example, if you don’t mind storing your tabular data in CSV or GeoCSV, GitHub will render the data to an interactive table, which can be easily searched and edited.

You can also quickly generate basic web maps directly within your repository by storing your vector data in the GeoJSON or TopoJSON format. GitHub will render the vector data with Azure Maps and Leaflet.js. The generated web map can then be embedded on different sites with a simple snippet of JavaScript.

Integrations

JupyterLab

RStudio

VS Code

Data

While version control systems are most commonly applied to code, to further ensure the reproducibility of their work, researchers have been developing and improving systems that can work fluently with their data as well. These systems are often built to work atop or adjacent to Git while adding functionality to manage and store large datasets via Git LFS or object storage.

Kart

Kart is a tool that has been built on top of Git and extends its functionality to work with vector datasets while also integrating with Git LFS (Large File Storage) to handle large raster datasets. The Kart developers also provide a QGIS plugin to ensure easy integration.

DataLad

DataLad provides a general-purpose data version control system that supports a broad set of storage options including Microsoft OneDrive, DRA’s Arbutus object storage (OpenStack Swift), and a range of other S3-compatible object storage providers.

DVC

While applicable to a range of other data-intensive tasks, DVC is a data version control system that specializes in machine learning. It supports fewer storage options compared to Datalad, but can provide a smoother integration and setup experience in certain applications.