Research Data Management
Security & Privacy
Ensuring research data security and privacy is a critical element to consider and plan for when starting any new research project. An increasing number of grants are now requiring that researchers clearly identify the risks associated with their research data along with clear procedures and safeguards that will be in place to mitigate and address those risks. To take a proactive approach, researchers must require that cybersecurity best practices are strictly followed at the points in which data is collected, stored, accessed, and processed.
UBC researchers that plan to collect and store research information should consider contacting UBC Advanced Research Computing (ARC) for guidance and support to ensure data is properly protected. Furthermore, if the research encompasses any data that could be considered sensitive, contacting ARC is STRONGLY recommended.
UBC Advanced Research Computing - Research Cybersecurity and Compliance Services
UBC Advanced Research Computing - Information Security Compliance Checklist
Government of Canada - Science and Innovation - Safeguarding Your Research
Access
UBC Geog IT provides direct support to faculty and research assistants to ensure that local hardware is compliant with UBC’s standards for securely accessing research data. This includes ensuring laptops and desktops have full disk encryption, access control for system administration, and installation of UBC IT-provided malware protection. Additionally, researchers with machines connected to Geography’s local network have additional protection through multiple firewalls.
Maintaining secure computing environments is a collaborative process though, and ultimately, it is the responsibility of the researcher to ensure that any machines that are used for accessing their project’s research data, including those used by their research assistants, are effectively secured. Part of this includes ensuring machines are protected with a password or passphrase that complies with ISS U2, have been setup to automatically lock after, at most, 30 minutes of inactivity, and receive routine updates to the operating system and any installed software1. Depending on the risk levels associated with a researcher’s data, the restrictions noted above may need to be more stringent.
Before sharing research data with recently hired research assistant, ensure that they have completed the following UBC training modules.
Also consider reviewing and assigning the following modules:
Collection & Storage
Even if a project does not include Personal Identifiable Information (PII) or other sensitive data, classifying research information that will be collected, stored, and analyzed through the lifetime of a research project is an important to step to identify and assess risk levels that may be associated with the project.
Both UBC IT and UBC ARC provide tools and services, like REDCap for data collection and OneDrive for data storage, which are thoroughly reviewed to ensure compliance when working for sensitive data. Any tools that rely on external service providers will need to go through a Security Threat Risk Assessment (STRA) to ensure that the tool complies with privacy legislation and effectively mitigates known security threats.
Privacy Matters @ UBC - PIA & STRA - Compliance Review & Comprehensive Evaluation Framework
WestDRI & UBC ARC - Compliance or Chaos: A Research Assessment Survival Guide - Webinar
UBC ARC’s REDCap Security and Privacy provides a good example of the kind of information that would be gathered and documented through an STRA.
Data Collection Systems
For survey or form-based data collection, UBC researchers can access either of two FIPPA-compliant systems, REDCap, which is managed by UBC’s Advanced Research Computing (ARC) team, or Qualtrics, which is a vendor-supported cloud service that UBC IT manages a subscription for.
REDCap
REDCap is ideal for academic research, particularly when working with sensitive data and/or running a long-term research project. Many academic institutions support their own REDCap instances, and it’s often used for medical research. For offline data collection, REDCap provides a mobile app.
UBC Survey Tool (Qualtrics)
Qualtrics works best for quickly developing surveys for assessments or short-term research projects. It provides a ton of features alongside a modern and user-friendly interface. A mobile app is also available for offline data entry.
LimeSurvey
LimeSurvey is a free and open-source alternative to REDCap and Qualtrics. While UBC researchers do not have direct access to LimeSurvey, an instance can be easily deployed via a VM on the Alliance Cloud infrastructure. Generally, REDCap and Qualtrics are better options, but LimeSurvey can be a helpful tool to be aware of when collaborating with small organization who need to retain ownership and access to collected research data.
Recommended GIS and Geospatial File Formats
GIS software and other geospatial computing software and packages can support a wide array of file formats for vector and raster data thanks in large part to the Geospatial Data Abstraction Library (GDAL). When creating, enhancing, and/or storing geospatial data, it is important to carefully select a format that best meets the needs of your project. The formats listed below focus on cases that may require broad usability among various GIS platforms and software, preservation, and performance.
Relevant resources:
Vector
OGC GeoPackage
Developed and maintained as an OGC standard, GeoPackage has become a broadly supported format for storing and transferring GIS data. In addition to vector data, it can also store raster data. This is the recommended and default format for vector data in QGIS.
ESRI File Geodatabase (FileGDB)
Created by ESRI, the File Geodatabase format has been developed as an alternative to Shapefile with the intention to overcome some of its shortcomings and act as a possible successor.
ESRI Shapefile
By far the most popular vector format, Shapefile was developed by ESRI in the 90’s and has continued to be maintained by them. While the format is not fully open, it has nevertheless found an extraordinary level of support among GIS and other geospatial software.
GeoJSON
GeoJSON provides a lightweight format that can be easily read and written via JavaScript. This format is particularly well-suited for web mapping and easily integrates with web mapping libraries, like Leaflet and OpenLayers.
FlatGeobuf
A relatively new format that has shown significant performance improvements compared to the formats listed above. FlatGeobuf currently lacks the backing of standardization, but it has found broad support in geospatial packages and software. It is also currently under review as a proposed OGC Community Standard.
GeoParquet
Similar to FlatGeobuf, GeoParquet has recently seen a stable release with significant performance improvements compared to GeoPackage and Shapefile, and it is quickly finding support among GIS and other geospatial computing software. This format is open-source, but has yet to reach standardization. Its developers intend to propose it for adoption as an OGC standard.
Raster
GeoTIFF and Cloud Optimized GeoTIFF (COG)
GeoTIFF has become the dominant format for raster data used in GIS and other geospatial computing due in large part to its development on open-source standards. It is also often the default and recommended format in libraries like GDAL and GIS software like QGIS. The format has been further improved thanks to the development of Cloud Optimized GeoTIFFs (COG), which enhances the capacity of GeoTIFF for cloud computing and access via the web. COG files can be stored and easily accessed via S3 object stores, like that supported by UBC ARC Chinook and the DRA’s Arbutus Object Storage.
Built atop the TIFF file format, GeoTIFFs also support multiple compression algorithms that can significantly reduce their overall file size. Understanding and using these algorithms effectively can maximize computing resources while also mitigating any unnecessary data loss.
Uncompressed vs Lossless vs Lossy
GeoTIFFs can often be distributed without any compression at all. While the clear drawback to this approach is extraordinarily large file sizes, there can be benefits in ensuring that files do not need to be either encoded or decoded, which requires varying levels of computing time and in some cases libraries that are not supported by commonly used operating systems or software.
On the other hand, GeoTIFFs can be compressed using lossy algorithms, with the most common being JPEG. Applying a lossy algorithm can significantly reduce overall file sizes, but it comes with a large drawback, varying levels of data loss. A GeoTIFF that is compressed using JPEG will always lose some data, even if the compression quality is set to 100.
Lossless compression provides a valuable middle ground between lossy compression and using no compression at all. A GeoTIFF that uses lossless compression can see varying levels of reduction in file size based on the raster data and compression algorithm used. Again a single drawback is that encoding and decoding a large lossless file can require varying amounts of computing time. Encoding files on powerful machines can reduce encoding times while serving files as COGs can help reduce some of the decoding time as only the needed portions of the file will be downloaded and decoded by end users.
The Translate tool found in the GDAL library is the most commonly used tool for creating GeoTIFFs and COGs, and it supports a broad set of compression algorithms. The following algorithms are important to note:
LZW - This is a lossless algorithm with broad support, and it’s the current default used by GDAL. While it can be relatively fast, this algorithm is not optimized for raster data, so reductions in file size may be minimal. Nevertheless, when in doubt, this is the algorithm to choose.
LERC - Developed and maintained by Esri, this algorithm has been optimized for raster data and can support either lossy or lossless compression. While lacking the level of support provided by LZW or JPEG, this algorithm is relatively fast and provides a valuable middle ground between the two.
JPEG - Well supported and backed by solid, open-source standards, this has been the defacto lossy algorithm for raster data over the past 30 years. This a solid choice if you are looking to display a raster on the web and require broad support across web browsers and other software.
WEBP - While it is well supported by modern web browsers, this algorithm is not developed on open-source standards. It supports both lossy and lossless compression. It also provides improved compression ratios over JPEG and currently provides the best option for displaying a raster on the web, but it lacks broader support in other software and will likely be replaced by JXL in the future.
JXL - This is still a relatively new standard that has been developed as a successor to JPEG and JPEG2000. Similar to LERC and WEBP, it supports both lossy and lossless compression, but it is capable of reaching significantly better compression ratios at the expense of slower encoding and decoding times. It also has yet to reach similar levels of support as JPEG or LZW and not all distributions of GDAL include the necessary library to use this algorithm, but it is well worth monitoring in the future as its support grows.
Footnotes
UBC ARC (2025, March 13). UBC Research Security Compliance Checklist. https://arc.ubc.ca/media/document/arc-securitycompliancechecklist-fillablepdf↩︎