Bringing Scientific Data Management Systems Into the Cloud
Vir Biotechnology is a commercial-stage immunology company focused on combining immunologic insights with cutting-edge technologies to treat and prevent serious infectious diseases.
Alex Bangs, CIO at Vir Biotechnology, illuminates his journey to a cloud-native Scientific Data Management System (SDMS) to store, tag, manage, and integrate data generated by scientists and instruments.
This will be so helpful to future Virites in 2031, to have links between ELN, data, and results—rather than having to try to piece the info together from whatever clues they’re able to find. —Vir Scientist
Highlights from the talk include the following:
- A survey of products and technologies to consider in the SDMS space (and where Quilt fits in that landscape)
- Ensuring that data are FAIR company-wide— findable, accessible, interoperable, and reusable
- Collections (Quilt packages) as a set of files with an immutable version history and compact URL that can be linked to ELNs like Benchling and IDBS
- Tagging, labeling, documenting, and organizing datasets so that they can be queried longitudinally and used by data science teams for modeling and analysis
- Strategies for data to be durable and FAIR for the next 10–20 years
- Ensuring that data management systems are accessible and usable by scientists and non-developers
- Keeping data in the cloud when local network bandwidth is limited (and moving compute to data)
- Creating a shared and single source of truth (SSOT) in spite of varying skills and motivations on cross-functional teams of scientists, data scientists, IT, and leadership
- Transitioning workflows to the cloud to leverage AWS services like Amazon S3
- A three-phase data lifecycle that tracks data from its birth on instruments, to analysis, and finally to production
- Move data from Amazon S3, to a local machine, and back to the cloud for running local applications like FlowJo
- Store and tag large sets of files from Flow Cytometry, next gen sequencing, or any type of instrument data
- Integrating data, metadata, and documentation into a single, reproducible package
- Single sign-on with systems like Okta
- Automatically sync’ing file servers and NASs with the cloud

