Quilt

Quilt is a data mesh for cross-functional teams

Follow publication

Bringing Scientific Data Management Systems Into the Cloud

--

Vir Biotechnology’s CIO, Alex Bangs, speaks at Bio IT World in Boston

Vir Biotechnology is a commercial-stage immunology company focused on combining immunologic insights with cutting-edge technologies to treat and prevent serious infectious diseases.

Alex Bangs, CIO at Vir Biotechnology, illuminates his journey to a cloud-native Scientific Data Management System (SDMS) to store, tag, manage, and integrate data generated by scientists and instruments.

This will be so helpful to future Virites in 2031, to have links between ELN, data, and results—rather than having to try to piece the info together from whatever clues they’re able to find. —Vir Scientist

Highlights from the talk include the following:

  • A survey of products and technologies to consider in the SDMS space (and where Quilt fits in that landscape)
  • Ensuring that data are FAIR company-wide— findable, accessible, interoperable, and reusable
  • Collections (Quilt packages) as a set of files with an immutable version history and compact URL that can be linked to ELNs like Benchling and IDBS
  • Tagging, labeling, documenting, and organizing datasets so that they can be queried longitudinally and used by data science teams for modeling and analysis
  • Strategies for data to be durable and FAIR for the next 10–20 years
  • Ensuring that data management systems are accessible and usable by scientists and non-developers
  • Keeping data in the cloud when local network bandwidth is limited (and moving compute to data)
  • Creating a shared and single source of truth (SSOT) in spite of varying skills and motivations on cross-functional teams of scientists, data scientists, IT, and leadership
  • Transitioning workflows to the cloud to leverage AWS services like Amazon S3
  • A three-phase data lifecycle that tracks data from its birth on instruments, to analysis, and finally to production
  • Move data from Amazon S3, to a local machine, and back to the cloud for running local applications like FlowJo
  • Store and tag large sets of files from Flow Cytometry, next gen sequencing, or any type of instrument data
  • Integrating data, metadata, and documentation into a single, reproducible package
  • Single sign-on with systems like Okta
  • Automatically sync’ing file servers and NASs with the cloud
Data lifecycle (in Amazon S3) from instrument to ELN
Cloud SDMS requirements

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Published in Quilt

Quilt is a data mesh for cross-functional teams

Written by Aneesh Karve

Data, visualization, machine learning, and abstract algebra. CTO and co-founder @QuiltData. Recent talks https://goo.gl/U9VYr5.

No responses yet

Write a response