Visualize and share IGV genome tracks with quilt3 and JSON (and zero backend code)
Next-generation sequencing data are among the most voluminous and
critical in biopharma. Easy visualization of genome tracks is
pivotal to understanding potential drugs and therapies. The tricky part in visualizing genome tracks is not the visualization but the sharing, which typically requires you to stand up and maintain your own visualization service so that scientists can consume the visualizations without coding. In fact this is one of the most challenging tasks in data science: having to leave your native tools like Jupyter and pandas, just to share your results with others in dead-end file formats like PowerPoints, or complex web services written in flask. This article demonstrates how you can generate an IGV options file in JSON and quilt3 push
the result to share genome tracks with your colleagues.
Visualizing genome tracks
IGV visualizations are driven by JSON files known as IGV browser configurations that specify the reference genome, tracks, and view locus, as well as the user interface.
IGV browser configurations
Below are two simple IGV browser configuration files, igv-options-interact.json
, and igv-options.json
. Copy both files to a clean directory on your machine. Note that each options file references publicly available genome sequences data from the Broad Institute, hosted in Amazon S3:
Note that Quilt supports the following three conventions to reference URLs in an IGV options file:
- HTTPS links (
https://
) - S3 links (
s3://
) - Package-relative links (
./your/path/to/data/
)
quilt_summarize.json
Now for the major time-saving step. Create a third file on your machine, quilt_summarize.json
, with the following contents:
quilt_summarize.json specifies how the contents of your package display in the Quilt web catalog. Notice the "types: ["igv"]
line that tells the Quilt catalog to render the corresponding JSON files as genome tracks.
We now have the three files above in a directory on our local machine:
You can now quilt3 push
your working directory to S3 as follows:
If your company runs its own Quilt web catalog, you will see IGV visualizations that resemble this example of the above package with genome tracks on open.quiltdata.com. (If you’re not yet running the Quilt web catalog, we plan to support IGV in future versions of the open source quilt3 catalog
that you can use to browse and preview data in S3.)
You can explore genome tracks and the JSON files as shown below. The first screencast demonstrates three sequence tracks (variant, alignment, and annotation):
You can use the “View as” menu, upper right, to view the data as JSON or an IGV visualization.
The final screencast demonstrates multiple loci displayed side-by-side in split panes within the IGV visualization plugin:
Learn more
Post questions in the comments or visit https://quiltdata.com/ to learn more about integrating wet and dry science datasets with Quilt data packages.
Appendix: Visualization in the web catalog
The Quilt Data web catalog facilitates dynamic reporting, currently with native support for the following libraries for visualization, exploration, and presentation:
- Vega
- Vega-lite
- ECharts
- Voila (Developer preview)
- Perspective
Previewing genome sequence files
The Quilt web catalog currently supports visual preview of large FASTQ, SAM, and VCF files.