Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Graphical (GUI) and interactive exploration tool for large genotype matrixes like 1KG or gnomAD.

WimSWimS Member ✭✭
edited May 20 in Ask the GATK team

Dear GATK development team and GATK users,

What is currently the best visual(GUI) and interactive genotype matrix exploration tool (a browser) for large genotype matrixes, say the 1000 human genomes VCF?
Or something between the 1000 genomes VCF and the gnomAD (15K genomes) VCF? The full VCF including the genotypes should be visualized and explorable, not just the variant sites.

So 100M plus variants, 1000+ samples, raw uncompressed VCF file size 1TB+.

One requirement is that it should do all kinds of filtering that 'bcftools view' or 'GATK VariantFiltration' does:
http://www.htslib.org/doc/bcftools.html#view
https://software.broadinstitute.org/gatk/documentation/tooldocs/current/org_broadinstitute_hellbender_tools_walkers_filters_VariantFiltration.php#--filter-expression

But then in an interactive and visual way (graphical user interface).
Queries are return within seconds, and a (paged) variant and genotype table is shown. And maybe even summary stats for your current selection.

Does something like this already exist? If so which tools? Or is it being build by some one? If not why not?

My preference would be:
1. An open source solution that builds on bcftools or GATK, or the HTS-JDK or HTSLib libraries. Maybe in combination with an open source big data backend.
2. A standard commercial front end/analytical tool (e.g. SpotFire/Tableau) that takes in tab file created by BCFtools query or GATK VariantsToTable. Downside is of course that SpotFire/Tableau don't have any genomics/genetics domain logic that can be used for filtering the table. And a very big memory machine is needed, since all data is loaded to memory? Did anyone try this?
3. A standard commercial front end/analytical tool (e.g. SpotFire/Tableau) that somehow works with the domain logic of bcftools/GATK/HTS-JDK/HTSlib in the backend? Maybe with a 'bigdata' distributed or in memory database backend? e.g. Apache Spark ? Is this possible?
4. A custom commercial software front end tool that builds on top of the functionality/results of GATK GenotypeGVCFs or maybe even IntelGenomicsDB.

Thank you.

Post edited by WimS on

Answers

Sign In or Register to comment.