This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
Graphical (GUI) and interactive exploration tool for large genotype matrixes like 1KG or gnomAD.
Dear GATK development team and GATK users,
What is currently the best visual(GUI) and interactive genotype matrix exploration tool (a browser) for large genotype matrixes, say the 1000 human genomes VCF?
Or something between the 1000 genomes VCF and the gnomAD (15K genomes) VCF? The full VCF including the genotypes should be visualized and explorable, not just the variant sites.
So 100M plus variants, 1000+ samples, raw uncompressed VCF file size 1TB+.
One requirement is that it should do all kinds of filtering that 'bcftools view' or 'GATK VariantFiltration' does:
But then in an interactive and visual way (graphical user interface).
Queries are return within seconds, and a (paged) variant and genotype table is shown. And maybe even summary stats for your current selection.
Does something like this already exist? If so which tools? Or is it being build by some one? If not why not?
My preference would be:
1. An open source solution that builds on bcftools or GATK, or the HTS-JDK or HTSLib libraries. Maybe in combination with an open source big data backend.
2. A standard commercial front end/analytical tool (e.g. SpotFire/Tableau) that takes in tab file created by BCFtools query or GATK VariantsToTable. Downside is of course that SpotFire/Tableau don't have any genomics/genetics domain logic that can be used for filtering the table. And a very big memory machine is needed, since all data is loaded to memory? Did anyone try this?
3. A standard commercial front end/analytical tool (e.g. SpotFire/Tableau) that somehow works with the domain logic of bcftools/GATK/HTS-JDK/HTSlib in the backend? Maybe with a 'bigdata' distributed or in memory database backend? e.g. Apache Spark ? Is this possible?
4. A custom commercial software front end tool that builds on top of the functionality/results of GATK GenotypeGVCFs or maybe even IntelGenomicsDB.