We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
We will be out of the office for a Broad Institute event from Dec 10th to Dec 11th 2019. We will be back to monitor the GATK forum on Dec 12th 2019. In the meantime we encourage you to help out other community members with their queries.
Thank you for your patience!
Calling all variants in amplicon library with read counts
I am trying to determine the composition of a single gene library. My amplicon is 750 bp long and was sequenced with Illumina 2x75 bp paired-end kit. Because the amplicon library was constructed to contain 1 mutation / gene, the vast majority of my reads are wild type and reads with mutations are present at ~1-3% at each position. Sequencing being done over a short region, DepthOfCoverage gives average depth around 250,000 per position (though I was seeing about 100K when done with bowtie2+samtools).
I want to identify all mutations in my sample and count how many reads support each mutation. I don't want to call consensus genotypes or only see the most likely mutations, which is what seems to happen with HaplotypeCaller or MuTect2, respectively. Increasing the ploidy for HC or reducing the prior probabilities for MuTect2 changes the mutations I see, but I'm still not seeing nowhere near as many mutations as I expect to.
For example, I see some mutations with something like:
java -jar $GATK -T MuTect2 --heterozygosity 0.00001 --heterozygosity_stdev 0.001 --indel_heterozygosity 0.00001 --maxReadsInRegionPerSample 350000 -R ../reference.fa -I:tumor libraryN.sorted.bam -o fileN.raw2.vcf
Changing heterozygosity parameters gives me more mutations, but I suspect this is not a good way of going about this.
So, how could I get all mutations in the reads (with minimal filtering, base quality and good mapping only) and the counts for them? At this point I am more concerned with sensitivity than sequencing errors. Many mutations are deletions anyway, for which sequencing errors are unlikely.
I'm running GATK version 3.7-0-gcfedb67 with java version 1.2.0_121.
Thank you for your time,