Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Calling all variants in amplicon library with read counts
I am trying to determine the composition of a single gene library. My amplicon is 750 bp long and was sequenced with Illumina 2x75 bp paired-end kit. Because the amplicon library was constructed to contain 1 mutation / gene, the vast majority of my reads are wild type and reads with mutations are present at ~1-3% at each position. Sequencing being done over a short region, DepthOfCoverage gives average depth around 250,000 per position (though I was seeing about 100K when done with bowtie2+samtools).
I want to identify all mutations in my sample and count how many reads support each mutation. I don't want to call consensus genotypes or only see the most likely mutations, which is what seems to happen with HaplotypeCaller or MuTect2, respectively. Increasing the ploidy for HC or reducing the prior probabilities for MuTect2 changes the mutations I see, but I'm still not seeing nowhere near as many mutations as I expect to.
For example, I see some mutations with something like:
java -jar $GATK -T MuTect2 --heterozygosity 0.00001 --heterozygosity_stdev 0.001 --indel_heterozygosity 0.00001 --maxReadsInRegionPerSample 350000 -R ../reference.fa -I:tumor libraryN.sorted.bam -o fileN.raw2.vcf
Changing heterozygosity parameters gives me more mutations, but I suspect this is not a good way of going about this.
So, how could I get all mutations in the reads (with minimal filtering, base quality and good mapping only) and the counts for them? At this point I am more concerned with sensitivity than sequencing errors. Many mutations are deletions anyway, for which sequencing errors are unlikely.
I'm running GATK version 3.7-0-gcfedb67 with java version 1.2.0_121.
Thank you for your time,