This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!
Calling all variants in amplicon library with read counts
I am trying to determine the composition of a single gene library. My amplicon is 750 bp long and was sequenced with Illumina 2x75 bp paired-end kit. Because the amplicon library was constructed to contain 1 mutation / gene, the vast majority of my reads are wild type and reads with mutations are present at ~1-3% at each position. Sequencing being done over a short region, DepthOfCoverage gives average depth around 250,000 per position (though I was seeing about 100K when done with bowtie2+samtools).
I want to identify all mutations in my sample and count how many reads support each mutation. I don't want to call consensus genotypes or only see the most likely mutations, which is what seems to happen with HaplotypeCaller or MuTect2, respectively. Increasing the ploidy for HC or reducing the prior probabilities for MuTect2 changes the mutations I see, but I'm still not seeing nowhere near as many mutations as I expect to.
For example, I see some mutations with something like:
java -jar $GATK -T MuTect2 --heterozygosity 0.00001 --heterozygosity_stdev 0.001 --indel_heterozygosity 0.00001 --maxReadsInRegionPerSample 350000 -R ../reference.fa -I:tumor libraryN.sorted.bam -o fileN.raw2.vcf
Changing heterozygosity parameters gives me more mutations, but I suspect this is not a good way of going about this.
So, how could I get all mutations in the reads (with minimal filtering, base quality and good mapping only) and the counts for them? At this point I am more concerned with sensitivity than sequencing errors. Many mutations are deletions anyway, for which sequencing errors are unlikely.
I'm running GATK version 3.7-0-gcfedb67 with java version 1.2.0_121.
Thank you for your time,