If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Haplotype depends on kmer size - how to get just 2 haplos from haplotypeCaller
I am using HaplotypeCaller in order to get haplotype sequences from individual samples (several samples per species) for gene tree/species tree analysis. The reads are from an exome capture experiment. Because I am running individual samples I have limited the max # of haplotypes to 2. However, the default behavior of using two kmer size (10 and 25) results in up to four haplotypes per exon (interval) in the bamout file. I have found that if I supply a kmerSize parameter I get only 2 haplotypes but these differ depending on the kmer I supply. The difference is not only subsetting of the snps found with multiple kmer sizes but distinct snps called with different kmer sizes as well. I would like to run the analysis with multiple kmerSizes specified and have the caller only output the two most likely haplotypes. Is this possible and, if so, how can I do it? Or, am I misunderstanding how the caller works?
I think I understand why different kmer sizes would result in different snps called but if anyone could explain it to me I'd love confirmation.
Here is my original command line before experimenting with kmer sizes:
java -jar /opt/local/NGS/GenomeAnalysisTK-3.4-46/GenomeAnalysisTK.jar -T HaplotypeCaller -R /Users/bdorsey/Documents/Dioon/Capture_seqs_assembly/captured_seqs_uniq.fa -I /Volumes/HD2/Capture_assembly/Dioon1/contigs/Dioon1_m1n350r.10x.sp5.bam -L /Volumes/HD2/Capture_assembly/Dioon1/exonsCov10sp5.list --activeRegionIn /Volumes/HD2/Capture_assembly/Dioon1/exonsCov10sp5.list --maxNumHaplotypesInPopulation 2 --minReadsPerAlignmentStart 5 -out_mode EMIT_ALL_SITES -ERC BP_RESOLUTION --forceActive --dontTrimActiveRegions --activeRegionMaxSize 10000 -bamWriterType CALLED_HAPLOTYPES --disableOptimizations -bamout /Volumes/HD2/Capture_assembly/Dioon1/haplo/Dioon1.haplos.bam -o /Volumes/HD2/Capture_assembly/Dioon1/haplo/Dioon1.haplos.g.vcf
Thanks very much for any help.