Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Variant called if HC is run on individual bam but variant not called on multiple input bam

ZakiZaki Member
edited February 2014 in Ask the GATK team

Dear Team,

I encounter a situation when ;

If I run Haplotype caller on one .bam file - Haplotype caller called the mutation (sanger validated)

java GenomeAnalysisTK.jar -T HaplotypeCaller -nct 15 -minPruning 2 \ -R genome.fa \ -I sample_1.bam \ -L TP53.bed \ -U ALLOW_N_CIGAR_READS \ -o sample_1.vcf

But, If i run Haplotype caller with multiple bam as input - Haplotype caller failed to call the mutation in the sample mentioned above

java GenomeAnalysisTK.jar -T HaplotypeCaller -nct 15 -minPruning 2 \ -R genome.fa \ -I sample_1.bam -I sample_2.bam -I sample_3 .bam ..... -I sample_19.bam \ -L TP53.bed \ -U ALLOW_N_CIGAR_READS \ -o multisample.vcf

I used the -bamOutput option to view the .bam file HC would expect to see when calling the mutation. Please find below IGV screenshot

java GenomeAnalysisTK.jar -T HaplotypeCaller -nct 15 -minPruning 2 \ -R genome.fa \ -I sample_1.bam -I sample_2.bam -I sample_3 .bam ..... -I sample_19.bam \ -L TP53.bed \ -U ALLOW_N_CIGAR_READS \ --bamOutput multiple_all.bam --bamWriterType ALL_POSSIBLE_HAPLOTYPES \ -o multisample.vcf

1st row is HC run only on sample_1.bam
2nd row is HC run on multiple.bam input - using the --bamOutput option alone
3rd row is HC run on multiple.bam input - using the --bamOutput option + --bamWriterType ALL_POSSIBLE_HAPLOTYPES

The coverage on the 1st row (IGV) of sample1 is 262 (Ref(C):3, Mut(A):259)
The coverage on the 3rd row (IGV) file at the position of interest is 3627(Ref(C):3411, Mut(A):215, G:1 , DEL:13)

Could the reason why HC unable to call the mutation (when multiple.bam are used as input) is because the postion of interest is has too much noise ~3 base upstream? (shown in screen shot below)

Any help understanding why mulisample bam input would fail to call the variant would me most helpful.

Thanks in advance

Answers

Sign In or Register to comment.