HaplotypeCaller does not output reference haplotype in bamout

seruseru BergenMember ✭✭
edited August 2014 in Ask the GATK team

Hi,

I am using HC in GATK v3.2-2-gec30cee and trying to visualize the locally reassembled reads in IGV. For one sample, which is hetRA, the bam file produced by -bamout shows the reads and the variant as expected. For another which is called as homRR in this position, I don't see any reads at all, although MIN_DP in this block is 26 according to GVCF. Also viewing the input bam to HC shows a bunch of reads supporting the reference allele.

I tried using -AR and -forceActive to ensure the locus is processed, and different options for bamWriteType and output_mode. Still no success in getting the assembled reads in the bamout file. When running with the -debug flag, I can see that the site is processed and called homRR ("Found only the reference haplotype in the assembly graph"), but no reads are printed to the bamout. Is this the intended behaviour? Is there a way of getting these reads out into the bamout file?

Thank you,
Pawel

Tagged:

Best Answer

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @seru‌

    Hello Pawel,

    The bamout file only contains the output of active regions. So, if a region is not found to have the potential for variation, it will have no output in the bamout file. Please read about how active regions are determined here: http://www.broadinstitute.org/gatk/guide/article?id=4147

    Even forcing a region to be active will not work if there is no evidence for variation present.

    If you would like to see how many reads have the reference allele and how many do not at the homRR sites, you can use BP_RESOLUTION mode when calling variants. BP_RESOLUTION mode outputs the AD which gives you the number of reference reads and the number of non-reference reads. Unfortunately, it is not possible to actually see the reads in igv for a non-active region.

    -Sheila

  • KurtKurt Member ✭✭✭
    edited August 2014

    @sero, @Sheila‌,

    I think he can use;

    --bamWriterType ALL_POSSIBLE_HAPLOTYPES

    There are some caveats with this;

    See;

    http://gatkforums.broadinstitute.org/discussion/comment/11932#Comment_11932

  • KurtKurt Member ✭✭✭

    Sorry, I'm an idiot. Please disregard.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Kurt‌

    Yes, I had to think for a bit there as well. Your suggestion works well when you want to see every possible haplotype that was considered, not just the most likely ones.

    Thanks,
    Sheila

  • seruseru BergenMember ✭✭

    Thank you for your answers. So, if I understand correctly your answer @Sheila, there is currently no way of getting the locally assembled reads out, if no variant is detected by HC. Even when active regions are specified (-AR) and forced (-forceActive), and the --bamWriterType is set to spit out all the haplotypes (I had checked it @Kurt). In my opinion this is very counterintuitive. And more importanly, it makes impossible to visually inspect the regions processed by HC, if no variation is present.

    We have just moved from genotyping using UG to HC and are trying to get an overview of the differences on data from previous projects. In trio analysis, we choose to visually inspect all samples where putative de-novo variant is detected. With the HC we can see the nicely assembled reads in the proband. But for parents we can either look at pre-HC bam files which contain data differing from these reported in the (G)VCF files (because one is pre- and the other post-assembly), or bamOut files with no reads at all. Would you consider implementing in future versions of GATK an option to output also pure reference haplotypes in bamOut files? Or change the current (counterintuitive) behaviour of already present settings?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @seru There's a small misunderstanding here. When no evidence of variation is found, there is no difference between the pre-assembly and post-reassembly reads, so you can use the original bam to check the region in IGV. The HC's bam output is only useful in regions where evidence of variation relative to the reference has been detected. That is why bam output is disabled for regions without variation. In our view, producing an output containing pure reference haplotypes would be a waste of time and space since that would not contain any information that isn't already available in the original bam and the reference sequence.

  • seruseru BergenMember ✭✭

    Thank you for your reply Geraldine. I believe there can be a difference between pre- and post-assembly reads in spite of no variation being detected. This happens for instance when a mismatch in pre-HC bam (detected by UG; please see the attached IGV screenshot*) is removed by the local assembly in haplotype calling. Reads from this region are not pushed to the bamOut file because there is no variant found by HC. I have several such examples (mostly of lower quality variants must admit), but for sanity checking I would like to be able to inspect them. With the current settings I can only inspect regions where variants are called by the HC. And sometimes those that are not called can be equally important.

    And there is also a small practical difference: viewing one vs two bam files (per sample) in the IGV. Especially when looking at trios:)

    *) the screenshot shows input to HC (gatk.bam) and bamOutput file (gatk.hc.bam). The vcf tracks from the top are: 1) UG variants not called by HC, 2) UG variants, 3) HC variants

  • seruseru BergenMember ✭✭

    This would be excellent! Thank you.

  • seruseru BergenMember ✭✭

    Hi again,
    I was wondering if this issue has been considered the GATK developers, and if there is any plan to address it?
    Thank you,

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi @seru,

    Yes, we have a solution for this in development. I think the new code still has to go through code review, but I expect it will get done this week, so the new functionality should be available soon.

  • seruseru BergenMember ✭✭

    Fantastic! Thank you for the answer and your efforts.

Sign In or Register to comment.