Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office on October 14, 2019, due to the U.S. holiday. We will return to monitoring the forum on October 15.

Prevent haplotypecaller from using soft-clipped proportion of the read

RaycuiRaycui Max-Planck Institute for the Biology of AgeingMember

Dear GATK team,
From what I read on the forum, it seems that HaplotypeCaller call variants (and perhaps calculates RGQ) using soft-clipped parts of the reads? Is there a parameter to suppress this behavior, making it throw out the soft-clipped proportions before calling?
Best regards,
Ray

Answers

  • RaycuiRaycui Max-Planck Institute for the Biology of AgeingMember
  • RaycuiRaycui Max-Planck Institute for the Biology of AgeingMember
    edited August 2015

    Hello again,
    not sure if it's related. But in the vcf file, the marked snps have DP=1 while there seems to be 2 reads (both mapping Q > 30, both have soft clipped bases at the ends). The weird thing is that the bases around the 3 snps all have DP=2. This is the vcf from GenotypeGVCF.

    GapFilledScaffold_1 23117 . T G 18.59 LowQual AC=2;AF=1.00;AN=2;DP=1;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=18.59;SOR=1.609 GT: AD:DP:GQ:PGT:PID:PL 1/1:0,1:1:3:1|1:23117_T_G:45,3,0 GapFilledScaffold_1 23118 . C . . . AN=2;DP=2 GT:AD:DP:RGQ 0/0:2:2:6 GapFilledScaffold_1 23119 . C . . . AN=2;DP=2 GT:AD:DP:RGQ 0/0:2:2:6 GapFilledScaffold_1 23120 . A G 18.59 LowQual AC=2;AF=1.00;AN=2;DP=1;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=18.59;SOR=1.609 GT: AD:DP:GQ:PGT:PID:PL 1/1:0,1:1:3:1|1:23117_T_G:45,3,0 GapFilledScaffold_1 23121 . T . . . AN=2;DP=2 GT:AD:DP:RGQ 0/0:2:2:6 GapFilledScaffold_1 23122 . C . . . AN=2;DP=2 GT:AD:DP:RGQ 0/0:2:2:6 GapFilledScaffold_1 23123 . G A 18.59 LowQual AC=2;AF=1.00;AN=2;DP=1;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;QD=18.59;SOR=1.609 GT: AD:DP:GQ:PGT:PID:PL 1/1:0,1:1:3:1|1:23117_T_G:45,3,0

  • SheilaSheila admin Broad InstituteMember, Broadie, Moderator admin

    @Raycui
    Hi Ray,

    Can you check the base qualities as well? I suspect one of the reads has a base that has a low base quality score.

    -Sheila

  • RaycuiRaycui Max-Planck Institute for the Biology of AgeingMember
    edited August 2015

    @Sheila said:
    Raycui
    Hi Ray,

    Can you check the base qualities as well? I suspect one of the reads has a base that has a low base quality score.

    -Sheila

    Hi Sheila,
    Thank you for the suggestion. However it seems that both base qualities are good, for example, the first snp "G" have 40 in one read and 37 in the other (I trimmed the data with quality > 25). One thing I notice is that the first read's mate is not mapped, does that make GATK to throw out both mates? But then it doesn't explain why in the non-SNP positions the depth is correctly given as 2.
    `
    Read name = HWI-D00427:53:C6B9TACXX:7:2309:12284:32865
    Location = GapFilledScaffold_1:23,117
    Alignment start = 23,045 (-)
    Cigar = 7S90M1I3M
    Mapped = yes
    Mapping quality = 35
    Secondary = no
    Supplementary = no
    Duplicate = no
    Failed QC = no
    Base = G
    Base phred quality = 40
    Mate is mapped = no
    First in pair
    MD = 6T4G14A0A4A3T5T29T2A2G14
    NM = 11
    AS = 40
    XS = 18
    Mate sequence: AAAGGTGCGACCTGTTCAGCGAACACGTCTACTACTGAACACGCCTGTTTCTTATGCACCAGAGCATTTTCAAAGGAATGTTTTATGAAAATTTAAAAGAT
    Alignment start position = GapFilledScaffold_1:23045
    GATGACTGACCACAGCAGACTTCTCTCTGTGAGTGCTGGCGCTCCGGCCAGACTGCTGCTTAGCCGTTTTACACGGCGTGCCGTCAGAGTCCTGGAATGAA

    Read name = HWI-D00427:53:C6B9TACXX:7:1113:2851:85111
    Location = GapFilledScaffold_1:23,117
    Alignment start = 23,082 (-)
    Cigar = 53M48S
    Mapped = yes
    Mapping quality = 60
    Secondary = no
    Supplementary = no
    Duplicate = no
    Failed QC = no
    Base = G
    Base phred quality = 37
    Mate is mapped = yes
    Mate start = GapFilledScaffold_1:22979 (+)
    Insert size = -155
    First in pair
    Pair orientation = F2R1
    MD = 5T29T2A2G11
    NM = 4
    AS = 33
    XS = 18
    Alignment start position = GapFilledScaffold_1:23082
    CGGCCAGACTGCTGCTTAGCCGTTTTACACGGCGTGCCGTCAGAGTCCTGGAATGAAACAACAACAGAAACATGGAGGTTATATTGAAATGGTTTTTTTCT
    `

  • SheilaSheila admin Broad InstituteMember, Broadie, Moderator admin

    @Raycui
    Hi Ray,

    Are you looking at the original bam file or the bamout file? Haplotype Caller does a reassembly step that may change the positions of reads. Can you post the bamout file here? https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_haplotypecaller_HaplotypeCaller.php#--bamOutput

    Thanks,
    Sheila

  • RaycuiRaycui Max-Planck Institute for the Biology of AgeingMember

    Hi Sheila, should I do CALLED_HAPLOTYPES or ALL_POSSIBLE_HAPLOTYPES?

  • RaycuiRaycui Max-Planck Institute for the Biology of AgeingMember

    @Sheila said:
    Raycui
    Hi Ray,

    Are you looking at the original bam file or the bamout file? Haplotype Caller does a reassembly step that may change the positions of reads. Can you post the bamout file here? https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_haplotypecaller_HaplotypeCaller.php#--bamOutput

    Thanks,
    Sheila

    Ok, I'm running this now. But one note is that bam emission is not supported if I use -nct option, which wasn't what I used before, so not sure if results will be comparable.

  • RaycuiRaycui Max-Planck Institute for the Biology of AgeingMember
    edited August 2015

    @Sheila said:
    Raycui
    Hi Ray,

    Are you looking at the original bam file or the bamout file? Haplotype Caller does a reassembly step that may change the positions of reads. Can you post the bamout file here? https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_haplotypecaller_HaplotypeCaller.php#--bamOutput

    Thanks,
    Sheila

    Hi Sheila,
    now that I'm looking at the bamout file, it doesn't seem to make much sense because HaplotypeCaller seemed to have reconstructed several artificial haplotypes in this region (5 in total), but somehow the output is still DP=2 for non invariant sites and DP=1 for the snps.
    Now I also notice that both of these two reads in the input bam have PG=MarkDuplicates. But Duplicate=no so probably that doesn't mean they were marked by Picard as PCR duplicates? I'm confident that PCR duplicate marking shouldn't be the problem because 1) the reads look quite different 2) This is a PCR-free library so by definition there shouldn't be any PCR duplicates 3) even if one read was thrown out due to PCR duplicate, the depth on non variant positions should also be 1, but currently it's 2.

    Direct GVCF output from HaplotypeCaller:

    GapFilledScaffold_1 23117 . T G,<NON_REF> 20 . DP=1;MLEAC=1,0;MLEAF=0.500,0.00;MQ=60.00 GT :AD:DP:GQ:PGT:PID:PL:SB 1/1:0,1,0:1:3:0|1:23117_T_G:45,3,0,45,3,45:0,0,0,1 GapFilledScaffold_1 23118 . C <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:2,0:2:6:0,6,87 GapFilledScaffold_1 23119 . C <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:2,0:2:6:0,6,90 GapFilledScaffold_1 23120 . A G,<NON_REF> 20 . DP=1;MLEAC=1,0;MLEAF=0.500,0.00;MQ=60.00 GT:AD:DP:GQ:PGT:PID:PL:SB 1/1:0,1,0:1:3:0|1:23117_T_G:45,3,0,45,3,45:0,0,0,1 GapFilledScaffold_1 23121 . T <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:2,0:2:6:0,6,87 GapFilledScaffold_1 23122 . C <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:2,0:2:6:0,6,87 GapFilledScaffold_1 23123 . G A,<NON_REF> 20 . DP=1;MLEAC=1,0;MLEAF=0.500,0.00;MQ=60.00 GT:AD:DP:GQ:PGT:PID:PL:SB 1/1:0,1,0:1:3:0|1:23117_T_G:45,3,0,45,3,45:0,0,0,1

    GenotypeGVCF output:

    GapFilledScaffold_1 23117 . T G 18.59 LowQual AC=2;AF=1.00;AN=2;DP=1;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.0 0;QD=18.59;SOR=1.609 GT:AD:DP:GQ:PGT:PID:PL 1/1:0,1:1:3:1|1:23117_T_G:45,3,0 GapFilledScaffold_1 23118 . C . . . AN=2;DP=2 GT:AD:DP:RGQ 0/0:2:2:6 GapFilledScaffold_1 23119 . C . . . AN=2;DP=2 GT:AD:DP:RGQ 0/0:2:2:6 GapFilledScaffold_1 23120 . A G 18.59 LowQual AC=2;AF=1.00;AN=2;DP=1;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.0 0;QD=18.59;SOR=1.609 GT:AD:DP:GQ:PGT:PID:PL 1/1:0,1:1:3:1|1:23117_T_G:45,3,0 GapFilledScaffold_1 23121 . T . . . AN=2;DP=2 GT:AD:DP:RGQ 0/0:2:2:6 GapFilledScaffold_1 23122 . C . . . AN=2;DP=2 GT:AD:DP:RGQ 0/0:2:2:6 GapFilledScaffold_1 23123 . G A 18.59 LowQual AC=2;AF=1.00;AN=2;DP=1;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.0 0;QD=18.59;SOR=1.609 GT:AD:DP:GQ:PGT:PID:PL 1/1:0,1:1:3:1|1:23117_T_G:45,3,0

    Read details from top to bottom in the input bam (after markduplicates and indel realignment)

    Read name = HWI-D00427:53:C6B9TACXX:7:2309:12284:32865
    Sample = 11CTD
    Read group = 11CTD
    Location = GapFilledScaffold_1:23,106
    Alignment start = 23,045 (-)
    Cigar = 7S90M1I3M
    Mapped = yes
    Mapping quality = 35
    Secondary = no
    Supplementary = no
    Duplicate = no
    Failed QC = no
    Base = T
    Base phred quality = 40
    Mate is mapped = no
    First in pair
    MD = 6T4G14A0A4A3T5T29T2A2G14
    PG = MarkDuplicates
    RG = 11CTD
    NM = 11
    AS = 40
    XS = 18
    Mate sequence: AAAGGTGCGACCTGTTCAGCGAACACGTCTACTACTGAACACGCCTGTTTCTTATGCACCAGAGCATTTTCAAAGGAATGTTTTATGAAAATTTAAAAGAT
    Alignment start position = GapFilledScaffold_1:23045
    GATGACTGACCACAGCAGACTTCTCTCTGTGAGTGCTGGCGCTCCGGCCAGACTGCTGCTTAGCCGTTTTACACGGCGTGCCGTCAGAGTCCTGGAATGAA

    Read name = HWI-D00427:53:C6B9TACXX:7:1113:2851:85111
    Sample = 11CTD
    Read group = 11CTD
    Location = GapFilledScaffold_1:23,098
    Alignment start = 23,082 (-)
    Cigar = 53M48S
    Mapped = yes
    Mapping quality = 60
    Secondary = no
    Supplementary = no
    Duplicate = no
    Failed QC = no
    Base = T
    Base phred quality = 33
    Mate is mapped = yes
    Mate start = GapFilledScaffold_1:22979 (+)
    Insert size = -155
    First in pair
    Pair orientation = F2R1
    MC = 47S50M4S
    MD = 5T29T2A2G11
    PG = MarkDuplicates
    RG = 11CTD
    NM = 4
    MQ = 40
    AS = 33
    XS = 18
    Alignment start position = GapFilledScaffold_1:23082
    CGGCCAGACTGCTGCTTAGCCGTTTTACACGGCGTGCCGTCAGAGTCCTGGAATGAAACAACAACAGAAACATGGAGGTTATATTGAAATGGTTTTTTTCT

    Read details from top to bottom in debug bamout:

    Read name = HC236
    Sample = HC
    Read group = ArtificialHaplotype

    Location = GapFilledScaffold_1:23,117
    Alignment start = 23,067 (-)
    Cigar = 77M
    Mapped = yes
    Mapping quality = 60
    Secondary = no
    Supplementary = no
    Duplicate = no
    Failed QC = no

    Base = G
    Base phred quality = 33

    HC = -832289720
    RG = ArtificialHaplotype

    Alignment start position = GapFilledScaffold_1:23067
    TGAGAACTGGAGCTTCGGCCAGACTGCTGCTTAGCCGTTTTACACGGCGTGCCGTCAGAGTCCTGGAAGAAGGCAAC

    Read name = HC239
    Sample = HC
    Read group = ArtificialHaplotype

    Location = GapFilledScaffold_1:23,117
    Alignment start = 23,067 (-)
    Cigar = 77M
    Mapped = yes
    Mapping quality = 60
    Secondary = no
    Supplementary = no
    Duplicate = no
    Failed QC = no

    Base = G
    Base phred quality = 33

    HC = -1738477733
    RG = ArtificialHaplotype

    Alignment start position = GapFilledScaffold_1:23067
    TGAGAACTGGAGCTTCGGCCTGACTGCTGCTTAGCCGTTTTACACGGCGTGCCGTCAGAGTCCTGGAAGAAGGCAAC

    Read name = HC238
    Sample = HC
    Read group = ArtificialHaplotype

    Location = GapFilledScaffold_1:23,117
    Alignment start = 23,067 (-)
    Cigar = 77M
    Mapped = yes
    Mapping quality = 60
    Secondary = no
    Supplementary = no
    Duplicate = no
    Failed QC = no

    Base = T
    Base phred quality = 33

    HC = 453936609
    RG = ArtificialHaplotype

    Alignment start position = GapFilledScaffold_1:23067
    TGAGAACTGGAGCTTCGGCCAGACTGCTGCTTAGCCGTTTTACACGGCGTTCCATCGGAGTCCTGGAAGAAGGCAAC

    Read name = HC237
    Sample = HC
    Read group = ArtificialHaplotype

    Location = GapFilledScaffold_1:23,117
    Alignment start = 23,067 (-)
    Cigar = 77M
    Mapped = yes
    Mapping quality = 60
    Secondary = no
    Supplementary = no
    Duplicate = no
    Failed QC = no

    Base = T
    Base phred quality = 33

    HC = -452251404
    RG = ArtificialHaplotype

    Alignment start position = GapFilledScaffold_1:23067
    TGAGAACTGGAGCTTCGGCCTGACTGCTGCTTAGCCGTTTTACACGGCGTTCCATCGGAGTCCTGGAAGAAGGCAAC

    Read name = HWI-D00427:53:C6B9TACXX:7:1113:2851:85111
    Sample = 11CTD
    Read group = 11CTD

    Location = GapFilledScaffold_1:23,117
    Alignment start = 23,082 (-)
    Cigar = 53M
    Mapped = yes
    Mapping quality = 60
    Secondary = no
    Supplementary = no
    Duplicate = no
    Failed QC = no

    Base = G
    Base phred quality = 37

    Mate is mapped = yes
    Mate start = GapFilledScaffold_1:22979 (+)
    Insert size = -155
    First in pair
    Pair orientation = F2R1

    HC = -832289720
    MC = 47S50M4S
    MD = 5T29T2A2G11
    PG = MarkDuplicates
    RG = 11CTD
    NM = 4
    MQ = 40
    AS = 33
    XS = 18

    Alignment start position = GapFilledScaffold_1:23082
    CGGCCAGACTGCTGCTTAGCCGTTTTACACGGCGTGCCGTCAGAGTCCTGGAA

  • RaycuiRaycui Max-Planck Institute for the Biology of AgeingMember

    Screen shot from top to bottom: 1) raw bam output from BWA-mem, 2) after indel realignment 3) bamout debug from HaplotypeCaller

  • SheilaSheila admin Broad InstituteMember, Broadie, Moderator admin

    @Raycui
    Hi Ray,

    I think the 1 read is not being used because its mate is not mapped. Haplotype Caller applies read filters, and reads that do not pass are not included in DP for variant sites.

    -Sheila

  • RaycuiRaycui Max-Planck Institute for the Biology of AgeingMember

    @Sheila said:
    Raycui
    Hi Ray,

    I think the 1 read is not being used because its mate is not mapped. Haplotype Caller applies read filters, and reads that do not pass are not included in DP for variant sites.

    -Sheila

    Thanks Sheila. But there doesn't seem to be any filter applied in HaplotypeCaller that deals with unmapped mates according to the docs:

    Read filters
    These Read Filters are automatically applied to the data by the Engine before processing by HaplotypeCaller.

    NotPrimaryAlignmentFilter
    FailsVendorQualityCheckFilter
    DuplicateReadFilter
    UnmappedReadFilter
    MappingQualityUnavailableFilter
    HCMappingQualityFilter
    MalformedReadFilter
    BadCigarFilter

    **This also doesn't explain why Depth is inconsistent between adjacent invariant and SNP sites if one read is filtered. Please see DP values in the gVCF **

  • Geraldine_VdAuweraGeraldine_VdAuwera admin Cambridge, MAMember, Administrator, Broadie admin
    edited August 2015

    There are a few minor differences in how DP is calculated at variant vs nonvariant sites. And there are some reasons why individual reads will occasionally be discounted within HaplotypeCaller. Ultimately it is not always possible to have perfect accounting for every single read without delving deep into the HC's internal operations. This is typically not a problem when you gave sufficient coverage to make accurate calls. In this case you simply do not have enough coverage in any case, and I'm afraid we cannot devote any further resources to examining a call that would only be very uncertain at best given such low coverage.

  • RaycuiRaycui Max-Planck Institute for the Biology of AgeingMember

    @Geraldine_VdAuwera said:
    There are a few minor differences in how DP is calculated at variant vs nonvariant sites. And there are some reasons why individual reads will occasionally be discounted within HaplotypeCaller. Ultimately it is not always possible to have perfect accounting for every single read without delving deep into the HC's internal operations. This is typically not a problem when you gave sufficient coverage to make accurate calls. In this case you simply do not have enough coverage in any case, and I'm afraid we cannot devote any further resources to examining a call that would only be very uncertain at best given such low coverage.

    Dear Geraldine, thank you for the explanation.
    It's not uncommon for resequencing projects to use low coverage data (e.g. 1000 human genome, 3x-10x) in exchange for more samples, and depending on the type of analyses and methods involved, errors in the genotype calling can be tolerated as long as they do not introduce bias. Sometimes the benefit of increasing sample size outweights the disadvantage of errors in the call.
    On the other hand, please understand that I am not concerned about this particular call, but using this simple case to suggest inconsistencies in the behavior of GATK's HaplotypeCaller in the DP calculation. What I'm worried about is such behavior would also carry over to regions of higher coverage, without a well explained reason.
    I understand that resources are limited on your part to further investigate, but if you have more information on the HC operations I would be happy to take a look.

  • RaycuiRaycui Max-Planck Institute for the Biology of AgeingMember

    Dear Sheila, thank you for your help, I will try this again tomorrow when I'm back in the office.
    Best Regards,
    Ray

  • RaycuiRaycui Max-Planck Institute for the Biology of AgeingMember
    edited August 2015

    Dear Sheila,
    Unfortunately adding these two parameters still gave the same DP calculations in the vcf. I can also post the bamout if you want.
    Here is my complete command line:
    java -Xmx2g -jar ${GATKPATHNEW}/GenomeAnalysisTK.jar -T HaplotypeCaller -minPruning 1 -minDanglingBranchLength 1 -dontUseSoftClippedBases --bamOutput ${SAMPLENAME}.haplotypecaller.debug2.bam -R $REF_GENOME -I ${SAMPLENAME}.realigned.sorted.bam --emitRefConfidence BP_RESOLUTION --variant_index_type LINEAR --variant_index_parameter 128000 -o ${SAMPLENAME}.snps.raw2.vcf --min_mapping_quality_score 20

  • SheilaSheila admin Broad InstituteMember, Broadie, Moderator admin

    @Raycui
    Hi Ray,

    Yes, can you post the bamout? Please also color the reads by sample. It looks like in your original post, there is only 1 read. The other 4 are artificial haplotypes constructed by Haplotype Caller.

    -Sheila

  • RaycuiRaycui Max-Planck Institute for the Biology of AgeingMember

    @Sheila said:
    Raycui
    Hi Ray,

    Yes, can you post the bamout? Please also color the reads by sample. It looks like in your original post, there is only 1 read. The other 4 are artificial haplotypes constructed by Haplotype Caller.

    -Sheila

    Hi Sheila, now I notice that due to some reason HaplotypeCaller is not writing to the bamout any more. It creates the bam and the bai files, appending to the header, but doesn't write anything to them although there's vcf output. Perhaps bamout doesn't work with these two parameters?

  • SheilaSheila admin Broad InstituteMember, Broadie, Moderator admin

    @Raycui
    No.The bamout should work with those parameters.

  • RaycuiRaycui Max-Planck Institute for the Biology of AgeingMember

    @Sheila said:
    Raycui
    No.The bamout should work with those parameters.

    Em... now even if I remove the two parameters the bamout doesn't work any more... so far only seeing the header being written out, but nothing else. Do I need to wait till the end of the run to see things in the bamout? I remember that last time I ran it I saw output immediately... This is very strange.

  • SheilaSheila admin Broad InstituteMember, Broadie, Moderator admin

    @Raycui
    Hi Ray,

    Yes, you should wait until the end of the run.

    -Sheila

  • RaycuiRaycui Max-Planck Institute for the Biology of AgeingMember

    @Sheila said:
    Raycui
    Hi Ray,

    Yes, you should wait until the end of the run.

    -Sheila

    Hi Sheila, attached is the new bamout (first track), the old bamout without the two added parameters (second track) and the original bam. So it does seem to have only 1 read in each case?

  • SheilaSheila admin Broad InstituteMember, Broadie, Moderator admin

    @Raycui
    Hi Ray,

    Yes, the bamout only contains one read in the region. The other "reads" are the artificial haplotypes constructed by Haplotype Caller. I should have realized earlier what was going on! Sorry for dragging you through all this.

    -Sheila

  • RaycuiRaycui Max-Planck Institute for the Biology of AgeingMember

    @Sheila said:
    Raycui
    Hi Ray,

    Yes, the bamout only contains one read in the region. The other "reads" are the artificial haplotypes constructed by Haplotype Caller. I should have realized earlier what was going on! Sorry for dragging you through all this.

    -Sheila

    Hi Sheila, yes - that part makes sense. But still I'm not certain why one of the reads was dropped since it appears to pass all the quality filters. It's also strange about the different DP's for invariant and variant sites. Since if only one "real" read is included then intuitively all sites should have DP=1.
    For now I have decided to proceed with the samtools protocol for SNP calling after indel realignment instead of haplotype callers, since at least it seems to produce DP estimates more faithful to the input data. Thank you very much for all the help.

    Issue · Github
    by Sheila

    Issue Number
    1143
    State
    closed
    Last Updated
    Milestone
    Array
    Closed By
    chandrans
  • SheilaSheila admin Broad InstituteMember, Broadie, Moderator admin
    edited August 2015

    @Raycui
    Hi Ray,

    I just got some clarity from Laura, a developer here. I completely forgot that the GVCF contains the reference sites in blocks. The DP is usually not as accurate for reference sites, because there is only one DP for the entire block (median DP for all sites in block), which may contain many sites. The variant sites are accurate, because they contain single record for each of them in the GVCF.

    I hope this makes sense. Ultimately, it is up to you to decide whether you want exact read depths for all sites or accurate indel calls.

    -Sheila

Sign In or Register to comment.