Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

GATK4: RMSMappingQuality results differ between v4.0.0.0 and v4.1.1.0

dbailleuldbailleul FranceMember
Good morning everybody and thanks in advance for your advices and your help.
I checked for this problem before submitting this question. I hope this is not a double.

We are working with whole genome sequencing and SNP identification, with the GATK best pratices workflow.
We got strange results with a change of version (we usually work with GATK v4.0.0.0, but we switched recently to v4.1.1.0), so we conducted a little test.

We took 40 BAM files, produced with GATK v4.0.0.0 and we followed the workflow twice to obtain VCF files, one with GATK v4.0.0.0 and the other with GATK v4.1.1.0.

Then, we followed the advices about Hard-filtering germline short variants (id 11069 of GATK documentation).
So we filtered on MQ < 40, SOR > 3 and FS > 60 as recommended, and we obtained a dramatical decrease of SNP number between versions.
3.14M SNP for v4.0 against 615k SNP for v4.1.

We plot the FS, SOR and MQ values obtained with the two worflows.

FS v4.0/v4.1 is a strict line, i.e. the FS values did not change between versions.
This is the kind of results we want.

(see first figure FS on left up, sorry, not old enough to include links)

SOR results are a little bit messier, but it's quite ok and that does not explain our strange results on SNP filtration.

(see second figure SOR on right up)

MQ results seem to be the problems.

(see third figure MQ in the middle of the page)

The MQ results are systematically lower with v4.1 than with 4.0.
You can see on individual graph that most of the MQ v4.1 values return a FAIL as inferior to 40.

(see fourth and fifth figures on the bottom of the page)

I don't know how to explain these results.

I went to Github and found the Improve MQ calculation accuracy (#4969) change.
Apparently, there was an amelioration of the MQ calculation, with others tests implemented.
But MQ values are supposed not to change with versions, I guess.

Are the MQ tests the same in v4.0 and v4.1 ?
Is there another test that should be used, instead of MQ ?
Did I miss anything obvious ?
Thanks for reading !

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @dbailleul

    Can you please post the exact commands you are using for variant calling with the two different versions.

  • dbailleuldbailleul FranceMember

    Good evening @bhanuGandham.
    The commands are the same for each version (v4.0 and v4.1.1), only the module load with the chosen version of GATK is changing.

    So, for variant calling, we used:

    gatk --java-options "-Xmx15g" HaplotypeCaller  \
       -R ${REF} \
       -I ${IN} \
       -O ${OUT}/${SAMPLE}.g.vcf.gz \
       -ERC GVCF \
       -ploidy ${PLOIDY_N}
    

    Then we worked per chromosom:

    while getopts ":c:" opt; do
      case $opt in
        c) CHR=${OPTARG};;
      esac
    done
    
    gatk --java-options "-Xmx100g" CombineGVCFs \
            -R ${REF} \
            -L ${CHR} \
            --variant ${PATHTOGVCF}/SAMPLE01.g.vcf.gz \
            --variant ${PATHTOGVCF}/SAMPLE02.g.vcf.gz \
            [...]
            --variant ${PATHTOGVCF}/SAMPLE40.g.vcf.gz \
            -O ${OUT}/MetaGenotypes${CHR}.g.vcf.gz
    

    Then we genotyped:

    while getopts ":c:" opt; do
      case $opt in
        c) CHR=${OPTARG};;
      esac
    done
    
    gatk --java-options "-Xmx80g" GenotypeGVCFs  \
        -R ${REF} \
        -V ${PATHTOCHROMO}/MetaGenotypes${CHR}.g.vcf.gz \
        --use-new-qual-calculator \
        -O ${OUT}/MetaGenotypesCalled${CHR}.vcf.gz
    

    And at the end, we concatenated into one vcf:

    bcftools concat -f vcfs.list -o MetaGenotypesCalledDiplo.vcf.gz -O z
    tabix MetaGenotypesCalledDiplo.vcf.gz
    

    Thanks in advance for your help !

  • emeryjemeryj Member, Broadie

    Hello @dbailleul

    To give you some background on MQ changes you pointed to, we were previously systematically over-counting RMSMapping quality because reference depth was not being accounted for correctly. This would lead to some sites with MQ of 120+ which was meaningless. We have changed the calculation to keep the correct depth information based on the reads that were actually tallied. We only expect this to make a difference in a relatively small number of cases

    I have a few questions about your run. Is this human data? What value are you using for ploidy? Do your input bams have a significant quantity of unusual mapping quality reads like MQ 255 or MQ 0 reads? How is your bam aligned?

    Would it be possible to get a bam snippet for one of the sites where the mapping quality dropped significantly?

  • dbailleuldbailleul FranceMember

    Good afternoon @emeryj and thanks for taking time to help us.

    @emeryj said:
    Is this human data? What value are you using for ploidy? Do your input bams have a significant quantity of unusual mapping quality reads like MQ 255 or MQ 0 reads? How is your bam aligned?

    Our data are about bees.

    We are using PLOIDY = 2.

    I've checked the 40 individuals.
    We have a high majority of MQ values equal to 60 in our bam and few 0 values.
    I've produced some plots and each has the same shape.
    I've only attached one plot here.
    You have one chromosome (16) for one individual (number 24) with the distribution of the MQ values.

    Our .bam are aligned with this command:

    bwa mem -M -R @RG(...)| samtools view -bh -o ${SAMPLE}_aligned.bam -

    @emeryj said:
    Would it be possible to get a bam snippet for one of the sites where the mapping quality dropped significantly?

    Of course !
    I've uploaded GATK_v_change_MQ.zip on ftp.broadinstitute.org.
    The SNP in question is on chromosome 16, position 6210534.
    With GATK v4.0, his MQ value is 68.61.
    With GATK v4.1.1, his MQ value become 26.12.
    I've snippeted a 1000bp window around this SNP.

  • emeryjemeryj Member, Broadie

    Thank you for answering our questions, we will try rerunning the data you provided to figure out what happened, in the meantime however it may be helpful to see the line in the combined gvcf for that site as that might clue us in to where to look.

  • emeryjemeryj Member, Broadie

    Also, what is the correct reference to use for these files?

  • dbailleuldbailleul FranceMember

    @emeryj said:
    Also, what is the correct reference to use for these files?

    The reference is Apis Mellifera HAv3: https://www.ncbi.nlm.nih.gov/assembly/GCA_003254395.1/

    I'm still working on your first question !
    Thanks again.

  • dbailleuldbailleul FranceMember

    @emeryj said:
    in the meantime however it may be helpful to see the line in the combined gvcf for that site as that might clue us in to where to look.

    I hope this is what you asked for:
    For GATK v4.0.0
    CM009936.1 6210534 . C <NON_REF> . . . GT:DP:GQ:MIN_DP:PL ./.:9:27:9:0,27,299 ./.:8:21:7:0,21,235 ./.:6:15:5:0,15,162 ./.:10:24:9:0,24,360 ./.:7:21:7:0,21,229 ./.:8:21:7:0,21,240 ./.:5:15:5:0,15,142 ./.:19:51:19:0,51,765 ./.:7:21:7:0,21,221 ./.:6:15:5:0,15,170 ./.:10:27:10:0,27,405 ./.:10:21:9:0,21,315 ./.:9:21:8:0,21,315 ./.:8:21:7:0,21,233 ./.:4:9:3:0,9,103 ./.:7:18:6:0,18,193 ./.:4:12:4:0,12,132 ./.:6:18:6:0,18,204 ./.:1:3:1:0,3,31 ./.:9:21:8:0,21,315 ./.:6:18:6:0,18,194 ./.:12:33:12:0,33,495 ./.:19:51:17:0,51,536 ./.:9:27:9:0,27,283 ./.:14:39:13:0,39,408 ./.:11:30:10:0,30,315 ./.:5:15:5:0,15,163 ./.:7:21:7:0,21,224 ./.:9:21:8:0,21,315 ./.:10:30:10:0,30,332 ./.:10:30:10:0,30,330 ./.:11:30:10:0,30,313 ./.:5:12:4:0,12,138 ./.:13:36:12:0,36,373 ./.:12:36:12:0,36,368 ./.:8:18:6:0,18,212 ./.:1:3:1:0,3,32 ./.:7:21:7:0,21,232 ./.:10:30:10:0,30,315 ./.:8:21:8:0,21,315

    For GATK v4.1.1
    CM009936.1 6210534 . C <NON_REF> . . . GT:DP:GQ:MIN_DP:PL ./.:9:27:9:0,27,299 ./.:8:21:7:0,21,235 ./.:6:15:5:0,15,162 ./.:10:24:9:0,24,360 ./.:7:21:7:0,21,229 ./.:8:21:7:0,21,240 ./.:5:15:5:0,15,142 ./.:19:51:19:0,51,765 ./.:7:21:7:0,21,221 ./.:6:15:5:0,15,170 ./.:10:27:10:0,27,405 ./.:10:21:9:0,21,315 ./.:9:21:8:0,21,315 ./.:8:21:7:0,21,233 ./.:4:9:3:0,9,103 ./.:7:18:6:0,18,193 ./.:4:12:4:0,12,132 ./.:6:18:6:0,18,204 ./.:1:3:1:0,3,31 ./.:9:21:8:0,21,315 ./.:6:18:6:0,18,194 ./.:12:33:12:0,33,495 ./.:19:51:17:0,51,536 ./.:9:27:9:0,27,283 ./.:14:39:13:0,39,408 ./.:11:30:10:0,30,315 ./.:5:15:5:0,15,163 ./.:7:21:7:0,21,224 ./.:9:21:8:0,21,315 ./.:10:30:10:0,30,332 ./.:10:30:10:0,30,330 ./.:11:30:10:0,30,313 ./.:5:12:4:0,12,138 ./.:13:36:12:0,36,373 ./.:12:36:12:0,36,368 ./.:8:18:6:0,18,212 ./.:1:3:1:0,3,32 ./.:7:21:7:0,21,232 ./.:10:30:10:0,30,315 ./.:8:21:8:0,21,315

    Do not hesitate if these information are not what you wanted !

  • emeryjemeryj Member, Broadie
    edited July 10

    @dbailleul Sorry about the delay. Two questions, are you sure that is the right data above? They are exactly equal strings which isn't what I expect, are you sure you copied the right things over? Why don't I see INFO field annotations at all in your data? Did you disable annotations at some stage for this computation, because when I run on the samples you sent us I see annotations and some different depth values?

    Its also worth noting, those fields you sent me look like they don't correspond to the data you sent. The samples appear to be snippets of data on CM009946.1 as opposed to CM009936.1 that you sent. Nevertheless there do appear to be differences when I run combineGVCFs on the samples at the same (chromosome shifted) site after running HaplotypeCaller according to your script:
    (4.0.0.0)
    CM009946.1 6210534 . A T,*,<NON_REF> . . BaseQRankSum=0.00;ClippingRankSum=0.00;DP=238;ExcessHet=3.01;MQRankSum=0.00;RAW_MQ=122400.00;ReadPosRankSum=-3.190e-01 GT:AD:DP:GQ:MIN_DP:PGT:PID:PL:SB ./.:.:7:0:7:.:.:0,0,71,0,71,71,0,71,71,71 ./.:.:15:0:15:.:.:0,0,238,0,238,238,0,238,238,238 ./.:.:8:0:8:.:.:0,0,13,0,13,13,0,13,13,13 ./.:.:3:6:2:.:.:0,6,59,6,59,59,6,59,59,59 ./.:.:9:0:9:.:.:0,0,225,0,225,225,0,225,225,225 ./.:1,0,0,0:1:3:.:0|1:6210516_C_CT:0,3,34,3,34,34,3,34,34,34:1,0,0,0 ./.:.:10:0:10:.:.:0,0,191,0,191,191,0,191,191,191 ./.:3,1,0,0:4:18:.:.:.:18,0,110,27,113,139,27,113,139,139:3,0,1,0 ./.:7,0,0,0:7:24:.:0|1:6210534_A_T:0,24,305,24,305,305,24,305,305,305:4,3,0,0 ./.:.:12:0:12:.:.:0,0,52,0,52,52,0,52,52,52 ./.:.:7:0:7:.:.:0,0,165,0,165,165,0,165,165,165 ./.:.:9:0:9:.:.:0,0,244,0,244,244,0,244,244,244 ./.:.:4:0:2:.:.:0,0,0,0,0,0,0,0,0,0 ./.:3,1,0,0:4:19:.:0|1:6210516_C_CT:19,0,71,28,74,102,28,74,102,102:3,0,1,0 ./.:.:3:6:2:.:.:0,6,64,6,64,64,6,64,64,64 ./.:.:4:0:4:.:.:0,0,63,0,63,63,0,63,63,63 ./.:.:4:0:4:.:.:0,0,59,0,59,59,0,59,59,59 ./.:.:5:0:5:.:.:0,0,100,0,100,100,0,100,100,100 ./.:.:1:0:1:.:.:0,0,0,0,0,0,0,0,0,0 ./.:.:10:0:10:.:.:0,0,117,0,117,117,0,117,117,117 ./.:3,1,0,0:4:20:.:.:.:20,0,105,29,108,136,29,108,136,136:2,1,1,0 ./.:2,0,0,0:2:6:.:0|1:6210534_A_T:0,6,119,6,119,119,6,119,119,119:0,2,0,0 ./.:1,0,0,0:1:6:.:0|1:6210534_A_T:0,6,173,6,173,173,6,173,173,173:1,0,0,0 ./.:.:10:27:9:.:.:0,27,271,27,271,271,27,271,271,271 ./.:.:2:0:2:.:.:0,0,0,0,0,0,0,0,0,0 ./.:.:12:0:12:.:.:0,0,2,0,2,2,0,2,2,2 ./.:.:6:0:6:.:.:0,0,0,0,0,0,0,0,0,0 ./.:1,0,0,0:1:3:.:0|1:6210516_C_CT:0,3,93,3,93,93,3,93,93,93:1,0,0,0 ./.:.:12:0:12:.:.:0,0,247,0,247,247,0,247,247,247 ./.:2,0,0,0:2:6:.:0|1:6210516_C_CT:0,6,64,6,64,64,6,64,64,64:2,0,0,0 ./.:.:7:0:7:.:.:0,0,152,0,152,152,0,152,152,152 ./.:6,0,0,0:6:18:.:.:.:0,18,266,18,266,266,18,266,266,266:3,3,0,0 ./.:.:11:0:11:.:.:0,0,149,0,149,149,0,149,149,149 ./.:.:16:0:16:.:.:0,0,209,0,209,209,0,209,209,209 ./.:1,0,0,0:1:3:.:0|1:6210534_A_T:0,3,47,3,47,47,3,47,47,47:1,0,0,0 ./.:.:10:24:10:.:.:0,24,360,24,360,360,24,360,360,360 ./.:.:3:0:3:.:.:0,0,33,0,33,33,0,33,33,33 ./.:.:6:0:6:.:.:0,0,71,0,71,71,0,71,71,71 ./.:1,0,0,0:1:3:.:1|0:6210516_C_CT:0,3,45,3,45,45,3,45,45,45:0,1,0,0 ./.:.:2:0:2:.:.:0,0,0,0,0,0,0,0,0,0

    (4.1.1.0)
    CM009946.1 6210534 . A T,*,<NON_REF> . . BaseQRankSum=0.00;DP=210;ExcessHet=3.01;MQRankSum=0.00;RAW_MQandDP=122400,34;ReadPosRankSum=-3.190e-01 GT:AD:DP:GQ:MIN_DP:PGT:PID:PL:PS:SB ./.:.:4:9:4:.:.:0,9,135,9,135,135,9,135,135,135 ./.:.:4:0:4:.:.:0,0,69,0,69,69,0,69,69,69 ./.:.:8:0:8:.:.:0,0,13,0,13,13,0,13,13,13 ./.:.:3:6:2:.:.:0,6,59,6,59,59,6,59,59,59 ./.:.:9:0:9:.:.:0,0,225,0,225,225,0,225,225,225 .|.:1,0,0,0:1:3:.:0|1:6210516_C_CT:0,3,34,3,34,34,3,34,34,34:6210516:1,0,0,0 ./.:.:10:0:10:.:.:0,0,191,0,191,191,0,191,191,191 ./.:3,1,0,0:4:18:.:.:.:18,0,110,27,113,139,27,113,139,139:.:3,0,1,0 .|.:7,0,0,0:7:24:.:0|1:6210534_A_T:0,24,305,24,305,305,24,305,305,305:6210534:4,3,0,0 ./.:.:12:0:12:.:.:0,0,52,0,52,52,0,52,52,52 ./.:.:7:0:7:.:.:0,0,165,0,165,165,0,165,165,165 ./.:.:9:0:9:.:.:0,0,244,0,244,244,0,244,244,244 ./.:.:1:3:1:.:.:0,3,31,3,31,31,3,31,31,31 .|.:3,1,0,0:4:19:.:0|1:6210516_C_CT:19,0,71,28,74,102,28,74,102,102:6210516:3,0,1,0 ./.:.:3:6:2:.:.:0,6,64,6,64,64,6,64,64,64 ./.:.:4:0:4:.:.:0,0,63,0,63,63,0,63,63,63 ./.:.:4:0:4:.:.:0,0,59,0,59,59,0,59,59,59 ./.:.:5:0:5:.:.:0,0,100,0,100,100,0,100,100,100 ./.:.:1:0:1:.:.:0,0,0,0,0,0,0,0,0,0 ./.:.:10:0:10:.:.:0,0,117,0,117,117,0,117,117,117 ./.:3,1,0,0:4:20:.:.:.:20,0,105,29,108,136,29,108,136,136:.:2,1,1,0 .|.:2,0,0,0:2:6:.:0|1:6210534_A_T:0,6,119,6,119,119,6,119,119,119:6210534:0,2,0,0 .|.:1,0,0,0:1:6:.:0|1:6210534_A_T:0,6,173,6,173,173,6,173,173,173:6210534:1,0,0,0 ./.:.:10:27:9:.:.:0,27,271,27,271,271,27,271,271,271 ./.:.:2:0:2:.:.:0,0,0,0,0,0,0,0,0,0 ./.:.:12:0:12:.:.:0,0,2,0,2,2,0,2,2,2 ./.:.:6:0:6:.:.:0,0,0,0,0,0,0,0,0,0 .|.:1,0,0,0:1:3:.:0|1:6210516_C_CT:0,3,93,3,93,93,3,93,93,93:6210516:1,0,0,0 ./.:.:12:0:12:.:.:0,0,247,0,247,247,0,247,247,247 .|.:2,0,0,0:2:6:.:0|1:6210516_C_CT:0,6,64,6,64,64,6,64,64,64:6210516:2,0,0,0 ./.:.:7:0:7:.:.:0,0,152,0,152,152,0,152,152,152 ./.:6,0,0,0:6:18:.:.:.:0,18,266,18,266,266,18,266,266,266:.:3,3,0,0 ./.:.:11:0:11:.:.:0,0,149,0,149,149,0,149,149,149 ./.:.:5:0:5:.:.:0,0,94,0,94,94,0,94,94,94 .|.:1,0,0,0:1:3:.:0|1:6210534_A_T:0,3,47,3,47,47,3,47,47,47:6210534:1,0,0,0 ./.:.:10:24:10:.:.:0,24,360,24,360,360,24,360,360,360 ./.:.:3:0:3:.:.:0,0,33,0,33,33,0,33,33,33 ./.:.:6:0:6:.:.:0,0,71,0,71,71,0,71,71,71 .|.:1,0,0,0:1:3:.:1|0:6210516_C_CT:0,3,45,3,45,45,3,45,45,45:6210516:0,1,0,0 ./.:.:1:0:0:.:.:0,0,0,0,0,0,0,0,0,0

    Investigating the differences seem to largely be related to slightly different reference confidence blocking (causing the DP count to change somewhat), this is expected and generally shouldn't affect the new MQ calculation. Upon Genotyping the same site I found that the newer version of GATK resulted in MQ=60 (was 68) at that site:
    (4.0.0.0)
    CM009946.1 6210534 . A T 52.04 . AC=3;AF=0.043;AN=70;BaseQRankSum=0.00;ClippingRankSum=0.00;DP=238;ExcessHet=6.9029;FS=2.722;InbreedingCoeff=-0.2335;MLEAC=7;MLEAF=0.100;MQ=68.61;MQRankSum=0.00;QD=4.34;ReadPosRankSum=-3.190e-01;SOR=1.609 GT:AD:DP:GQ:PGT:PID:PL 0/0:7,0:7:0:.:.:0,0,71 0/0:15,0:15:0:.:.:0,0,238 0/0:8,0:8:0:.:.:0,0,13 0/0:2,0:2:6:.:.:0,6,59 0/0:9,0:9:0:.:.:0,0,225 0/0:1,0:1:3:0|1:6210516_C_CT:0,3,34 0/0:10,0:10:0:.:.:0,0,191 0/1:3,1:4:18:.:.:18,0,110 0/0:7,0:7:24:0|1:6210534_A_T:0,24,305 0/0:12,0:12:0:.:.:0,0,52 0/0:7,0:7:0:.:.:0,0,165 0/0:9,0:9:0:.:.:0,0,244 ./.:2,0:2:.:.:.:0,0,0 0/1:3,1:4:19:0|1:6210516_C_CT:19,0,71 0/0:2,0:2:6:.:.:0,6,64 0/0:4,0:4:0:.:.:0,0,63 0/0:4,0:4:0:.:.:0,0,59 0/0:5,0:5:0:.:.:0,0,100 ./.:1,0:1:.:.:.:0,0,0 0/0:10,0:10:0:.:.:0,0,117 0/1:3,1:4:20:.:.:20,0,105 0/0:2,0:2:6:0|1:6210534_A_T:0,6,119 0/0:1,0:1:6:0|1:6210534_A_T:0,6,173 0/0:9,0:9:27:.:.:0,27,271 ./.:2,0:2:.:.:.:0,0,0 0/0:12,0:12:0:.:.:0,0,2 ./.:6,0:6:.:.:.:0,0,0 0/0:1,0:1:3:0|1:6210516_C_CT:0,3,93 0/0:12,0:12:0:.:.:0,0,247 0/0:2,0:2:6:0|1:6210516_C_CT:0,6,64 0/0:7,0:7:0:.:.:0,0,152 0/0:6,0:6:18:.:.:0,18,266 0/0:11,0:11:0:.:.:0,0,149 0/0:16,0:16:0:.:.:0,0,209 0/0:1,0:1:3:0|1:6210534_A_T:0,3,47 0/0:10,0:10:24:.:.:0,24,360 0/0:3,0:3:0:.:.:0,0,33 0/0:6,0:6:0:.:.:0,0,71 0/0:1,0:1:3:1|0:6210516_C_CT:0,3,45 ./.:2,0:2:.:.:.:0,0,0

    (4.1.1.0)
    CM009946.1 6210534 . A T 48.89 . AC=3;AF=0.042;AN=72;BaseQRankSum=0.00;DP=210;ExcessHet=5.5273;FS=2.722;InbreedingCoeff=-0.2226;MLEAC=6;MLEAF=0.083;MQ=60.00;MQRankSum=0.00;QD=4.07;ReadPosRankSum=-3.190e-01;SOR=1.609 GT:AD:DP:GQ:PGT:PID:PL:PS 0/0:4,0:4:9:.:.:0,9,135 0/0:4,0:4:0:.:.:0,0,69 0/0:8,0:8:0:.:.:0,0,13 0/0:2,0:2:6:.:.:0,6,59 0/0:9,0:9:0:.:.:0,0,225 0|0:1,0:1:3:0|1:6210516_C_CT:0,3,34:6210516 0/0:10,0:10:0:.:.:0,0,191 0/1:3,1:4:18:.:.:18,0,110 0|0:7,0:7:24:0|1:6210534_A_T:0,24,305:6210534 0/0:12,0:12:0:.:.:0,0,52 0/0:7,0:7:0:.:.:0,0,165 0/0:9,0:9:0:.:.:0,0,244 0/0:1,0:1:3:.:.:0,3,31 0|1:3,1:4:19:0|1:6210516_C_CT:19,0,71:6210516 0/0:2,0:2:6:.:.:0,6,64 0/0:4,0:4:0:.:.:0,0,63 0/0:4,0:4:0:.:.:0,0,59 0/0:5,0:5:0:.:.:0,0,100 ./.:1,0:1:.:.:.:0,0,0 0/0:10,0:10:0:.:.:0,0,117 0/1:3,1:4:20:.:.:20,0,105 0|0:2,0:2:6:0|1:6210534_A_T:0,6,119:6210534 0|0:1,0:1:6:0|1:6210534_A_T:0,6,173:6210534 0/0:9,0:9:27:.:.:0,27,271 ./.:2,0:2:.:.:.:0,0,0 0/0:12,0:12:0:.:.:0,0,2 ./.:6,0:6:.:.:.:0,0,0 0|0:1,0:1:3:0|1:6210516_C_CT:0,3,93:6210516 0/0:12,0:12:0:.:.:0,0,247 0|0:2,0:2:6:0|1:6210516_C_CT:0,6,64:6210516 0/0:7,0:7:0:.:.:0,0,152 0/0:6,0:6:18:.:.:0,18,266 0/0:11,0:11:0:.:.:0,0,149 0/0:5,0:5:0:.:.:0,0,94 0|0:1,0:1:3:0|1:6210534_A_T:0,3,47:6210534 0/0:10,0:10:24:.:.:0,24,360 0/0:3,0:3:0:.:.:0,0,33 0/0:6,0:6:0:.:.:0,0,71 0|0:1,0:1:3:1|0:6210516_C_CT:0,3,45:6210516 ./.:0,0:0:.:.:.:0,0,0

    Am I looking at the correct site? This result looks more correct now and didn't dip to <60 MQ coverage. Indeed all of the variant sites from the 4.1.1.0 vcf have MQ60 after my run. Are you sure that you were using the correct pairing of GATK versions through every stage of both trials, that is you used gatk 4.1.1.0 with haplotype caller, combineGVCFs, and GenotypeGVCFs in sequence? Could I ask you to try running with the latest master of gatk? I was running off of a freshly built jar of gatk in both cases, were you using the 4.1.1.0 docker image when you ran your trials?

    Here are the commands I ran in case there is some obvious difference you see:
    ls . | grep .bam | xargs -I test ../gatk_clean/gatk HaplotypeCaller -R GCA_003254395.1_Amel_HAv3_genomic.fa -I test -O 4.1.1.0/test.g.vcf -ERC GVCF -ploidy 2 -L CM009946.1:6209905-6211008
    ../gatk_clean/gatk CombineGVCFs -R GCA_003254395.1_Amel_HAv3_genomic.fa $(echo $samples41) -O 4.1.1.0/combined.g.vcf -L CM009946.1:6209905-6211008
    ../gatk_clean/gatk GenotypeGVCFs -R GCA_003254395.1_Amel_HAv3_genomic.fa --use-new-qual-calculator -V 4.1.1.0/combined.g.vcf -O 4.1.1.0/genotyped.g.vcf

  • dbailleuldbailleul FranceMember

    I am more than sorry for the mistake.
    I was aiming at chromosome CM009946.1, the same as the data I sent.

    First of all, the "correct" results of the gvcf for this SNP with GATK v4.0.0.0:

    CM009946.1 6210534 . A T,*,<NON_REF> . . BaseQRankSum=0;ClippingRankSum=0;DP=238;ExcessHet=3.01;MQRankSum=0;RAW_MQ=122400;ReadPosRankSum=-0.319 GT:AD:DP:GQ:MIN_DP:PGT:PID:PL:SB ./.:.:7:0:7:.:.:0,0,71,0,71,71,0,71,71,71:. ./.:.:15:0:15:.:.:0,0,238,0,238,238,0,238,238,238:. ./.:.:8:0:8:.:.:0,0,13,0,13,13,0,13,13,13:. ./.:.:3:6:2:.:.:0,6,59,6,59,59,6,59,59,59:. ./.:.:9:0:9:.:.:0,0,225,0,225,225,0,225,225,225:. ./.:1,0,0,0:1:3:.:0|1:6210516_C_CT:0,3,34,3,34,34,3,34,34,34:1,0,0,0 ./.:.:10:0:10:.:.:0,0,191,0,191,191,0,191,191,191:. ./.:3,1,0,0:4:18:.:.:.:18,0,110,27,113,139,27,113,139,139:3,0,1,0 ./.:7,0,0,0:7:24:.:0|1:6210534_A_T:0,24,305,24,305,305,24,305,305,305:4,3,0,0 ./.:.:12:0:12:.:.:0,0,52,0,52,52,0,52,52,52:. ./.:.:7:0:7:.:.:0,0,165,0,165,165,0,165,165,165:. ./.:.:9:0:9:.:.:0,0,244,0,244,244,0,244,244,244:. ./.:.:4:0:2:.:.:0,0,0,0,0,0,0,0,0,0:. ./.:3,1,0,0:4:19:.:0|1:6210516_C_CT:19,0,71,28,74,102,28,74,102,102:3,0,1,0 ./.:.:3:6:2:.:.:0,6,64,6,64,64,6,64,64,64:. ./.:.:4:0:4:.:.:0,0,63,0,63,63,0,63,63,63:. ./.:.:4:0:4:.:.:0,0,59,0,59,59,0,59,59,59:. ./.:.:5:0:5:.:.:0,0,100,0,100,100,0,100,100,100:. ./.:.:1:0:1:.:.:0,0,0,0,0,0,0,0,0,0:. ./.:.:10:0:10:.:.:0,0,117,0,117,117,0,117,117,117:. ./.:3,1,0,0:4:20:.:.:.:20,0,105,29,108,136,29,108,136,136:2,1,1,0 ./.:2,0,0,0:2:6:.:0|1:6210534_A_T:0,6,119,6,119,119,6,119,119,119:0,2,0,0 ./.:1,0,0,0:1:6:.:0|1:6210534_A_T:0,6,173,6,173,173,6,173,173,173:1,0,0,0 ./.:.:10:27:9:.:.:0,27,271,27,271,271,27,271,271,271:. ./.:.:2:0:2:.:.:0,0,0,0,0,0,0,0,0,0:. ./.:.:12:0:12:.:.:0,0,2,0,2,2,0,2,2,2:. ./.:.:6:0:6:.:.:0,0,0,0,0,0,0,0,0,0:. ./.:1,0,0,0:1:3:.:0|1:6210516_C_CT:0,3,93,3,93,93,3,93,93,93:1,0,0,0 ./.:.:12:0:12:.:.:0,0,247,0,247,247,0,247,247,247:. ./.:2,0,0,0:2:6:.:0|1:6210516_C_CT:0,6,64,6,64,64,6,64,64,64:2,0,0,0 ./.:.:7:0:7:.:.:0,0,152,0,152,152,0,152,152,152:. ./.:6,0,0,0:6:18:.:.:.:0,18,266,18,266,266,18,266,266,266:3,3,0,0 ./.:.:11:0:11:.:.:0,0,149,0,149,149,0,149,149,149:. ./.:.:16:0:16:.:.:0,0,209,0,209,209,0,209,209,209:. ./.:1,0,0,0:1:3:.:0|1:6210534_A_T:0,3,47,3,47,47,3,47,47,47:1,0,0,0 ./.:.:10:24:10:.:.:0,24,360,24,360,360,24,360,360,360:. ./.:.:3:0:3:.:.:0,0,33,0,33,33,0,33,33,33:. ./.:.:6:0:6:.:.:0,0,71,0,71,71,0,71,71,71:. ./.:1,0,0,0:1:3:.:1|0:6210516_C_CT:0,3,45,3,45,45,3,45,45,45:0,1,0,0 ./.:.:2:0:2:.:.:0,0,0,0,0,0,0,0,0,0:.

    and the results for GATK v4.1.1.0:

    CM009946.1 6210534 . A T,*,<NON_REF> . . BaseQRankSum=0;ClippingRankSum=0;DP=238;ExcessHet=3.01;MQRankSum=0;RAW_MQ=7200;ReadPosRankSum=-0.319 GT:AD:DP:GQ:MIN_DP:PGT:PID:PL:SB ./.:.:7:0:7:.:.:0,0,71,0,71,71,0,71,71,71:. ./.:.:15:0:15:.:.:0,0,238,0,238,238,0,238,238,238:. ./.:.:8:0:8:.:.:0,0,13,0,13,13,0,13,13,13:. ./.:.:3:6:2:.:.:0,6,59,6,59,59,6,59,59,59:. ./.:.:9:0:9:.:.:0,0,225,0,225,225,0,225,225,225:. ./.:1,0,0,0:1:3:.:0|1:6210516_C_CT:0,3,34,3,34,34,3,34,34,34:1,0,0,0 ./.:.:10:0:10:.:.:0,0,191,0,191,191,0,191,191,191:. ./.:3,1,0,0:4:18:.:.:.:18,0,110,27,113,139,27,113,139,139:3,0,1,0 ./.:7,0,0,0:7:24:.:0|1:6210534_A_T:0,24,305,24,305,305,24,305,305,305:4,3,0,0 ./.:.:12:0:12:.:.:0,0,52,0,52,52,0,52,52,52:. ./.:.:7:0:7:.:.:0,0,165,0,165,165,0,165,165,165:. ./.:.:9:0:9:.:.:0,0,244,0,244,244,0,244,244,244:. ./.:.:4:0:2:.:.:0,0,0,0,0,0,0,0,0,0:. ./.:3,1,0,0:4:19:.:0|1:6210516_C_CT:19,0,71,28,74,102,28,74,102,102:3,0,1,0 ./.:.:3:6:2:.:.:0,6,64,6,64,64,6,64,64,64:. ./.:.:4:0:4:.:.:0,0,63,0,63,63,0,63,63,63:. ./.:.:4:0:4:.:.:0,0,59,0,59,59,0,59,59,59:. ./.:.:5:0:5:.:.:0,0,100,0,100,100,0,100,100,100:. ./.:.:1:0:1:.:.:0,0,0,0,0,0,0,0,0,0:. ./.:.:10:0:10:.:.:0,0,117,0,117,117,0,117,117,117:. ./.:3,1,0,0:4:20:.:.:.:20,0,105,29,108,136,29,108,136,136:2,1,1,0 ./.:2,0,0,0:2:6:.:0|1:6210534_A_T:0,6,119,6,119,119,6,119,119,119:0,2,0,0 ./.:1,0,0,0:1:6:.:0|1:6210534_A_T:0,6,173,6,173,173,6,173,173,173:1,0,0,0 ./.:.:10:27:9:.:.:0,27,271,27,271,271,27,271,271,271:. ./.:.:2:0:2:.:.:0,0,0,0,0,0,0,0,0,0:. ./.:.:12:0:12:.:.:0,0,2,0,2,2,0,2,2,2:. ./.:.:6:0:6:.:.:0,0,0,0,0,0,0,0,0,0:. ./.:1,0,0,0:1:3:.:0|1:6210516_C_CT:0,3,93,3,93,93,3,93,93,93:1,0,0,0 ./.:.:12:0:12:.:.:0,0,247,0,247,247,0,247,247,247:. ./.:2,0,0,0:2:6:.:0|1:6210516_C_CT:0,6,64,6,64,64,6,64,64,64:2,0,0,0 ./.:.:7:0:7:.:.:0,0,152,0,152,152,0,152,152,152:. ./.:6,0,0,0:6:18:.:.:.:0,18,266,18,266,266,18,266,266,266:3,3,0,0 ./.:.:11:0:11:.:.:0,0,149,0,149,149,0,149,149,149:. ./.:.:16:0:16:.:.:0,0,209,0,209,209,0,209,209,209:. ./.:1,0,0,0:1:3:.:0|1:6210534_A_T:0,3,47,3,47,47,3,47,47,47:1,0,0,0 ./.:.:10:24:10:.:.:0,24,360,24,360,360,24,360,360,360:. ./.:.:3:0:3:.:.:0,0,33,0,33,33,0,33,33,33:. ./.:.:6:0:6:.:.:0,0,71,0,71,71,0,71,71,71:. ./.:1,0,0,0:1:3:.:1|0:6210516_C_CT:0,3,45,3,45,45,3,45,45,45:0,1,0,0 ./.:.:2:0:2:.:.:0,0,0,0,0,0,0,0,0,0:.

  • dbailleuldbailleul FranceMember

    Secondly, these are the results extracted from the twe vcf build with both versions.
    You can see I can't type properly because the MQ value of this SNP with GATK v4.1.1.0 is 16.64 and not 26.12. I guess I was a bit tired lately.

    GATK v4.0.0.0

    CM009946.1 6210534 . A T 52.04 . AC=3;AF=0.043;AN=70;BaseQRankSum=0;ClippingRankSum=0;DP=238;ExcessHet=6.9029;FS=2.722;InbreedingCoeff=-0.2335;MLEAC=7;MLEAF=0.1;MQ=68.61;MQRankSum=0;QD=4.34;ReadPosRankSum=-0.319;SOR=1.609 GT:AD:DP:GQ:PGT:PID:PL 0/0:7,0:7:0:.:.:0,0,71 0/0:15,0:15:0:.:.:0,0,238 0/0:8,0:8:0:.:.:0,0,13 0/0:2,0:2:6:.:.:0,6,59 0/0:9,0:9:0:.:.:0,0,225 0/0:1,0:1:3:0|1:6210516_C_CT:0,3,34 0/0:10,0:10:0:.:.:0,0,191 0/1:3,1:4:18:.:.:18,0,110 0/0:7,0:7:24:0|1:6210534_A_T:0,24,305 0/0:12,0:12:0:.:.:0,0,52 0/0:7,0:7:0:.:.:0,0,165 0/0:9,0:9:0:.:.:0,0,244 ./.:2,0:2:.:.:.:0,0,0 0/1:3,1:4:19:0|1:6210516_C_CT:19,0,71 0/0:2,0:2:6:.:.:0,6,64 0/0:4,0:4:0:.:.:0,0,63 0/0:4,0:4:0:.:.:0,0,59 0/0:5,0:5:0:.:.:0,0,100 ./.:1,0:1:.:.:.:0,0,0 0/0:10,0:10:0:.:.:0,0,117 0/1:3,1:4:20:.:.:20,0,105 0/0:2,0:2:6:0|1:6210534_A_T:0,6,119 0/0:1,0:1:6:0|1:6210534_A_T:0,6,173 0/0:9,0:9:27:.:.:0,27,271 ./.:2,0:2:.:.:.:0,0,0 0/0:12,0:12:0:.:.:0,0,2 ./.:6,0:6:.:.:.:0,0,0 0/0:1,0:1:3:0|1:6210516_C_CT:0,3,93 0/0:12,0:12:0:.:.:0,0,247 0/0:2,0:2:6:0|1:6210516_C_CT:0,6,64 0/0:7,0:7:0:.:.:0,0,152 0/0:6,0:6:18:.:.:0,18,266 0/0:11,0:11:0:.:.:0,0,149 0/0:16,0:16:0:.:.:0,0,209 0/0:1,0:1:3:0|1:6210534_A_T:0,3,47 0/0:10,0:10:24:.:.:0,24,360 0/0:3,0:3:0:.:.:0,0,33 0/0:6,0:6:0:.:.:0,0,71 0/0:1,0:1:3:1|0:6210516_C_CT:0,3,45 ./.:2,0:2:.:.:.:0,0,0

    GATK v4.1.1.0

    CM009946.1 6210534 . A T 51.81 . AC=3;AF=0.043;AN=70;BaseQRankSum=0;ClippingRankSum=0;DP=238;ExcessHet=6.9029;FS=2.722;InbreedingCoeff=-0.2335;MLEAC=7;MLEAF=0.1;MQ=16.64;MQRankSum=0;QD=4.32;RAW_MQ=7200;ReadPosRankSum=-0.319;SOR=1.609 GT:AD:DP:GQ:PGT:PID:PL 0/0:7,0:7:0:.:.:0,0,71 0/0:15,0:15:0:.:.:0,0,238 0/0:8,0:8:0:.:.:0,0,13 0/0:2,0:2:6:.:.:0,6,59 0/0:9,0:9:0:.:.:0,0,225 0/0:1,0:1:3:0|1:6210516_C_CT:0,3,34 0/0:10,0:10:0:.:.:0,0,191 0/1:3,1:4:18:.:.:18,0,110 0/0:7,0:7:24:0|1:6210534_A_T:0,24,305 0/0:12,0:12:0:.:.:0,0,52 0/0:7,0:7:0:.:.:0,0,165 0/0:9,0:9:0:.:.:0,0,244 ./.:2,0:2:.:.:.:0,0,0 0/1:3,1:4:19:0|1:6210516_C_CT:19,0,71 0/0:2,0:2:6:.:.:0,6,64 0/0:4,0:4:0:.:.:0,0,63 0/0:4,0:4:0:.:.:0,0,59 0/0:5,0:5:0:.:.:0,0,100 ./.:1,0:1:.:.:.:0,0,0 0/0:10,0:10:0:.:.:0,0,117 0/1:3,1:4:20:.:.:20,0,105 0/0:2,0:2:6:0|1:6210534_A_T:0,6,119 0/0:1,0:1:6:0|1:6210534_A_T:0,6,173 0/0:9,0:9:27:.:.:0,27,271 ./.:2,0:2:.:.:.:0,0,0 0/0:12,0:12:0:.:.:0,0,2 ./.:6,0:6:.:.:.:0,0,0 0/0:1,0:1:3:0|1:6210516_C_CT:0,3,93 0/0:12,0:12:0:.:.:0,0,247 0/0:2,0:2:6:0|1:6210516_C_CT:0,6,64 0/0:7,0:7:0:.:.:0,0,152 0/0:6,0:6:18:.:.:0,18,266 0/0:11,0:11:0:.:.:0,0,149 0/0:16,0:16:0:.:.:0,0,209 0/0:1,0:1:3:0|1:6210534_A_T:0,3,47 0/0:10,0:10:24:.:.:0,24,360 0/0:3,0:3:0:.:.:0,0,33 0/0:6,0:6:0:.:.:0,0,71 0/0:1,0:1:3:1|0:6210516_C_CT:0,3,45 ./.:2,0:2:.:.:.:0,0,0

  • dbailleuldbailleul FranceMember

    @emeryj said:
    Are you sure that you were using the correct pairing of GATK versions through every stage of both trials, that is you used gatk 4.1.1.0 with haplotype caller, combineGVCFs, and GenotypeGVCFs in sequence?

    I just checked both in command lines and log files: yes.
    GATK v4.1.1.0 was used for Haplotype Caller, combineGVCFs, and GenotypeGVCFs.

    @emeryj said:
    Could I ask you to try running with the latest master of gatk? I was running off of a freshly built jar of gatk in both cases, were you using the 4.1.1.0 docker image when you ran your trials?

    I am running my commands on a SLURM cluster from my organization.
    I use module load to manage my program versions.
    I will ask the cluster maintainers where the GATK version 4.1.1.0 is coming from.

    Thanks a lot for your help.

  • dbailleuldbailleul FranceMember

    The cluster maintainers just answer me: they said they used this link
    https://github.com/broadinstitute/gatk/releases/download/4.1.1.0/gatk-4.1.1.0.zip

    Is this corresponding to the lastest master ?

  • emeryjemeryj Member, Broadie

    I tested with both the current master as well as the 4.1.1.0 version you used and in each case it appeared to work. Try rerunning with the latest version (that should be located here: https://github.com/broadinstitute/gatk/releases/download/4.1.2.0/gatk-4.1.2.0.zip). I'm fascinated as to why our executions of the code appear to be giving such different results at this site. The first issue that gives me pause is that I notice your 4.1.1.0 combined gvcf has these annotations: RAW_MQ=7200, as opposed to my 4.1.1.0 run which has the following annotation: RAW_MQandDP=122400,34. The later annotation is the correct annotation that should be produced by the new MQ calculation (it keeps track of only reads that were part of the MQ sum as opposed to reads from the GT field). Did you rerun haplotype caller on the bam using 4.1.1.0 before making these calls? The 4.1.1.0 haplotype should be outputting the RAW_MQandDP annotation by default. Were you running with any other arguments for your tools than the ones you posted above, do my commands seem consistent with this?

    There is an easy way to check this. Can you send me the GATKCommandLine header lines for an example HC output gvcf, the combined result, and the genotyped result for your 4.1.1.0 trial? It might illuminate what version you are using exactly in your configuration as well as what argument differences might be present. The lines should look like this:

    ##GATKCommandLine=<ID=HaplotypeCaller,CommandLine="HaplotypeCaller --sample-ploidy 2 --emit-ref-confidence GVCF --output 4.1.1.0/NCA9_CA GATC_L003_red.bam.g.vcf --intervals CM009946.1:6209905-6211008 --input NCA9_CAGATC_L003_red.bam --reference GCA_003254395.1_Amel_HAv3_gen omic.fa --use-new-qual-calculator true --use-old-qual-calculator false --annotate-with-num-discovered-alleles false --heterozygosity 0.0 01 --indel-heterozygosity 1.25E-4 --heterozygosity-stdev 0.01 --standard-min-confidence-threshold-for-calling 30.0 --max-alternate-allele s 6 --max-genotype-count 1024 --num-reference-samples-if-no-call 0 --genotyping-mode DISCOVERY --genotype-filtered-alleles false --contam ination-fraction-to-filter 0.0 --output-mode EMIT_VARIANTS_ONLY --all-site-pls false --gvcf-gq-bands 1 --gvcf-gq-bands 2 --gvcf-gq-bands 3 --gvcf-gq-bands 4 --gvcf-gq-bands 5 --gvcf-gq-bands 6 --gvcf-gq-bands 7 --gvcf-gq-bands 8 --gvcf-gq-bands 9 --gvcf-gq-bands 10 --gvcf-g q-bands 11 --gvcf-gq-bands 12 --gvcf-gq-bands 13 --gvcf-gq-bands 14 --gvcf-gq-bands 15 --gvcf-gq-bands 16 --gvcf-gq-bands 17 --gvcf-gq-ba nds 18 --gvcf-gq-bands 19 --gvcf-gq-bands 20 --gvcf-gq-bands 21 --gvcf-gq-bands 22 --gvcf-gq-bands 23 --gvcf-gq-bands 24 --gvcf-gq-bands 25 --gvcf-gq-bands 26 --gvcf-gq-bands 27 --gvcf-gq-bands 28 --gvcf-gq-bands 29 --gvcf-gq-bands 30 --gvcf-gq-bands 31 --gvcf-gq-bands 32 - -gvcf-gq-bands 33 --gvcf-gq-bands 34 --gvcf-gq-bands 35 --gvcf-gq-bands 36 --gvcf-gq-bands 37 --gvcf-gq-bands 38 --gvcf-gq-bands 39 --gvc f-gq-bands 40 --gvcf-gq-bands 41 --gvcf-gq-bands 42 --gvcf-gq-bands 43 --gvcf-gq-bands 44 --gvcf-gq-bands 45 --gvcf-gq-bands 46 --gvcf-gq -bands 47 --gvcf-gq-bands 48 --gvcf-gq-bands 49 --gvcf-gq-bands 50 --gvcf-gq-bands 51 --gvcf-gq-bands 52 --gvcf-gq-bands 53 --gvcf-gq-ban ds 54 --gvcf-gq-bands 55 --gvcf-gq-bands 56 --gvcf-gq-bands 57 --gvcf-gq-bands 58 --gvcf-gq-bands 59 --gvcf-gq-bands 60 --gvcf-gq-bands 7 0 --gvcf-gq-bands 80 --gvcf-gq-bands 90 --gvcf-gq-bands 99 --indel-size-to-eliminate-in-ref-model 10 --use-alleles-trigger false --disabl e-optimizations false --just-determine-active-regions false --dont-genotype false --max-mnp-distance 0 --do-not-run-physical-phasing fals e --use-filtered-reads-for-annotations false --correct-overlapping-quality false --adaptive-pruning false --do-not-recover-dangling-branc hes false --recover-dangling-heads false --consensus false --dont-trim-active-regions false --max-disc-ar-extension 25 --max-gga-ar-exten sion 300 --padding-around-indels 150 --padding-around-snps 20 --kmer-size 10 --kmer-size 25 --dont-increase-kmer-sizes-for-cycles false - -allow-non-unique-kmers-in-ref false --num-pruning-samples 1 --min-dangling-branch-length 4 --recover-all-dangling-branches false --max-n um-haplotypes-in-population 128 --min-pruning 2 --adaptive-pruning-initial-error-rate 0.001 --pruning-lod-threshold 1.0 --max-unpruned-va riants 100 --debug-assembly false --debug-graph-transformations false --capture-assembly-failure-bam false --error-correct-reads false -- kmer-length-for-read-error-correction 25 --min-observations-for-kmer-to-be-solid 20 --likelihood-calculation-engine PairHMM --base-qualit y-score-threshold 18 --pair-hmm-gap-continuation-penalty 10 --pair-hmm-implementation FASTEST_AVAILABLE --pcr-indel-model CONSERVATIVE -- phred-scaled-global-read-mismapping-rate 45 --native-pair-hmm-threads 4 --native-pair-hmm-use-double-precision false --bam-writer-type CA LLED_HAPLOTYPES --dont-use-soft-clipped-bases false --min-base-quality-score 10 --smith-waterman JAVA --min-assembly-region-size 50 --max -assembly-region-size 300 --assembly-region-padding 100 --max-reads-per-alignment-start 50 --active-probability-threshold 0.002 --max-pro b-propagation-distance 50 --force-active false --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-m erging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation fals e --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --len ient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buff er -1 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO -- QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays --disable-tool-defau lt-read-filters false --minimum-mapping-quality 20 --disable-tool-default-annotations false --enable-all-annotations false",Version="4.1. 1.0-SNAPSHOT",Date="July 10, 2019 10:39:00 AM EDT">

  • dbailleuldbailleul FranceMember

    @emeryj said:
    Can you send me the GATKCommandLine header lines for an example HC output gvcf, the combined result, and the genotyped result for your 4.1.1.0 trial? It might illuminate what version you are using exactly in your configuration as well as what argument differences might be present.```

    @emeryj, thank you so much !
    We've found the problem, thanks to you !

    I've exported the header lines as asked, and I was changing the paths to make it more readable.
    And I've found that, whereas I generated GVCF files with GATK v4.1.1.0, the GVCF files I was using for the v4.1.1.0 trial were in fact the GVCF files generated with v4.0.0.0.
    It is crystal clear with my paths renamed, but it was not at all with the original names.

    So apparently, when you use GVCF generated with HaplotypeCaller from v4.0.0.0, with GenotypeGVCFs from v4.1.1.0, there is a incompatibility with RAW_MQ values.

    Indeed (and knowing what to look for) we've found in the log file of GenotypeGVCFs this warning:
    06:15:11.129 WARN RMSMappingQuality - MQ annotation data is not properly formatted. This GATK version expects key RAW_MQandDP with an long tuple of sum of squared MQ values and total reads over variant genotypes as the value. Attempting to use deprecated MQ calculation.

    And in the header lines info:
    ##INFO=<ID=RAW_MQandDP,Number=2,Type=Integer,Description="Raw data (sum of squared MQ and total depth) for improved RMS Mapping Quality calculation. Incompatible with deprecated RAW_MQ formulation.

    So we tested with all GCVFs v4.1.1.0 and GenotypeGVCFs v4.1.1.0 and indeed everything was fine, just as your analysis results.

    Thanks you again for your help.
    We still would have been on this problem without you.
    For the real analysis, we were mixing GCVFs from different HaplotypeCaller versions ...

    Now we are generating new GCVFs with the latest version of GATK and I intend to create an automatic way to create scripts to use, with a version variable to avoid mixing GATK versions (and also a path variable, I promise).

    Here are the header lines you asked, that helped me to figure the problem:

    HaplotypeCaller

    ##GATKCommandLine=<ID=HaplotypeCaller,CommandLine="HaplotypeCaller  --sample-ploidy 2 --emit-ref-confidence GVCF --output /gvcfv41/NCA17_ATGTCA_L003.g.vcf.gz --input /bootstraping/NCA17_ATGTCA_L003.bam --reference /Apis_mellifera_HAV3.fa  --use-new-qual-calculator true --use-old-qual-calculator false --annotate-with-num-discovered-alleles false --heterozygosity 0.001 --indel-heterozygosity 1.25E-4 --heterozygosity-stdev 0.01 --standard-min-confidence-threshold-for-calling 30.0 --max-alternate-alleles 6 --max-genotype-count 1024 --num-reference-samples-if-no-call 0 --genotyping-mode DISCOVERY --genotype-filtered-alleles false --contamination-fraction-to-filter 0.0 --output-mode EMIT_VARIANTS_ONLY --all-site-pls false --gvcf-gq-bands 1 --gvcf-gq-bands 2 --gvcf-gq-bands 3 --gvcf-gq-bands 4 --gvcf-gq-bands 5 --gvcf-gq-bands 6 --gvcf-gq-bands 7 --gvcf-gq-bands 8 --gvcf-gq-bands 9 --gvcf-gq-bands 10 --gvcf-gq-bands 11 --gvcf-gq-bands 12 --gvcf-gq-bands 13 --gvcf-gq-bands 14 --gvcf-gq-bands 15 --gvcf-gq-bands 16 --gvcf-gq-bands 17 --gvcf-gq-bands 18 --gvcf-gq-bands 19 --gvcf-gq-bands 20 --gvcf-gq-bands 21 --gvcf-gq-bands 22 --gvcf-gq-bands 23 --gvcf-gq-bands 24 --gvcf-gq-bands 25 --gvcf-gq-bands 26 --gvcf-gq-bands 27 --gvcf-gq-bands 28 --gvcf-gq-bands 29 --gvcf-gq-bands 30 --gvcf-gq-bands 31 --gvcf-gq-bands 32 --gvcf-gq-bands 33 --gvcf-gq-bands 34 --gvcf-gq-bands 35 --gvcf-gq-bands 36 --gvcf-gq-bands 37 --gvcf-gq-bands 38 --gvcf-gq-bands 39 --gvcf-gq-bands 40 --gvcf-gq-bands 41 --gvcf-gq-bands 42 --gvcf-gq-bands 43 --gvcf-gq-bands 44 --gvcf-gq-bands 45 --gvcf-gq-bands 46 --gvcf-gq-bands 47 --gvcf-gq-bands 48 --gvcf-gq-bands 49 --gvcf-gq-bands 50 --gvcf-gq-bands 51 --gvcf-gq-bands 52 --gvcf-gq-bands 53 --gvcf-gq-bands 54 --gvcf-gq-bands 55 --gvcf-gq-bands 56 --gvcf-gq-bands 57 --gvcf-gq-bands 58 --gvcf-gq-bands 59 --gvcf-gq-bands 60 --gvcf-gq-bands 70 --gvcf-gq-bands 80 --gvcf-gq-bands 90 --gvcf-gq-bands 99 --indel-size-to-eliminate-in-ref-model 10 --use-alleles-trigger false --disable-optimizations false --just-determine-active-regions false --dont-genotype false --max-mnp-distance 0 --do-not-run-physical-phasing false --use-filtered-reads-for-annotations false --correct-overlapping-quality false --adaptive-pruning false --do-not-recover-dangling-branches false --recover-dangling-heads false --consensus false --dont-trim-active-regions false --max-disc-ar-extension 25 --max-gga-ar-extension 300 --padding-around-indels 150 --padding-around-snps 20 --kmer-size 10 --kmer-size 25 --dont-increase-kmer-sizes-for-cycles false --allow-non-unique-kmers-in-ref false --num-pruning-samples 1 --min-dangling-branch-length 4 --recover-all-dangling-branches false --max-num-haplotypes-in-population 128 --min-pruning 2 --adaptive-pruning-initial-error-rate 0.001 --pruning-lod-threshold 1.0 --max-unpruned-variants 100 --debug-assembly false --debug-graph-transformations false --capture-assembly-failure-bam false --error-correct-reads false --kmer-length-for-read-error-correction 25 --min-observations-for-kmer-to-be-solid 20 --likelihood-calculation-engine PairHMM --base-quality-score-threshold 18 --pair-hmm-gap-continuation-penalty 10 --pair-hmm-implementation FASTEST_AVAILABLE --pcr-indel-model CONSERVATIVE --phred-scaled-global-read-mismapping-rate 45 --native-pair-hmm-threads 4 --native-pair-hmm-use-double-precision false --bam-writer-type CALLED_HAPLOTYPES --dont-use-soft-clipped-bases false --min-base-quality-score 10 --smith-waterman JAVA --min-assembly-region-size 50 --max-assembly-region-size 300 --assembly-region-padding 100 --max-reads-per-alignment-start 50 --active-probability-threshold 0.002 --max-prob-propagation-distance 50 --force-active false --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays  --disable-tool-default-read-filters false --minimum-mapping-quality 20 --disable-tool-default-annotations false --enable-all-annotations false",Version="4.1.1.0",Date="3 juin 2019 16:11:21 CEST">
    

    CombineGVCFs

    ##GATKCommandLine=<ID=CombineGVCFs,CommandLine="CombineGVCFs  --output /gvcfv41/MetaGenotypesCM009946.1.g.vcf.gz --variant /gvcfv40/NCA10_ACTTGA_L003/calling/NCA10_ACTTGA_L003.g.vcf.gz --variant /gvcfv40/NCA11_GATCAG_L003/calling/NCA11_GATCAG_L003.g.vcf.gz --variant /gvcfv40/NCA12_TAGCTT_L003/calling/NCA12_TAGCTT_L003.g.vcf.gz --variant /gvcfv40/NCA13_GGCTAC_L003/calling/NCA13_GGCTAC_L003.g.vcf.gz --variant /gvcfv40/NCA14_CTTGTA_L003/calling/NCA14_CTTGTA_L003.g.vcf.gz --variant /gvcfv40/NCA15_AGTCAA_L003/calling/NCA15_AGTCAA_L003.g.vcf.gz --variant /gvcfv40/NCA16_AGTTCC_L003/calling/NCA16_AGTTCC_L003.g.vcf.gz --variant /gvcfv40/NCA17_ATGTCA_L003/calling/NCA17_ATGTCA_L003.g.vcf.gz --variant /gvcfv40/NCA18_CCGTCC_L003/calling/NCA18_CCGTCC_L003.g.vcf.gz --variant /gvcfv40/NCA19_GTCCGC_L003/calling/NCA19_GTCCGC_L003.g.vcf.gz --variant /gvcfv40/NCA1_ACTGAT_L002/calling/NCA1_ACTGAT_L002.g.vcf.gz --variant /gvcfv40/NCA20_GTGAAA_L003/calling/NCA20_GTGAAA_L003.g.vcf.gz --variant /gvcfv40/NCA21_GTGGCC_L003/calling/NCA21_GTGGCC_L003.g.vcf.gz --variant /gvcfv40/NCA22_GTTTCG_L003/calling/NCA22_GTTTCG_L003.g.vcf.gz --variant /gvcfv40/NCA23_CGTACG_L003/calling/NCA23_CGTACG_L003.g.vcf.gz --variant /gvcfv40/NCA24_GAGTGG_L003/calling/NCA24_GAGTGG_L003.g.vcf.gz --variant /gvcfv40/NCA25_ACTGAT_L003/calling/NCA25_ACTGAT_L003.g.vcf.gz --variant /gvcfv40/NCA26_ATTCCT_L003/calling/NCA26_ATTCCT_L003.g.vcf.gz --variant /gvcfv40/NCA27_ATCACG_L002/calling/NCA27_ATCACG_L002.g.vcf.gz --variant /gvcfv40/NCA28_CGATGT_L002/calling/NCA28_CGATGT_L002.g.vcf.gz --variant /gvcfv40/NCA29_TTAGGC_L002/calling/NCA29_TTAGGC_L002.g.vcf.gz --variant /gvcfv40/NCA2_ATTCCT_L002/calling/NCA2_ATTCCT_L002.g.vcf.gz --variant /gvcfv40/NCA30_TGACCA_L002/calling/NCA30_TGACCA_L002.g.vcf.gz --variant /gvcfv40/NCA31_ACAGTG_L002/calling/NCA31_ACAGTG_L002.g.vcf.gz --variant /gvcfv40/NCA32_GCCAAT_L002/calling/NCA32_GCCAAT_L002.g.vcf.gz --variant /gvcfv40/NCA33_CAGATC_L002/calling/NCA33_CAGATC_L002.g.vcf.gz --variant /gvcfv40/NCA34_ACTTGA_L002/calling/NCA34_ACTTGA_L002.g.vcf.gz --variant /gvcfv40/NCA35_GATCAG_L002/calling/NCA35_GATCAG_L002.g.vcf.gz --variant /gvcfv40/NCA36_TAGCTT_L002/calling/NCA36_TAGCTT_L002.g.vcf.gz --variant /gvcfv40/NCA37_GGCTAC_L002/calling/NCA37_GGCTAC_L002.g.vcf.gz --variant /gvcfv40/NCA38_CTTGTA_L002/calling/NCA38_CTTGTA_L002.g.vcf.gz --variant /gvcfv40/NCA39_AGTCAA_L002/calling/NCA39_AGTCAA_L002.g.vcf.gz --variant /gvcfv40/NCA3_ATCACG_L003/calling/NCA3_ATCACG_L003.g.vcf.gz --variant /gvcfv40/NCA40_AGTTCC_L002/calling/NCA40_AGTTCC_L002.g.vcf.gz --variant /gvcfv40/NCA4_CGATGT_L003/calling/NCA4_CGATGT_L003.g.vcf.gz --variant /gvcfv40/NCA5_TTAGGC_L003/calling/NCA5_TTAGGC_L003.g.vcf.gz --variant /gvcfv40/NCA6_TGACCA_L003/calling/NCA6_TGACCA_L003.g.vcf.gz --variant /gvcfv40/NCA7_ACAGTG_L003/calling/NCA7_ACAGTG_L003.g.vcf.gz --variant /gvcfv40/NCA8_GCCAAT_L003/calling/NCA8_GCCAAT_L003.g.vcf.gz --variant /gvcfv40/NCA9_CAGATC_L003/calling/NCA9_CAGATC_L003.g.vcf.gz --intervals CM009946.1 --reference /Apis_mellifera_HAV3.fa  --convert-to-base-pair-resolution false --break-bands-at-multiples-of 0 --input-is-somatic false --drop-somatic-filtering-annotations false --ignore-variants-starting-outside-interval false --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays  --disable-tool-default-read-filters false --disable-tool-default-annotations false --enable-all-annotations false",Version="4.1.1.0",Date="4 juin 2019 14:19:04 CEST">
    ##GATKCommandLine=<ID=HaplotypeCaller,CommandLine="HaplotypeCaller  --emit-ref-confidence GVCF --sample-ploidy 2 --output /gvcfv41/NCA10_ACTTGA_L003.g.vcf.gz --input /bootstraping/NCA10_ACTTGA_L003.bam --reference /Apis_mellifera_HAV3.fa  --annotation-group StandardAnnotation --annotation-group StandardHCAnnotation --gvcf-gq-bands 1 --gvcf-gq-bands 2 --gvcf-gq-bands 3 --gvcf-gq-bands 4 --gvcf-gq-bands 5 --gvcf-gq-bands 6 --gvcf-gq-bands 7 --gvcf-gq-bands 8 --gvcf-gq-bands 9 --gvcf-gq-bands 10 --gvcf-gq-bands 11 --gvcf-gq-bands 12 --gvcf-gq-bands 13 --gvcf-gq-bands 14 --gvcf-gq-bands 15 --gvcf-gq-bands 16 --gvcf-gq-bands 17 --gvcf-gq-bands 18 --gvcf-gq-bands 19 --gvcf-gq-bands 20 --gvcf-gq-bands 21 --gvcf-gq-bands 22 --gvcf-gq-bands 23 --gvcf-gq-bands 24 --gvcf-gq-bands 25 --gvcf-gq-bands 26 --gvcf-gq-bands 27 --gvcf-gq-bands 28 --gvcf-gq-bands 29 --gvcf-gq-bands 30 --gvcf-gq-bands 31 --gvcf-gq-bands 32 --gvcf-gq-bands 33 --gvcf-gq-bands 34 --gvcf-gq-bands 35 --gvcf-gq-bands 36 --gvcf-gq-bands 37 --gvcf-gq-bands 38 --gvcf-gq-bands 39 --gvcf-gq-bands 40 --gvcf-gq-bands 41 --gvcf-gq-bands 42 --gvcf-gq-bands 43 --gvcf-gq-bands 44 --gvcf-gq-bands 45 --gvcf-gq-bands 46 --gvcf-gq-bands 47 --gvcf-gq-bands 48 --gvcf-gq-bands 49 --gvcf-gq-bands 50 --gvcf-gq-bands 51 --gvcf-gq-bands 52 --gvcf-gq-bands 53 --gvcf-gq-bands 54 --gvcf-gq-bands 55 --gvcf-gq-bands 56 --gvcf-gq-bands 57 --gvcf-gq-bands 58 --gvcf-gq-bands 59 --gvcf-gq-bands 60 --gvcf-gq-bands 70 --gvcf-gq-bands 80 --gvcf-gq-bands 90 --gvcf-gq-bands 99 --indel-size-to-eliminate-in-ref-model 10 --use-alleles-trigger false --dont-trim-active-regions false --max-disc-ar-extension 25 --max-gga-ar-extension 300 --padding-around-indels 150 --padding-around-snps 20 --kmer-size 10 --kmer-size 25 --dont-increase-kmer-sizes-for-cycles false --allow-non-unique-kmers-in-ref false --num-pruning-samples 1 --recover-dangling-heads false --do-not-recover-dangling-branches false --min-dangling-branch-length 4 --consensus false --max-num-haplotypes-in-population 128 --error-correct-kmers false --min-pruning 2 --debug-graph-transformations false --kmer-length-for-read-error-correction 25 --min-observations-for-kmer-to-be-solid 20 --likelihood-calculation-engine PairHMM --base-quality-score-threshold 18 --pair-hmm-gap-continuation-penalty 10 --pair-hmm-implementation FASTEST_AVAILABLE --pcr-indel-model CONSERVATIVE --phred-scaled-global-read-mismapping-rate 45 --native-pair-hmm-threads 4 --native-pair-hmm-use-double-precision false --debug false --use-filtered-reads-for-annotations false --bam-writer-type CALLED_HAPLOTYPES --disable-optimizations false --just-determine-active-regions false --dont-genotype false --dont-use-soft-clipped-bases false --capture-assembly-failure-bam false --error-correct-reads false --do-not-run-physical-phasing false --min-base-quality-score 10 --smith-waterman JAVA --use-new-qual-calculator false --annotate-with-num-discovered-alleles false --heterozygosity 0.001 --indel-heterozygosity 1.25E-4 --heterozygosity-stdev 0.01 --standard-min-confidence-threshold-for-calling 10.0 --max-alternate-alleles 6 --max-genotype-count 1024 --genotyping-mode DISCOVERY --contamination-fraction-to-filter 0.0 --output-mode EMIT_VARIANTS_ONLY --all-site-pls false --min-assembly-region-size 50 --max-assembly-region-size 300 --assembly-region-padding 100 --max-reads-per-alignment-start 50 --active-probability-threshold 0.002 --max-prob-propagation-distance 50 --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --disable-tool-default-read-filters false --minimum-mapping-quality 20",Version=4.0.0.0,Date="September 1, 2018 7:33:07 AM CEST">
    

    GenotypeGVCFs

    ##GATKCommandLine=<ID=CombineGVCFs,CommandLine="CombineGVCFs  --output /gvcfv41/MetaGenotypesCM009946.1.g.vcf.gz --variant /gvcfv40/NCA10_ACTTGA_L003/calling/NCA10_ACTTGA_L003.g.vcf.gz --variant /gvcfv40/NCA11_GATCAG_L003/calling/NCA11_GATCAG_L003.g.vcf.gz --variant /gvcfv40/NCA12_TAGCTT_L003/calling/NCA12_TAGCTT_L003.g.vcf.gz --variant /gvcfv40/NCA13_GGCTAC_L003/calling/NCA13_GGCTAC_L003.g.vcf.gz --variant /gvcfv40/NCA14_CTTGTA_L003/calling/NCA14_CTTGTA_L003.g.vcf.gz --variant /gvcfv40/NCA15_AGTCAA_L003/calling/NCA15_AGTCAA_L003.g.vcf.gz --variant /gvcfv40/NCA16_AGTTCC_L003/calling/NCA16_AGTTCC_L003.g.vcf.gz --variant /gvcfv40/NCA17_ATGTCA_L003/calling/NCA17_ATGTCA_L003.g.vcf.gz --variant /gvcfv40/NCA18_CCGTCC_L003/calling/NCA18_CCGTCC_L003.g.vcf.gz --variant /gvcfv40/NCA19_GTCCGC_L003/calling/NCA19_GTCCGC_L003.g.vcf.gz --variant /gvcfv40/NCA1_ACTGAT_L002/calling/NCA1_ACTGAT_L002.g.vcf.gz --variant /gvcfv40/NCA20_GTGAAA_L003/calling/NCA20_GTGAAA_L003.g.vcf.gz --variant /gvcfv40/NCA21_GTGGCC_L003/calling/NCA21_GTGGCC_L003.g.vcf.gz --variant /gvcfv40/NCA22_GTTTCG_L003/calling/NCA22_GTTTCG_L003.g.vcf.gz --variant /gvcfv40/NCA23_CGTACG_L003/calling/NCA23_CGTACG_L003.g.vcf.gz --variant /gvcfv40/NCA24_GAGTGG_L003/calling/NCA24_GAGTGG_L003.g.vcf.gz --variant /gvcfv40/NCA25_ACTGAT_L003/calling/NCA25_ACTGAT_L003.g.vcf.gz --variant /gvcfv40/NCA26_ATTCCT_L003/calling/NCA26_ATTCCT_L003.g.vcf.gz --variant /gvcfv40/NCA27_ATCACG_L002/calling/NCA27_ATCACG_L002.g.vcf.gz --variant /gvcfv40/NCA28_CGATGT_L002/calling/NCA28_CGATGT_L002.g.vcf.gz --variant /gvcfv40/NCA29_TTAGGC_L002/calling/NCA29_TTAGGC_L002.g.vcf.gz --variant /gvcfv40/NCA2_ATTCCT_L002/calling/NCA2_ATTCCT_L002.g.vcf.gz --variant /gvcfv40/NCA30_TGACCA_L002/calling/NCA30_TGACCA_L002.g.vcf.gz --variant /gvcfv40/NCA31_ACAGTG_L002/calling/NCA31_ACAGTG_L002.g.vcf.gz --variant /gvcfv40/NCA32_GCCAAT_L002/calling/NCA32_GCCAAT_L002.g.vcf.gz --variant /gvcfv40/NCA33_CAGATC_L002/calling/NCA33_CAGATC_L002.g.vcf.gz --variant /gvcfv40/NCA34_ACTTGA_L002/calling/NCA34_ACTTGA_L002.g.vcf.gz --variant /gvcfv40/NCA35_GATCAG_L002/calling/NCA35_GATCAG_L002.g.vcf.gz --variant /gvcfv40/NCA36_TAGCTT_L002/calling/NCA36_TAGCTT_L002.g.vcf.gz --variant /gvcfv40/NCA37_GGCTAC_L002/calling/NCA37_GGCTAC_L002.g.vcf.gz --variant /gvcfv40/NCA38_CTTGTA_L002/calling/NCA38_CTTGTA_L002.g.vcf.gz --variant /gvcfv40/NCA39_AGTCAA_L002/calling/NCA39_AGTCAA_L002.g.vcf.gz --variant /gvcfv40/NCA3_ATCACG_L003/calling/NCA3_ATCACG_L003.g.vcf.gz --variant /gvcfv40/NCA40_AGTTCC_L002/calling/NCA40_AGTTCC_L002.g.vcf.gz --variant /gvcfv40/NCA4_CGATGT_L003/calling/NCA4_CGATGT_L003.g.vcf.gz --variant /gvcfv40/NCA5_TTAGGC_L003/calling/NCA5_TTAGGC_L003.g.vcf.gz --variant /gvcfv40/NCA6_TGACCA_L003/calling/NCA6_TGACCA_L003.g.vcf.gz --variant /gvcfv40/NCA7_ACAGTG_L003/calling/NCA7_ACAGTG_L003.g.vcf.gz --variant /gvcfv40/NCA8_GCCAAT_L003/calling/NCA8_GCCAAT_L003.g.vcf.gz --variant /gvcfv40/NCA9_CAGATC_L003/calling/NCA9_CAGATC_L003.g.vcf.gz --intervals CM009946.1 --reference /Apis_mellifera_HAV3.fa  --convert-to-base-pair-resolution false --break-bands-at-multiples-of 0 --input-is-somatic false --drop-somatic-filtering-annotations false --ignore-variants-starting-outside-interval false --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays  --disable-tool-default-read-filters false --disable-tool-default-annotations false --enable-all-annotations false",Version="4.1.1.0",Date="4 juin 2019 14:19:04 CEST">
    ##GATKCommandLine=<ID=GenotypeGVCFs,CommandLine="GenotypeGVCFs  --output /gvcfv41/MetaGenotypesCalledCM009946.1.vcf.gz --use-new-qual-calculator true --variant /gvcfv41/MetaGenotypesCM009946.1.g.vcf.gz --reference /Apis_mellifera_HAV3.fa  --include-non-variant-sites false --merge-input-intervals false --input-is-somatic false --tumor-lod-to-emit 3.5 --allele-fraction-error 0.001 --keep-combined-raw-annotations false --use-old-qual-calculator false --annotate-with-num-discovered-alleles false --heterozygosity 0.001 --indel-heterozygosity 1.25E-4 --heterozygosity-stdev 0.01 --standard-min-confidence-threshold-for-calling 30.0 --max-alternate-alleles 6 --max-genotype-count 1024 --sample-ploidy 2 --num-reference-samples-if-no-call 0 --only-output-calls-starting-in-intervals false --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays  --disable-tool-default-read-filters false --disable-tool-default-annotations false --enable-all-annotations false",Version="4.1.1.0",Date="June 11, 2019 2:39:23 PM CEST">
    ##GATKCommandLine=<ID=HaplotypeCaller,CommandLine="HaplotypeCaller  --emit-ref-confidence GVCF --sample-ploidy 2 --output /gvcfv41/NCA10_ACTTGA_L003.g.vcf.gz --input /gvcfv40/NCA10_ACTTGA_L003/bootstraping/NCA10_ACTTGA_L003.bam --reference /Apis_mellifera_HAV3.fa  --annotation-group StandardAnnotation --annotation-group StandardHCAnnotation --gvcf-gq-bands 1 --gvcf-gq-bands 2 --gvcf-gq-bands 3 --gvcf-gq-bands 4 --gvcf-gq-bands 5 --gvcf-gq-bands 6 --gvcf-gq-bands 7 --gvcf-gq-bands 8 --gvcf-gq-bands 9 --gvcf-gq-bands 10 --gvcf-gq-bands 11 --gvcf-gq-bands 12 --gvcf-gq-bands 13 --gvcf-gq-bands 14 --gvcf-gq-bands 15 --gvcf-gq-bands 16 --gvcf-gq-bands 17 --gvcf-gq-bands 18 --gvcf-gq-bands 19 --gvcf-gq-bands 20 --gvcf-gq-bands 21 --gvcf-gq-bands 22 --gvcf-gq-bands 23 --gvcf-gq-bands 24 --gvcf-gq-bands 25 --gvcf-gq-bands 26 --gvcf-gq-bands 27 --gvcf-gq-bands 28 --gvcf-gq-bands 29 --gvcf-gq-bands 30 --gvcf-gq-bands 31 --gvcf-gq-bands 32 --gvcf-gq-bands 33 --gvcf-gq-bands 34 --gvcf-gq-bands 35 --gvcf-gq-bands 36 --gvcf-gq-bands 37 --gvcf-gq-bands 38 --gvcf-gq-bands 39 --gvcf-gq-bands 40 --gvcf-gq-bands 41 --gvcf-gq-bands 42 --gvcf-gq-bands 43 --gvcf-gq-bands 44 --gvcf-gq-bands 45 --gvcf-gq-bands 46 --gvcf-gq-bands 47 --gvcf-gq-bands 48 --gvcf-gq-bands 49 --gvcf-gq-bands 50 --gvcf-gq-bands 51 --gvcf-gq-bands 52 --gvcf-gq-bands 53 --gvcf-gq-bands 54 --gvcf-gq-bands 55 --gvcf-gq-bands 56 --gvcf-gq-bands 57 --gvcf-gq-bands 58 --gvcf-gq-bands 59 --gvcf-gq-bands 60 --gvcf-gq-bands 70 --gvcf-gq-bands 80 --gvcf-gq-bands 90 --gvcf-gq-bands 99 --indel-size-to-eliminate-in-ref-model 10 --use-alleles-trigger false --dont-trim-active-regions false --max-disc-ar-extension 25 --max-gga-ar-extension 300 --padding-around-indels 150 --padding-around-snps 20 --kmer-size 10 --kmer-size 25 --dont-increase-kmer-sizes-for-cycles false --allow-non-unique-kmers-in-ref false --num-pruning-samples 1 --recover-dangling-heads false --do-not-recover-dangling-branches false --min-dangling-branch-length 4 --consensus false --max-num-haplotypes-in-population 128 --error-correct-kmers false --min-pruning 2 --debug-graph-transformations false --kmer-length-for-read-error-correction 25 --min-observations-for-kmer-to-be-solid 20 --likelihood-calculation-engine PairHMM --base-quality-score-threshold 18 --pair-hmm-gap-continuation-penalty 10 --pair-hmm-implementation FASTEST_AVAILABLE --pcr-indel-model CONSERVATIVE --phred-scaled-global-read-mismapping-rate 45 --native-pair-hmm-threads 4 --native-pair-hmm-use-double-precision false --debug false --use-filtered-reads-for-annotations false --bam-writer-type CALLED_HAPLOTYPES --disable-optimizations false --just-determine-active-regions false --dont-genotype false --dont-use-soft-clipped-bases false --capture-assembly-failure-bam false --error-correct-reads false --do-not-run-physical-phasing false --min-base-quality-score 10 --smith-waterman JAVA --use-new-qual-calculator false --annotate-with-num-discovered-alleles false --heterozygosity 0.001 --indel-heterozygosity 1.25E-4 --heterozygosity-stdev 0.01 --standard-min-confidence-threshold-for-calling 10.0 --max-alternate-alleles 6 --max-genotype-count 1024 --genotyping-mode DISCOVERY --contamination-fraction-to-filter 0.0 --output-mode EMIT_VARIANTS_ONLY --all-site-pls false --min-assembly-region-size 50 --max-assembly-region-size 300 --assembly-region-padding 100 --max-reads-per-alignment-start 50 --active-probability-threshold 0.002 --max-prob-propagation-distance 50 --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --disable-tool-default-read-filters false --minimum-mapping-quality 20",Version=4.0.0.0,Date="September 1, 2018 7:33:07 AM CEST">
    
  • emeryjemeryj Member, Broadie

    @dbailleul That is great to hear! Im glad you were able to find the issue. We talked when we updated the RMSMappingQuality calculation of making the mapping quality annotation version mismatch a failure state. Seeing as this took so long to figure out perhaps I will ask around and see if we can't make it a little more obvious that this happened.

Sign In or Register to comment.