If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
We will be out of the office on November 11th and 13th 2019, due to the U.S. holiday(Veteran's day) and due to a team event(Nov 13th). We will return to monitoring the GATK forum on November 12th and 14th respectively. Thank you for your patience.

Variant not being called by HC GATK v3.7-0-gcfedb67

nans_bnnans_bn Member
edited October 2018 in Ask the GATK team

We are calling variants on data that has been sequenced on NextSeq platform. We have been using the same pipeline , with the same commands since a year and in all our runs we have a control sample to check if the variant calling and sequencing has been done right. For this particular run, one of a known SNP 7:143013285that was called in the same sample in the previous 5 runs (over the last year) was missed by haplotype caller. On looking at the bam file, the variant seems to be present (highlighted BAM file). Above two bam files are from the same sample that were called in previous runs and HC was able to pick it up. The command I use are as follows

trim_galore -q 0 --paired --fastqc $R1_fastq $R2_fastq --output_dir $FASTQ

bwa mem -M -t 8 $ind.fa $FASTQ/${s_id}.R1_val_1.fq.gz $FASTQ/${s_id}.R2_val_2.fq.gz | sambamba_v0.6.6 view -t 8 -S -h -f bam -o $s_id.bam /dev/stdin
sambamba_v0.6.6 sort -t 8 -o $s_id.sorted.bam $s_id.bam
sambamba_v0.6.6 index -t 8 $s_id.sorted.bam

java -jar $picard AddOrReplaceReadGroups I=$s_id.sorted.bam O=$s_id.sorted.RG.bam SORT_ORDER=coordinate RGID=$s_id RGLB=$flowcell RGPL=illumina RGPU=U RGSM=$RUNNAME
sambamba_v0.6.6 index $s_id.sorted.RG.bam 
sambamba_v0.6.6 markdup -t 8 $s_id.sorted.RG.bam $s_id.markdup.bam 
sambamba_v0.6.6 index $s_id.markdup.bam 

java -Xmx8g -jar $gatk -T BaseRecalibrator \
    -I $s_id.markdup.bam \
    -R $ind.fa \
        -knownSites dbsnp_138.b37.vcf \
        -knownSites Mills_and_1000G_gold_standard.indels.b37.vcf \
        -knownSites 1000G_phase1.indels.b37.vcf \
    -o $s_id.recal_data.table \
    -L $bed 

#Apply the Recalibration
java -Xmx8g$TMPDIR -jar $gatk -T PrintReads \
    -I $s_id.markdup.bam \
    -R $ind.fa \
    -BQSR $s_id.recal_data.table \
    -o $s_id.${RUNNAME}.variant_ready.bam 

java -Xmx32g -jar $gatk -T HaplotypeCaller \
    -R $ind.fa --dbsnp $dbsnp_138.b37.vcf \
    -I $s_id.${RUNNAME}.variant_ready.bam \
    -stand_call_conf 30.0 \
    -L $bed \
    -o $s_id.${RUNNAME}.g.vcf

Things that I have tried which did not work:-
1. tried running HC with the options -allowNonUniqueKmers
2. also tried options to change parameters-stand_call_conf 2.0 -mmq 5
3. tried -ERC BP_RESOLUTION that results in
7 143013285 . C <NON_REF> . . . GT:AD:DP:GQ:PL 0/0:9,2:11:0:0,0,152

NOTE: The variant was picked up when I ran FREEBAYES and VARSCAN using default parameters.

7 143013285 . C T 206.73 . AB=0.394737;ABP=6.66752;AC=1;AF=0.5;AN=2;AO=15;CIGAR=1X;DP=38;DPB=38;DPRA=0;EPP=20.5268;EPPR=37.093;GTI=0;LEN=1;MEANALT=1;MQM=60;MQMR=60;NS=1;NUMALT=1;ODDS=47.6012;PAIRED=1;PAIREDR=1;PAO=0;PQA=0;PQR=0;PRO=0;QA=495;QR=811;RO=23;RPL=2;RPP=20.5268;RPPR=37.093;RPR=13;RUN=1;SAF=15;SAP=35.5824;SAR=0;SRF=23;SRP=52.9542;SRR=0;TYPE=snp;technology.illumina=1 GT:DP:AD:RO:QR:AO:QA:GL 0/1:38:23,15:23:811:15:495:-33.4265,0,-61.8717

7 143013285 . C T . PASS ADP=38;WT=0;HET=1;HOM=0;NC=0 GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR 0/1:52:38:38:23:15:39.47%:5.4463E-6:35:33:23:0:15:0

I can email the bamout file if required (though I am not allowed to upload it publicly.)

Any suggestions will be helpful. Thanks.


Best Answer


  • nans_bnnans_bn Member

    Dear GATK Team,

    Any inputs/advise on the above issue will be extremly helpful.

    Thank you

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @nans_bn,

    Our FTP site, where we ask users to upload bug reports, is secure and some of us have had HIPAA training, to enable review of such data. In your pipeline, did you by any chance upgrade to a different version of GATK between this last callset and those that pick up this variant at 7 143013285?

    Otherwise, you have a couple of options to figure out what is going on. Number one is to review the BAMOUT reassembly. Number two is to ask HaplotypeCaller to output the reassembly graph with --graph-output, which you can then use to diagram the graph with Graphviz, an external tool. If you want to fiddle with parameters, you may find increasing/decreasing the kmer size often changes results. Suggestions are outlined in Article#1235.

  • nans_bnnans_bn Member

    Hi @shlee ,

    Many thanks for your response.

    So, I did not upgrade to a different version of gatk. It has been 3.7 always.

    I tried some of the suggestions and I managed to pick up the variant when I used the option -dontUseSoftClippedBases

    Can you explain why is that ? Should't I lose variants on using this option ? if I continue to use this option, will this have an impact on large indels ?

    Thank you

Sign In or Register to comment.