Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office for a Broad Institute event from Dec 10th to Dec 11th 2019. We will be back to monitor the GATK forum on Dec 12th 2019. In the meantime we encourage you to help out other community members with their queries.
Thank you for your patience!

False indel PASSed but not covered in IGV

Hi there,

I came across a set of false positive indels obtained in some targeted sequencing experiments.
Situations are similar to the following, where a deletion looks to have high quality, according to quality and filter. Genotype quality looks also good. But in IGV the situation is different being not possible to view exactly the variant

chr1 68905301 GACTTGCTTATTGGATCTTCCTTGTC G 412.73 PASS AC=1;AF=0.500;AN=2;BaseQRankSum=-1.810e-01;ClippingRankSum=0.00;DP=99;ExcessHet=3.0103;FS=18.765;MLEAC=1;MLEAF=0.500;MQ=56.92;MQ0=0;MQRankSum=-3.160e-01;QD=4.17;ReadPosRankSum=-8.529e+00;SOR=2.712;set=variant2 GT:AD:DP:GQ:PGT:PID:PL 0/1:74,25:99:99:0|1:68905301_GA
CTTGCTTATTGGATCTTCCTTGTC_G:450,0,3034

Please find attached IGV screenshot of the previous variant. indel1.png and indel2.png represent the same variant with different zoom in.

The commands I used respectively for variant calling and filtering are the following:

HaplotypeCaller
java -Xmx64g -jar GenomeAnalysisTK.jar -T HaplotypeCaller -I EYE113_trim_align_sortidx.bam -R ucsc.hg19.fa -nct 10 --emitRefConfidence GVCF --variant_index_type LINEAR --variant_index_parameter 128000 -stand_call_conf 30 -stand_emit_conf 10 --min_base_quality_score 30 --intervals eye_haloplex.bed -A Coverage -A FisherStrand -A BaseQualityRankSumTest -A HaplotypeScore -A MappingQualityRankSumTest -A MappingQualityZero -A QualByDepth -A RMSMappingQuality -A ReadPosRankSumTest -A SpanningDeletions -o EYE113_varcall.g.vcf

GenotypeGVCFs
java -Xmx25g -jar GenomeAnalysisTK.jar -T GenotypeGVCFs -V EYE113_varcall.g.vcf -R ucsc.hg19.fa -stand_call_conf 30 -stand_emit_conf 10 -A Coverage -A FisherStrand -A BaseQualityRankSumTest -A HaplotypeScore -A InbreedingCoeff -A MappingQualityRankSumTest -A MappingQualityZero -A QualByDepth -A RMSMappingQuality -A ReadPosRankSumTest -o EYE113_TRY_genot.g.vcf

SelectVariants to select only Indels
java -Xmx8g -jar GenomeAnalysisTK.jar -T SelectVariants --variant EYE113_TRY_genot.g.vcf -R ucsc.hg19.fa -selectType INDEL -o EYE113_TRY_genot_varfilt_INDEL.vcf

Variant Filtration to filter only indels
/java -Xmx64g -jar GenomeAnalysisTK.jar -T VariantFiltration --variant EYE113_TRY_genot_varfilt_INDEL.vcf -R ucsc.hg19.fa --clusterWindowSize 10 --filterExpression "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0" --filterName "HARD_TO_VALIDATE" --filterExpression "QUAL < 30.0" --filterName "VeryLowQual" --filterExpression "QUAL > 30.0 && QUAL < 100.0" --filterName "LowQual" -o EYE113_TRY_genot_varfilt_INDEL_filt.vcf

//I did selectVariants and VariantFiltration for SNPs, too then I combine the two vcfs with CombineVariants.

I did not have any specific error in any of the program. Only during VariantFiltration I have the warning:

WARN 21:46:22,732 Interpreter - ![38,47]: 'QD < 2.0 || FS > 60.0 || MQ < 40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0;' undefined variable MQRankSum
but, as far as I can see in the forum, I can ignore this.

Best regards,

Francesco

Tagged:

Best Answers

  • SheilaSheila Broad Institute admin
    Accepted Answer

    @Vergilius
    Hi Francesco,

    The original BAM file is not always incorrect. The bamout file is very useful for explaining "confusing calls"at messy sites. We do not recommend generating the bamout file for all sites because it is just too computationally intensive. However, if you find some confusing calls, you should definitely look at the bamout file to see why those calls were made.

    As for making a bed file from the VCF, you can actually use the VCF as input to -L. But, to generate bamout files, you should also include -ip 100.

    -Sheila

Answers

  • VergiliusVergilius ItalyMember
    edited March 2017

    Dear Sheila,

    Thanks for the quick answer!

    I read about the bamout and started to solve a big issue... :smile:

    Anyway, since people I am working with (clinicians) use to look at the alignment in IGV to have a direct proof of the discovered variant,
    would you suggest to produce at any run the "bamout" file?
    Is the original BAM file always "incorrect"?

    I was thinking: if I generate a bed file from the final VCF file, then I can use that as -L argument and obtain the bamout just for the regions surrounding the variants...

    Best
    Francesco

  • SheilaSheila Broad InstituteMember, Broadie admin
    Accepted Answer

    @Vergilius
    Hi Francesco,

    The original BAM file is not always incorrect. The bamout file is very useful for explaining "confusing calls"at messy sites. We do not recommend generating the bamout file for all sites because it is just too computationally intensive. However, if you find some confusing calls, you should definitely look at the bamout file to see why those calls were made.

    As for making a bed file from the VCF, you can actually use the VCF as input to -L. But, to generate bamout files, you should also include -ip 100.

    -Sheila

  • VergiliusVergilius ItalyMember

    Thanks a lot!

Sign In or Register to comment.