Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Multi sample somatic variant calling filters

Hi,

for my PhD I have several patients with multiple biopsies being available, so the multi sample variant calling is really very interesting for me.

I did follow the latest tutorial on running Mutect2 v4.1.2.0 with first creating a panel of normals with 40 healthy samples sequenced on the same machine with the same library prep,
Then performed joint variant calling according to the tutorial with a later step of filtering.
From orthagonal validation of those exact samples with ddPCR I know that these variants are actually present in the samples, but in they get filtered out with two filters: multiallelic, which I kind of understand, as there are a lot of different variants at this position (as this position allows resistance to treatment), but it also says normal_artifact, which I do not understand at all.
First of all, I would love to know if there is a way for me to not filter out these variants, especially as they are pretty "common" in some samples (vaf around 0.4)
And secondly I would like to understand why the calculated AF for most these alleles is actually not 0 in the tumour, when the AD field says, that no supporting read was found
In general, this is unfortunate, because I was really hoping Mutect2 would be an easy solution.
I would love to understand why this happened and if I can change the behaviour

##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##FILTER=<ID=base_qual,Description="alt median base quality">
##FILTER=<ID=clustered_events,Description="Clustered events observed in the tumor">
##FILTER=<ID=contamination,Description="contamination">
##FILTER=<ID=duplicate,Description="evidence for alt allele is overrepresented by apparent duplicates">
##FILTER=<ID=fragment,Description="abs(ref - alt) median fragment length">
##FILTER=<ID=germline,Description="Evidence indicates this site is germline, not somatic">
##FILTER=<ID=haplotype,Description="Variant near filtered variant on same haplotype.">
##FILTER=<ID=low_allele_frac,Description="Allele fraction is below specified threshold">
##FILTER=<ID=map_qual,Description="ref - alt median mapping quality">
##FILTER=<ID=multiallelic,Description="Site filtered because too many alt alleles pass tumor LOD">
##FILTER=<ID=n_ratio,Description="Ratio of N to alt exceeds specified ratio">
##FILTER=<ID=normal_artifact,Description="artifact_in_normal">
##FILTER=<ID=numt_chimera,Description="NuMT variant with too many ALT reads originally from autosome">
##FILTER=<ID=numt_novel,Description="Alt depth is below expected coverage of NuMT in autosome">
##FILTER=<ID=orientation,Description="orientation bias detected by the orientation bias mixture model">
##FILTER=<ID=panel_of_normals,Description="Blacklisted site in panel of normals">
##FILTER=<ID=position,Description="median distance of alt variants from end of reads">
##FILTER=<ID=slippage,Description="Site filtered due to contraction of short tandem repeat region">
##FILTER=<ID=strand_bias,Description="Evidence for alt allele comes from one read direction only">
##FILTER=<ID=strict_strand,Description="Evidence for alt allele is not represented in both directions">
##FILTER=<ID=weak_evidence,Description="Mutation does not meet likelihood threshold">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=AF,Number=A,Type=Float,Description="Allele fractions of alternate alleles in the tumor">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##FORMAT=<ID=F1R2,Number=R,Type=Integer,Description="Count of reads in F1R2 pair orientation supporting each allele">
##FORMAT=<ID=F2R1,Number=R,Type=Integer,Description="Count of reads in F2R1 pair orientation supporting each allele">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another">
##FORMAT=<ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##FORMAT=<ID=PS,Number=1,Type=Integer,Description="Phasing set (typically the position of the first variant in the set)">
##FORMAT=<ID=SB,Number=4,Type=Integer,Description="Per-sample component statistics which comprise the Fisher's Exact Test to detect strand bias.">
##GATKCommandLine=<ID=FilterMutectCalls,CommandLine="FilterMutectCalls  --output /home/shollizeck/CASCADE/analysis/CA99/tumor/joined/mutect2/CA99_somatic.filtered.vcf.gz --contamination-table /home/shollizeck/CASCADE/analysis/CA99/tumor/joined/mutect2/contamination.table --tumor-segmentation /home/shollizeck/CASCADE/analysis/CA99/tumor/joined/mutect2/segments.table --orientation-bias-artifact-priors /home/shollizeck/CASCADE/analysis/CA99/tumor/joined/mutect2/readOrientationModel.tar.gz --variant /home/shollizeck/CASCADE/analysis/CA99/tumor/joined/mutect2/CA99_somatic.unfiltered.vcf.gz --reference /data/reference/indexes/human/hg19/fasta/Homo_sapiens.GRCh37.73.dna.toplevel.fa  --threshold-strategy OPTIMAL_F_SCORE --f-score-beta 1.0 --false-discovery-rate 0.05 --initial-threshold 0.1 --mitochondria-mode false --max-events-in-region 2 --max-alt-allele-count 1 --unique-alt-read-count 0 --min-median-mapping-quality 30 --min-median-base-quality 20 --max-median-fragment-length-difference 10000 --min-median-read-position 1 --max-n-ratio Infinity --min-reads-per-strand 0 --autosomal-coverage 0.0 --max-numt-fraction 0.85 --min-allele-fraction 0.0 --contamination-estimate 0.0 --log-snv-prior -13.815510557964275 --log-indel-prior -16.11809565095832 --log-artifact-prior -2.302585092994046 --normal-p-value-threshold 0.001 --min-slippage-length 8 --pcr-slippage-rate 0.1 --distance-on-haplotype 100 --long-indel-length 5 --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays  --disable-tool-default-read-filters false",Version="4.1.2.0",Date="August 12, 2019 10:14:09 AM UTC">
##GATKCommandLine=<ID=Mutect2,CommandLine="Mutect2  --f1r2-tar-gz /home/shollizeck/CASCADE/analysis/CA99/tumor/joined/mutect2/10-f1r2.tar.gz --normal-sample CA99 --panel-of-normals /data/reference/dawson_labs/Mutect2/PanelOfNormals/pon.vcf.gz --germline-resource /data/reference/dawson_labs/Mutect2/af-only-gnomad.raw.sites.GRCh37.73.vcf.gz --output /home/shollizeck/CASCADE/analysis/CA99/tumor/joined/mutect2/CA99_10.somatic.unfiltered.vcf.gz --intervals 10 --input /home/shollizeck/CASCADE/analysis/CA99/germline/Bam/CA99_merged.markdups.bam --input /home/shollizeck/CASCADE/analysis/CA99/tumor/CA99-26/Bam/CA99-26_merged.markdups.bam --input /home/shollizeck/CASCADE/analysis/CA99/tumor/CA99-31/Bam/CA99-31_merged.markdups.bam --input /home/shollizeck/CASCADE/analysis/CA99/tumor/CA99-41/Bam/CA99-41_merged.markdups.bam --input /home/shollizeck/CASCADE/analysis/CA99/tumor/CA99-57/Bam/CA99-57_merged.markdups.bam --input /home/shollizeck/CASCADE/analysis/CA99/tumor/CA99-11/Bam/CA99-11_merged.markdups.bam --input /home/shollizeck/CASCADE/analysis/CA99/tumor/CA99-59/Bam/CA99-59_merged.markdups.bam --input /home/shollizeck/CASCADE/analysis/CA99/tumor/CA99-47/Bam/CA99-47_merged.markdups.bam --input /home/shollizeck/CASCADE/analysis/CA99/tumor/CA99-55/Bam/CA99-55_merged.markdups.bam --reference /data/reference/indexes/human/hg19/fasta/Homo_sapiens.GRCh37.73.dna.toplevel.fa  --f1r2-median-mq 50 --f1r2-min-bq 20 --f1r2-max-depth 200 --genotype-pon-sites false --genotype-germline-sites false --af-of-alleles-not-in-resource -1.0 --mitochondria-mode false --tumor-lod-to-emit 3.0 --initial-tumor-lod 2.0 --pcr-snv-qual 40 --pcr-indel-qual 40 --max-population-af 0.01 --downsampling-stride 1 --callable-depth 10 --max-suspicious-reads-per-alignment-start 0 --normal-lod 2.2 --ignore-itr-artifacts false --gvcf-lod-band -2.5 --gvcf-lod-band -2.0 --gvcf-lod-band -1.5 --gvcf-lod-band -1.0 --gvcf-lod-band -0.5 --gvcf-lod-band 0.0 --gvcf-lod-band 0.5 --gvcf-lod-band 1.0 --minimum-allele-fraction 0.0 --genotype-filtered-alleles false --disable-adaptive-pruning false --dont-trim-active-regions false --max-disc-ar-extension 25 --max-gga-ar-extension 300 --padding-around-indels 150 --padding-around-snps 20 --kmer-size 10 --kmer-size 25 --dont-increase-kmer-sizes-for-cycles false --allow-non-unique-kmers-in-ref false --num-pruning-samples 1 --min-dangling-branch-length 4 --recover-all-dangling-branches false --max-num-haplotypes-in-population 128 --min-pruning 2 --adaptive-pruning-initial-error-rate 0.001 --pruning-lod-threshold 2.302585092994046 --max-unpruned-variants 100 --debug-assembly false --debug-graph-transformations false --capture-assembly-failure-bam false --error-correct-reads false --kmer-length-for-read-error-correction 25 --min-observations-for-kmer-to-be-solid 20 --likelihood-calculation-engine PairHMM --base-quality-score-threshold 18 --pair-hmm-gap-continuation-penalty 10 --pair-hmm-implementation FASTEST_AVAILABLE --pcr-indel-model CONSERVATIVE --phred-scaled-global-read-mismapping-rate 45 --native-pair-hmm-threads 4 --native-pair-hmm-use-double-precision false --bam-writer-type CALLED_HAPLOTYPES --dont-use-soft-clipped-bases false --min-base-quality-score 10 --smith-waterman JAVA --emit-ref-confidence NONE --max-mnp-distance 1 --min-assembly-region-size 50 --max-assembly-region-size 300 --assembly-region-padding 100 --max-reads-per-alignment-start 50 --active-probability-threshold 0.002 --max-prob-propagation-distance 50 --force-active false --interval-set-rule UNION --interval-padding 0 --interval-exclusion-padding 0 --interval-merging-rule ALL --read-validation-stringency SILENT --seconds-between-progress-updates 10.0 --disable-sequence-dictionary-validation false --create-output-bam-index true --create-output-bam-md5 false --create-output-variant-index true --create-output-variant-md5 false --lenient false --add-output-sam-program-record true --add-output-vcf-command-line true --cloud-prefetch-buffer 40 --cloud-index-prefetch-buffer -1 --disable-bam-index-caching false --sites-only-vcf-output false --help false --version false --showHidden false --verbosity INFO --QUIET false --use-jdk-deflater false --use-jdk-inflater false --gcs-max-retries 20 --gcs-project-for-requester-pays  --disable-tool-default-read-filters false --max-read-length 2147483647 --min-read-length 30 --minimum-mapping-quality 20 --disable-tool-default-annotations false --enable-all-annotations false",Version="4.1.2.0",Date="July 29, 2019 12:11:09 AM UTC">
##INFO=<ID=CONTQ,Number=1,Type=Float,Description="Phred-scaled qualities that alt allele are not due to contamination">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">
##INFO=<ID=ECNT,Number=1,Type=Integer,Description="Number of events in this haplotype">
##INFO=<ID=GERMQ,Number=1,Type=Integer,Description="Phred-scaled quality that alt alleles are not germline variants">
##INFO=<ID=MBQ,Number=R,Type=Integer,Description="median base quality">
##INFO=<ID=MFRL,Number=R,Type=Integer,Description="median fragment length">
##INFO=<ID=MMQ,Number=R,Type=Integer,Description="median mapping quality">
##INFO=<ID=MPOS,Number=A,Type=Integer,Description="median distance from end of read">
##INFO=<ID=NALOD,Number=A,Type=Float,Description="Negative log 10 odds of artifact in normal with same allele fraction as tumor">
##INFO=<ID=NCount,Number=1,Type=Integer,Description="Count of N bases in the pileup">
##INFO=<ID=NLOD,Number=A,Type=Float,Description="Normal log 10 likelihood ratio of diploid het or hom alt genotypes">
##INFO=<ID=OCM,Number=1,Type=Integer,Description="Number of alt reads whose original alignment doesn't match the current contig.">
##INFO=<ID=PON,Number=0,Type=Flag,Description="site found in panel of normals">
##INFO=<ID=POPAF,Number=A,Type=Float,Description="negative log 10 population allele frequencies of alt alleles">
##INFO=<ID=ROQ,Number=1,Type=Float,Description="Phred-scaled qualities that alt allele are not due to read orientation artifact">
##INFO=<ID=RPA,Number=.,Type=Integer,Description="Number of times tandem repeat unit is repeated, for each allele (including reference)">
##INFO=<ID=RU,Number=1,Type=String,Description="Tandem repeat unit (bases)">
##INFO=<ID=SEQQ,Number=1,Type=Integer,Description="Phred-scaled quality that alt alleles are not sequencing errors">
##INFO=<ID=STR,Number=0,Type=Flag,Description="Variant is a short tandem repeat">
##INFO=<ID=STRANDQ,Number=1,Type=Integer,Description="Phred-scaled quality of strand bias artifact">
##INFO=<ID=STRQ,Number=1,Type=Integer,Description="Phred-scaled quality that alt alleles in STRs are not polymerase slippage errors">
##INFO=<ID=TLOD,Number=A,Type=Float,Description="Log 10 likelihood ratio score of variant existing versus not existing">
##INFO=<ID=UNIQ_ALT_READ_COUNT,Number=1,Type=Integer,Description="Number of ALT reads with unique start and mate end positions at a variant site">
##MutectVersion=2.2
##bcftools_concatCommand=concat -o /home/shollizeck/CASCADE/analysis/CA99/tumor/joined/mutect2/CA99_somatic.unfiltered.vcf.gz -O z /home/shollizeck/CASCADE/analysis/CA99/tumor/joined/mutect2/CA99_10.somatic.unfiltered.vcf.gz /home/shollizeck/CASCADE/analysis/CA99/tumor/joined/mutect2/CA99_11.somatic.unfiltered.vcf.gz /home/shollizeck/CASCADE/analysis/CA99/tumor/joined/mutect2/CA99_12.somatic.unfiltered.vcf.gz /home/shollizeck/CASCADE/analysis/CA99/tumor/joined/mutect2/CA99_13.somatic.unfiltered.vcf.gz /home/shollizeck/CASCADE/analysis/CA99/tumor/joined/mutect2/CA99_14.somatic.unfiltered.vcf.gz /home/shollizeck/CASCADE/analysis/CA99/tumor/joined/mutect2/CA99_15.somatic.unfiltered.vcf.gz /home/shollizeck/CASCADE/analysis/CA99/tumor/joined/mutect2/CA99_16.somatic.unfiltered.vcf.gz /home/shollizeck/CASCADE/analysis/CA99/tumor/joined/mutect2/CA99_17.somatic.unfiltered.vcf.gz /home/shollizeck/CASCADE/analysis/CA99/tumor/joined/mutect2/CA99_18.somatic.unfiltered.vcf.gz /home/shollizeck/CASCADE/analysis/CA99/tumor/joined/mutect2/CA99_19.somatic.unfiltered.vcf.gz /home/shollizeck/CASCADE/analysis/CA99/tumor/joined/mutect2/CA99_1.somatic.unfiltered.vcf.gz /home/shollizeck/CASCADE/analysis/CA99/tumor/joined/mutect2/CA99_20.somatic.unfiltered.vcf.gz /home/shollizeck/CASCADE/analysis/CA99/tumor/joined/mutect2/CA99_21.somatic.unfiltered.vcf.gz /home/shollizeck/CASCADE/analysis/CA99/tumor/joined/mutect2/CA99_22.somatic.unfiltered.vcf.gz /home/shollizeck/CASCADE/analysis/CA99/tumor/joined/mutect2/CA99_2.somatic.unfiltered.vcf.gz /home/shollizeck/CASCADE/analysis/CA99/tumor/joined/mutect2/CA99_3.somatic.unfiltered.vcf.gz /home/shollizeck/CASCADE/analysis/CA99/tumor/joined/mutect2/CA99_4.somatic.unfiltered.vcf.gz /home/shollizeck/CASCADE/analysis/CA99/tumor/joined/mutect2/CA99_5.somatic.unfiltered.vcf.gz /home/shollizeck/CASCADE/analysis/CA99/tumor/joined/mutect2/CA99_6.somatic.unfiltered.vcf.gz /home/shollizeck/CASCADE/analysis/CA99/tumor/joined/mutect2/CA99_7.somatic.unfiltered.vcf.gz /home/shollizeck/CASCADE/analysis/CA99/tumor/joined/mutect2/CA99_8.somatic.unfiltered.vcf.gz /home/shollizeck/CASCADE/analysis/CA99/tumor/joined/mutect2/CA99_9.somatic.unfiltered.vcf.gz /home/shollizeck/CASCADE/analysis/CA99/tumor/joined/mutect2/CA99_MT.somatic.unfiltered.vcf.gz /home/shollizeck/CASCADE/analysis/CA99/tumor/joined/mutect2/CA99_X.somatic.unfiltered.vcf.gz /home/shollizeck/CASCADE/analysis/CA99/tumor/joined/mutect2/CA99_Y.somatic.unfiltered.vcf.gz; Date=Sat Aug 10 21:18:38 2019
##bcftools_concatVersion=1.9-80-gff3137d+htslib-1.9-66-gbcf9bff
##contig=<ID=1,length=249250621>
##contig=<ID=2,length=243199373>
##contig=<ID=3,length=198022430>
##contig=<ID=4,length=191154276>
##contig=<ID=5,length=180915260>
##contig=<ID=6,length=171115067>
##contig=<ID=7,length=159138663>
##contig=<ID=8,length=146364022>
##contig=<ID=9,length=141213431>
##contig=<ID=10,length=135534747>
##contig=<ID=11,length=135006516>
##contig=<ID=12,length=133851895>
##contig=<ID=13,length=115169878>
##contig=<ID=14,length=107349540>
##contig=<ID=15,length=102531392>
##contig=<ID=16,length=90354753>
##contig=<ID=17,length=81195210>
##contig=<ID=18,length=78077248>
##contig=<ID=19,length=59128983>
##contig=<ID=20,length=63025520>
##contig=<ID=21,length=48129895>
##contig=<ID=22,length=51304566>
##contig=<ID=X,length=155270560>
##contig=<ID=Y,length=59373566>
##contig=<ID=MT,length=16569>
##contig=<ID=GL000207.1,length=4262>
##contig=<ID=GL000226.1,length=15008>
##contig=<ID=GL000229.1,length=19913>
##contig=<ID=GL000231.1,length=27386>
##contig=<ID=GL000210.1,length=27682>
##contig=<ID=GL000239.1,length=33824>
##contig=<ID=GL000235.1,length=34474>
##contig=<ID=GL000201.1,length=36148>
##contig=<ID=GL000247.1,length=36422>
##contig=<ID=GL000245.1,length=36651>
##contig=<ID=GL000197.1,length=37175>
##contig=<ID=GL000203.1,length=37498>
##contig=<ID=GL000246.1,length=38154>
##contig=<ID=GL000249.1,length=38502>
##contig=<ID=GL000196.1,length=38914>
##contig=<ID=GL000248.1,length=39786>
##contig=<ID=GL000244.1,length=39929>
##contig=<ID=GL000238.1,length=39939>
##contig=<ID=GL000202.1,length=40103>
##contig=<ID=GL000234.1,length=40531>
##contig=<ID=GL000232.1,length=40652>
##contig=<ID=GL000206.1,length=41001>
##contig=<ID=GL000240.1,length=41933>
##contig=<ID=GL000236.1,length=41934>
##contig=<ID=GL000241.1,length=42152>
##contig=<ID=GL000243.1,length=43341>
##contig=<ID=GL000242.1,length=43523>
##contig=<ID=GL000230.1,length=43691>
##contig=<ID=GL000237.1,length=45867>
##contig=<ID=GL000233.1,length=45941>
##contig=<ID=GL000204.1,length=81310>
##contig=<ID=GL000198.1,length=90085>
##contig=<ID=GL000208.1,length=92689>
##contig=<ID=GL000191.1,length=106433>
##contig=<ID=GL000227.1,length=128374>
##contig=<ID=GL000228.1,length=129120>
##contig=<ID=GL000214.1,length=137718>
##contig=<ID=GL000221.1,length=155397>
##contig=<ID=GL000209.1,length=159169>
##contig=<ID=GL000218.1,length=161147>
##contig=<ID=GL000220.1,length=161802>
##contig=<ID=GL000213.1,length=164239>
##contig=<ID=GL000211.1,length=166566>
##contig=<ID=GL000199.1,length=169874>
##contig=<ID=GL000217.1,length=172149>
##contig=<ID=GL000216.1,length=172294>
##contig=<ID=GL000215.1,length=172545>
##contig=<ID=GL000205.1,length=174588>
##contig=<ID=GL000219.1,length=179198>
##contig=<ID=GL000224.1,length=179693>
##contig=<ID=GL000223.1,length=180455>
##contig=<ID=GL000195.1,length=182896>
##contig=<ID=GL000212.1,length=186858>
##contig=<ID=GL000222.1,length=186861>
##contig=<ID=GL000200.1,length=187035>
##contig=<ID=GL000193.1,length=189789>
##contig=<ID=GL000194.1,length=191469>
##contig=<ID=GL000225.1,length=211173>
##contig=<ID=GL000192.1,length=547496>
##filtering_status=These calls have been filtered by FilterMutectCalls to label false positives with a list of failed filters and true positives with PASS.
##normal_sample=CA99
##source=FilterMutectCalls
##source=Mutect2
##tumor_sample=CA99-11
##tumor_sample=CA99-26
##tumor_sample=CA99-31
##tumor_sample=CA99-41
##tumor_sample=CA99-47
##tumor_sample=CA99-55
##tumor_sample=CA99-57
##tumor_sample=CA99-59
##bcftools_viewVersion=1.9-80-gff3137d+htslib-1.9-66-gbcf9bff
##bcftools_viewCommand=view -h CASCADE/analysis/CA99/tumor/joined/mutect2/CA99_somatic.filtered.vcf.gz; Date=Wed Sep  4 13:01:27 2019
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  CA99    CA99-11 CA99-26 CA99-31 CA99-41 CA99-47 CA99-55 CA99-57 CA99-59
10  43615014    .   G   A,C,T   .   multiallelic;normal_artifact    CONTQ=93;DP=1438;ECNT=1;GERMQ=93;MBQ=37,37,37,37;MFRL=360,407,325,369;MMQ=60,60,60,60;MPOS=35,41,29;NALOD=0.957,1.52,1.52;NLOD=26.39,28.55,28.56;POPAF=6.00,6.00,6.00;ROQ=21;SEQQ=93;STRANDQ=93;TLOD=318.68,206.55,109.89   GT:AD:AF:DP:F1R2:F2R1:SB    0/0:95,1,0,0:0.018,9.934e-03,9.933e-03:96:39,0,0,0:56,1,0,0:49,46,1,0   0/1/2/3:104,69,1,0:0.394,5.737e-03,5.589e-03:174:59,30,0,0:43,36,0,0:49,55,34,36    0/1/2/3:86,0,32,4:7.869e-03,0.263,0.037:122:47,0,16,3:36,0,16,1:50,36,19,17 0/1/2/3:115,2,2,0:0.024,0.022,8.101e-03:119:57,1,2,0:56,1,0,0:63,52,2,2 0/1/2/3:189,0,3,1:5.039e-03,0.020,0.010:193:97,0,3,1:91,0,0,0:96,93,4,0 0/1/2/3:166,0,3,30:4.824e-03,0.020,0.151:199:89,0,2,18:75,0,1,11:82,84,15,18    0/1/2/3:184,2,19,17:0.013,0.089,0.080:222:97,0,13,7:84,2,6,10:80,104,18,20  0/1/2/3:101,0,35,0:7.072e-03,0.257,7.072e-03:136:49,0,21,0:50,0,14,0:53,48,17,18    0/1/2/3:85,56,0,0:0.390,6.877e-03,6.877e-03:141:47,28,0,0:38,25,0,0:42,43,27,29

Best Answer

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    @micknudsen Thank you for providing input. We appreciate your input! We are hoping to have more community members provide their GATK expertise.

    @SHollizeck Please let us know if this helps resolve the issue.

  • SHollizeckSHollizeck Member

    Thank you for the idea, but before I adjust filtering threshold I would love to understand, why this variant was flagged.
    The score for this should be NALOD right? This is 0.957 for alt allele =A

    which means an estimated probability of this being an artifact of 0.8

    Id rather not change the p value threshold to that high.

    The question that I have is why is this estimated to be that high
    There is only one read showing A instead of the reference G at that position in the normal and every other quality measure like MMQ MBQ seem to be in order.

    And finally why is the estimated GT in all of the samples 0/1/2/3/4 even though some of them do not contain read support for that alt allele
    For example

    0/1/2/3:85,56,0,0:0.390,6.877e-03,6.877e-03:141:47,28,0,0:38,25,0,0:42,43,27,29
    

    should only be 0/1

  • @davidben
    Thank you very much for the reply.

    • I assumed that the genotype information was irrelevant, but I had cases, where it seemed to match up.
    • That makes sense as well, thank you very much, that means I will still calculate the actually observed AF myself.
    • I agree, on the third bullet point as well
    • This is the one I have a bit of an issue with, because I normally wouldn't know which site has positive selection. And with WGS for 8 samples I have quite a lot of variants to consider, So just like with the AF in one sample being above a threshold, could you not include a filtering strategy that considers, if the site is multi-allelic in several samples instead in just one? Seeing that most of the sample only have one alt allele and not several present? I have not checked how many variants were tagged with this filter, but I am assuming that it is a high number and just ignoring that filter seems unwise for the majority of cases.

    I think my question is mainly answered, and I will wait for a new version, which changes the filtering strategy.

Sign In or Register to comment.