Attention:
The frontline support team will be offline as we are occupied with the GATK Workshop on March 21st and 22nd 2019. We will be back and available to answer questions on the forum on March 25th 2019.

GATK4 - VariantFiltration --genotype-filter-expression

thatguy027thatguy027 ChicagoMember

Hello there,
I am trying to apply some sample-level filters on a VCF generated using GATK4.0.2.1. My issue is that all variant sites are not getting an FT flag added and I am wondering why. Additionally, "PASS" is being added the the FILTER column at the variant-level (I am not sure if this behavior is expected, but it seems weird)

Here is some information about the system:

17:43:04.589 DEBUG NativeLibraryLoader - Extracting libgkl_compression.so to /tmp/szs315/libgkl_compression8694733123384787175.so
17:43:04.681 INFO  VariantFiltration - ------------------------------------------------------------
17:43:04.681 INFO  VariantFiltration - The Genome Analysis Toolkit (GATK) v4.0.2.1
17:43:04.681 INFO  VariantFiltration - For support and documentation go to https://software.broadinstitute.org/gatk/
17:43:04.681 INFO  VariantFiltration - Executing as [email protected] on Linux v3.10.0-514.36.5.el7.x86_64 amd64
17:43:04.681 INFO  VariantFiltration - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_112-b16
17:43:04.682 INFO  VariantFiltration - Start Date/Time: March 11, 2018 6:43:04 PM CDT
17:43:04.682 INFO  VariantFiltration - ------------------------------------------------------------
17:43:04.682 INFO  VariantFiltration - ------------------------------------------------------------
17:43:04.682 INFO  VariantFiltration - HTSJDK Version: 2.14.3
17:43:04.682 INFO  VariantFiltration - Picard Version: 2.17.2
17:43:04.684 INFO  VariantFiltration - HTSJDK Defaults.BUFFER_SIZE : 131072
17:43:04.684 INFO  VariantFiltration - HTSJDK Defaults.COMPRESSION_LEVEL : 1
17:43:04.684 INFO  VariantFiltration - HTSJDK Defaults.CREATE_INDEX : false
17:43:04.684 INFO  VariantFiltration - HTSJDK Defaults.CREATE_MD5 : false
17:43:04.684 INFO  VariantFiltration - HTSJDK Defaults.CUSTOM_READER_FACTORY : 
17:43:04.684 INFO  VariantFiltration - HTSJDK Defaults.DISABLE_SNAPPY_COMPRESSOR : false
17:43:04.685 INFO  VariantFiltration - HTSJDK Defaults.EBI_REFERENCE_SERVICE_URL_MASK : https://www.ebi.ac.uk/ena/cram/md5/%s
17:43:04.685 INFO  VariantFiltration - HTSJDK Defaults.NON_ZERO_BUFFER_SIZE : 131072
17:43:04.685 INFO  VariantFiltration - HTSJDK Defaults.REFERENCE_FASTA : null
17:43:04.685 INFO  VariantFiltration - HTSJDK Defaults.SAM_FLAG_FIELD_FORMAT : DECIMAL
17:43:04.685 INFO  VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
17:43:04.685 INFO  VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
17:43:04.685 INFO  VariantFiltration - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
17:43:04.685 INFO  VariantFiltration - HTSJDK Defaults.USE_CRAM_REF_DOWNLOAD : false
17:43:04.685 DEBUG ConfigFactory - Configuration file values: 
17:43:04.688 DEBUG ConfigFactory -  gcsMaxRetries = 20
17:43:04.688 DEBUG ConfigFactory -  gatk_stacktrace_on_user_exception = false
17:43:04.688 DEBUG ConfigFactory -  samjdk.use_async_io_read_samtools = false
17:43:04.688 DEBUG ConfigFactory -  samjdk.compression_level = 1
17:43:04.688 DEBUG ConfigFactory -  samjdk.use_async_io_write_samtools = true
17:43:04.688 DEBUG ConfigFactory -  samjdk.use_async_io_write_tribble = false
17:43:04.688 DEBUG ConfigFactory -  spark.kryoserializer.buffer.max = 512m
17:43:04.688 DEBUG ConfigFactory -  spark.driver.maxResultSize = 0
17:43:04.688 DEBUG ConfigFactory -  spark.driver.userClassPathFirst = true
17:43:04.688 DEBUG ConfigFactory -  spark.io.compression.codec = lzf
17:43:04.688 DEBUG ConfigFactory -  spark.yarn.executor.memoryOverhead = 600
17:43:04.689 DEBUG ConfigFactory -  spark.driver.extraJavaOptions = 
17:43:04.689 DEBUG ConfigFactory -  spark.executor.extraJavaOptions = 
17:43:04.689 DEBUG ConfigFactory -  codec_packages = [htsjdk.variant, htsjdk.tribble, org.broadinstitute.hellbender.utils.codecs]
17:43:04.689 DEBUG ConfigFactory -  cloudPrefetchBuffer = 40
17:43:04.689 DEBUG ConfigFactory -  cloudIndexPrefetchBuffer = -1
17:43:04.689 DEBUG ConfigFactory -  createOutputBamIndex = true
17:43:04.689 INFO  VariantFiltration - Deflater: IntelDeflater
17:43:04.689 INFO  VariantFiltration - Inflater: IntelInflater
17:43:04.689 INFO  VariantFiltration - GCS max retries/reopens: 20
17:43:04.689 INFO  VariantFiltration - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes
17:43:04.689 INFO  VariantFiltration - Initializing engine

Here is the command I used to apply the filters

 gatk-launch VariantFiltration \
-variant wild_isolate.vcf.gz \
--genotype-filter-expression "DP < 2" \
--genotype-filter-name "depth" \
-O wi_dp_tet.vcf  \
--verbosity DEBUG \
--seconds-between-progress-updates 0.1 \
--disable-tool-default-read-filters true \
--lenient true \
--disable-sequence-dictionary-validation true \
--disable-bam-index-caching true

I added the --verbosity flag and all other flags below --verbosity after I noticed some variants were not receiving the FT field. I thought there may be some default filters being applied that may results in variants being skipped (maybe these flags need to be applied at previous steps?). I ran this step with and without those flags, and with/without the -R flag.

I am running this on a test data set to make sure my pipeline is working properly... 45576 variants are not receiving the FT field and 127762 variants did receive the FT field. Also, not that I am not going through the VQSR procedure because I do not have a truth set.

As for the steps proceeding VariantFiltration, I ran HaplotypeCaller in DISCOVERY with ERC GVCF (in chromosome blocks), performed ValidateVariants, combined chromosome gVCFs for each each sample using CombineGVCFs, combined individual sample gVCFs with GenomicsDBImport, and then ran GenotypeGVCFs on individual chromosomes, and collapsed the chromosome VCFs using GatherVcfs.

Here are the last few entries of test VCF, highlighting the inconsistent FORMAT/FT field.

MtDNA   12998   .   C   A,T 2457.39 PASS    AC=8,6;AF=0.571,0.429;AN=14;AS_QD=15.04,31.74;DP=74;ExcessHet=3.0103;FS=0.000;GQ_MEAN=31.14;GQ_STDDEV=28.46;MLEAC=8,6;MLEAF=0.571,0.429;MQ=59.59;NCC=1;QD=33.66;SOR=0.720   GT:AD:DP:GQ:PL  1/1:0,2,0:2:6:80,6,0,80,6,80    2/2:0,0,2:2:6:83,83,83,6,6,0    1/1:0,3,0:3:9:125,9,0,125,9,125 ./.:1,0,0:1:.:0,0,0,0,0,0   1/1:0,22,0:22:66:817,66,0,817,66,817    1/1:0,8,0:8:24:235,24,0,235,24,235  2/2:0,0,11:11:33:383,383,383,33,33,0    2/2:0,0,25:25:74:749,749,749,74,74,0
MtDNA   13029   .   T   C   74.63   PASS    AC=2;AF=0.125;AN=16;AS_QD=32.99;DP=62;ExcessHet=0.1472;FS=0.000;GQ_MEAN=22.13;GQ_STDDEV=20.47;MLEAC=1;MLEAF=0.063;MQ=60.00;NCC=0;QD=26.41;SOR=0.693 GT:AD:DP:FT:GQ:PL   1/1:0,2:2:PASS:6:90,6,0 0/0:1,0:1:depth:3:0,3,34    0/0:5,0:5:PASS:15:0,15,195  0/0:1,0:1:depth:3:0,3,32    0/0:18,0:18:PASS:48:0,48,720    0/0:7,0:7:PASS:21:0,21,213  0/0:8,0:8:PASS:24:0,24,288  0/0:20,0:20:PASS:57:0,57,855
MtDNA   13069   .   T   C   2144.05 PASS    AC=12;AF=1.00;AN=12;AS_QD=27.59;DP=51;ExcessHet=3.0103;FS=0.000;GQ_MEAN=25.50;GQ_STDDEV=13.52;MLEAC=14;MLEAF=1.00;MQ=60.00;NCC=2;QD=30.55;SOR=0.994 GT:AD:DP:GQ:PL  1/1:0,2:2:6:87,6,0  ./.:0,0:0:.:0,0,0   1/1:0,7:7:21:292,21,0   ./.:0,0:0:.:0,0,0   1/1:0,12:12:36:531,36,0 1/1:0,7:7:21:259,21,0   1/1:0,8:8:24:334,24,0   1/1:0,15:15:45:620,45,0
MtDNA   13208   .   C   T   788.24  PASS    AC=6;AF=0.500;AN=12;AS_QD=25.73;DP=53;ExcessHet=0.1809;FS=0.000;GQ_MEAN=20.00;GQ_STDDEV=19.22;MLEAC=8;MLEAF=0.667;MQ=60.00;NCC=2;QD=28.92;SOR=1.127 GT:AD:DP:GQ:PL  ./.:0,0:0:.:0,0,0   0/0:2,0:2:6:0,6,65  1/1:0,4:4:12:157,12,0   ./.:0,0:0:.:0,0,0   1/1:0,8:8:24:341,24,0   1/1:0,8:8:24:303,24,0   0/0:13,0:13:0:0,0,353   0/0:18,0:18:54:0,54,472
MtDNA   13344   .   G   A   226.02  PASS    AC=2;AF=0.200;AN=10;AS_QD=28.25;DP=17;ExcessHet=0.2482;FS=0.000;GQ_MEAN=9.60;GQ_STDDEV=8.85;MLEAC=3;MLEAF=0.300;MQ=60.00;NCC=3;QD=28.25;SOR=1.179   GT:AD:DP:FT:GQ:PL   0/0:1,0:1:depth:3:0,3,39    ./.:0,0:0:PASS:.:0,0,0  ./.:0,0:0:PASS:.:0,0,0  ./.:0,0:0:PASS:.:0,0,0  0/0:2,0:2:PASS:3:0,3,45 0/0:4,0:4:PASS:12:0,12,136  0/0:2,0:2:PASS:6:0,6,88 1/1:0,8:8:PASS:24:239,24,0
MtDNA   13700   .   TA  T   49.17   PASS    AC=2;AF=0.250;AN=8;AS_QD=24.58;DP=24;ExcessHet=0.3218;FS=0.000;GQ_MEAN=17.25;GQ_STDDEV=7.89;MLEAC=2;MLEAF=0.250;MQ=48.99;NCC=4;QD=24.58;RPA=8,7;RU=A;SOR=2.303;STR  GT:AD:DP:GQ:PL  ./.:0,0:0:.:0,0,0   ./.:0,0:0:.:0,0,0   ./.:1,0:1:.:0,0,0   ./.:0,0:0:.:0,0,0   0/0:7,0:7:21:0,21,298   0/0:6,0:6:18:0,18,141   1/1:0,2:2:6:61,6,0  0/0:8,0:8:24:0,24,211

Any and all helps is appreciated! I'm hoping it is something simple!

Thanks

Answers

  • thatguy027thatguy027 ChicagoMember

    looking back at my test runs from yesterday, i had a genotype-level filter of DP<10 and I got a lot more FT fields. I'm guessing that if, for a a given site, no samples are marked by the filter, PASS is placed in the FILTER column and no FT field is added. Is this correct?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @thatguy027
    Hi,

    I'm guessing that if, for a a given site, no samples are marked by the filter, PASS is placed in the FILTER column and no FT field is added. Is this correct?

    It should be correct, but notice at position MtDNA:13344, the FILTER column has PASS even though one of the samples fails the genotype filter. I don't think this should happen. Can you submit a bug report? Instructions are here.

    Thanks,
    Sheila

  • MishMish Member
    edited June 2018

    I faced the same problem. is there any solution for this problem ?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Mish
    Hi,

    Can you post some example records that have the FT field and don't have the FT field? Can you also try with the very latest version if you have not already?

    Thanks,
    Sheila

Sign In or Register to comment.