Attention:
The frontline support team will be unavailable to answer questions on April 15th and 17th 2019. We will be back soon after. Thank you for your patience and we apologize for any inconvenience!

about filtering the somatic variants

BogdanBogdan Palo Alto, CAMember ✭✭

Dear Sheila, and Geraldine,

please could you advice on filtering the somatic variants based on FS (Phred-scaled p-value using Fisher's exact test to detect strand bias):
starting from a vcf file produced by MUTECT2. I am doing the following :

a. to add the FS field :

**$GATK -T VariantAnnotator \
-R $REFERENCE_HG38 \
-I input.bam \
-V input.vcf \
-A FisherStrand \
-A StrandOddsRatio \
-A BaseCounts \
-A HomopolymerRun \
-o output.vcf \
--disable_auto_index_creation_and_locking_when_reading_rods **

b. to do the filtering :

**$GATK -T VariantFiltration \
-R $REFERENCE_HG38 \
--variant output.vcf \
-o output.filtered.vcf \
--filterExpression "FS > 60" --filterName "StrandBiasFisherFilter" \
--disable_auto_index_creation_and_locking_when_reading_rods **

However the second part is not running properly: it says :

WARN 18:18:55,595 Interpreter - ![0,2]: 'FS > 60;' undefined variable FS
WARN 18:18:55,595 Interpreter - ![0,2]: 'FS > 60;' undefined variable FS
WARN 18:18:55,596 Interpreter - ![0,2]: 'FS > 60;' undefined variable FS

although FS is certainly in the VCF files (please see below). thanks a lot !

** chr21 5216709 . A T . alt_allele_in_normal;clustered_events BaseCounts=241,0,1,8;ECNT=10;FS=2.572;HCNT=1;HRun=0;MAX_ED=53;MIN_ED=8;NLOD=25.05;SOR=1.491;TLOD=12.
01 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:PGT:PID:QSS:REF_F1R2:REF_F2R1 0/1:234,5:0.021:4:1:0.200:0|1:5216709_A_T:5987,125:121:113 0/0:154,4:0.032:2:2:0.500:0|1:5216709_A_T:3895,102:7
8:76
chr21 5216717 . C T . clustered_events;homologous_mapping_event BaseCounts=0,231,0,19;ECNT=10;FS=11.684;HCNT=4;HRun=0;MAX_ED=53;MIN_ED=8;NLOD=46.35;SOR=2.27
9;TLOD=20.60 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:PGT:PID:QSS:REF_F1R2:REF_F2R1 0/1:244,7:0.032:5:2:0.286:0|1:5216717_C_T:5801,171:125:119 0/0:153,0:0.00:0:0:.:0|1:5216717_C_T:3636,0:
78:75**

Best Answers

Answers

  • BogdanBogdan Palo Alto, CAMember ✭✭

    Now I see ... I am getting these messages because some records in the VCF files do not have the FS field annotated by VariantAnnotator, especially the INDELS : an example is below :smile:

    chr21 44463137 . T TTAAA . clustered_events;multi_event_alt_allele_in_normal;t_lod_fstar ECNT=9;HCNT=4;MAX_ED=26;MIN_ED=7;NLOD=2.72;TLOD=5.14 GT:AD:AF:ALT_F1R2:ALT_F2R1:PGT:PID:QSS:REF_F1R2:REF_F2R1 0/1:62,2:0.031:0:2:0|1:44463117_C_T:1396,42:33:29 0/0:20,1:0.042:0:1:0|1:44463117_C_T:483,17:12:8
    chr21 44463138 . AGGG A . clustered_events;multi_event_alt_allele_in_normal;t_lod_fstar ECNT=9;HCNT=4;MAX_ED=26;MIN_ED=7;NLOD=3.03;TLOD=5.11 GT:AD:AF:ALT_F1R2:ALT_F2R1:PGT:PID:QSS:REF_F1R2:REF_F2R1 0/1:61,2:0.030:0:2:0|1:44463117_C_T:1397,53:32:29 0/0:22,1:0.040:0:1:0|1:44463117_C_T:545,11:13:9
    chr21 44463143 . AGGGCC A . clustered_events;multi_event_alt_allele_in_normal;t_lod_fstar ECNT=9;HCNT=4;MAX_ED=26;MIN_ED=7;NLOD=2.72;TLOD=5.12 GT:AD:AF:ALT_F1R2:ALT_F2R1:PGT:PID:QSS:REF_F1R2:REF_F2R1 0/1:63,2:0.031:0:2:0|1:44463117_C_T:1481,47:35:28 0/0:23,1:0.042:0:1:0|1:44463117_C_T:553,11:14:9

    However, I am still a bit confused ... as HaplotypeCaller computes the FS for INDELS, could Mutect1 or Mutect2 do the same (possibly) ? thanks !

  • BogdanBogdan Palo Alto, CAMember ✭✭

    I was trying, the program , Mutect2 is still running, although it gives some errors during the run :smiley: (below)
    will let you know how it finishes ... ;)

    ERROR 17:25:15,810 MuTect2 - Null qss at 5217320
    ERROR 17:25:15,810 MuTect2 - Null qss at 5217326
    ERROR 17:25:15,811 MuTect2 - Null qss at 5217331
    ERROR 17:25:18,453 MuTect2 - Null qss at 5217469
    ERROR 17:25:18,453 MuTect2 - Null qss at 5217482
    ERROR 17:25:18,454 MuTect2 - Null qss at 5217490
    ERROR 17:25:18,454 MuTect2 - Null qss at 5217503
    ERROR 17:25:18,454 MuTect2 - Null qss at 5217510

  • BogdanBogdan Palo Alto, CAMember ✭✭

    The command line being :smile:
    $GATK \
    -T MuTect2 \
    -R $REFERENCE_HG38 \
    -L chr21 \
    -A FisherStrand \
    -I:normal normal.IR.RC.bam \
    -I:tumor tumor.IR.RC.bam \
    -o test.fisher.strand.MUTECT2.vcf \
    --disable_auto_index_creation_and_locking_when_reading_rods

  • BogdanBogdan Palo Alto, CAMember ✭✭

    Yes, MUTECt2 finishes the run shall I add the -A FisherStrand , and it excludes the mutations that are highly strand biased.
    although it gives the errors such as the ones below along the run. Shall I worry about those ? thanks !

    ERROR 17:25:15,810 MuTect2 - Null qss at 5217320
    ERROR 17:25:15,810 MuTect2 - Null qss at 5217326

  • BogdanBogdan Palo Alto, CAMember ✭✭

    Dear Sheila, could I also add the field -A when running MUTECT1 ?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Bogdan
    Hi Bogdan,

    I don't think you can add -A with the original MuTect.

    -Sheila

  • BogdanBogdan Palo Alto, CAMember ✭✭

    Thanks Sheila, a question though : during MUTECt2 run (with -A FisherStrand), do you think it is safe to ignore the errors such as

    ERROR 17:25:15,810 MuTect2 - Null qss at 5217320
    ERROR 17:25:15,810 MuTect2 - Null qss at 5217326

    Issue · Github
    by Sheila

    Issue Number
    824
    State
    closed
    Last Updated
    Assignee
    Array
    Milestone
    Array
    Closed By
    vdauwera
  • BogdanBogdan Palo Alto, CAMember ✭✭

    thanks Geraldine , and happy weekend !

Sign In or Register to comment.