We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

about filtering the somatic variants

BogdanBogdan Palo Alto, CAMember ✭✭

Dear Sheila, and Geraldine,

please could you advice on filtering the somatic variants based on FS (Phred-scaled p-value using Fisher's exact test to detect strand bias):
starting from a vcf file produced by MUTECT2. I am doing the following :

a. to add the FS field :

**$GATK -T VariantAnnotator \
-R $REFERENCE_HG38 \
-I input.bam \
-V input.vcf \
-A FisherStrand \
-A StrandOddsRatio \
-A BaseCounts \
-A HomopolymerRun \
-o output.vcf \
--disable_auto_index_creation_and_locking_when_reading_rods **

b. to do the filtering :

**$GATK -T VariantFiltration \
-R $REFERENCE_HG38 \
--variant output.vcf \
-o output.filtered.vcf \
--filterExpression "FS > 60" --filterName "StrandBiasFisherFilter" \
--disable_auto_index_creation_and_locking_when_reading_rods **

However the second part is not running properly: it says :

WARN 18:18:55,595 Interpreter - ![0,2]: 'FS > 60;' undefined variable FS
WARN 18:18:55,595 Interpreter - ![0,2]: 'FS > 60;' undefined variable FS
WARN 18:18:55,596 Interpreter - ![0,2]: 'FS > 60;' undefined variable FS

although FS is certainly in the VCF files (please see below). thanks a lot !

** chr21 5216709 . A T . alt_allele_in_normal;clustered_events BaseCounts=241,0,1,8;ECNT=10;FS=2.572;HCNT=1;HRun=0;MAX_ED=53;MIN_ED=8;NLOD=25.05;SOR=1.491;TLOD=12.
01 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:PGT:PID:QSS:REF_F1R2:REF_F2R1 0/1:234,5:0.021:4:1:0.200:0|1:5216709_A_T:5987,125:121:113 0/0:154,4:0.032:2:2:0.500:0|1:5216709_A_T:3895,102:7
8:76
chr21 5216717 . C T . clustered_events;homologous_mapping_event BaseCounts=0,231,0,19;ECNT=10;FS=11.684;HCNT=4;HRun=0;MAX_ED=53;MIN_ED=8;NLOD=46.35;SOR=2.27
9;TLOD=20.60 GT:AD:AF:ALT_F1R2:ALT_F2R1:FOXOG:PGT:PID:QSS:REF_F1R2:REF_F2R1 0/1:244,7:0.032:5:2:0.286:0|1:5216717_C_T:5801,171:125:119 0/0:153,0:0.00:0:0:.:0|1:5216717_C_T:3636,0:
78:75**

Best Answers

Answers

  • BogdanBogdan Palo Alto, CAMember ✭✭

    Now I see ... I am getting these messages because some records in the VCF files do not have the FS field annotated by VariantAnnotator, especially the INDELS : an example is below :smile:

    chr21 44463137 . T TTAAA . clustered_events;multi_event_alt_allele_in_normal;t_lod_fstar ECNT=9;HCNT=4;MAX_ED=26;MIN_ED=7;NLOD=2.72;TLOD=5.14 GT:AD:AF:ALT_F1R2:ALT_F2R1:PGT:PID:QSS:REF_F1R2:REF_F2R1 0/1:62,2:0.031:0:2:0|1:44463117_C_T:1396,42:33:29 0/0:20,1:0.042:0:1:0|1:44463117_C_T:483,17:12:8
    chr21 44463138 . AGGG A . clustered_events;multi_event_alt_allele_in_normal;t_lod_fstar ECNT=9;HCNT=4;MAX_ED=26;MIN_ED=7;NLOD=3.03;TLOD=5.11 GT:AD:AF:ALT_F1R2:ALT_F2R1:PGT:PID:QSS:REF_F1R2:REF_F2R1 0/1:61,2:0.030:0:2:0|1:44463117_C_T:1397,53:32:29 0/0:22,1:0.040:0:1:0|1:44463117_C_T:545,11:13:9
    chr21 44463143 . AGGGCC A . clustered_events;multi_event_alt_allele_in_normal;t_lod_fstar ECNT=9;HCNT=4;MAX_ED=26;MIN_ED=7;NLOD=2.72;TLOD=5.12 GT:AD:AF:ALT_F1R2:ALT_F2R1:PGT:PID:QSS:REF_F1R2:REF_F2R1 0/1:63,2:0.031:0:2:0|1:44463117_C_T:1481,47:35:28 0/0:23,1:0.042:0:1:0|1:44463117_C_T:553,11:14:9

    However, I am still a bit confused ... as HaplotypeCaller computes the FS for INDELS, could Mutect1 or Mutect2 do the same (possibly) ? thanks !

  • BogdanBogdan Palo Alto, CAMember ✭✭

    I was trying, the program , Mutect2 is still running, although it gives some errors during the run :smiley: (below)
    will let you know how it finishes ... ;)

    ERROR 17:25:15,810 MuTect2 - Null qss at 5217320
    ERROR 17:25:15,810 MuTect2 - Null qss at 5217326
    ERROR 17:25:15,811 MuTect2 - Null qss at 5217331
    ERROR 17:25:18,453 MuTect2 - Null qss at 5217469
    ERROR 17:25:18,453 MuTect2 - Null qss at 5217482
    ERROR 17:25:18,454 MuTect2 - Null qss at 5217490
    ERROR 17:25:18,454 MuTect2 - Null qss at 5217503
    ERROR 17:25:18,454 MuTect2 - Null qss at 5217510

  • BogdanBogdan Palo Alto, CAMember ✭✭

    The command line being :smile:
    $GATK \
    -T MuTect2 \
    -R $REFERENCE_HG38 \
    -L chr21 \
    -A FisherStrand \
    -I:normal normal.IR.RC.bam \
    -I:tumor tumor.IR.RC.bam \
    -o test.fisher.strand.MUTECT2.vcf \
    --disable_auto_index_creation_and_locking_when_reading_rods

  • BogdanBogdan Palo Alto, CAMember ✭✭

    Yes, MUTECt2 finishes the run shall I add the -A FisherStrand , and it excludes the mutations that are highly strand biased.
    although it gives the errors such as the ones below along the run. Shall I worry about those ? thanks !

    ERROR 17:25:15,810 MuTect2 - Null qss at 5217320
    ERROR 17:25:15,810 MuTect2 - Null qss at 5217326

  • BogdanBogdan Palo Alto, CAMember ✭✭

    Dear Sheila, could I also add the field -A when running MUTECT1 ?

  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭

    @Bogdan
    Hi Bogdan,

    I don't think you can add -A with the original MuTect.

    -Sheila

  • BogdanBogdan Palo Alto, CAMember ✭✭

    Thanks Sheila, a question though : during MUTECt2 run (with -A FisherStrand), do you think it is safe to ignore the errors such as

    ERROR 17:25:15,810 MuTect2 - Null qss at 5217320
    ERROR 17:25:15,810 MuTect2 - Null qss at 5217326

    Issue · Github
    by Sheila

    Issue Number
    824
    State
    closed
    Last Updated
    Assignee
    Array
    Milestone
    Array
    Closed By
    vdauwera
  • BogdanBogdan Palo Alto, CAMember ✭✭

    thanks Geraldine , and happy weekend !

Sign In or Register to comment.