the multiallelic support of VCF QD/FS filed

wang_yugui2wang_yugui2 china,beijingMember

Hi,

Is there some way to make the VCF QD/FS filed support multiallelic ?
I Want to filter VCF by QD/FS info for RNA data.

1)The QD/FS filed is NOT the VCF Type A'; Type 'A': If the Field has one value per alternate allele then this value should be A';

2)There is no way to let GATK to output VCF multiallelic separately.
Now multiallelic of VCF share the same QD/FS value.

Best Regards.
Wang Yugui

Tagged:

Best Answer

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @wang_yugui2
    Hi Wang Yugui,

    1) The QD gives the overall confidence that there is a variant at the site for all the samples. However, you may want to look into filtering GQ, because the GQ gives the confidence in the genotype of the sample. For strand bias, you can try to look into StrandAlleleCountsBySample. https://www.broadinstitute.org/gatk/guide/tooldocs/org_broadinstitute_gatk_tools_walkers_annotator_StrandAlleleCountsBySample.php

    2) This thread will give you some insight into why we don't want to support splitting multiallelic sites. http://gatkforums.broadinstitute.org/discussion/3148/how-to-deal-with-multiallelic-sites-in-the-vcf

    -Sheila

  • wang_yugui2wang_yugui2 china,beijingMember

    by https://www.broadinstitute.org/gatk/guide/tagged?tag=genotype-quality
    it is more effective to filter on site-level annotations first, then refine and filter genotypes as appropriate.
    That's the workflow we recommend, based on years of experience doing this at fairly large scales.

    If the site has big background noise(small QD or Big FS), then all allele of this site will be NOT good.

    But even If the site has small background noise, Is it better to check QD/FS of an allele with low coverage such
    as RNA-seq and Tumor before we check the GQ/SAC of a single sample?

    Best Regards
    Wang Yugui

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    We have less experience with RNAseq than with germline DNA, so we cannot yet provide very detailed recommendations for filtering callsets derived from that data type. If you are dealing with RNAseq, we have a few recommendations, but ultimately you will need to experiment to find what works best on your data.

    Tumor is yet another case figure, which is handled rather differently at the moment.

    What type of data are you currently working with?

  • wang_yugui2wang_yugui2 china,beijingMember

    I'm working both RNA-seq and tumor.

    For tumor study, the most widely used one is Normal/Tumor pair.
    For one pair, Mutect/Varscan2 execute Fisher exact test to find tumor's SNP/Indel.
    For many pairs(such as 100 pairs), we need execute Fisher exact test among all samples to find SNP/Indel of tumor group,
    and we need GVCF(no-var) support. But Mutect/VarScan2 do NOT support GVCF(no-var) well,
    so I'm trying to test GATK/Samtools for tumor.

    for normal group, I want to use HaplotypeCaller -gvcf with -contamination 0.15 option to filter.
    for tumor group, I want to use HaplotypeCaller -gvcf without -contamination option. and then joint all together.

    I also try to fix genotypes with the info provided in VCF file, and then I will have a VCF of all samples good enough
    to execute Fisher exact test among all samples.

    samtools/bcftools of github version begin to support INFO/AD(Number=R) INFO/ADF(Number=R) INFO/ADR(Number=R) ,
    that is part info of multiallelic FS.

    samtools/bcftools 1.2 yet NOT support INFO/ADF,INFO/ADR.

    I have no more info about QD now.

Sign In or Register to comment.