VariantFiltration for non-human data

sagipolanisagipolani Posts: 41Member
edited October 2012 in Ask the GATK team

Hi all,

I'm currently analysing non-human mammalian whole genome data (>30x). No previous variants databases are available.

I'm currently in the VariantFiltration step. I came around the following command which is used for human data, and I'm wondering if it will be good for non-human data:

java -Xmx10g -jar GenomeAnalysisTK.jar \
-R [reference.fasta] \
-T VariantFiltration \
--variant [input.recalibrated.vcf] \
-o [recalibrated.filtered.vcf] \
--clusterWindowSize 10 \
--filterExpression "MQ0 >= 4 && ((MQ0 / (1.0 * DP)) > 0.1)" \
--filterName "HARD_TO_VALIDATE" \
--filterExpression "DP < 5 " \
--filterName "LowCoverage" \
--filterExpression "QUAL < 30.0 " \
--filterName "VeryLowQual" \
--filterExpression "QUAL > 30.0 && QUAL < 50.0 " \
--filterName "LowQual" \
--filterExpression "QD < 1.5 " \
--filterName "LowQD" \
--filterExpression "SB > -10.0 " \
--filterName "StrandBias"

I would appreciate your thoughts on this matter.

Thank you very much!

Sagi

Post edited by Geraldine_VdAuwera on

Best Answer

Answers

  • xujingxujing ChinaPosts: 1Member

    Hi all,
    I had saw a command about GATK/VariantFiltration/filterExpression. I cannot understand the meaning of some arguments in the below command. The command is --filterExpression "QD < 20.0 || ReadPosRankSum < -8.0 || FS > 10.0 || QUAL < $MEANQUAL" I want to know the meaning of the bold fonts. Can you help me?

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,364Administrator, GATK Developer admin

    Hi @xujing,

    Please read the documentation on JEXL expressions. Use the search box in the top right corner of the page if you don't know where to find the documentation.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.