We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

About variant filtration process..

MUHAMMADSOHAILRAZAMUHAMMADSOHAILRAZA Beijing Institute of Genomics, CASMember ✭✭

Hi,
I have read the tutorial about "(howto) Apply hard filters to a call set" at
https://www.broadinstitute.org/gatk/guide/article?id=2806

Before the variant filtration, we need to 1. Extract the SNPs from the call set
sample code of the tutorial was:
java -jar GenomeAnalysisTK.jar \
-T SelectVariants \
-R reference.fa \
-V raw_variants.vcf \
-L 20 \
-selectType SNP \
-o raw_snps.vcf

Here why -L 20 option is used?? i know it is interval option. but why the value set to 20 explicitly?

secondly, while applying hard filters, parameters like"ClusterSize" and "ClusterWindowSize" are not considered in the best practices, while,
some people use them to remove the false positive variants, i know it's bit irrelevant question, but personally what you think how much this parameter can be important for variant filtration for illumina sequencing data?

Many thanks!

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    This is because the commands were extracted from hands-on tutorials where we ran on just chromosome 20 in order to save time. You can remove it from the command to run actual analyses.

  • MUHAMMADSOHAILRAZAMUHAMMADSOHAILRAZA Beijing Institute of Genomics, CASMember ✭✭

    @Geraldine_VdAuwera
    Thank you!

    secondly, while applying hard filters, parameters like"ClusterSize" and "ClusterWindowSize" are not considered in the best practices, while,
    some people use them to remove the false positive variants, i know it's bit irrelevant question, but personally what you think how much this parameter can be important for variant filtration for illumina sequencing data?

    Kind Regards

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Using cluster position as filter is problematic because we know there are real variants that occur naturally in clusters. I would say it comes down to how much you care about sensitivity vs specificity. Personally I would prefer not to use this type of filter.

Sign In or Register to comment.