We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Is there a way to limit the genotyping to those sites with FILTER=PASS?

jfarrelljfarrell Member ✭✭

With the genotyping discovery qscript, all the discovery sites were genotyped during the run instead of the much smaller set that have filter=PASS.

Instead of having all these sites genotyped, is there a option to specify that only sites with filter=PASS should be genotyped and output to the final vcf file. Or is there a reason these non PASS sites are included?

Answers

  • bhandsakerbhandsaker Member, Broadie ✭✭✭✭

    I'm sure there are many vcf tools that can do this (or you can use grep).

    We have one such tool in GS (org.broadinstitute.sv.apps.VCFFilter -R reference -vcf input.vcf -O output.vcf ...).
    If you use -includeSitesByFilter PASS you can select just the passing sites.

    Sometimes, people genotype non-PASS sites (or filter less stringently) and then filter afterwards based on the genotyping. I'd say the standard practice is to genotype just the PASS sites.

  • jfarrelljfarrell Member ✭✭

    Presently, the template discovery.sh script and the SVDiscovery.q provided in the distribution will genotype both the non-PASS and PASS calls. For the discovery.sh script or the SVDiscovery.q, I just think it would be useful to add a SelectVariants in there ....

    java -cp ${classpath} -Xmx2g -jar ${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar \
    -R $REF \
    -T SelectVariants \
    --variant ${runDir}/${sites} \
    -o candidate_sites_for_genotyping.vcf \
    -ef

    Otherwise, the first time user of the script may try to genotype a million sites rather than the expected 10,0000 or so that actually passed. In other words, the default behavior of the script would be to genotype only FILTER=PASS.

    John

Sign In or Register to comment.