Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Hard filtering homozygous reference calls in non-model

I'm using GaTK to assess heterozygosity in a non-model species. I need to filter out the good/confident variant calls as well as the good homozygous-reference samples because the heterozygosity calculation is simply (# het sites / (# hom ref sites + # hom alt sites ))

For the hom reference, I plan to apply the DP > 10, but I'm not sure what other filters are appropriate.

For the variants, since I'm working with a non-model I'm extractring biallelic SNPs that pass GaTK hard filters, and where GQ>20, DP>10 and DP < 100, I also exclude any clustered SNPs (3 in 10bp).

My GaTK calling was done as follows:
java -Xmx${MEM} -jar ${gatk_dir}/GenomeAnalysisTK.jar \
-R ${genome} \
-T HaplotypeCaller \
-I ${data_dir}/"11_"${SAMPLE_ABB}"IR_BQSR"${CHROM}"recalibrated.bam" \
--emitRefConfidence BP_RESOLUTION \
-L ${CHROM} \
-o ${data_dir}/"12
"${SAMPLE_ABB}"HapCaller"${CHROM}".g.vcf" \
2>>./"C_IR_BQSR"${SAMPLE_ABB}"_"${CHROM}".txt"

java -Xmx$MEM -jar ${gatk_dir}/GenomeAnalysisTK.jar \
-R ${genome} \
-T GenotypeGVCFs \
-L ${CHROM} \
--includeNonVariantSites \
--standard_min_confidence_threshold_for_calling 30 \
--standard_min_confidence_threshold_for_emitting 10 \
--variant ${data_dir}/"12_"${SAMPLE_ABB}"HapCaller"${CHROM}".g.vcf" \
-o ${data_dir}/"13_"${SAMPLE_ABB}"HapCaller"${CHROM}".vcf" \
2>>./"C_IR_BQSR"${SAMPLE_ABB}"_"${CHROM}".txt"

Answers

Sign In or Register to comment.