We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Hard filtering homozygous reference calls in non-model

I'm using GaTK to assess heterozygosity in a non-model species. I need to filter out the good/confident variant calls as well as the good homozygous-reference samples because the heterozygosity calculation is simply (# het sites / (# hom ref sites + # hom alt sites ))

For the hom reference, I plan to apply the DP > 10, but I'm not sure what other filters are appropriate.

For the variants, since I'm working with a non-model I'm extractring biallelic SNPs that pass GaTK hard filters, and where GQ>20, DP>10 and DP < 100, I also exclude any clustered SNPs (3 in 10bp).

My GaTK calling was done as follows:
java -Xmx${MEM} -jar ${gatk_dir}/GenomeAnalysisTK.jar \
-R ${genome} \
-T HaplotypeCaller \
-I ${data_dir}/"11_"${SAMPLE_ABB}"IR_BQSR"${CHROM}"recalibrated.bam" \
--emitRefConfidence BP_RESOLUTION \
-L ${CHROM} \
-o ${data_dir}/"12
"${SAMPLE_ABB}"HapCaller"${CHROM}".g.vcf" \
2>>./"C_IR_BQSR"${SAMPLE_ABB}"_"${CHROM}".txt"

java -Xmx$MEM -jar ${gatk_dir}/GenomeAnalysisTK.jar \
-R ${genome} \
-T GenotypeGVCFs \
-L ${CHROM} \
--includeNonVariantSites \
--standard_min_confidence_threshold_for_calling 30 \
--standard_min_confidence_threshold_for_emitting 10 \
--variant ${data_dir}/"12_"${SAMPLE_ABB}"HapCaller"${CHROM}".g.vcf" \
-o ${data_dir}/"13_"${SAMPLE_ABB}"HapCaller"${CHROM}".vcf" \
2>>./"C_IR_BQSR"${SAMPLE_ABB}"_"${CHROM}".txt"

Answers

Sign In or Register to comment.