Attention:
The frontline support team will be unavailable to answer questions on April 15th and 17th 2019. We will be back soon after. Thank you for your patience and we apologize for any inconvenience!

Hard filtering homozygous reference calls in non-model

I'm using GaTK to assess heterozygosity in a non-model species. I need to filter out the good/confident variant calls as well as the good homozygous-reference samples because the heterozygosity calculation is simply (# het sites / (# hom ref sites + # hom alt sites ))

For the hom reference, I plan to apply the DP > 10, but I'm not sure what other filters are appropriate.

For the variants, since I'm working with a non-model I'm extractring biallelic SNPs that pass GaTK hard filters, and where GQ>20, DP>10 and DP < 100, I also exclude any clustered SNPs (3 in 10bp).

My GaTK calling was done as follows:
java -Xmx${MEM} -jar ${gatk_dir}/GenomeAnalysisTK.jar \
-R ${genome} \
-T HaplotypeCaller \
-I ${data_dir}/"11_"${SAMPLE_ABB}"IR_BQSR"${CHROM}"recalibrated.bam" \
--emitRefConfidence BP_RESOLUTION \
-L ${CHROM} \
-o ${data_dir}/"12
"${SAMPLE_ABB}"HapCaller"${CHROM}".g.vcf" \
2>>./"C_IR_BQSR"${SAMPLE_ABB}"_"${CHROM}".txt"

java -Xmx$MEM -jar ${gatk_dir}/GenomeAnalysisTK.jar \
-R ${genome} \
-T GenotypeGVCFs \
-L ${CHROM} \
--includeNonVariantSites \
--standard_min_confidence_threshold_for_calling 30 \
--standard_min_confidence_threshold_for_emitting 10 \
--variant ${data_dir}/"12_"${SAMPLE_ABB}"HapCaller"${CHROM}".g.vcf" \
-o ${data_dir}/"13_"${SAMPLE_ABB}"HapCaller"${CHROM}".vcf" \
2>>./"C_IR_BQSR"${SAMPLE_ABB}"_"${CHROM}".txt"

Answers

Sign In or Register to comment.