The number of variants after hard-filter increased from the number of variants without hard-filter

How do I have more variants after hard-filtering than before hard-filtering?

Variant Call without hard-filtering:
~/gatk-4.0.8.1/gatk HaplotypeCaller -R Reference_genome.fasta -I KMM2_bwamem_alignment_samblaster_samtoolsview_samtoolssort_09292018.bam -O KMM2_raw_snps_indels_annotations_09302018.vcf -A BaseQualityRankSumTest -A FisherStrand -A MappingQu alityRankSumTest -A QualByDepth -A RMSMappingQuality -A ReadPosRankSumTest -A StrandOddsRatio -A InbreedingCoeff

(gatk) [[email protected] KMM2_analysis]$ grep -v '#' KMM2_raw_snps_indels_annotations_09302018.vcf | wc -l
701948

Variant Call with hard-filtering (snps):
~/gatk-4.0.8.1/gatk VariantFiltration -R Reference_genome.fasta -V KMM2_raw_snps_indels_annotations_09302018.vcf -filter "QD < 2.0 || MQ < 40.0 || FS > 60.0 || SOR > 3.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filter-name "snpfilter1" -O KMM2_filtered_snps_snpfilter1_10012018.vcf

Variant Call with hard-filtering (indels):
~/gatk-4.0.8.1/gatk VariantFiltration -R Reference_genome.fasta -V KMM2_raw_snps_indels_annotations_09302018.vcf -filter "QD < 2.0 || MQ < 40.0 || FS > 60.0 || SOR > 3.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0" --filter-name "indfilter1" -O KMM2_filtered_indels_indfilter1_10012018.vcf

Merging hard-filtered snps and indels
picard MergeVcfs I=KMM2_filtered_snps_snpfilter1_10012018.vcf I=KMM2_filtered_indels_indfilter1_10012018.vcf O=KMM2_combined_filtered_variants_filter1_10012018.vcf

[[email protected] KMM2_analysis]$ grep 'PASS' KMM2_combined_filtered_variants_filter1_10012018.vcf | wc -l
1302970

Answers

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi again @kmmahan,

    I think this is a question you can answer with some probing, i.e. look into the VCF and see what is different. Given the difference between the two callsets is the merge of a SNPs-only callset with an INDEL-only callset, I suspect this may have to do with representing overlapping mixed variants in separate versus multiallelic variant records.

Sign In or Register to comment.