Variant Annotator not annotating MappingQualityRankSumTest and ReadPosRankSumTest for INDELs

TechnicalVaultTechnicalVault Sanger, Cambridge, UKPosts: 86Member

Hi,

I'm attempting to use Variant Annotator to annotate some VCFs produced by samtools so I can run VQSR on them. Unfortunately I've gottent stuck and I'm trying to figure out why Variant Annotator wouldn't be annotating INDELs with MappingQualityRankSumTest and ReadPosRankSumTest, it seems to annotate SNPs fine. There are both Homs and het's called on the sample. Could it be I need to left align the indels to get enough coverage? What would you suggest is the best way to debug this? Is there a way to make GATK behave more verbosely about why it's refusing an annotation?

Thanks Martin

Martin Pollard, Human Genetics Informatics - Wellcome Trust Sanger Institute

Best Answers

Answers

  • KurtKurt Posts: 174Member ✭✭✭

    Not sure why they are not annotating, but in regards to making GATK more verbose, have you already added -l DEBUG to your command line?

  • TechnicalVaultTechnicalVault Sanger, Cambridge, UKPosts: 86Member

    Thanks for the suggestion, -l DEBUG does make matters more verbose, but it appears there's no debugging output from the annotator itself. I've now tried running left align then annotating, still flummoxed.

    Martin Pollard, Human Genetics Informatics - Wellcome Trust Sanger Institute

  • TechnicalVaultTechnicalVault Sanger, Cambridge, UKPosts: 86Member
    edited November 2012

    Hmm also they seem to all be coming out with FS=0.000;HaplotypeScore=0.0000.

    If it helps this is the command line used:

    java -Xmx2000m -Xms2000m -server -XX:+UseSerialGC -jar /nfs/users/nfs_m/mercury/src/GenomeAnalysisTK-1.6/GenomeAnalysisTK.jar -T VariantAnnotator -R /lustre/scratch111/resources/vrpipe/ref/Homo_sapiens/1000Genomes/human_g1k_v37.fasta -I /lustre/scratch111/projects/helic/vcf_intermediate-2/lists/chr1-pooled.list --variant /lustre/scratch111/projects/helic/vcf_intermediate-2/pooled/1/1:14000001-15000000.samtools.vcf.gz -L 1:14000001-15000000 -A QualByDepth -A HaplotypeScore -A MappingQualityRankSumTest -A ReadPosRankSumTest -A FisherStrand -A InbreedingCoeff -A DepthOfCoverage -o /lustre/scratch111/projects/helic/vcf_intermediate-2/pooled/1/1:14000001-15000000.vcf.gz.part.vcf.gz
    
    Post edited by TechnicalVault on

    Martin Pollard, Human Genetics Informatics - Wellcome Trust Sanger Institute

  • TechnicalVaultTechnicalVault Sanger, Cambridge, UKPosts: 86Member
    edited November 2012

    My bad that was the command line from the older directory:

    java -Xmx2500m -Xms2500m -server -XX:+UseSerialGC -jar /nfs/users/nfs_m/mercury/src/GenomeAnalysisTK-2.2/GenomeAnalysisTKLite.jar -T VariantAnnotator -R /lustre/scratch111/resources/vrpipe/ref/Homo_sapiens/1000Genomes/human_g1k_v37.fasta -I /lustre/scratch111/projects/helic/vcf_intermediate-3/lists/chr1-pooled.list --variant /lustre/scratch111/projects/helic/vcf_intermediate-3/pooled/1/1:14000001-15000000.samtools.vcf.gz -L 1:14000001-15000000 -U LENIENT_VCF_PROCESSING -A QualByDepth -A HaplotypeScore -A MappingQualityRankSumTest -A ReadPosRankSumTest -A FisherStrand -A InbreedingCoeff -A DepthOfCoverage -o /lustre/scratch111/projects/helic/vcf_intermediate-3/pooled/1/1:14000001-15000000.vcf.gz.part.vcf.gz
    

    Rather than being 0 FS and HaplotypeScore are simply missing from the 2.2 annotated BAMs, similar to ReadPosRankSumTest and MappingQualityRankSumTest. Is the new functionality in the Lite GATK or just in Full?

    Post edited by TechnicalVault on

    Martin Pollard, Human Genetics Informatics - Wellcome Trust Sanger Institute

  • tommycarstensentommycarstensen United KingdomPosts: 153Member

    @TechnicalVault‌ I have the same problem for INDELs. What solution did you end up with? Did you "run your calls through the UG in GENOTYPE_GIVEN_ALLELES mode" as suggested?

  • TechnicalVaultTechnicalVault Sanger, Cambridge, UKPosts: 86Member

    Yes, that seemed to be the most effective work around.

    Martin Pollard, Human Genetics Informatics - Wellcome Trust Sanger Institute

  • tommycarstensentommycarstensen United KingdomPosts: 153Member

    Martin and others. Just a warning. samtools1.1 (and UG) splits SNPs and INDELs in individual records. If you run your samtools calls through the UG in GENOTYPE_GIVEN_ALLELES mode, then you will get warnings like these:

    GenotypingGivenAllelesUtils - Multiple valid VCF records detected in the alleles input file at site $CHROM:$POS, only considering the first record
    

    I guess one needs to merge the records with bcftools merge -m both or similar first:

    http://samtools.github.io/bcftools/bcftools.html#merge
    

    CombineVariants used to do it according to this thread:

    http://gatkforums.broadinstitute.org/discussion/3375/combine-snp-and-indel-vcf
    

    In my case I think I'm just going to filter out the INDELs, since I only need those for the downstream analysis. UG emits SNPs and INDELs separately anyway, so it shouldn't be a problem, if a SNP is present at an INDEL site.

    I'm not sure what is going to happen, if the number of alleles exceed the default --max_alternate_alleles of 6.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,822Administrator, GATK Developer admin

    FYI all -- HaplotypeCaller also does GENOTYPE_GIVEN_ALLELES now. I'm hesitant to push it as the ultimate solution here, because we have noticed a few edge cases where HC in GGA mode doesn't do the right thing, but be aware that it exists.

    Do I understand correctly that the use case here is that you have calls from another caller (say, samtools) and the goal of this workaround is to be able to put those calls through VQSR?

    Geraldine Van der Auwera, PhD

  • tommycarstensentommycarstensen United KingdomPosts: 153Member

    @Geraldine_VdAuwera said: Do I understand correctly that the use case here is that you have calls from another caller (say, samtools) and the goal of this workaround is to be able to put those calls through VQSR?

    Yes, and it's straightforward for (biallelic) SNPs. I'm doing it for INDELs next week. Sorry for not responding sooner. I don't check all threads in which I have participated.

Sign In or Register to comment.