The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

#### ☞ Did you remember to?

1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

#### ☞ Did we ask for a bug report?

Then follow instructions in Article#1894.

#### ☞ Formatting tip!

Surround blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks (  ) each to make a code block.
Powered by Vanilla. Made with Bootstrap.
Picard 2.9.0 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

# Variant Annotator not annotating MappingQualityRankSumTest and ReadPosRankSumTest for INDELs

Cambridge, UKMember Posts: 111 ✭✭✭

Hi,

I'm attempting to use Variant Annotator to annotate some VCFs produced by samtools so I can run VQSR on them. Unfortunately I've gottent stuck and I'm trying to figure out why Variant Annotator wouldn't be annotating INDELs with MappingQualityRankSumTest and ReadPosRankSumTest, it seems to annotate SNPs fine. There are both Homs and het's called on the sample. Could it be I need to left align the indels to get enough coverage? What would you suggest is the best way to debug this? Is there a way to make GATK behave more verbosely about why it's refusing an annotation?

Thanks
Martin

Martin Pollard, Human Genetics Informatics - Wellcome Trust Sanger Institute and Genetic Epidemiology Group - WTSI & Cambridge University

Tagged:

## Answers

• Member Posts: 255 ✭✭✭

Not sure why they are not annotating, but in regards to making GATK more verbose, have you already added -l DEBUG to your command line?

• Cambridge, UKMember Posts: 111 ✭✭✭

Thanks for the suggestion, -l DEBUG does make matters more verbose, but it appears there's no debugging output from the annotator itself. I've now tried running left align then annotating, still flummoxed.

Martin Pollard, Human Genetics Informatics - Wellcome Trust Sanger Institute and Genetic Epidemiology Group - WTSI & Cambridge University

• Cambridge, UKMember Posts: 111 ✭✭✭
edited November 2012

Hmm also they seem to all be coming out with FS=0.000;HaplotypeScore=0.0000.

If it helps this is the command line used:

java -Xmx2000m -Xms2000m -server -XX:+UseSerialGC -jar /nfs/users/nfs_m/mercury/src/GenomeAnalysisTK-1.6/GenomeAnalysisTK.jar -T VariantAnnotator -R /lustre/scratch111/resources/vrpipe/ref/Homo_sapiens/1000Genomes/human_g1k_v37.fasta -I /lustre/scratch111/projects/helic/vcf_intermediate-2/lists/chr1-pooled.list --variant /lustre/scratch111/projects/helic/vcf_intermediate-2/pooled/1/1:14000001-15000000.samtools.vcf.gz -L 1:14000001-15000000 -A QualByDepth -A HaplotypeScore -A MappingQualityRankSumTest -A ReadPosRankSumTest -A FisherStrand -A InbreedingCoeff -A DepthOfCoverage -o /lustre/scratch111/projects/helic/vcf_intermediate-2/pooled/1/1:14000001-15000000.vcf.gz.part.vcf.gz


Martin Pollard, Human Genetics Informatics - Wellcome Trust Sanger Institute and Genetic Epidemiology Group - WTSI & Cambridge University

• Cambridge, UKMember Posts: 111 ✭✭✭
edited November 2012

My bad that was the command line from the older directory:

java -Xmx2500m -Xms2500m -server -XX:+UseSerialGC -jar /nfs/users/nfs_m/mercury/src/GenomeAnalysisTK-2.2/GenomeAnalysisTKLite.jar -T VariantAnnotator -R /lustre/scratch111/resources/vrpipe/ref/Homo_sapiens/1000Genomes/human_g1k_v37.fasta -I /lustre/scratch111/projects/helic/vcf_intermediate-3/lists/chr1-pooled.list --variant /lustre/scratch111/projects/helic/vcf_intermediate-3/pooled/1/1:14000001-15000000.samtools.vcf.gz -L 1:14000001-15000000 -U LENIENT_VCF_PROCESSING -A QualByDepth -A HaplotypeScore -A MappingQualityRankSumTest -A ReadPosRankSumTest -A FisherStrand -A InbreedingCoeff -A DepthOfCoverage -o /lustre/scratch111/projects/helic/vcf_intermediate-3/pooled/1/1:14000001-15000000.vcf.gz.part.vcf.gz


Rather than being 0 FS and HaplotypeScore are simply missing from the 2.2 annotated BAMs, similar to ReadPosRankSumTest and MappingQualityRankSumTest. Is the new functionality in the Lite GATK or just in Full?

Martin Pollard, Human Genetics Informatics - Wellcome Trust Sanger Institute and Genetic Epidemiology Group - WTSI & Cambridge University

• United KingdomMember Posts: 404 ✭✭✭

@TechnicalVault‌ I have the same problem for INDELs. What solution did you end up with? Did you "run your calls through the UG in GENOTYPE_GIVEN_ALLELES mode" as suggested?

• Cambridge, UKMember Posts: 111 ✭✭✭

Yes, that seemed to be the most effective work around.

Martin Pollard, Human Genetics Informatics - Wellcome Trust Sanger Institute and Genetic Epidemiology Group - WTSI & Cambridge University

• United KingdomMember Posts: 404 ✭✭✭

Martin and others. Just a warning. samtools1.1 (and UG) splits SNPs and INDELs in individual records. If you run your samtools calls through the UG in GENOTYPE_GIVEN_ALLELES mode, then you will get warnings like these:

GenotypingGivenAllelesUtils - Multiple valid VCF records detected in the alleles input file at site $CHROM:$POS, only considering the first record


I guess one needs to merge the records with bcftools merge -m both or similar first:

http://samtools.github.io/bcftools/bcftools.html#merge


CombineVariants used to do it according to this thread:

http://gatkforums.broadinstitute.org/discussion/3375/combine-snp-and-indel-vcf
`

In my case I think I'm just going to filter out the INDELs, since I only need those for the downstream analysis. UG emits SNPs and INDELs separately anyway, so it shouldn't be a problem, if a SNP is present at an INDEL site.

I'm not sure what is going to happen, if the number of alleles exceed the default --max_alternate_alleles of 6.

• Cambridge, MAMember, Administrator, Broadie Posts: 11,390 admin

FYI all -- HaplotypeCaller also does GENOTYPE_GIVEN_ALLELES now. I'm hesitant to push it as the ultimate solution here, because we have noticed a few edge cases where HC in GGA mode doesn't do the right thing, but be aware that it exists.

Do I understand correctly that the use case here is that you have calls from another caller (say, samtools) and the goal of this workaround is to be able to put those calls through VQSR?

Geraldine Van der Auwera, PhD

• United KingdomMember Posts: 404 ✭✭✭

@Geraldine_VdAuwera said:
Do I understand correctly that the use case here is that you have calls from another caller (say, samtools) and the goal of this workaround is to be able to put those calls through VQSR?

Yes, and it's straightforward for (biallelic) SNPs. I'm doing it for INDELs next week. Sorry for not responding sooner. I don't check all threads in which I have participated.

Sign In or Register to comment.