UnifiedGenotyper missed indels calling on 100 base pair reads

malwanamalwana Posts: 7Member

Hi, I'm using The Genome Analysis Toolkit (GATK) v2.3-9-ge5ebf34

I was running the UnifiedGenotyper tools on a 100bp paired end read with the the command below:

Program Args: -T UnifiedGenotyper -R ref.fasta -I realignedBwa.bam -glm BOTH -log variantsCalling.log -o bwa_variants.vcf

I noticed that only SNPs were called and not a single indel was called. however, when i used a 150bp paired end read with the similar command above it worked fine, the SNPs and Indels were called.

To make sure that the realignedBWA.bam file that i used was not corrupt/malformed, i used two different other pileup program on this bam file and the SNPs and Indels were called nicely. Hope you can take a look at this. I attached together the log file just in case

thanks in advance

log
log
variantsCalling.log
18K

Best Answers

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,366Administrator, GATK Developer admin

    Hi there,

    Can you tell me what happens if you set the -glm argument to INDEL?

    Geraldine Van der Auwera, PhD

  • malwanamalwana Posts: 7Member

    If the -glm argument is set up to INDEL, nothing is called up except for the 150bp which called up the indel

    while if i set up the argument to BOTH as well as SNP, only the SNPs were called.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,366Administrator, GATK Developer admin

    Hmm. Try using the --output_mode EMIT_ALL_SITES argument and see if the indels get called and with what quality scores.

    Geraldine Van der Auwera, PhD

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,366Administrator, GATK Developer admin

    Oh, scratch that, that option doesn't work for indels.

    Have you tried calling it with the HaplotypeCaller? It is more advanced than the UG.

    Geraldine Van der Auwera, PhD

  • malwanamalwana Posts: 7Member

    I've been trying the HaplotypeCaller, and i'm not sure if it's my procedure/methods are wrong, but not a single SNPs or indel was called. the only output are the progress meter and no calling (for both 100bp and 150bp)

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,366Administrator, GATK Developer admin

    That's very odd. What kind of dataset are you running on?

    Geraldine Van der Auwera, PhD

  • malwanamalwana Posts: 7Member

    I simulated mutation read from a reference genome using DWGSIM, which produced reads (used the parameter to generate the read as if coming from illumina) which I align using several aligner including BWA..

  • malwanamalwana Posts: 7Member

    Sorry for the long delay, it's holiday here.. I'll take a quick look and reply back, thanks a lot for helping out

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,366Administrator, GATK Developer admin

    No worries, enjoy your holiday!

    Geraldine Van der Auwera, PhD

  • malwanamalwana Posts: 7Member

    The base qualities are good enough, higher than the default minimum of 17 needed by the UG. A check on BQSR.log showed only 253 reads were filtered out from 20257 (1.25%) and samtools flagstat showed a 100% QC-passed reads.

    Should I try to change any parameters involving the minimum base qualities because as for now i'm using the default.

  • malwanamalwana Posts: 7Member

    ok, my bad.. it turns out the DP for indels is lower than the minimum thus making the indels being ommited.

    Thanks for helping out pdexheimer and Geraldine.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 7,366Administrator, GATK Developer admin

    Glad to hear your problem is solved.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.