Bug Bulletin: we have identified a bug that affects indexing when producing gzipped VCFs. This will be fixed in the upcoming 3.2 release; in the meantime you need to reindex gzipped VCFs using Tabix.

Pileup format

arshiarshi Posts: 7Member
edited August 2012 in Ask the team

I am running Pileup with the verbose option. I have two questions regarding it. (1) Why are all the value in the mapping quality column 0 ? (2)There is another column, not mentioned in the description of pileup, separated by '@'. What does this column mean ?

11 86988 A A D 0 C37@931@1036@0 11 86989 G G D 0 C37@932@1036@0 11 86990 T T B 0 C37@933@1036@0 11 86991 G G D 0 C37@934@1036@0 11 86992 A A B 0 C37@935@1036@0 11 86993 C C C 0 C37@936@1036@0 11 86994 C CCC D=A 0 C37@937@1036@0,38@0@100@0,39@0@100@0

Thanks, Arshi

Tagged:

Best Answer

  • ebanksebanks Posts: 671 mod
    Answer ✓

    The mapping quality isn't emitted by default, so you can't be seeing them at all with that command line. I think perhaps you are seeing 0s because there are no RODs (e.g. VCFs) being input.

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

Answers

  • ebanksebanks Posts: 671GSA Member mod

    Thanks for reporting this. I'm just about to add documentation for the verbose output. Here's what it will say: In addition to the standard pileup output, adds 'verbose' output too. The verbose output contains the number of spanning deletions, and for each read in the pileup it has the read name, offset in the base string, read length, and read mapping quality. These per read items are delimited with an '@' character.

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

  • arshiarshi Posts: 7Member

    Thanks ! Could you also help me with the issue where I am getting all 0 mapping qualities ?. I have checked my data in IGV, and very few of my reads have 0 mapping qualities. I get the correct phred quality scores, though. This is how I am running Pileup, java -Xmx8g -path/toGATK.jar/ \ -T Pileup \ -R path/toGATK/resources/hg19.fa \ -I a.bam \ -o a.pileup

    Thanks

  • arshiarshi Posts: 7Member

    Thanks a lot Eric !. The INDEL is a great option in pileup. I am also trying to get all the INDELS and SNPs through UnifiedGenotyper (-glm BOTH). Is there a way that GATK can output the number of Indels at each position. Similar to a pileup format ?. I am interested in both known and predictive INDELS and their count. Perhaps I can use the .vcf file from UnifiedGenotyper ?.

    Thanks, Arshi

  • arshiarshi Posts: 7Member

    Just to be a littel clear, I tried the --metadata option in Pileup and used the 1000G_indel.vcf file as RODs.

  • ebanksebanks Posts: 671GSA Member mod

    Hmm, no I don't think you can do what you want with the GATK right now.

    Eric Banks, PhD -- Senior Group Leader, MPG Analysis, Broad Institute of Harvard and MIT

  • arshiarshi Posts: 7Member

    Ok. Thanks for your quick reply !

Sign In or Register to comment.