We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Pileup format

arshiarshi Member
edited August 2012 in Ask the GATK team

I am running Pileup with the verbose option. I have two questions regarding it.
(1) Why are all the value in the mapping quality column 0 ?
(2)There is another column, not mentioned in the description of pileup, separated by '@'. What does this column mean ?

11 86988 A A D 0 [email protected]@[email protected]
11 86989 G G D 0 [email protected]@[email protected]
11 86990 T T B 0 [email protected]@[email protected]
11 86991 G G D 0 [email protected]@[email protected]
11 86992 A A B 0 [email protected]@[email protected]
11 86993 C C C 0 [email protected]@[email protected]
11 86994 C CCC D=A 0 [email protected]@[email protected],[email protected]@[email protected],[email protected]@[email protected]

Thanks,
Arshi

Tagged:

Best Answer

  • ebanksebanks Broad Institute ✭✭✭✭
    Accepted Answer

    The mapping quality isn't emitted by default, so you can't be seeing them at all with that command line. I think perhaps you are seeing 0s because there are no RODs (e.g. VCFs) being input.

Answers

  • ebanksebanks Broad InstituteMember, Broadie, Dev ✭✭✭✭

    Thanks for reporting this. I'm just about to add documentation for the verbose output. Here's what it will say:
    In addition to the standard pileup output, adds 'verbose' output too. The verbose output contains the number of spanning deletions, and for each read in the pileup it has the read name, offset in the base string, read length, and read mapping quality. These per read items are delimited with an '@' character.

  • arshiarshi Member

    Thanks !
    Could you also help me with the issue where I am getting all 0 mapping qualities ?. I have checked my data in IGV, and very few of my reads have 0 mapping qualities.
    I get the correct phred quality scores, though.
    This is how I am running Pileup,
    java -Xmx8g -path/toGATK.jar/ \
    -T Pileup \
    -R path/toGATK/resources/hg19.fa \
    -I a.bam \
    -o a.pileup

    Thanks

  • ebanksebanks Broad InstituteMember, Broadie, Dev ✭✭✭✭
    Accepted Answer

    The mapping quality isn't emitted by default, so you can't be seeing them at all with that command line. I think perhaps you are seeing 0s because there are no RODs (e.g. VCFs) being input.

  • arshiarshi Member

    Thanks a lot Eric !. The INDEL is a great option in pileup. I am also trying to get all the INDELS and SNPs through UnifiedGenotyper (-glm BOTH). Is there a way that GATK can output the number of Indels at each position. Similar to a pileup format ?. I am interested in both known and predictive INDELS and their count.
    Perhaps I can use the .vcf file from UnifiedGenotyper ?.

    Thanks,
    Arshi

  • arshiarshi Member

    Just to be a littel clear, I tried the --metadata option in Pileup and used the 1000G_indel.vcf file as RODs.

  • ebanksebanks Broad InstituteMember, Broadie, Dev ✭✭✭✭

    Hmm, no I don't think you can do what you want with the GATK right now.

  • arshiarshi Member

    Ok. Thanks for your quick reply !

Sign In or Register to comment.