The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Did you remember to?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Surround blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block.
Powered by Vanilla. Made with Bootstrap.
Picard 2.9.0 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

indels in Pileup output

akiezunakiezun Cambridge, MAMember Posts: 22

Hi, to get to indel information in pileup output (eg number and lengths of deletions spanning the position), do I have to run the verbose option? I'm asking because the verbose output produces all that per-read info that I don't need right now - just need the indels. Samtools pileup includes indel info in the standard output and like any other base.

./adam

Tagged:

Answers

  • ebanksebanks Broad InstituteMember, Broadie, Dev Posts: 692 admin

    Hey Adam,
    The standard output should give you deletions in the pileup (using the 'D' character).
    The verbose output will additionally give you the total number of spanning deletions (plus all that other info you don't need).
    But there's currently no way to get the length of deletions.
    This would be fairly straightforward to implement though, so if you want to give it a shot we'd be thrilled to incorporate a patch from you... :)

    Eric Banks, PhD -- Director, Data Sciences and Data Engineering, Broad Institute of Harvard and MIT

  • akiezunakiezun Cambridge, MAMember Posts: 22

    Thanks Eric. Do I need anything special to get this to work, I don't see any indication of indels in the output (samtools reports an indel though)? I'm on v2.7-2-g701cd16.
    Here's what I get from samtools mpileup:

    13  32920967    N   137 C$CCCCC-4NNNNC-4NNNNCCCCCC-4NNNNC-4NNNNCCCC-4NNNNCC-4NNNNC-4NNNNccC-4NNNNC-4NNNNC-4NNNNC-4NNNNCC-4NNNNcCcCCC-4NNNNC-4NNNNCCCc-4nnnnC-4NNNNCcCC-4NNNNCc-4nnnnCcC-4NNNNC-4NNNNC-4NNNNC-4NNNNc-4nnnnc-4nnnncc-4nnnnc-4nnnnCCcc-4nnnnc-4nnnnCc-4nnnnCcC-4NNNNCC-4NNNNcC-4NNNNCcCCc-4nnnncC-4NNNNccc-4nnnnCc-4nnnncc-4nnnnC-4NNNNCC-4NNNNC-4NNNNCc-4nnnncCc-4nnnnC-4NNNNcccC-4NNNNC-4NNNNcC-4NNNNc-4nnnnccc-4nnnnC-4NNNNccC-4NNNNCCc-4nnnnc-4nnnncCCCc-4nnnnCC-4NNNNcC-4NNNNC-4NNNNcCc-4nnnnc-4nnnnC-4NNNNCC-4NNNNcc-4nnnnCcC-4NNNN  CBDCEEHIIEHFFHFGIFHIIIIIDGHIIDH>FFIGGIHIIIIHIIIIIAHIIIIIIIIIIIIIIIIIIFIHIIIIIIHGIIHIIIHIHHAIIHII6IIHHIHIFIIHIIHBHIIDHHHIHAGGIIIIIIGGHGFFE
    13  32920968    N   136 AAAA**AAAAA**AAA*A**aa****A*aAaAA**AAA**AaA*A*Aa******a**AAa**A*Aa*A*a*AaAA*a*aa*A*a**A**A*aA**aaa**a**aa**aa*AA**aAAA*A*a**aA***A*a*Aa*    EEDECFHGGIGGGGGHEIEGGIHIIEGEIIFGHIIIFGIIGBGIIGGIBGGFIGIIGIIIGGIII@GIGGGIIFIGGGGFIGIGGGIGG>GEGGI?IEGGIFIBIEFIHGIGIG6GCGEGEIFHDHFEHFFFCECF
    13  32920969    N   138 A$A$A$A$**AAAAA**AAA*A**aa****A*aAaAA**AAA**AaA*A*Aa******a**AAa**A*Aa*A*a*AaAA*a*aa*A*a**A**A$*aA**aaa**a**aa**aa*AA**aAAA*A*a**aA***A*a*Aa*^]a^]a ECDECFFIGIFGGGGHEIEGEHHIIEHEHIEHFIIGIGIIHIGIIGEIBGGFIGIIGIIIGGIIIIGIGFGIIIIGFGFIIGIFGGIGGAGFGGIAIFGGIFIAIEFFFGEGIG@GGGEGEIFHIIFEHFFECGEFA>
    13  32920970    N   134 **TTTTT**TTT*T**tt****T*tTtTT**TTT**TtT*T*Tt******t**TTt**T*Tt*T*t*TtTT*t*tt*T*t**T***tT**ttt**t**tt**tt*TT**tTTT*T*t**tT***T*t*Tt*tt^]t    CFEGEHCGGFEFEIEGEHHIIEFEHIDFFIIIIFIIFHFIIGFHBGGFIGHIGHHHGGHIHIGDGFGHIHHGDGFIIEIFGGHGGGFFGICHDGGIFIBIEFIFGCEIGCEEEEEEIFHIGFEHEFFCFBFB>@
    13  32920971    N   134 **AAAAA**AAA*A**aa****A*aAaAA**AAA**AaA*A*Aa******a**AAa**A*Aa*A*a*AaAA*a*aa*A*a**A***aA**aaa**a**aa**aa*AA**aAAA*A*a**aA***A*a*Aa*aaa  CFFGEDBGGFFDEIEGGIHIIEFEIDGFFIIHIEIIFIEIHGFIBGGFIGIIGHHIGGHIHIGHGGGHIDHGGGGFIEIGGGHGGGGEGI@IGGGIFIAIEFIGGHFIGGFAEEFEIFHIGFEHDFECFDFC7B
    

    and this is from GATK Pileup:

        13 32920967 C CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC CBDCEEHIIEHFFHFGIFHIIIIIDGHIIDH>FFIGGIHIIIIHIIIIIAHIIIIIIIIIIIIIIIIIIFIHIIIIIIHGIIHIIIHIHHAIIHII6IIHHIHIFIIHIIHBHIIDHHHIHAGGIIIIIIGGHGFFE
        13 32920968 A AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA EEDEHGGIGGGHIGIGIIFGHIFGGBGIGIIIIIII@IGIIFIGGFGGI>EG?IEIBIIHIG6GCGGIDHFFEC
        13 32920969 A AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA ECDEFIGIFGGHIEHHHIEHFGIGHIGIEIIIIIIIIIFIIIIFFIGFIAFGAIFIAIFFEG@GGGGIIIFEGEA>
        13 32920970 T TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT EGEHCFEFIEHFHIDFFIIFFHFIFHHHHHHHIDFHIHHDFIEFHFFCHDIBIIFCECEEEEIIGEFFBB>@
        13 32920971 A AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA FGEDBFFDIGIFIDGFFHIEFIEHFIIHHIHHIHGHIDHGGFEGHGE@IGIAIIGHFGFAEFIIGDEFDC7B
    
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie Posts: 11,388 admin

    Hmm, weird. Have you tried running on another region that has a clear indel, to see if all indels are being mishandled or just this one? To be honest, this might be a bug, but unfortunately we're all up to our ears in work and fixing this won't be a priority...

    Geraldine Van der Auwera, PhD

  • akiezunakiezun Cambridge, MAMember Posts: 22

    Understood.

    This indel is pretty clean. For your convenience I cut a small region that shows the same result. On the Broad servers it's here; /xchip/cga_home/akiezun/forSharing/debugPileupOnIndels

    If you do confirm it is a bug, I will take a shot at fixing it, but I do want to make sure I'm not missing something trivial like an option or something.

    thanks Geraldine

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie Posts: 11,388 admin

    Nope, there are no other options, so I'd call it a bug. FYI we have another pileup-related tool called CheckPileup that compares the output from our Pileup to the samtools one. I'm not sure but maybe that could be helpful for debugging? In any case, would be awesome to get a fix from you.

    Geraldine Van der Auwera, PhD

  • akiezunakiezun Cambridge, MAMember Posts: 22

    Thanks. I got a bizarre error when running CheckPileup
    the SAM pileup line didn't have the expected number of tokens (expected = 10, saw = 6 on line ...

    this is strange because the sam pileup format is in fact 6 columns, per GATK docs: http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_utils_codecs_sampileup_SAMPileupCodec.html
    and per samtools docs:
    http://samtools.sourceforge.net/pileup.shtml

    I'm puzzled. I'm missing something super trivial.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie Posts: 11,388 admin

    Hmm, it's calling the SAMPileupCodec which does specify it wants 10 tokens:

    public class SAMPileupCodec extends AsciiFeatureCodec<SAMPileupFeature> {
        // the number of tokens we expect to parse from a pileup line
        private static final int expectedTokenCount = 10;
    

    I wonder if this is an old codec that hasn't been updated in a while. I'll ask GSA if they know what's going on here.

    Geraldine Van der Auwera, PhD

  • KurtKurt Member Posts: 255 ✭✭✭

    There always was some confusion as to what the definition of the format of pileup was from samtools. There was the simple pileup which is 6 columns and there was the "consensus" or extended pileup that a lot of people called pileup that was 10 columns (It was the original 6 plus 4 extra columns, which most people used pre-vcf)...I forget what those extra 4 columns were...

  • akiezunakiezun Cambridge, MAMember Posts: 22

    OK, I think I know what may be going on. The old samtools command, pileup (now deprecated), used to have a flag -c which generated a consensus sequence and had 10 columns in the output. The doc is wrong then because it exploicity shows 6 columns: http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_utils_codecs_sampileup_SAMPileupCodec.html

    With that, I was able to validate the pileup with the checker. I'm still missing the indels though - I guess the checker must just actively ignore them because they're right there in the pileup output. Will try to fix this.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie Posts: 11,388 admin

    Ah, that makes sense, thanks. I'll fix the docs accordingly.

    Geraldine Van der Auwera, PhD

  • akiezunakiezun Cambridge, MAMember Posts: 22

    Hmm, Eric says that the output should contain indels but the Pileup walker inherits the default implementation of org.broadinstitute.sting.gatk.walkers.Walker.includeReadsWithDeletionAtLoci() which returns false so indels are not even seen by the walker in the pileup so it seems to have no chance of including indels right now.

  • KatieKatie United StatesMember Posts: 28

    Hi, Is there a way to generate a GATK pileup while excluding reads falling below a minimum mapping quality and base quality? I am essentially wondering the equivalent of samtools mpileup -q and -Q options. Thank you!

Sign In or Register to comment.