Trying to get a question in before the shutdown. Is everything shutting down in the US.... :)

I am trying to do a dcov to say 100X coverage (note that I want a fixed depth, not a fraction downsampling) and get the resulting reads in a BAM file. Getting the reads resulting from the dcov in the output file is the key thing here.

I have tried this with both PrintReads (which seemed the obvious choice) and with IndelRealigner but both seem to ignore the dcov argument passed to the GATK engine and write the whole of the input BAM to the output BAM.

Does anyone know which tool could be used to carry out this task?


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi Tim,

    That's weird, can you post your command line? Using PrintReads with dcov is exactly what you should do...

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    /facepalm Thanks for jumping in, @droazen. My brain is having its own little Shutdown/Outage today.

    Tim, there's your answer...

  • TimHughesTimHughes Member

    Thanks for the answer. As you say, unfortunate :)

    But, just as a note: I did have several hundred reads starting at the same positions (this is a haloplex dataset where you get big "blocks" of amplicons). So, although I would not have achieved my goal of gettting a 100X target loci coverage, given what @droazen says about read walkers, I would have thought I would have gotten less reads in my output BAM than in the input BAM. But I did not: same number of reads in input and output.

  • droazendroazen Cambridge, MAMember, Broadie, Dev ✭✭

    I should also have mentioned that the downsampling is done per-sample. So, as an example, PrintReads with -dcov 10 means "no more than 10 reads per alignment start position per sample".

    As a test, I'd suggest that you try running PrintReads with -dcov 1, and see if you ever get more than one read with the same alignment start for a single sample.

