RevertSam per readgroup rather than using OUTPUT_BY_READGROUP

FPBarthelFPBarthel HoustonMember ✭✭

Hi GATK team,

I have been having some troubles implementing RevertSam with OUTPUT_BY_READGROUP set to true due to pipeline software constraints (see that issue here but unrelated to GATK). Suffice to say, I should run a seperate instance of RevertSam for each readgroup.

I tried running the following:

samtools view -br {READGROUP_ID} {SAMPLE_BAM} | gatk RevertSam --INPUT=/dev/stdin --OUTPUT={OUTPUT}

And this seems to work fine (despite being a tad slow). Is there any reason why I should not do it this way?


Best Answer

  • shleeshlee Cambridge ✭✭✭✭✭
    Accepted Answer

    Hi @FPBarthel,

    If this streaming works for you, then great. Thanks for sharing this solution. I suppose the reason to try to find an alternative is already mentioned by you:

    [...] a tad slow

    Unrelatedly, I see that you're pipelining with Snakemake, which I heard about recently. How are you finding it? Have you tried WDL?


