Holiday Notice:
The Frontline Support team will be slow to respond December 17-18 due to an institute-wide retreat and offline December 22- January 1, while the institute is closed. Thank you for your patience during these next few weeks. Happy Holidays!

Bug in HaplotypeCaller: Lines meant to STDERR go to STDOUT

I use HaplotypeCaller v. 3.6-0-g89b7209. I want to compress the output before writing it to the disk and use command like this:

gatk -T HaplotypeCaller ... -o /dev/stdout | grep -v -e --- -e WARN | bgzip -s - > output.gvcf.gz

The grep command is needed to get around a bug that sends lines meant to STDERR to STDOUT. This is similar to a bug I reported in v. 3.3:
http://gatkforums.broadinstitute.org/gatk/discussion/4947/bug-pairhmm-outputs-to-stdout-instead-of-stderr#latest

Answers

  • valentinvalentin Cambridge, MAMember, Dev ✭✭
    edited October 2016

    Are you saying that that bug was not fixed, you have the same kind misplaced messages?, or are you messages different? ... based on your grep exclusion pattern seems to be the latter. We would need to know what are those messages to figure out the source of the problem so please post them.

    In the mean time I think you should try to simply output into a file with a name finished with .vcf.gz; that may produce the compression that you want with a tabix index.

    If you want to have more control you can either output it into a uncompressed vcf and compressed it after or use a named pipe like so:

    mkfifo temp-pipe
    bgzip -s < temp-pipe > myoutput.vcf.gz & 
    gatk -T HaplotypeCaller ... -o temp-pipe
    wait # for bgzip to complete.
    rm temp-pipe
    

    The name piped will avoid the need for a large temporal vcf which I think is what you are trying to avoid. I believe named pipes don't go beyond a fixed size, I think it was 64KB back in the day a had to dealt with them in the mid 90s ... wow! time flies.

    Post edited by valentin on
  • Hi,
    This is a new bug that is similar to the earlier one: both write stuff meant to STDERR to STDOUT. The new one adds a few lines in the end of the file. They can look like this:

    ------------------------------------------------------------------------------------------
    Done. There were 5 WARN messages, the first 5 are repeated below.
    WARN  08:55:57,428 GATKVCFUtils - Naming your output file using the .g.vcf extension will automatically set the appropriate value
    s  for --variant_index_type and --variant_index_parameter
    WARN  08:55:57,803 InbreedingCoeff - Annotation will not be calculated. InbreedingCoeff requires at least 10 unrelated samples.
    WARN  08:55:58,798 HaplotypeScore - Annotation will not be calculated, must be called from UnifiedGenotyper
    WARN  08:56:21,180 AnnotationUtils - Annotation will not be calculated, genotype is not called
    WARN  08:56:21,181 AnnotationUtils - Annotation will not be calculated, genotype is not called
    ------------------------------------------------------------------------------------------
    

    Regards, Ari

  • valentinvalentin Cambridge, MAMember, Dev ✭✭

    Thanks @ariloytynoja, that was very helpful. I can see what is wrong now.
    We will take care of it.
    Are any of the workaround above good enough for you to progress further for now?

Sign In or Register to comment.