Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

confusing error message ; complaint of missing header columns that aren't missing?

esalinasesalinas BroadMember, Broadie ✭✭✭

Hi,

I pass to PadTargets an interval_list file, and it complains of missing columns....but the values are on the first line.

Do you by chance notice anything wrong with what I'm doing? Note the semi-recent commit 3a2bb0d from "gatk-protected"

It complains of something missing that looks like it's not missing so as a result I'm confused....

thanks

-eddie

[email protected]:/opt/data/cromwell-executions/pon_gatk_workflow/31cc0bd9-71d3-4270-9be2-06143e7c0a88/call-PadTargets/execution# java -Xmx1g -jar /opt/data/cromwell-executions/pon_gatk_workflow/31cc0bd9-71d3-4270-9be2-06143e7c0a88/call-PadTargets/inputs/opt/data/gatk-protected-all-3a2bb0d-SNAPSHOT-spark_standalone.jar PadTargets --targets agilent.tsv   --output agilent.padded.tsv
19:45:28.231 INFO  IntelGKLUtils - Trying to load Intel GKL library from:
    jar:file:/mnt/opt/data/cromwell-executions/pon_gatk_workflow/31cc0bd9-71d3-4270-9be2-06143e7c0a88/call-PadTargets/inputs/opt/data/gatk-protected-all-3a2bb0d-SNAPSHOT-spark_standalone.jar!/com/intel/gkl/native/libIntelGKL.so
19:45:28.302 INFO  IntelGKLUtils - Intel GKL library loaded from classpath.
[December 2, 2016 7:45:28 PM UTC] org.broadinstitute.hellbender.tools.exome.PadTargets  --targets agilent.tsv --output agilent.padded.tsv  --padding 250 --help false --version false --verbosity INFO --QUIET false --use_jdk_deflater false
[December 2, 2016 7:45:28 PM UTC] Executing as [email protected] on Linux 3.16.0-0.bpo.4-amd64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_101-b13; Version: Version:3a2bb0d-SNAPSHOT
19:45:28.308 INFO  PadTargets - Defaults.BUFFER_SIZE : 131072
19:45:28.309 INFO  PadTargets - Defaults.COMPRESSION_LEVEL : 5
19:45:28.309 INFO  PadTargets - Defaults.CREATE_INDEX : false
19:45:28.309 INFO  PadTargets - Defaults.CREATE_MD5 : false
19:45:28.309 INFO  PadTargets - Defaults.CUSTOM_READER_FACTORY : 
19:45:28.309 INFO  PadTargets - Defaults.EBI_REFERENCE_SERVICE_URL_MASK : http://www.ebi.ac.uk/ena/cram/md5/%s
19:45:28.309 INFO  PadTargets - Defaults.NON_ZERO_BUFFER_SIZE : 131072
19:45:28.309 INFO  PadTargets - Defaults.REFERENCE_FASTA : null
19:45:28.309 INFO  PadTargets - Defaults.SAM_FLAG_FIELD_FORMAT : DECIMAL
19:45:28.309 INFO  PadTargets - Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
19:45:28.309 INFO  PadTargets - Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : false
19:45:28.310 INFO  PadTargets - Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
19:45:28.310 INFO  PadTargets - Defaults.USE_CRAM_REF_DOWNLOAD : false
19:45:28.310 INFO  PadTargets - Deflater IntelDeflater
19:45:28.310 INFO  PadTargets - Initializing engine
19:45:28.310 INFO  PadTargets - Done initializing engine
19:45:28.312 INFO  TargetTableReader - Reading targets from '/mnt/opt/data/cromwell-executions/pon_gatk_workflow/31cc0bd9-71d3-4270-9be2-06143e7c0a88/call-PadTargets/execution/agilent.tsv' ...
19:45:28.333 INFO  PadTargets - Shutting down engine
[December 2, 2016 7:45:28 PM UTC] org.broadinstitute.hellbender.tools.exome.PadTargets done. Elapsed time: 0.00 minutes.
Runtime.totalMemory()=230686720
***********************************************************************

A USER ERROR has occurred: Bad input: format error at line 1: Bad header in file.  Not all mandatory columns are present.  Missing: VN:1.0, @HD, SO:coordinate

***********************************************************************
[email protected]:/opt/data/cromwell-executions/pon_gatk_workflow/31cc0bd9-71d3-4270-9be2-06143e7c0a88/call-PadTargets/execution# head -3 agilent.tsv 
@HD VN:1.0  SO:coordinate
@SQ SN:1    LN:249250621    AS:GRCh37   UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homo_sapiens_assembly19.fasta   M5:1b22b98cdeb4a9304cb5d48026a85128 SP:Homo Sapiens
@SQ SN:2    LN:243199373    AS:GRCh37   UR:http://www.broadinstitute.org/ftp/pub/seq/references/Homo_sapiens_assembly19.fasta   M5:a0d9851da00400dec1098a9255ac712e SP:Homo Sapiens
[email protected]:/opt/data/cromwell-executions/pon_gatk_workflow/31cc0bd9-71d3-4270-9be2-06143e7c0a88/call-PadTargets/execution# head -1 agilent.tsv |cut -f1
@HD
[email protected]:/opt/data/cromwell-executions/pon_gatk_workflow/31cc0bd9-71d3-4270-9be2-06143e7c0a88/call-PadTargets/execution# head -1 agilent.tsv |cut -f2
VN:1.0
[email protected]:/opt/data/cromwell-executions/pon_gatk_workflow/31cc0bd9-71d3-4270-9be2-06143e7c0a88/call-PadTargets/execution# head -1 agilent.tsv |cut -f3
SO:coordinate
[email protected]:/opt/data/cromwell-executions/pon_gatk_workflow/31cc0bd9-71d3-4270-9be2-06143e7c0a88/call-PadTargets/execution# 

Issue · Github
by Sheila

Issue Number
1507
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
vdauwera

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @esalinas
    Hi Eddie,

    I am checking with the team. Someone will get back to you soon.

    -Sheila

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi Eddie, that looks like it could be a bug. Can you tell me where you got the version of GATK4 protected that you're running? Can you try the package made available here and see if it reproduces the issue?

  • ramaniakramaniak TOrontoMember

    Hello,
    I am seeing the same error even though the header is present.

    Runtime.totalMemory()=1553989632
    ***********************************************************************
    
    A USER ERROR has occurred: Bad input: format error at line 1: Bad header in file.  Not all mandatory columns are present.  Missing: VN:1.4, SO:unsorted, @HD
    
    ******************************************************************** 
    

    I tried the alternate version in your link as well as the version from ((https://github.com/broadinstitute/gatk-protected/releases/tag/1.0.0.0-alpha1.2.3) and see the same error.

    I have also tried an .interval_list, .bed file and a .list file for good measure.

    thanks
    Arun

  • valentinvalentin Cambridge, MAMember, Dev ✭✭
    edited March 2017

    I think that there is a bug in the error message... the listed missing column names are wrong and in fact is showing the "dubious column" names found in the input. That must be addressed.

    However the root cause is that the input file agilent.tsv does not follow the expected format:

    e.g.:

    name <tab> contig <tab> start <tab> stop
    target_1 <tab> chr1 <tab> 1001 <tab> 1101 
    target_2 <tab> chr1 <tab> 3136 <tab> 3912
    ...
    

    Just to be clear " <tab> " represents a single tab character.

Sign In or Register to comment.