Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

The SVPreprocess error about Invalid sequence position

xiayanokxiayanok chinaMember
edited May 2015 in GenomeSTRiP

Hi Bob and all,

When I run my SVpreprocessing I get an error saying:
I NFO 13:58:01,433 HelpFormatter - ----------------------------------------------------------
INFO 13:58:01,435 HelpFormatter - Program Name: org.broadinstitute.sv.apps.ComputeGCProfiles
INFO 13:58:01,441 HelpFormatter - Program Args: -O /disk/disk1/work/xiayan/bin/svtoolkit/sorghum_data/metadata/gcprofile/reference.gcprof.zip -R /disk/disk1/work/xiayan/bin/svtoolkit/
sorghum_data/data/Sbicolor_v2.1_255_cop.fasta -md /disk/disk1/work/xiayan/bin/svtoolkit/sorghum_data/metadata -writeReferenceProfile true -genomeMaskFile /disk/disk1/work/xiayan/bin/sv
toolkit/sorghum_data/data/Sbicolor_v2.1_255_cop.svmask.fasta -copyNumberMaskFile /disk/disk1/work/xiayan/bin/svtoolkit/sorghum_data/data/Sbicolor_v2.1_255_cop.gcmask.fasta -configFile
/disk/disk1/work/xiayan/bin/svtoolkit/conf/genstrip_parameters.txt
INFO 13:58:01,447 HelpFormatter - Executing as [email protected] on Linux 2.6.32-431.3.1.el6.x86_64 amd64; OpenJDK 64-Bit Server VM 1.7.0_45-mockbuild_2013_11_22_18_30-b00.

INFO  13:58:01,448 HelpFormatter - Date/Time: 2015/05/08 13:58:01 
INFO  13:58:01,448 HelpFormatter - ---------------------------------------------------------- 
INFO  13:58:01,448 HelpFormatter - ---------------------------------------------------------- 
INFO  13:58:01,470 MetaData - Opening metadata ...  
INFO  13:58:01,470 MetaData - Adding metadata directory /disk/disk1/work/xiayan/bin/svtoolkit/sorghum_data/metadata ... 
INFO  13:58:01,471 MetaData - Opened metadata. 
INFO  13:58:01,471 ComputeGCProfiles - Opening reference sequence ... 
INFO  13:58:01,472 ComputeGCProfiles - Opened reference sequence. 
INFO  13:58:01,472 ComputeGCProfiles - Opening genome mask ... 
INFO  13:58:01,473 ComputeGCProfiles - Opened genome mask. 
INFO  13:58:01,474 ComputeGCProfiles - Opening copy number mask ... 
INFO  13:58:01,474 ComputeGCProfiles - Opened copy number mask. 
INFO  13:58:01,475 ComputeGCProfiles - Initializing algorithm ... 
#INFO: ReadCountAlgorithm: detected metadata version 1, forcing legacy behavior
INFO  13:58:01,479 ComputeGCProfiles - Algorithm initialized. 
INFO  13:58:01,480 ComputeGCProfiles - Computing reference GC profile ... 
#DBG: Fri May 08 13:58:01 CST 2015 computing reference profile for the interval Chr01:1-73727935
#DBG: Fri May 08 13:58:14 CST 2015 computing reference profile for the interval Chr02:1-77694824
#DBG: Fri May 08 13:58:27 CST 2015 computing reference profile for the interval Chr03:1-74408397
#DBG: Fri May 08 13:58:42 CST 2015 computing reference profile for the interval Chr04:1-67966759
#DBG: Fri May 08 13:58:54 CST 2015 computing reference profile for the interval Chr05:1-62243505
#DBG: Fri May 08 13:59:04 CST 2015 computing reference profile for the interval Chr06:1-62192017
#DBG: Fri May 08 13:59:15 CST 2015 computing reference profile for the interval Chr07:1-64263908
#DBG: Fri May 08 13:59:26 CST 2015 computing reference profile for the interval Chr08:1-55354556
#DBG: Fri May 08 13:59:36 CST 2015 computing reference profile for the interval Chr09:1-59454246
#DBG: Fri May 08 13:59:46 CST 2015 computing reference profile for the interval Chr10:1-61085274
#DBG: Fri May 08 13:59:56 CST 2015 computing reference profile for the interval super_10:1-8818317
Exception in thread "main" java.lang.RuntimeException: Invalid sequence position: super_10:746
    at org.broadinstitute.sv.commandline.CommandLineProgram.execute(CommandLineProgram.java:61)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
    at org.broadinstitute.sv.commandline.CommandLineProgram.run(CommandLineProgram.java:25)
    at org.broadinstitute.sv.apps.ComputeGCProfiles.main(ComputeGCProfiles.java:125)
Caused by: java.lang.IllegalArgumentException: Invalid sequence position: super_10:746
    at org.broadinstitute.sv.mask.GenomeMaskFastaFile.getMaskBit(GenomeMaskFastaFile.java:80)
    at org.broadinstitute.sv.metadata.gc.GCProfileAlgorithm.computeIntervalProfile(GCProfileAlgorithm.java:338)
    at org.broadinstitute.sv.metadata.gc.GCProfileAlgorithm.computeReferenceProfileMap(GCProfileAlgorithm.java:225)
    at org.broadinstitute.sv.metadata.gc.GCProfileAlgorithm.computeReferenceProfile(GCProfileAlgorithm.java:181)
    at org.broadinstitute.sv.apps.ComputeGCProfiles.run(ComputeGCProfiles.java:172)
    at org.broadinstitute.sv.commandline.CommandLineProgram.execute(CommandLineProgram.java:50)
    ... 4 more

And the error is in the SVPreprocess-5.out.

my command is:

classpath="${SV_DIR}/lib/SVToolkit.jar:${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar:${SV_DIR}/lib/gatk/Queue.jar"
java -Xmx30g -cp ${classpath} \
org.broadinstitute.gatk.queue.QCommandLine \
-S ${SV_DIR}/qscript/SVPreprocess.q \
-S ${SV_DIR}/qscript/SVQScript.q \
-cp ${classpath} \
-gatk ${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar \
-configFile ${SV_DIR}/conf/genstrip_parameters.txt \
-tempDir ${SV_TMPDIR} \
-R $reference \
-I ${outdir}/population.list\
-md ${metadata} \
-runDirectory ${run1} \
-useMultiStep \
-reduceInsertSizeDistributions true \
-computeReadCounts true \
-jobLogDir ${run1}/logs \
-computeGCProfiles true \
-jobRunner Shell \
-gatkJobRunner Shell \
-run
My server has no LSF or SGE because there is only one. So I want to run it directly. Except this error, it failed and exited finally.

Post edited by bhandsaker on
Tagged:

Best Answers

Answers

  • xiayanokxiayanok chinaMember

    After several hours it end with "INFO 05:49:10,559 QCommandLine - Script failed: 6597 Pend, 0 Run, 5 Fail, 145 Done". It failed because with no LSF or SGE ?

  • bhandsakerbhandsaker Member, Broadie ✭✭✭✭

    You should check the log files for details. You may have to also look at the log files for sub-jobs, based on what the top level log file says.

    In our environment, we occasionally get transient failures, which may be the case here.
    If so, then you can simply rerun the Queue command which will rerun the jobs that failed (a la "make").

  • xiayanokxiayanok chinaMember

    Why I post comment and got "Your comment will appear after it is approved" but there is no appear at last ?

  • xiayanokxiayanok chinaMember

    May be I past to much error information.

  • xiayanokxiayanok chinaMember

    @bhandsaker said:
    You should check the log files for details. You may have to also look at the log files for sub-jobs, based on what the top level log file says.

    In our environment, we occasionally get transient failures, which may be the case here.
    If so, then you can simply rerun the Queue command which will rerun the jobs that failed (a la "make").

    Thank you Bob. I have replied several times but not appear. I don't known why, may be I past too much error information of my logs.
    I have checked the error and rerun my command several times and got the same error and failed finally.
    I don't known whether it caused by the limited resource as I run the commend directly without LSF of SGE platform.
    At last, I run the command with "-L Chr01" and there is no error information and have done successfully. When I run the whole genome and failed with "INFO 05:49:10,559 QCommandLine - Script failed: 6597 Pend, 0 Run, 5 Fail, 145 Done". The five failed information is too much if I past here and my comment will not appear again.

    Can I run my command with each single chromosomes?What's the difference of the result between single chromosome and the whole genome ?
    Bob, thank you again.

  • xiayanokxiayanok chinaMember

    Bob ,this is my logs about 5 failed information when I run whole genome. When I run whole genome do I need set the "input.genomeSize, input.genomeSizeMale, input.genomeSizeFemale" ? Because my sepecies has no gender and my reference genome include the super contigs. my input .bam file also aligned on the reference genome so if how do I set the "input.genomeSize"?
    Thank you for your patience.

  • bhandsakerbhandsaker Member, Broadie ✭✭✭✭

    If the log files are too bulky to post, you can email them to me directly. You can find my email address on the McCarroll lab web page: http://mccarrolllab.com/people.

    You should be able to run per chromosome, although we don't routinely do that here. During stage 5, each sample is evaluated based o n the degree of variation, so if you run per-chromosome you will get some bias in genome-wide calling sample-to-sample especially with respect to rare variants.

    You don't need to set input genome sizes in GS 2.0 any more. They will be inferred based on the reference and ploidy map.

  • xiayanokxiayanok chinaMember
    edited May 2015

    I have checked the SVPreprocess-5.out and the sub-jobs is "org.broadinstitute.sv.apps.ComputeGCProfiles" and the output file is "reference.gcprof.zip". In the metadata file there is no "reference.gcprof.zip", I guess something wrong this my gcmask file and I created this file with the result of RepeatMasker. I only choose the 10 chromosome regions to count. And the error information is like this :

    #DBG: Tue May 12 19:03:25 CST 2015 computing reference profile for the interval Chr01:1-73727935
    #DBG: Tue May 12 19:03:38 CST 2015 computing reference profile for the interval Chr02:1-77694824
    #DBG: Tue May 12 19:03:51 CST 2015 computing reference profile for the interval Chr03:1-74408397
    #DBG: Tue May 12 19:04:03 CST 2015 computing reference profile for the interval Chr04:1-67966759
    #DBG: Tue May 12 19:04:14 CST 2015 computing reference profile for the interval Chr05:1-62243505
    #DBG: Tue May 12 19:04:24 CST 2015 computing reference profile for the interval Chr06:1-62192017
    #DBG: Tue May 12 19:04:36 CST 2015 computing reference profile for the interval Chr07:1-64263908
    #DBG: Tue May 12 19:04:46 CST 2015 computing reference profile for the interval Chr08:1-55354556
    #DBG: Tue May 12 19:04:55 CST 2015 computing reference profile for the interval Chr09:1-59454246
    #DBG: Tue May 12 19:05:05 CST 2015 computing reference profile for the interval Chr10:1-61085274
    #DBG: Tue May 12 19:05:15 CST 2015 computing reference profile for the interval super_10:1-8818317
    Exception in thread "main" java.lang.RuntimeException: Invalid sequence position: super_10:746
        at org.broadinstitute.sv.commandline.CommandLineProgram.execute(CommandLineProgram.java:61)
        at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
        at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
        at org.broadinstitute.sv.commandline.CommandLineProgram.run(CommandLineProgram.java:25)
        at org.broadinstitute.sv.apps.ComputeGCProfiles.main(ComputeGCProfiles.java:125)
    Caused by: java.lang.IllegalArgumentException: Invalid sequence position: super_10:746
    

    Do I need make the gcmask file with whole chromosome and the super contigs?

    Post edited by bhandsaker on
  • bhandsakerbhandsaker Member, Broadie ✭✭✭✭

    Is this the entire log? I think there is more information that is missing.

    When you paste these messages, it would be better to format them as code.

  • xiayanokxiayanok chinaMember

    @bhandsaker said:
    Is this the entire log? I think there is more information that is missing.

    When you paste these messages, it would be better to format them as code.

    Bob, Sorry for the chaos code it's not what I expected when I past. The attachment is the whole logs I hope it can help to solve my problem.

    Thank you so much.

  • xiayanokxiayanok chinaMember
    edited May 2015

    @bhandsaker said:
    Is this the entire log? I think there is more information that is missing.

    When you paste these messages, it would be better to format them as code.

    Bob,this is the SVPreprocess-5.out log information.
    INFO 22:26:52,650 HelpFormatter - ----------------------------------------------------------
    INFO 22:26:52,652 HelpFormatter - Program Name: org.broadinstitute.sv.apps.ComputeGCProfiles
    INFO 22:26:52,658 HelpFormatter - Program Args: -O /disk/disk1/work/xiayan/bin/svtoolkit/sorghum_data/metadata/gcprofile/reference.gcprof.zip -R /disk/disk1/work/xiayan/bin/svtoolkit/sorghum_data/data/Sbicolor_v2.1_255_cop.fasta -md /disk/disk1/work/xiayan/bin/svtoolkit/sorghum_data/metadata -writeReferenceProfile true -genomeMaskFile /disk/disk1/work/xiayan/bin/svtoolkit/sorghum_data/data/Sbicolor_v2.1_255_cop.svmask.fasta -copyNumberMaskFile /disk/disk1/work/xiayan/bin/svtoolkit/sorghum_data/data/Sbicolor_v2.1_255_cop.gcmask.fasta -configFile /disk/disk1/work/xiayan/bin/svtoolkit/conf/genstrip_parameters.txt
    INFO 22:26:52,664 HelpFormatter - Executing as [email protected] on Linux 2.6.32-431.3.1.el6.x86_64 amd64; OpenJDK 64-Bit Server VM 1.7.0_45-mockbuild_2013_11_22_18_30-b00.
    INFO 22:26:52,665 HelpFormatter - Date/Time: 2015/05/07 22:26:52
    INFO 22:26:52,665 HelpFormatter - ----------------------------------------------------------
    INFO 22:26:52,665 HelpFormatter - ----------------------------------------------------------
    INFO 22:26:52,672 MetaData - Opening metadata ...
    INFO 22:26:52,673 MetaData - Adding metadata directory /disk/disk1/work/xiayan/bin/svtoolkit/sorghum_data/metadata ...
    INFO 22:26:52,673 MetaData - Opened metadata.
    INFO 22:26:52,674 ComputeGCProfiles - Opening reference sequence ...
    INFO 22:26:52,675 ComputeGCProfiles - Opened reference sequence.
    INFO 22:26:52,675 ComputeGCProfiles - Opening genome mask ...
    INFO 22:26:52,676 ComputeGCProfiles - Opened genome mask.
    INFO 22:26:52,676 ComputeGCProfiles - Opening copy number mask ...
    INFO 22:26:52,676 ComputeGCProfiles - Opened copy number mask.
    INFO 22:26:52,677 ComputeGCProfiles - Initializing algorithm ...
    #INFO: ReadCountAlgorithm: detected metadata version 1, forcing legacy behavior
    INFO 22:26:52,680 ComputeGCProfiles - Algorithm initialized.
    INFO 22:26:52,680 ComputeGCProfiles - Computing reference GC profile ...
    #DBG: Thu May 07 22:26:52 CST 2015 computing reference profile for the interval Chr01:1-73727935
    #DBG: Thu May 07 22:27:04 CST 2015 computing reference profile for the interval Chr02:1-77694824
    #DBG: Thu May 07 22:27:17 CST 2015 computing reference profile for the interval Chr03:1-74408397
    #DBG: Thu May 07 22:27:30 CST 2015 computing reference profile for the interval Chr04:1-67966759
    #DBG: Thu May 07 22:27:41 CST 2015 computing reference profile for the interval Chr05:1-62243505
    #DBG: Thu May 07 22:27:51 CST 2015 computing reference profile for the interval Chr06:1-62192017
    #DBG: Thu May 07 22:28:02 CST 2015 computing reference profile for the interval Chr07:1-64263908
    #DBG: Thu May 07 22:28:13 CST 2015 computing reference profile for the interval Chr08:1-55354556
    #DBG: Thu May 07 22:28:22 CST 2015 computing reference profile for the interval Chr09:1-59454246
    #DBG: Thu May 07 22:28:32 CST 2015 computing reference profile for the interval Chr10:1-61085274
    #DBG: Thu May 07 22:28:42 CST 2015 computing reference profile for the interval super_10:1-8818317
    Exception in thread "main" java.lang.RuntimeException: Invalid sequence position: super_10:746
    at org.broadinstitute.sv.commandline.CommandLineProgram.execute(CommandLineProgram.java:61)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
    at org.broadinstitute.sv.commandline.CommandLineProgram.run(CommandLineProgram.java:25)
    at org.broadinstitute.sv.apps.ComputeGCProfiles.main(ComputeGCProfiles.java:125)
    Caused by: java.lang.IllegalArgumentException: Invalid sequence position: super_10:746
    at org.broadinstitute.sv.mask.GenomeMaskFastaFile.getMaskBit(GenomeMaskFastaFile.java:80)
    at org.broadinstitute.sv.metadata.gc.GCProfileAlgorithm.computeIntervalProfile(GCProfileAlgorithm.java:338)
    at org.broadinstitute.sv.metadata.gc.GCProfileAlgorithm.computeReferenceProfileMap(GCProfileAlgorithm.java:225)
    at org.broadinstitute.sv.metadata.gc.GCProfileAlgorithm.computeReferenceProfile(GCProfileAlgorithm.java:181)
    at org.broadinstitute.sv.apps.ComputeGCProfiles.run(ComputeGCProfiles.java:172)
    at org.broadinstitute.sv.commandline.CommandLineProgram.execute(CommandLineProgram.java:50)
    ... 4 more

    Post edited by bhandsaker on
  • xiayanokxiayanok chinaMember

    Sorry, I don't known why every time I past the code it always appear chaos although I have changed the format. Next time I will pay more attention about the code format when I past.

  • xiayanokxiayanok chinaMember

    @bhandsaker said:
    What is in /disk/disk1/work/xiayan/bin/svtoolkit/sorghum_data/data/Sbicolor_v2.1_255_cop.svmask.fasta.fai and
    /disk/disk1/work/xiayan/bin/svtoolkit/sorghum_data/data/Sbicolor_v2.1_255_cop.gcmask.fasta.fai ?

    Are the contigs in the right order and do they have the right lengths?

    I have checked the two files and the Sbicolor_v2.1_255_cop.gcmask.fasta.fai only contain 10 chromosomes. I make the gcmask file according to the document(http://www.broadinstitute.org/software/genomestrip/node_ReferenceMetadata.html) maybe I haven't understand what it mean completely. I will rerun the command with all contigs included and feedback.

    Thank you Bob.

  • xiayanokxiayanok chinaMember

    @bhandsaker said:
    What is in /disk/disk1/work/xiayan/bin/svtoolkit/sorghum_data/data/Sbicolor_v2.1_255_cop.svmask.fasta.fai and
    /disk/disk1/work/xiayan/bin/svtoolkit/sorghum_data/data/Sbicolor_v2.1_255_cop.gcmask.fasta.fai ?

    Are the contigs in the right order and do they have the right lengths?

    Yes, when I rerun the command with Sbicolor_v2.1_255_cop.gcmask.fasta including all contigs the error "Invalid sequence position" has disappeared. It's running now.

    Thank you Bob.

  • xiayanokxiayanok chinaMember

    Bob, there is another problem.
    The script is running but one job fail and the error like this:
    ERROR 14:46:18,236 FunctionEdge - Error: samtools index /disk/disk1/work/xiayan/bin/svtoolkit/sorghum_data/metadata_whole/headers.bam
    ERROR 14:46:18,240 FunctionEdge - Contents of /disk/disk1/work/xiayan/bin/svtoolkit/sorghum_data/run1_whole/logs/SVPreprocess-3.out:

    And the SVPreprocess-3.out log file is empty and it's also empty in installtest logs. The headers.bam is exist, but empty. Is it right ?

  • xiayanokxiayanok chinaMember

    My input bam is the indel realignment files which used to GATK. Is this caused the samtools error ?

  • xiayanokxiayanok chinaMember

    I input the non-realignment bam files, the samtools error also exist.

  • xiayanokxiayanok chinaMember

    The script is running and it display "6764 Pend ,2 Run, 2 Fail , 6 Done " but the logs file have no failed information.How can known which job failed ?

    Thank you so much.

  • bhandsakerbhandsaker Member, Broadie ✭✭✭✭

    @xiayanok said:
    The script is running and it display "6764 Pend ,2 Run, 2 Fail , 6 Done " but the logs file have no failed information.How can known which job failed ?

    Thank you so much.

    If you search that output, it will list which jobs failed and the log files for the specific jobs.
    Queue will keep running parts of the pipeline that are not blocked as long as it can make progress.
    You will eventually have to rerun to have it retry the 2 failed jobs (but be careful not to start a new Queue command running in parallel as they will conflict over the output files).

  • xiayanokxiayanok chinaMember

    @bhandsaker said:
    You will eventually have to rerun to have it retry the 2 failed jobs (but be careful not to start a new Queue command running in parallel as they will conflict over the output files).

    Bob, the script failed finally with " 6713 Pend, 0 Run, 6 Fail, 55 Done" . The attachment is the output log including the all failed information. What can I do now ? Rerun the failed jobs?

    What I want to known is why there is always several jobs failed and the script exit finally.How can I avoid the failure or fix it ?

    I don't known the failure is caused by my data or my script ? Or the limited compute resources ?

    I'm confused by the failed information.

  • bhandsakerbhandsaker Member, Broadie ✭✭✭✭

    The failures look spurious to me. The logs indicate the programs completed successfully.

    This could be caused by something in your environment that is causing Queue to see a non-zero exit status from the processes, even though they completed successfully, or perhaps some kind of bug that is making the processes return a non-zero exit status.

    The cure is to rerun and most likely the jobs will run to completion and Queue will see a zero exit status.
    If the failures are reproducible, let me know.

  • xiayanokxiayanok chinaMember

    @bhandsaker said:
    The failures look spurious to me. The logs indicate the programs completed successfully.

    Thank you Bob. Before your reply, I guess my failure maybe caused by the limited compute resources, because I haven't use the LSF or SGE platform. So I just reduce my input files, this time I only input 5 bam files to test. And the "ERROR" also appeared and there are 4 failed but it still running. I have rerun the sub-job in the log which display "ERROR" information and actually it successfully. The script is running and the job is "INFO 23:11:12,889 QGraph - 4361 Pend, 7 Run, 4 Fail, 2237 Done". After it finished I will feedback.
    And another information is the "ERROR" sub-job is about
    samtools index /disk/disk1/work/xiayan/bin/svtoolkit/sorghum_data/metadata_whole/headers.bam
    org.broadinstitute.sv.apps.MergeReadDepthCoverage
    org.broadinstitute.sv.apps.MergeReadSpanCoverage
    org.broadinstitute.sv.apps.MergeGCProfiles

    I don't known whether the header of my bam files are ok. Because I merged different libary or platform bam files to one bam, it means one input bam file is one sample.I have 28 samples so there are 28 merged bam files. But "LB" in the header of the 28 merged bam file is the same. Deed I change there header ?

    Thank you Bob.

  • bhandsakerbhandsaker Member, Broadie ✭✭✭✭

    When you say headers.bam is empty, do you mean the file is zero length? What does samtools -H report?

    Maybe you could post the header from one of your input bam files.
    As to whether having the same LB is a problem: It depends on whether this is one sequencing library or not.
    If it is all one library, then they should all have the same LB tag. If they are different, they should be different.
    The LB tag is how we tell one library from another.

  • xiayanokxiayanok chinaMember

    The report of "samtools virew -h header.bam " like this :
    @HD VN:1.4 GO:none SO:coordinate
    @SQ SN:Chr01 LN:73727935
    @SQ SN:Chr02 LN:77694824
    @SQ SN:Chr03 LN:74408397
    @SQ SN:Chr04 LN:67966759
    @SQ SN:Chr05 LN:62243505
    @SQ SN:Chr06 LN:62192017
    @SQ SN:Chr07 LN:64263908
    @SQ SN:Chr08 LN:55354556
    @SQ SN:Chr09 LN:59454246
    @SQ SN:Chr10 LN:61085274
    @SQ SN:super_10 LN:8818317
    @SQ SN:super_11 LN:7339604
    ......
    @SQ SN:super_3326 LN:1005
    @RG ID:Insert PL:Illumina LB:sequence SM:SB1
    @RG ID:Insert.1 PL:Illumina LB:sequence SM:SB2
    @RG ID:Insert.2 PL:Illumina LB:sequence SM:SB3
    @RG ID:Insert.3 PL:Illumina LB:sequence SM:SB4
    @RG ID:Insert.4 PL:Illumina LB:sequence SM:SB5
    ......
    @PG ID:GATK IndelRealigner CL:knownAlleles=[] targetIntervals=SB1.realn.intervals LODThresholdForCleaning=5.0 consensusDeterminationModel=USE_READS entropyThreshold=0.15 maxReadsI
    nMemory=150000 maxIsizeForMovement=3000 maxPositionalMoveAllowed=200 maxConsensuses=30 maxReadsForConsensuses=120 maxReadsForRealignment=20000 noOriginalAlignmentTags=false nWayOut=nul
    l generate_nWayOut_md5s=false check_early=false noPGTag=false keepPGTags=false indelsFileForDebugging=null statisticsFileForDebugging=null SNPsFileForDebugging=null
    ......

    Finally, my script end with "Script failed: 3223 Pend, 0 Run, 4 Fail, 3382 Done" .
    What should I do with the result?
    The output log is about 7.4M I don't known whether I can upload it, but I will try.

  • xiayanokxiayanok chinaMember

    Bob, this is the long log file. It's really nice of you to help me patiently.

  • bhandsakerbhandsaker Member, Broadie ✭✭✭✭

    Re: headers.bam. What happens if you try to index from the command line using "samtools index headers.bam"?
    And what is the exit status from that command?

    Re: log file errors. In addition to headers.bam, you have three other errors, each with a subsidiary log file listed (look at the end of the log file, long lines truncated for clarity):

    INFO 00:45:35,454 QGraph - Failed: 'java' '-Xmx2048m' '-XX:+UseParallelOldGC' '-XX:ParallelGCThreads=4' ...
    INFO 00:45:35,454 QGraph - Log: /disk/disk1/work/xiayan/bin/svtoolkit/sorghum_data/run1_whole/logs/SVPreprocess-19.out
    INFO 00:45:35,454 QGraph - -------
    INFO 00:45:35,454 QGraph - Failed: 'java' '-Xmx2048m' '-XX:+UseParallelOldGC' '-XX:ParallelGCThreads=4' ...
    INFO 00:45:35,454 QGraph - Log: /disk/disk1/work/xiayan/bin/svtoolkit/sorghum_data/run1_whole/logs/SVPreprocess-25.out
    INFO 00:45:35,454 QGraph - -------
    INFO 00:45:35,455 QGraph - Failed: 'java' '-Xmx2048m' '-XX:+UseParallelOldGC' '-XX:ParallelGCThreads=4' ...
    INFO 00:45:35,455 QGraph - Log: /disk/disk1/work/xiayan/bin/svtoolkit/sorghum_data/run1_whole/logs/SVPreprocess-31.out
    INFO 00:45:35,455 QGraph - -------
    INFO 00:45:35,455 QGraph - Failed: samtools index /disk/disk1/work/xiayan/bin/svtoolkit/sorghum_data/metadata_whole/headers.bam
    INFO 00:45:35,455 QGraph - Log: /disk/disk1/work/xiayan/bin/svtoolkit/sorghum_data/run1_whole/logs/SVPreprocess-3.out
    INFO 00:45:35,455 QCommandLine - Script failed: 3223 Pend, 0 Run, 4 Fail, 3382 Done

  • xiayanokxiayanok chinaMember

    There is no error if I run "samtools index headers.bam" and there will be a file named "headers.bam.bai".

    I have rerun the 4 failed logs and it seemed successfully because end with "Program completed." without error information.
    As I guess the header is a factor lead to fail, so I rerun a script with only 3 samples which have different "LB" in the header. I use the Picard tools to change their headers. What surprised me is the samtools error is disappeared but the "SVPreprocess-3.out" is also empty. What a bad news is the error in the SVPreprocess-19.out appeared again. And I think the error in the step of "SVPreprocess-25.out" and "SVPreprocess-31.out" will also appear as before.

    The script is running and I'm waiting for other error information if there have. And I have reruned the sub-job in the SVPreprocess-19.out. Now it is runing and state is "6572 Pend, 1 Run, 1 Fail, 23 Done ".

  • bhandsakerbhandsaker Member, Broadie ✭✭✭✭

    If you haven't read about Queue, you need to understand that it works like "make" and tracks progress through creating ..done (and sometimes ..fail) files. Also, you need to be careful if you change parameters in such a way as to change the overall number of jobs in the pipeline - in this case the log files won't match, etc. If you change too many things like this, make sure you do a clean run in a fresh run directory at the end.

  • xiayanokxiayanok chinaMember

    In the prior script which has 4 failed logs and exit finally. And the error is about "samtools,MergeReadDepthCoverage,MergeReadSpanCoverage,MergeGCProfiles". This time I have changed the header of the input bam files, there is only the "MergeReadSpanCoverage" error but in fact there is no information in the "SVPreprocess-19.out". I checked the output of the logs, the sub-jobs of "MergeReadSpanCoverage" and "MergeGCProfiles" are already done successfully.

    Now the running state is "6451 Pend, 5 Run, 1 Fail, 140 Done".

    SVPreprocess-19.out is like this:

    [[email protected] logs]$ more SVPreprocess-19.out

    INFO 20:12:39,638 HelpFormatter - --------------------------------------------------------------
    INFO 20:12:39,640 HelpFormatter - Program Name: org.broadinstitute.sv.apps.MergeReadSpanCoverage
    INFO 20:12:39,646 HelpFormatter - Program Args: -I /disk/disk1/work/xiayan/bin/svtoolkit/sorghum_data/metadata_whole/spans/SB20_newRG_new.realn.spans.txt -I /disk/disk1/work/xiayan/bin/svtoolkit/sorghum_data/metadata_whole/span/SB21_newRG_new.realn.spans.txt -I /disk/disk1/work/xiayan/bin/svtoolkit/sorghum_data/metadata_whole/spans/SB22_newRG_new.realn.spans.txt -O /disk/disk1/work/xiayan/bin/svtoolkit/sorghum_data/metadata_whole/spans.dat
    INFO 20:12:39,653 HelpFormatter - Executing as [email protected]caldomain on Linux 2.6.32-431.3.1.el6.x86_64 amd64; OpenJDK 64-Bit Server VM 1.7.0_45-mockbuild_2013_11_22_18_30-b00.

    INFO 20:12:39,653 HelpFormatter - Date/Time: 2015/05/15 20:12:39
    INFO 20:12:39,654 HelpFormatter - --------------------------------------------------------------
    INFO 20:12:39,654 HelpFormatter - --------------------------------------------------------------
    INFO 20:12:39,659 CommandLineProgram - Program completed.

    Why the funtion "MergeReadDepthCoverage" always failed? Is it caused by some factor in the header of the input bam file ? It's just my guess, I hope it will be useful.

    Bob, thanks for your help.

  • bhandsakerbhandsaker Member, Broadie ✭✭✭✭

    Post one of the input files and the output file (they are just small text files).

  • xiayanokxiayanok chinaMember

    @bhandsaker said:
    If you haven't read about Queue, you need to understand that it works like "make" and tracks progress through creating ..done (and sometimes ..fail) files. Also, you need to be careful if you change parameters in such a way as to change the overall number of jobs in the pipeline - in this case the log files won't match, etc. If you change too many things like this, make sure you do a clean run in a fresh run directory at the end.

    Sorry, I shouldn't change so many parameters because I want it be faster as limited compute resources. I have reruned the script in the clean and fresh run directory and backup the output information.

    I'm trying to understand the Queue and check the logs carefully. As I changed the parameters, the number of function will changed. So the logs number will different even the same function. In order to compare the output, it's better not to change to much parameters at one time. What I understand is right ?

  • xiayanokxiayanok chinaMember

    There is the input file and output file.There really exit a file named ".spans.dat.fail". The file "spans.dat" I have to change its name because the ".dat" format is not allowed to post.

  • xiayanokxiayanok chinaMember

    The 3 input file is successfully with ".done" and only the output file has the ".fail" file.

  • bhandsakerbhandsaker Member, Broadie ✭✭✭✭

    I'm confused about what reproducible problems you are actually seeing right now.

  • xiayanokxiayanok chinaMember

    The reproducible problem is the "spans.dat" file always failed when I run whole genome.

  • bhandsakerbhandsaker Member, Broadie ✭✭✭✭

    If you are running just one sample, then those files (posted above) look correct.
    If the command is returning a non-zero exit status, you should check the log files, etc., and perhaps check whether there is something in your environment that might be causing the command to return a non-zero exit status even though it ran correctly.

  • xiayanokxiayanok chinaMember

    @bhandsaker said:
    If you are running just one sample, then those files (posted above) look correct.

    Bob, I'm sorry my feedback is so late. I tested three samples. When I run with three samples or only one sample it always failed with one error. The file span.dat failed and the log as follows. But when I run with only one chromosome there is no error. I don't known what's wrong with it.

    SVPreprocess-19.out:

    INFO 17:51:06,381 HelpFormatter - --------------------------------------------------------------
    INFO 17:51:06,383 HelpFormatter - Program Name: org.broadinstitute.sv.apps.MergeReadSpanCoverage
    INFO 17:51:06,389 HelpFormatter - Program Args: -I /disk/disk1/work/xiayan/bin/svtoolkit/sorghum_data/metadata_whole/spans/SB20_newRG_new.realn.spans.txt -I /disk/disk1/work/xiayan/bi
    n/svtoolkit/sorghum_data/metadata_whole/spans/SB21_newRG_new.realn.spans.txt -I /disk/disk1/work/xiayan/bin/svtoolkit/sorghum_data/metadata_whole/spans/SB22_newRG_new.realn.spans.txt -
    O /disk/disk1/work/xiayan/bin/svtoolkit/sorghum_data/metadata_whole/spans.dat
    INFO 17:51:06,396 HelpFormatter - Executing as [email protected] on Linux 2.6.32-431.3.1.el6.x86_64 amd64; OpenJDK 64-Bit Server VM 1.7.0_45-mockbuild_2013_11_22_18_30-b00.
    INFO 17:51:06,396 HelpFormatter - Date/Time: 2015/06/01 17:51:06
    INFO 17:51:06,396 HelpFormatter - --------------------------------------------------------------
    INFO 17:51:06,397 HelpFormatter - --------------------------------------------------------------
    INFO 17:51:06,402 CommandLineProgram - Program completed.

    spans.dat:

    SAMPLE LIBRARY READGROUP SPANCOVERAGE
    SB20 LB20 20 4093051901
    SB21 LB21 21 2113248290
    SB22 LB22 22 3915178907

    Thank you so much.

  • bhandsakerbhandsaker Member, Broadie ✭✭✭✭

    I don't know what is wrong either.
    Queue is acting like it is getting a non-zero exit status from your process.
    You can always remove the .spans.dat.fail file and manually touch .spans.dat.done if you want Queue to not rerun this step.

  • xiayanokxiayanok chinaMember

    @bhandsaker said:
    I don't know what is wrong either.
    Queue is acting like it is getting a non-zero exit status from your process.
    You can always remove the .spans.dat.fail file and manually touch .spans.dat.done if you want Queue to not rerun this step.

    Ok, I will try again. Thank you Bob.

Sign In or Register to comment.