Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

SVPreprocess Error: Alignment file does not exist

Dear Genome STRiP users,

I completed SVPreprocess to certain cohort successfully. Now I apply the same script to another cohort for calling the same variants. However, a kind of unexpected errors raised as below (as an example)

Exception in thread "main" org.broadinstitute.sv.commandline.ArgumentException: Alignment file does not exist: /proj/yunligrp/users/minzhi/gs/PAGE_chr16/bam_PAGE_chr16_1-500000/H_TK-12498-AB33938473
    at org.broadinstitute.sv.dataset.SAMLocation.create(SAMLocation.java:99)
    at org.broadinstitute.sv.commandline.CommandLineParser.createSAMLocation(CommandLineParser.java:256)
    at org.broadinstitute.sv.commandline.CommandLineParser.parseSAMLocationFile(CommandLineParser.java:247)
    at org.broadinstitute.sv.commandline.CommandLineParser.parseSAMLocations(CommandLineParser.java:234)
    at org.broadinstitute.sv.commandline.CommandLineParser.parseSAMLocations(CommandLineParser.java:220)
    at org.broadinstitute.sv.apps.ExtractBAMSubset.run(ExtractBAMSubset.java:79)
    at org.broadinstitute.sv.commandline.CommandLineProgram.execute(CommandLineProgram.java:54)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158)
    at org.broadinstitute.sv.commandline.CommandLineProgram.runAndReturnResult(CommandLineProgram.java:29)
    at org.broadinstitute.sv.commandline.CommandLineProgram.run(CommandLineProgram.java:25)
    at org.broadinstitute.sv.apps.ExtractBAMSubset.main(ExtractBAMSubset.java:74) 

I have 950 samples, but nearly 900 of them has such kind error during the SVPreprocess. I am not sure if this is related to the reference file -- but I use the reference file listed on Broad Inst's website. And the alignment file in my former analysis is just one file header.bam for all 3418 samples, but now it looks like the alignment files are different for each bam file. So does it related to my script? Here I attached my script. May I have your suggestions? Thank you very much.

classpath="${SV_DIR}/lib/SVToolkit.jar:${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar:${SV_DIR}/lib/gatk/Queue.jar"
gs_dir="$4"
rundir="${gs_dir}/$7_$1/$3"

java -Xmx4g -cp ${classpath}\
     org.broadinstitute.gatk.queue.QCommandLine\
     -S ${SV_DIR}/qscript/SVPreprocess.q\
     -S ${SV_DIR}/qscript/SVQScript.q\
     -cp ${classpath}\
     -gatk ${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar \
     -configFile ${SV_DIR}/conf/genstrip_parameters.txt \
     -R ${gs_dir}/Homo_sapiens_assembly38/Homo_sapiens_assembly38.fasta \
     -L $1:$2 \
     -I ${gs_dir}/$7_$1/supporting_$7_$1/$7_$1_$5_sample.list \
     -md ${rundir}/md_tempdir \
     -tempDir ${gs_dir}/gs_tempdir/svpre_tmp \
     -runDirectory ${rundir} \
     -ploidyMapFile ${gs_dir}/$7_$1/supporting_$7_$1/$7_$1_$8_ploidy.map \
     -jobLogDir ${rundir}/logs \
     -run \
     || exit 1

Best regards,
Wusheng

Tagged:

Best Answer

Answers

  • bhandsakerbhandsaker Member, Broadie ✭✭✭✭

    You are using the .list extension, which is correct, and the most common mistake.
    I suspect it is some problem within the *_sample.list file. Perhaps something with the character set or line terminators. The file path being read from the list file does not appear to exist.

  • Hi @bhandsaker :

    Thank you very much for your explanation. Here is part of my *_sample.list file:

    /proj/yunligrp/users/minzhi/gs/PAGE_chr16/bam_PAGE_chr16_1-500000/H_TK-12498-AB33938473
    /proj/yunligrp/users/minzhi/gs/PAGE_chr16/bam_PAGE_chr16_1-500000/H_TK-12538-AB33938624
    /proj/yunligrp/users/minzhi/gs/PAGE_chr16/bam_PAGE_chr16_1-500000/H_TK-12734-AB35899176
    ...
    

    The corresponding .bam file and .bam.bai file are in the directory "/proj/yunligrp/users/minzhi/gs/PAGE_chr16/bam_PAGE_chr16_1-500000/"
    And this directory was shown in the error message above, is this directory the file path that should be read from the list file?

    I will try to generate this file by python and specify the line terminator by "\n". But before that, may I have your help with my two confusions:

    1. when I run SVPreprocess to the JHS cohort, there is only one alignment file -- md_tempdir/headers.bam, but in the PAGE cohort I am running, the error messages
    Alignment file does not exist: /proj/yunligrp/users/minzhi/gs/PAGE_chr16/bam_PAGE_chr16_1-500000/H_TK-12498-AB33938473
    

    imply that the alignment file should be in the directory of the .bam file. I am not sure if the difference between the directory of the alignment file reflect that I did anything wrong.

    1. When I run SVPreprocess to the JHS cohort, the files in the md_tempdir are
    chimerism.dat     headers.bam      isd.stats.dat   rccache.bin.idx           svtoolkit.version.dat
    depth.dat         headers.bam.bai  mdversion.txt   rccache.list
    gcprofiles.zip    isd.dist.bin     profiles_100Kb  sample_gender.report.txt
    genome_sizes.txt  isd.hist.bin     rccache.bin     spans.dat
    

    but when I run SVPreprocess to the PAGE cohort, the files in the md_tempdir seems like incompleted:

    depth      genome_sizes.txt  mdversion.txt   rccache        spans
    gcprofile  isd               profiles_100Kb  rccache.merge  svtoolkit.version.dat
    

    Does it mean that the step of merging all the .bam file did not completed? Or in other words, does it mean that to any cohort, if we can complete the SVPreprocess successfully, then the name of the files in the md_tempdir should be the same as what I saw in the md_tempdir of the JHS cohort?

    Thank you very much.

    Best regards,
    Wusheng

  • Hi @bhandsaker :

    It works now (after adding .bam to every line in my .list file)! Thank you very much. And also thank you for the explanation of the headers.bam.

    Best regards,
    Wusheng

Sign In or Register to comment.