We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

SVPreprocess Error: Alignment file does not exist

Dear Genome STRiP users,

I completed SVPreprocess to certain cohort successfully. Now I apply the same script to another cohort for calling the same variants. However, a kind of unexpected errors raised as below (as an example)

Exception in thread "main" org.broadinstitute.sv.commandline.ArgumentException: Alignment file does not exist: /proj/yunligrp/users/minzhi/gs/PAGE_chr16/bam_PAGE_chr16_1-500000/H_TK-12498-AB33938473
    at org.broadinstitute.sv.dataset.SAMLocation.create(SAMLocation.java:99)
    at org.broadinstitute.sv.commandline.CommandLineParser.createSAMLocation(CommandLineParser.java:256)
    at org.broadinstitute.sv.commandline.CommandLineParser.parseSAMLocationFile(CommandLineParser.java:247)
    at org.broadinstitute.sv.commandline.CommandLineParser.parseSAMLocations(CommandLineParser.java:234)
    at org.broadinstitute.sv.commandline.CommandLineParser.parseSAMLocations(CommandLineParser.java:220)
    at org.broadinstitute.sv.apps.ExtractBAMSubset.run(ExtractBAMSubset.java:79)
    at org.broadinstitute.sv.commandline.CommandLineProgram.execute(CommandLineProgram.java:54)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:256)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:158)
    at org.broadinstitute.sv.commandline.CommandLineProgram.runAndReturnResult(CommandLineProgram.java:29)
    at org.broadinstitute.sv.commandline.CommandLineProgram.run(CommandLineProgram.java:25)
    at org.broadinstitute.sv.apps.ExtractBAMSubset.main(ExtractBAMSubset.java:74) 

I have 950 samples, but nearly 900 of them has such kind error during the SVPreprocess. I am not sure if this is related to the reference file -- but I use the reference file listed on Broad Inst's website. And the alignment file in my former analysis is just one file header.bam for all 3418 samples, but now it looks like the alignment files are different for each bam file. So does it related to my script? Here I attached my script. May I have your suggestions? Thank you very much.


java -Xmx4g -cp ${classpath}\
     -S ${SV_DIR}/qscript/SVPreprocess.q\
     -S ${SV_DIR}/qscript/SVQScript.q\
     -cp ${classpath}\
     -gatk ${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar \
     -configFile ${SV_DIR}/conf/genstrip_parameters.txt \
     -R ${gs_dir}/Homo_sapiens_assembly38/Homo_sapiens_assembly38.fasta \
     -L $1:$2 \
     -I ${gs_dir}/$7_$1/supporting_$7_$1/$7_$1_$5_sample.list \
     -md ${rundir}/md_tempdir \
     -tempDir ${gs_dir}/gs_tempdir/svpre_tmp \
     -runDirectory ${rundir} \
     -ploidyMapFile ${gs_dir}/$7_$1/supporting_$7_$1/$7_$1_$8_ploidy.map \
     -jobLogDir ${rundir}/logs \
     -run \
     || exit 1

Best regards,


Best Answer


  • bhandsakerbhandsaker Member, Broadie ✭✭✭✭

    You are using the .list extension, which is correct, and the most common mistake.
    I suspect it is some problem within the *_sample.list file. Perhaps something with the character set or line terminators. The file path being read from the list file does not appear to exist.

  • Hi @bhandsaker :

    Thank you very much for your explanation. Here is part of my *_sample.list file:


    The corresponding .bam file and .bam.bai file are in the directory "/proj/yunligrp/users/minzhi/gs/PAGE_chr16/bam_PAGE_chr16_1-500000/"
    And this directory was shown in the error message above, is this directory the file path that should be read from the list file?

    I will try to generate this file by python and specify the line terminator by "\n". But before that, may I have your help with my two confusions:

    1. when I run SVPreprocess to the JHS cohort, there is only one alignment file -- md_tempdir/headers.bam, but in the PAGE cohort I am running, the error messages
    Alignment file does not exist: /proj/yunligrp/users/minzhi/gs/PAGE_chr16/bam_PAGE_chr16_1-500000/H_TK-12498-AB33938473

    imply that the alignment file should be in the directory of the .bam file. I am not sure if the difference between the directory of the alignment file reflect that I did anything wrong.

    1. When I run SVPreprocess to the JHS cohort, the files in the md_tempdir are
    chimerism.dat     headers.bam      isd.stats.dat   rccache.bin.idx           svtoolkit.version.dat
    depth.dat         headers.bam.bai  mdversion.txt   rccache.list
    gcprofiles.zip    isd.dist.bin     profiles_100Kb  sample_gender.report.txt
    genome_sizes.txt  isd.hist.bin     rccache.bin     spans.dat

    but when I run SVPreprocess to the PAGE cohort, the files in the md_tempdir seems like incompleted:

    depth      genome_sizes.txt  mdversion.txt   rccache        spans
    gcprofile  isd               profiles_100Kb  rccache.merge  svtoolkit.version.dat

    Does it mean that the step of merging all the .bam file did not completed? Or in other words, does it mean that to any cohort, if we can complete the SVPreprocess successfully, then the name of the files in the md_tempdir should be the same as what I saw in the md_tempdir of the JHS cohort?

    Thank you very much.

    Best regards,

  • Hi @bhandsaker :

    It works now (after adding .bam to every line in my .list file)! Thank you very much. And also thank you for the explanation of the headers.bam.

    Best regards,

Sign In or Register to comment.