If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Preprocess error

mikyatopemikyatope BarcelonaMember

Hi I'm running preprocess in a complete hg18 genome with this basic options:

java -cp ${classpath} ${mx} \
    org.broadinstitute.gatk.queue.QCommandLine \
    -S ${SV_DIR}/qscript/SVPreprocess.q \
    -S ${SV_DIR}/qscript/SVQScript.q \
    -gatk ${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar \
    -cp ${classpath} \
    -tempDir ${SV_TMPDIR} \
    -configFile genstrip_parameters.txt \
    -R ~/hg18/hg18.fa \
    -runDirectory ${runDir} \
    -md ${runDir}/metadata \
    -reduceInsertSizeDistributions false \
    -computeGCProfiles true \
    -computeReadCounts true \
    -jobLogDir ${runDir}/logs \
    -I ${bam} \
    -run \
    || exit 1

And it fails with the error log attached.

Any suggestions on where to look first?? Thanks!


Best Answers


  • mikyatopemikyatope BarcelonaMember


    I've tried again, now it shows me an error about the genome mask file. I attach again the log in case I'm missing something else.

    I'm a bit confused since neither the ploidy map nor the genome mask is indicated as required input in the documentation ( Do I need to create all the files in the metadata bundle for the hg18 version?

  • mikyatopemikyatope BarcelonaMember

    Sorry, let me clarify the previous question: from the section "Building a custom reference metadata bundle", the only optional files are, reference.gendermask.bed and reference.gcmask.fasta? am I wrong? thanks!

  • mikyatopemikyatope BarcelonaMember

    Hello again,

    I've indexed and used the suggested file renamed as hg18.fa.svmask.fasta, but it's complaining about chr10, what do you suggest I check? thx!

    "Exception in thread "main" Mismatch found between genome mask and reference sequence: Interval chr10:1-135374737 not found in genome mask"

  • bhandsakerbhandsaker Member, Broadie ✭✭✭✭

    If you look at hg18.fa.fai, what do the first few reference contigs look like - what are their names and order?
    It looks like your reference contains "chr10" whereas the svmask you are using doesn't use "chr" (just "1", "2", "3").
    If the chromosome lengths match, you can just rename the fasta entries, reorder if necessary to match your reference, re-index and things should work ok.

  • mikyatopemikyatope BarcelonaMember

    Hi again! after making sure that both reference and mask have the same chromosomes, I got this error:

    Exception in thread "main" java.lang.RuntimeException: End of file while reading fasta file: /project/devel/mramia/hg18/refsGenomeStrip/hg18.fa.svmask.fasta

    I should explain, before, preprocess complained about the name of the chromosomes had 'chr' (everything had the same in all files). I changed this but did not reorder, could this be the cause?

    PD: not reconising 'chr' as a chromosome name is an intended behavior?

  • bhandsakerbhandsaker Member, Broadie ✭✭✭✭

    Everything needs to be in the same order (and have matching sequence/chromosome names - "1" doesn't match "chr1").
    That includes the reference file you aligned to, and this in turn determines the order of the alignments in the bam files.
    Similarly all of the mask files in Genome STRiP also need to be in reference order.

    There shouldn't be any explicit restriction on the names of the chromosomes.

Sign In or Register to comment.