CNVdiscovery pipeline: Argument with name '--genderMapFile /'--ploidyMapFile' missing

I'm testing the CNVdiscovery pipeline on a small set of individuals and I can't seem to get it running. I keep getting the following error despite both the gender & ploidy map being an optional parameter (I tried supplying a ploidy file with * * * * 2, but it still requires a gender map, which is information I don't have):

Exception in thread "main" org.broadinstitute.gatk.utils.commandline.MissingArgumentException:
Argument with name '--ploidyMapFile' (-ploidyMapFile) is missing.
Argument with name '--genderMapFile' (-genderMapFile) is missing.
at org.broadinstitute.gatk.utils.commandline.ParsingEngine.validate(
at org.broadinstitute.gatk.utils.commandline.ParsingEngine.validate(
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(

The command I'm running is:

java -Xmx4g -cp ${classpath} \
org.broadinstitute.gatk.queue.QCommandLine \
-S ${SV_DIR}/qscript/discovery/cnv/CNVDiscoveryPipeline.q \
-S ${SV_DIR}/qscript/SVQScript.q \
-cp ${classpath} \
-gatk ${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar \
-configFile ${SV_DIR}/conf/genstrip_parameters.txt \
-jobRunner Shell \
-gatkJobRunner Shell \
-R ${ref} \
-I ${prefix}/test.list \
-md ${output}/metadata \
-runDirectory ${output}/run1 \
-tilingWindowSize 1000 \
-tilingWindowOverlap 500 \
-maximumReferenceGapLength 1000 \
-boundaryPrecision 100 \
-minimumRefinedLength 500 \

Thanks for your help,


Best Answer


  • jkreinerjkreiner UofTMember

    Thanks for your quick response!

    Actually, I am not working with Human data. I have run the SVPreprocess step first, but I think the issue is as you identified, that I don't have a reference metadata bundle. Is it feasible to create mask files for a fragmented pacbio reference genome? Are the rdmask and svmask files the only required ones?

  • bhandsakerbhandsaker Member, Broadie, Moderator

    There are a couple of threads on creating a reference bundle - I can't remember if they are on the forum or on the older mailing list. It requires some work and reasonable computer skill.

    I do not know how well Genome STRiP works in general on non-human data, but I know people have used it with rice, cows, etc.

    My main concern about the pacbio reference would be whether there are lots of errors at the basepair level (i.e. small indels). I don't know what effect this would have on the ability to align the Illumina reads accurately and uniformly to the reference (i.e. as uniformly as a more finished reference).

    If you want to try making a reference bundle, maybe it would be easier to email me directly and then once it is working we can post the recipe here.

Sign In or Register to comment.