If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

CNVdiscovery pipeline: Argument with name '--genderMapFile /'--ploidyMapFile' missing

jkreinerjkreiner UofTMember

I'm testing the CNVdiscovery pipeline on a small set of individuals and I can't seem to get it running. I keep getting the following error despite both the gender & ploidy map being an optional parameter (I tried supplying a ploidy file with * * * * 2, but it still requires a gender map, which is information I don't have):

Exception in thread "main" org.broadinstitute.gatk.utils.commandline.MissingArgumentException:
Argument with name '--ploidyMapFile' (-ploidyMapFile) is missing.
Argument with name '--genderMapFile' (-genderMapFile) is missing.
at org.broadinstitute.gatk.utils.commandline.ParsingEngine.validate(
at org.broadinstitute.gatk.utils.commandline.ParsingEngine.validate(
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(
at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(

The command I'm running is:

java -Xmx4g -cp ${classpath} \
org.broadinstitute.gatk.queue.QCommandLine \
-S ${SV_DIR}/qscript/discovery/cnv/CNVDiscoveryPipeline.q \
-S ${SV_DIR}/qscript/SVQScript.q \
-cp ${classpath} \
-gatk ${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar \
-configFile ${SV_DIR}/conf/genstrip_parameters.txt \
-jobRunner Shell \
-gatkJobRunner Shell \
-R ${ref} \
-I ${prefix}/test.list \
-md ${output}/metadata \
-runDirectory ${output}/run1 \
-tilingWindowSize 1000 \
-tilingWindowOverlap 500 \
-maximumReferenceGapLength 1000 \
-boundaryPrecision 100 \
-minimumRefinedLength 500 \

Thanks for your help,


Best Answer


  • jkreinerjkreiner UofTMember

    Thanks for your quick response!

    Actually, I am not working with Human data. I have run the SVPreprocess step first, but I think the issue is as you identified, that I don't have a reference metadata bundle. Is it feasible to create mask files for a fragmented pacbio reference genome? Are the rdmask and svmask files the only required ones?

  • bhandsakerbhandsaker Member, Broadie ✭✭✭✭

    There are a couple of threads on creating a reference bundle - I can't remember if they are on the forum or on the older mailing list. It requires some work and reasonable computer skill.

    I do not know how well Genome STRiP works in general on non-human data, but I know people have used it with rice, cows, etc.

    My main concern about the pacbio reference would be whether there are lots of errors at the basepair level (i.e. small indels). I don't know what effect this would have on the ability to align the Illumina reads accurately and uniformly to the reference (i.e. as uniformly as a more finished reference).

    If you want to try making a reference bundle, maybe it would be easier to email me directly and then once it is working we can post the recipe here.

Sign In or Register to comment.