We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
GenomeSTRiP v2.0 CNV calling

Hi Bob,
I upgraded to version 2 and ran my samples with a 30 background population samples from 1000G for DEL discovery. I know that SVToolkit now calls deletions and copy number variation, my question is; how do I run CNV discovery. Here are my commands;
java -cp ${classpath} ${mx} \ org.broadinstitute.gatk.queue.QCommandLine \ -S ${SV_DIR}/qscript/SVPreprocess.q \ -S ${SV_DIR}/qscript/SVQScript.q \ -gatk ${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar \ -cp ${classpath} \ -configFile ${SV_DIR}/conf/genstrip_parameters.txt \ -tempDir ${SV_TMPDIR} \ -R data/human_g1k_v37.fasta \ -genomeMaskFile data/human_g1k_v37.mask.100.fasta \ -ploidyMapFile data/humgen_g1k_v37_ploidy.map \ -copyNumberMaskFile data/cn2_mask_g1k_v37.fasta \ -genderMapFile data/geneder1000GandMysamples.map \ -runDirectory ${runDir} \ -md ${runDir}/metadata \ -reduceInsertSizeDistributions true \ -bamFilesAreDisjoint true \ -computeGCProfiles true \ -jobLogDir ${runDir}/logs \ -I ${bamList} \ -run \
java -cp ${classpath} ${mx} \ org.broadinstitute.gatk.queue.QCommandLine \ -S ${SV_DIR}/qscript/SVDiscovery.q \ -S ${SV_DIR}/qscript/SVQScript.q \ -gatk ${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar \ -cp ${classpath} \ -configFile ${SV_DIR}/conf/genstrip_parameters.txt \ -tempDir ${SV_TMPDIR} \ -R data/human_g1k_v37.fasta \ -genomeMaskFile data/human_g1k_v37.mask.100.fasta \ -genderMapFile data/geneder1000GandMysamples.map \ -runDirectory ${runDir} \ -md ${runDir}/metadata \ -jobLogDir ${runDir}/logs \ -minimumSize 100 \ -maximumSize 1000000 \ -suppressVCFCommandLines \ -I ${bamList} \ -O ${sites} \ -run \
java -cp ${classpath} ${mx} \ org.broadinstitute.gatk.queue.QCommandLine \ -S ${SV_DIR}/qscript/SVGenotyper.q \ -S ${SV_DIR}/qscript/SVQScript.q \ -gatk ${SV_DIR}/lib/gatk/GenomeAnalysisTK.jar \ --disableJobReport \ -cp ${classpath} \ -configFile ${SV_DIR}/conf/genstrip_parameters.txt \ -tempDir ${SV_TMPDIR} \ -R data/human_g1k_v37.fasta \ -genomeMaskFile data/human_g1k_v37.mask.100.fasta \ -ploidyMapFile data/humgen_g1k_v37_ploidy.map \ -genderMapFile data/geneder1000GandMysamples.map \ -runDirectory ${runDir} \ -md ${runDir}/metadata \ -jobLogDir ${runDir}/logs \ -I ${bamList} \ -vcf ${sites} \ -O ${genotypes} \ -run
Would you please let me know what should I do to run CNV calling! And let me know if I did anything wrong in my DEL calling.
Thank you so much,
Reza
Best Answers
-
bhandsaker ✭✭✭✭
The command lines above look plausible.
Probably the best place to start with the new CNV pipeline is the online documentation here:
http://www.broadinstitute.org/software/genomestrip/node_CNVPipelineOverview.html
http://www.broadinstitute.org/software/genomestrip/org_broadinstitute_sv_qscript_CNVDiscoveryPipeline.html -
bhandsaker ✭✭✭✭
I guess they could use some SEO
http://melt.igs.umaryland.edu/
The Devine lab web site I guess is here:
http://medschool.umaryland.edu/FACULTYRESEARCHPROFILE/viewprofile.aspx?id=21601
Eugene Gardner is the primary author of MELT.Genome STRiP is fairly unique (to my knowledge) in terms of being a population-aware method and requiring multiple genomes.
Answers
The command lines above look plausible.
Probably the best place to start with the new CNV pipeline is the online documentation here:
http://www.broadinstitute.org/software/genomestrip/node_CNVPipelineOverview.html
http://www.broadinstitute.org/software/genomestrip/org_broadinstitute_sv_qscript_CNVDiscoveryPipeline.html
Thanks Bob,
I have 4 samples with about 20x coverage each and a background of 30 samples from 1000 G, do you think the parameters below would be good for this run. Thank you very much.
-tilingWindowSize 5000 -tilingWindowOverlap 2500 -maximumReferenceGapLength 2500 -boundaryPrecision 200 -minimumRefinedLength 2500
Are you using the 1000G high coverage samples or low coverage samples?
Samples HG00096 to HG00127, these bam files are about 10GB each, I believe they should be the 4-6 x coverage ones!
One factor we have found to be important for best calling is to match the read length between the target and background samples as much as possible. GS normalizes for alignability, which should in theory normalize for read length, but in practice we see subtle batch effects when mixing read lengths. The 1000G low coverage sequencing (from phase3) is mostly 70-100bp reads.
There are also 30 samples sequenced at high coverage (2x250 PCR free Illumina reads).
If you are using the low coverage samples, then your settings are probably the right place to start.
We are also finding that the GS calling is quite a bit better at 100 samples or so compared to 30-40 and even better at 500+.
It is not bad to start with 30 samples as a background population, but you might also try using several hundred 1000G samples.
Thanks Bob,
My reads are 101 in length and these samples also appear to be 101, I have 100 samples already downloaded from 1000 G and as you advised I will try to run with a larger population.
Are you guys going to include algorithms to call other variant types (specifically insertions and inversions) in the tool, if not what tools 1000G is currently using for calling insertions and inversions?
Thank you so much for the information!
Ah, those seem like pretty sensible defaults, then, and a good idea to try to stick to 1000G samples with 101bp reads.
Using 100 or 200 samples will take a bit longer but likely improve your results.
A number of tools were used in the 1000 Genomes project. For inversions, Delly (Rausch, 2012) works well.
MELT from the University of Maryland (Devine Lab) also did well in phase3 for calling mobile element insertions.
Thank you so much,
For calling structural variants do I generally need a big population/background population most of the time no matter what tool use, or this is the case for Genome STRiP only.
I am so sorry, I know this is totally irrelevant to this thread.
I found Delly. But I couldn't find MELT after some google search, do you know where to download the tool and its documentation!
I guess they could use some SEO
http://melt.igs.umaryland.edu/
The Devine lab web site I guess is here:
http://medschool.umaryland.edu/FACULTYRESEARCHPROFILE/viewprofile.aspx?id=21601
Eugene Gardner is the primary author of MELT.
Genome STRiP is fairly unique (to my knowledge) in terms of being a population-aware method and requiring multiple genomes.
Hi,Bob.
The second website doesn't work and it shows " The page you are looking for is not found." And several webpages about the Pipelines (Queue Scripts) also doesn't work. I want to learn the scripts on it recently.
For example:
http://www.broadinstitute.org/software/genomestrip/org_broadinstitute_sv_qscript_SVPreprocess.html
http://www.broadinstitute.org/software/genomestrip/org_broadinstitute_sv_qscript_QCommandLine.html
http://www.broadinstitute.org/software/genomestrip/org_broadinstitute_sv_qscript_SVDiscovery.html
http://www.broadinstitute.org/software/genomestrip/org_broadinstitute_sv_qscript_SVGenotyper.html
what's wrong with the webpage?
Thank you Bob.
There was a problem with our recent documentation update for r1563. It should be fixed now.
Learn how to do online digital marketing from basic level to expert level which is taught by hands on experience working professionals. With our Digital Marketing Training in Chennai you will not only learn concepts of it and you will able to create digital marketing campaign for any business, products, services, software’s and websites from small sizes to big.
digital marketing training in chennai
digital marketing training in porur
SKARtec Digital Marketing Academy is an institute dedicated to meet the integrated marketing needs of the industry. Our Digital Marketing Course in Chennai is ideal for those, who wish to manage a successful and sustainable digital marketing strategy. This digital marketing certification explores all the core digital marketing and management concepts, techniques and disciplines from planning, implementation and measurement to success and failure factors. Enrolling in this marketing course will prepare you to join an exclusive community of highly-recognized digital marketing experts.
Best Place Learn to Digital Marketing : Digital Marketing Training in Bangalore