Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
We will be out of the office for a Broad Institute event from Dec 10th to Dec 11th 2019. We will be back to monitor the GATK forum on Dec 12th 2019. In the meantime we encourage you to help out other community members with their queries.
Thank you for your patience!

GermlineCNVCaller - How to add samples to existing model

stanedavstanedav PragueMember

GATK 4.1.2 Linux Ubuntu 16.0.4, using bash script

I am in the process of testing your GermlineCNVCaller, at first I made the first model consisting of my 45 WES samples , steps were these:
1. For each sample:
java -Xmx30G -jar $gatk CollectReadCounts \ -I ${fileline[2]} \ -L $interval_file \ --interval-merging-rule OVERLAPPING_ONLY \ -O $output_path/${fileline[1]}.counts.hdf5
2. Determine Germline Contig Ploidy with list of all hdf5 files:
java -jar $gatk DetermineGermlineContigPloidy \ -L $interval_file \ --interval-merging-rule OVERLAPPING_ONLY \ --input $45_hdf5.list \ --contig-ploidy-priors $ploidy_table \ --output $output_path \ --output-prefix WES_45_cohort
3. Germline CNV Caller
java -jar $gatk GermlineCNVCaller \ --run-mode COHORT \ -L $interval_file \ --interval-merging-rule OVERLAPPING_ONLY \ --contig-ploidy-calls WES_45_cohort-calls \ --input $45_hdf5.list \ --output $output_path \ --output-prefix WES_45_cohort
4. Postprocessing
java -jar $gatk PostprocessGermlineCNVCalls \ --calls-shard-path WES_45_cohort-calls \ --model-shard-path WES_45_cohort-model \ --sample-index 4 \ --autosomal-ref-copy-number 2 \ --allosomal-contig chrX \ --allosomal-contig chrY \ --output-genotyped-intervals outputintervals.vcf \ --output-genotyped-segments outputsegments.vcf \ --contig-ploidy-calls WES_45_cohort-calls \ -imr OVERLAPPING_ONLY \ -R /home/dnalab/bioinformatics/hg19/ucsc.hg19.fasta

Now I would like to use this model on the other 9 WES samples (same sequencing setup), is there any way how to add it to existing model without re-running all these steps? Or maybe I just dont understand the flow of the pipeline.


  • AdelaideRAdelaideR Member admin

    If the 9 WES samples were run with the same kit and you are happy with the model generated by the first 45 samples (it depends on sequencing depth, quality of samples, etc.) then you can run the GermlineCNVCaller in CASE mode. From the documentation found here

    The parametrized coverage model can be used for CNV calling on future case samples provided that they are strictly compatible with the cohort used to generate the model parameters (in terms of tissue type(s), library preparation and sequencing protocols). We refer to this mode as the CASE run mode. There is no lower limit on the number of samples for running GermlineCNVCaller in CASE mode.
Sign In or Register to comment.