The Frontline Support team will be offline February 18 for President's Day but will be back February 19th. Thank you for your patience as we get to all of your questions!
Trouble using CombineReadCounts

Hi,
I am trying to create a PoN following
http://gatkforums.broadinstitute.org/gatk/discussion/6791/description-and-examples-of-the-steps-in-the-cnv-case-and-cnv-pon-creation-workflows#create-pon-workflow
I ran "GATK 4 CNV Proportional Coverage for Capture" and compiled the paths of the normal samples in one file (attached: listPropCov_TSCA11_15.txt)
I tried to run CombineReadCounts as follows using GATK4alpha (I tried using GATK3.7 but it gave a different error):
--
pathToGATK_jar = "....../gatk4-latest/gatk-protected.jar"
pathTotext_file_list_of_proportional_coverage_files = "....../listPropCov_TSCA11_15.txt"
pathToOutputFile = "...../merged_prop_cov_TSCA11_15.tsv"
javaPath = "/broad/software/free/Linux/redhat_6_x86_64/pkgs/jdk1.8.0_121/bin/java"
command = paste0(javaPath," -Xmx8g -jar ",pathToGATK_jar," CombineReadCounts --inputList ", pathTotext_file_list_of_proportional_coverage_files," -O ",pathToOutputFile, " -MOF 0")
system(command)
And I got the following error:
"A USER ERROR has occurred: Bad input: the input contains the sample repeated, e.g.:A3"
I checked my sample names and all of them are unique, I also checked that the interval names are unique. I tried using only 46 samples, but it gave me the same error.
What am I doing wrong? Is there another way to create a panel of normals to be used in GATK_Somatic_CNV_Toolchain_Capture ?
Thanks!
Answers
Hi @Sahar90,
If you're sure that all of your sample names are unique, it sounds like it could possibly be a bug in the sample-name parsing. Is there any way you could share a list of the sample names derived from the BAMs? Feel free to email the list to me directly if you prefer.
Hi,
I was having the same error, trying to create a PoN with samples from different sequencing batches.
The error, as it turns out, happens because two or more samples have the same target identification column, as produced by CalculateTargetCoverage (e.g. G7, A3).
I was having no trouble when creating the PoN with samples from the same batch, so I realized the target identification column corresponds to the well name, and when combining samples from different batches those well names are bound to be repeated.
This should probably be documented in https://gatkforums.broadinstitute.org/gatk/discussion/9143/how-to-call-somatic-copy-number-variants-using-gatk4-cnv
Issue · Github
by Sheila
As this might be useful for others, I ended up renaming the sample name in the BAM file using the commands in:
https://github.com/IARCbioinfo/BAM-tricks#change-sample-name
... from the well name to the sample_id.
@mcadosch
Hi,
Thank you for posting your solutions. I will let the original documenter of https://gatkforums.broadinstitute.org/gatk/discussion/9143/how-to-call-somatic-copy-number-variants-using-gatk4-cnv know.
-Sheila