We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

GATK4 4.0.9.0 PlotModeledSegments error

rcorbett2rcorbett2 vancouverMember

Hi there,

I'm relatively new to GATK copy number analysis. I have been following the following Tutorial#11682 and Tutorial#11683 without issue until I got to to the PlotModeledSegments section.

Here are the steps I followed so far:

#preprocess the reference to make it compatible with GATK
/gsc/software/linux-x86_64/jre1.8.0_66/bin/java -Xms512m -Xmx2g -jar ~/bin/picard_2.18.11.jar CreateSequenceDictionary REFERENCE=GRCh37-lite.fa O=GRCh37-lite.dict

#preprocess the intervals
 ~/bin/gatk-4.0.9.0/gatk PreprocessIntervals -R GRCh37-lite.fa --bin-length 1000 -interval-merging-rule OVERLAPPING_ONLY -O GRCh37-lite.1K.interval_list

#Collect the read counts for a sample (repeated for all 100 normals and all tumours)
/home/rcorbett/bin/gatk-4.0.8.1/gatk CollectReadCounts  -I normal.bam -L GRCh37-lite.1K.interval_list --interval-merging-rule OVERLAPPING_ONLY -O ${1}.hdf5

#merge normals into a panel of normals (PON)
~/bin/gatk-4.0.9.0/gatk --java-options "-Xmx32000m" CreateReadCountPanelOfNormals -I normal1.bam.hdf5 -I normal2.bam.hdf5 ........ -O PON

#denoise a single tumour against the PON (run this for all tumours)
~/bin/gatk-4.0.9.0/gatk --java-options "-Xms24g" DenoiseReadCounts -I tumour1.bam.hdf5 --count-panel-of-normals PON --standardized-copy-ratios tumour.standardizedCR.tsv --denoised-copy-ratios tumour.denoised.tsv

#collect the variants from the normals that have the G5A dbSNP flag
ls *vcf | xargs -i echo "java -Xmx16g -jar /gsc/software/linux-x86/snpEff-4.1/snpEff/SnpSift.jar filter \"(ID =~ 'rs') && (exists(G5A)) && (!exists(INDEL))\" {} > {}.G5A.vcf" | parallel -j 24 --progress

#change the headers of the VCFs so they'll work with GATK
ls *.G5A.vcf | xargs -i echo "cat {} | sed 's/PL,Number=-1/PL,Number=A/' | sed 's/Number=0/Number=A/'  > {}.clean.vcf" | bash

#Use the lists of variants to get the allele counts in both the tumour and normal at each position (this is an example for one normal and one tumour)
~/bin/gatk-4.0.9.0/gatk --java-options "-Xms24g" CollectAllelicCounts -L normal1*G5A.vcf.clean.vcf -I normal1.bam -R GRCh37-lite.fa -O normal1.allelicCounts.tsv
~/bin/gatk-4.0.9.0/gatk --java-options "-Xms24g" CollectAllelicCounts -L normal1*G5A.vcf.clean.vcf -I tumour1.bam -R GRCh37-lite.fa -O tumour1.allelicCounts.tsv

#segmentation
~/bin/gatk-4.0.9.0/gatk --java-options "-Xms4g" ModelSegments --denoised-copy-ratios tumour1.bam.hdf5.denoised.tsv --allelic-counts tumour1.allelicCounts.tsv --normal-allelic-counts normal1.allelicCounts.tsv --output results/ --output-prefix number1

#Plot
~/bin/gatk-4.0.9.0/gatk PlotModeledSegments --denoised-copy-ratios tumour1.bam.hdf5.denoised.tsv --allelic-counts tumour1.allelicCounts.tsv --segments results/number1.modelFinal.seg --sequence-dictionary GRCh37-lite.dict --minimum-contig-length 46709983 --output plots --output-prefix number1

Everything seems to make sense up until the last step where I get the following error:

[September 28, 2018 3:48:09 PM PDT] org.broadinstitute.hellbender.tools.copynumber.plotting.PlotModeledSegments done. Elapsed time: 0.17 minutes.
Runtime.totalMemory()=3468689408
java.lang.IllegalArgumentException: Number of allelic-count points in input modeled-segments file is inconsistent with that in input heterozygous allelic-counts file.
at org.broadinstitute.hellbender.utils.Utils.validateArg(Utils.java:724)

My tumour allelicCounts.tsv file has 2198786 entries while the sum of the 5th column of my modelFinal.seg file is 1313278. However, I'm really just guessing that these are the two numbers causing the difference.

Could you help me figure this out?

thanks,
Richard

Tagged:

Answers

  • rcorbett2rcorbett2 vancouverMember

    Just noticed that my CollectAllelicCounts command was run with version 4.0.8.1. I've seen that there are some issues with variant calling near the ends of chromosomes with that version. I'll re-run that step and re-try here.

  • rcorbett2rcorbett2 vancouverMember

    After regenerating the allelic-counts with 4.0.9.0 I get the same error as above.

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @rcorbett2,

    For the PlotModeledSegments error:

    java.lang.IllegalArgumentException: Number of allelic-count points in input modeled-segments file is inconsistent with that in input heterozygous allelic-counts file.

    Please check that you are using the correct allelic-counts file with PlotModeledSegments. Remember from Tutorial#11683 that ModelSegments produces two different allelic-counts files:

    1. hcc1143_T_clean.hets.normal.tsv
    2. hcc1143_T_clean.hets.tsv

    The .hets.normal.tsv file contains allelic counts for sites in the control/normal that are heterozygous. The .hets.tsv file contains allelic counts for the same sites for the case/tumor. PlotModeledSegments expects [9] .hets.tsv.

  • rcorbett2rcorbett2 vancouverMember

    Thanks! It worked!

Sign In or Register to comment.