Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office on November 11th and 13th 2019, due to the U.S. holiday(Veteran's day) and due to a team event(Nov 13th). We will return to monitoring the GATK forum on November 12th and 14th respectively. Thank you for your patience.

GATK4 4.0.9.0 PlotModeledSegments error

rcorbett2rcorbett2 vancouverMember

Hi there,

I'm relatively new to GATK copy number analysis. I have been following the following Tutorial#11682 and Tutorial#11683 without issue until I got to to the PlotModeledSegments section.

Here are the steps I followed so far:

#preprocess the reference to make it compatible with GATK
/gsc/software/linux-x86_64/jre1.8.0_66/bin/java -Xms512m -Xmx2g -jar ~/bin/picard_2.18.11.jar CreateSequenceDictionary REFERENCE=GRCh37-lite.fa O=GRCh37-lite.dict

#preprocess the intervals
 ~/bin/gatk-4.0.9.0/gatk PreprocessIntervals -R GRCh37-lite.fa --bin-length 1000 -interval-merging-rule OVERLAPPING_ONLY -O GRCh37-lite.1K.interval_list

#Collect the read counts for a sample (repeated for all 100 normals and all tumours)
/home/rcorbett/bin/gatk-4.0.8.1/gatk CollectReadCounts  -I normal.bam -L GRCh37-lite.1K.interval_list --interval-merging-rule OVERLAPPING_ONLY -O ${1}.hdf5

#merge normals into a panel of normals (PON)
~/bin/gatk-4.0.9.0/gatk --java-options "-Xmx32000m" CreateReadCountPanelOfNormals -I normal1.bam.hdf5 -I normal2.bam.hdf5 ........ -O PON

#denoise a single tumour against the PON (run this for all tumours)
~/bin/gatk-4.0.9.0/gatk --java-options "-Xms24g" DenoiseReadCounts -I tumour1.bam.hdf5 --count-panel-of-normals PON --standardized-copy-ratios tumour.standardizedCR.tsv --denoised-copy-ratios tumour.denoised.tsv

#collect the variants from the normals that have the G5A dbSNP flag
ls *vcf | xargs -i echo "java -Xmx16g -jar /gsc/software/linux-x86/snpEff-4.1/snpEff/SnpSift.jar filter \"(ID =~ 'rs') && (exists(G5A)) && (!exists(INDEL))\" {} > {}.G5A.vcf" | parallel -j 24 --progress

#change the headers of the VCFs so they'll work with GATK
ls *.G5A.vcf | xargs -i echo "cat {} | sed 's/PL,Number=-1/PL,Number=A/' | sed 's/Number=0/Number=A/'  > {}.clean.vcf" | bash

#Use the lists of variants to get the allele counts in both the tumour and normal at each position (this is an example for one normal and one tumour)
~/bin/gatk-4.0.9.0/gatk --java-options "-Xms24g" CollectAllelicCounts -L normal1*G5A.vcf.clean.vcf -I normal1.bam -R GRCh37-lite.fa -O normal1.allelicCounts.tsv
~/bin/gatk-4.0.9.0/gatk --java-options "-Xms24g" CollectAllelicCounts -L normal1*G5A.vcf.clean.vcf -I tumour1.bam -R GRCh37-lite.fa -O tumour1.allelicCounts.tsv

#segmentation
~/bin/gatk-4.0.9.0/gatk --java-options "-Xms4g" ModelSegments --denoised-copy-ratios tumour1.bam.hdf5.denoised.tsv --allelic-counts tumour1.allelicCounts.tsv --normal-allelic-counts normal1.allelicCounts.tsv --output results/ --output-prefix number1

#Plot
~/bin/gatk-4.0.9.0/gatk PlotModeledSegments --denoised-copy-ratios tumour1.bam.hdf5.denoised.tsv --allelic-counts tumour1.allelicCounts.tsv --segments results/number1.modelFinal.seg --sequence-dictionary GRCh37-lite.dict --minimum-contig-length 46709983 --output plots --output-prefix number1

Everything seems to make sense up until the last step where I get the following error:

[September 28, 2018 3:48:09 PM PDT] org.broadinstitute.hellbender.tools.copynumber.plotting.PlotModeledSegments done. Elapsed time: 0.17 minutes.
Runtime.totalMemory()=3468689408
java.lang.IllegalArgumentException: Number of allelic-count points in input modeled-segments file is inconsistent with that in input heterozygous allelic-counts file.
at org.broadinstitute.hellbender.utils.Utils.validateArg(Utils.java:724)

My tumour allelicCounts.tsv file has 2198786 entries while the sum of the 5th column of my modelFinal.seg file is 1313278. However, I'm really just guessing that these are the two numbers causing the difference.

Could you help me figure this out?

thanks,
Richard

Tagged:

Answers

  • rcorbett2rcorbett2 vancouverMember

    Just noticed that my CollectAllelicCounts command was run with version 4.0.8.1. I've seen that there are some issues with variant calling near the ends of chromosomes with that version. I'll re-run that step and re-try here.

  • rcorbett2rcorbett2 vancouverMember

    After regenerating the allelic-counts with 4.0.9.0 I get the same error as above.

  • shleeshlee CambridgeMember, Broadie ✭✭✭✭✭

    Hi @rcorbett2,

    For the PlotModeledSegments error:

    java.lang.IllegalArgumentException: Number of allelic-count points in input modeled-segments file is inconsistent with that in input heterozygous allelic-counts file.

    Please check that you are using the correct allelic-counts file with PlotModeledSegments. Remember from Tutorial#11683 that ModelSegments produces two different allelic-counts files:

    1. hcc1143_T_clean.hets.normal.tsv
    2. hcc1143_T_clean.hets.tsv

    The .hets.normal.tsv file contains allelic counts for sites in the control/normal that are heterozygous. The .hets.tsv file contains allelic counts for the same sites for the case/tumor. PlotModeledSegments expects [9] .hets.tsv.

  • rcorbett2rcorbett2 vancouverMember

    Thanks! It worked!

Sign In or Register to comment.