Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

GATK4: CollectAlleleCount output & Model ModelSegments

Hi everyone,
I am trying to run the CNV discovery pipeline and I have noticed that the header of sample.allelicCounts.tsv (produced by CollectAlleleCount) gives problems when used to rum ModelSegments.

Indeed, ModelSegments gives me this error:
"A USER ERROR has occurred: Bad input: Bad header in file. Not all mandatory columns are present. Missing: POSITION, REF_COUNT, REF_NUCLEOTIDE, ALT_NUCLEOTIDE, ALT_COUNT"

And I think it's because of the CollectAlleleCount tsv header format:
"CONTIG POSITION REF_COUNT ALT_COUNT REF_NUCLEOTIDE ALT_NUCLEOTIDE"

Is there any specific option to modify the column order? Can I directly parse the file?

Regards,

Alessandra

Answers

  • sleeslee Member, Broadie, Dev ✭✭✭
    edited January 2018

    Hi @alegasp89,

    ModelSegments expects the output of CollectAllelicCounts, so you should not be running into this issue unless there is some other unexpected formatting problem with your file (perhaps due to a nonstandard sample name). The order of the columns in the error message is arbitrary and does not need to match the order of the columns in the file.

    Could you attach a snippet of the rest of the header in your sample.allelicCounts.tsv file (being careful to preserve tabs, etc.)?

  • sleeslee Member, Broadie, Dev ✭✭✭
    edited January 2018

    Just following up, @alegasp89, did you figure out if your file contained a nonstandard sample name? If not, it would be great if we could get a bug fix in if necessary. Perhaps we could throw a more informative message even if the sample name was to blame.

  • dcampodcampo Los AngelesMember

    Hi, I am having the same issue, and don't know how to fix it.
    Attached there's a snippet of the sample file.

    Thanks!

  • sleeslee Member, Broadie, Dev ✭✭✭
    edited June 28

    @dcampo I didn't run into any issue with parsing your snippet when running GATK 4.1.2.0 ModelSegments. Can you post your command line and version number?

    EDIT: It occurs to me that one possible error is that you are inadvertently passing some other file to the --allelic-counts argument (perhaps a denoised copy ratios file---which contains a CONTIG column, but none of the other columns listed in the exception).

    Post edited by slee on
  • dcampodcampo Los AngelesMember

    Hi, thanks for your reply. Here's my command line:

    java -jar $GATK ModelSegments \
    --denoised-copy-ratios ${name}_dir/${name}_CTC.denoisedCR.tsv \
    --allelic-counts ${name}_dir/${name}_CTC_clean.allelicCounts.tsv \
    --normal-allelic-counts ${name}_dir/${name}_WBC.denoisedCR.tsv \
    --output ${name}_dir \
    --output-prefix ${name}_CTC_clean

    I am using gatk-4.1.2.0, and the snippet I sent is from the same ${name}_CTC_clean.allelicCounts.tsv file that I am passing to --allelic-counts.
    I've also checked the paths in the scripts, and everything seems to be in order. I also made sure that the field separator is a tab, as expected.
    And I tried to run one of the samples alone on a virtual node (as opposed to as a batch submission to the cluster), specifying full paths and all, but I am getting the same error.
    Now, the denoisedCR.tsv files have different headers, right? (after the sam-style header, they have the fields CONTIG START END LOG2_COPY_RATIO), so the error only refers to the allelicCounts.tsv file?
    These allelicCounts files come from CollectAllelicCounts, and I am following the tutorials 11682 and 11683.
    I am completely lost here, so any help will be much appreciated.
    Thanks!

  • sleeslee Member, Broadie, Dev ✭✭✭

    @dcampo looks like you are passing a denoisedCR.tsv file to --normal-allelic-counts. You should be passing the result of running CollectAllelicCounts on the matched normal to this argument. Hope that resolves the issue!

  • dcampodcampo Los AngelesMember

    Oh, yes, that's the issue...it's running now.
    I knew it had to be something silly, but could not see it. Rookie mistake :)
    Thanks!

Sign In or Register to comment.