GATK4: CollectAlleleCount output & Model ModelSegments

Hi everyone,
I am trying to run the CNV discovery pipeline and I have noticed that the header of sample.allelicCounts.tsv (produced by CollectAlleleCount) gives problems when used to rum ModelSegments.

Indeed, ModelSegments gives me this error:
"A USER ERROR has occurred: Bad input: Bad header in file. Not all mandatory columns are present. Missing: POSITION, REF_COUNT, REF_NUCLEOTIDE, ALT_NUCLEOTIDE, ALT_COUNT"

And I think it's because of the CollectAlleleCount tsv header format:
"CONTIG POSITION REF_COUNT ALT_COUNT REF_NUCLEOTIDE ALT_NUCLEOTIDE"

Is there any specific option to modify the column order? Can I directly parse the file?

Regards,

Alessandra

Answers

  • sleeslee Member, Broadie, Dev
    edited January 19

    Hi @alegasp89,

    ModelSegments expects the output of CollectAllelicCounts, so you should not be running into this issue unless there is some other unexpected formatting problem with your file (perhaps due to a nonstandard sample name). The order of the columns in the error message is arbitrary and does not need to match the order of the columns in the file.

    Could you attach a snippet of the rest of the header in your sample.allelicCounts.tsv file (being careful to preserve tabs, etc.)?

  • sleeslee Member, Broadie, Dev
    edited January 23

    Just following up, @alegasp89, did you figure out if your file contained a nonstandard sample name? If not, it would be great if we could get a bug fix in if necessary. Perhaps we could throw a more informative message even if the sample name was to blame.

Sign In or Register to comment.