To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

ERROR MESSAGE: Unknown file is malformed in DepthOfCoverage

Hi there,

I am using DepthOfCoverage to get the exome coverage of a target reference and getting this error.
ERROR MESSAGE: Unknown file is malformed: Could not parse location from line: chr1 11873 12227 NR_046018_exon_0_0_chr1_11874_f 0 +

My command line is

ava -Xmx64g -jar GenomeAnalysisTK.jar
-T DepthOfCoverage
-I bamfiles.list //A list of paths to bam files
-R ucsc.hg19.fa
-L clinical_exome_cod.bed.interval_list
-geneList:REFSEQ exonTrack.refSeq.chr121XYM.sort
-ct 10 -ct 20 -ct 40 -ct 80 -ct 100
-o bamfiles

I used AWK to modify the refseq file as from the instructions at https://www.broadinstitute.org/gatk/guide/article?id=1329 and this same command was working when I did to get gene coverage.

So, what am I doing wrong now?

Thanks

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie
    This seems to be complaining about your gene list. Make sure the values are separated by tabs, not spaces, and use the .refseq extension in the filename.
  • VergiliusVergilius ItalyMember

    I am sure that the extension is .refseq and the values are tab separated. But still I am not able to understand this error.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @Vergilius
    Hi,

    I'm a little confused why you used AWK? I ran this tool with no issue simply by following the instructions in the article you linked to above, and I did not need to modify the output file. Also, what do you mean "this same command was working when I did to get gene coverage"?

    -Sheila

  • VergiliusVergilius ItalyMember

    Hi Sheila,
    sorry if I have been confusing.

    I followed the instructions in the article and used AWK to implement the request: "To run with the GATK, contigs other than the standard 1-22,X,Y,MT must be removed, and the file sorted in karyotypic order." Hence, I used the following commands to remove non standard chromosomes and sort:

    awk '$3 ~ /^chr[12]?[0-9]$/' geneTrack.refSeq > geneTrack1
    awk '$3 ~ /^chr[XY]$/' geneTrack.refSeq > geneTrackxy
    awk '$3 ~ /^chrMT$/' geneTrack.refSeq > geneTrackM
    cat geneTrack1 geneTrackxy geneTrackM > geneTrackFinal
    sort -k3,3V -k5,5n -k6,6n geneTrackFinal > geneTrack1xy_sort.txt

    When I say that "this same command was working ....", I meant that since I wanted to get both exome and gene coverage I first downloaded geneTrack and got it with the previous procedure and then I downloaded exonTrack but I got the error.

Sign In or Register to comment.