Service Notice: Due to the blizzard currently hammering the US Northeast, the Broad is shut down and the GATK forum will be mostly unattended while we hunker down and sip hot cocoa with marshmallows. Assuming the power stays on and we're able to dig ourselves out of the snow when it's all over, normal service should resume Wednesday or Thursday.

DepthofCoverage

jujojujo Posts: 3Member

Hi,

im trying to calculate the coverage of a exome sequencing project.

My Code is

java -jar $gatk_jar \
-T DepthOfCoverage \
-R $ref_genome \
-I "$sample_name".recal_reads.bam \
-o $result_dir/$sample_name/coverage/"$sample_name"  \
-geneList $ref_refseq \
-L test.bed \
-ct 10 -ct 20 -ct 30
but i get an ERROR that i cant solve...
ERROR MESSAGE: Unknown file is malformed:

Could not parse location from line: chr1 66999814 67000061 NM_032291_exon_0_10_chr1_66999825_f 0 +

This error relates to my refseq_file but i cant find the problem with it. This is the first line from my geneList so something is messed up.

I created it along http://www.broadinstitute.org/gatk/guide/article?id=1329 and used the script sortByRef.pl to sort it by the fai file from your bundle

Can you give me any advise to check on ?

Thank you very much for helping!

Tagged:

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,957Administrator, GATK Developer admin

    Hi there,

    Try passing the refseq as -geneList:REFSEQ $ref_refseq to ensure that the file gets parsed using the correct format presets.

    Geraldine Van der Auwera, PhD

  • jujojujo Posts: 3Member
    edited November 2013

    Yeah this solves the error!

    But now I have a new one, which comes from the order of the file. I have tried to use our suggested perl script. My Command: ` MESSAGE: Input file is not sorted by start position.

    ERROR We saw a record with a start of chr1:48998527 after a record with a start of chr1:66999825, for input source: geneTrack.refSeq.sorted.txt

    ` I know that message is clear but i dont know what is going wrong.

    I called your perl script with:

    ./sortbyref.pl geneTrack.refSeq ucsc.hg19.fasta.fai > geneTrack.refSeq.sorted.txt

    But i guess there is something wrong with that .fai file ?

    Post edited by jujo on
  • pdexheimerpdexheimer Posts: 407Member, GSA Collaborator ✭✭✭✭

    It's been a while since I've used it, but I think that script has a default value (overridden on the command line) for which column contains the actual position values. I think that default must be wrong for your file

Sign In or Register to comment.