The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Did you remember to?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Surround blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block.
Powered by Vanilla. Made with Bootstrap.
Picard 2.9.0 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

RE: Gene list headers

lkchanlkchan Member Posts: 2
edited October 2012 in Ask the GATK team

Hi,

I am learning to use the DepthofCoverage function to obtain the gene coverage information for a collection of bacterial contigs that were mapped with metagenomic reads. The original post introducing this function is here: http://gatkforums.broadinstitute.org/discussion/40/depthofcoverage-v3-0-how-much-data-do-i-have#latest

In the post, you mentioned the gene list, as follow:

-geneList /path/to/gene/list.txt

The provided gene list must be of the following format:

585     NM_001005484    chr1    +       58953   59871   58953   59871   1       58953,  59871,  0       OR4F5   cmpl    cmpl    0,
587     NM_001005224    chr1    +       357521  358460  357521  358460  1       357521, 358460, 0       OR4F3   cmpl    cmpl    0,

I have three inquiries:

  1. Can you please provide headers to the values in each column?
  2. I am working with bacterial genomic contigs, can you please specify what basic information is needed for a gene list (e.g., name of contig, name of gene, location of gene in the contig, from... to ..., etc.)?

Thanks so much!

Leo

Post edited by Geraldine_VdAuwera on

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Administrator, Dev Posts: 11,163 admin
    Accepted Answer

    This article explains how to work with refseq gene lists:

    http://www.broadinstitute.org/gatk/guide/article?id=1329

    Geraldine Van der Auwera, PhD

  • lkchanlkchan Member Posts: 2

    Hi Geraldine,
    Thanks. I am working on a genome assembled from metagenomes. So, I do not have refseq for this "genome". I have contigs and coding regions predicted from the contigs. I mapped the metegenomic raw reads to the contigs and would like to get coverage for all genes. I would have to generate a custom gene list. Thanks!
    Leo

  • marlauxmarlaux BrazilMember Posts: 3

    Hi lkchan, did you reach the solution for your problem? because I am trying to do exactly this right now, I need to construct my custom genelist based on gff3/bed file, but I just can't find the headers! It is a .tsv file, so I could easily construct this file, right? but this example above doesn't tell us what is what... Thank you!

Sign In or Register to comment.