Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

How can I identify the library ID in the fastq file name?

Dear NGS experts,

I just started to analyse NGS data. I recently received some files that were generated at EMBL. I want to start analyzing them but have a hard time to figure out the read groups in order to start with GATK data preprocessing (from FASTQ to BAM).
I am using the following article to learm about the different read groups i need to build the BAM file: here
My FASTQ file name is structured exactly as the one in the red rectangle in the image below. Sadly there is no "lib" in the file name.
Can anyone identify the read groups, especially the library identifier for me?
The Image source is from here.

Thank you so much.

zmk

Tagged:

Best Answer

Answers

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    Open the Bam file with samtools using samtools view filename.bam | head -n 25

    You should be able to see readgroup ids from mapped reads directly from bam.

    Or just open them bam file using IGV and click on a read.

  • Hi Skywarrior. Sorry perhaps I did not explain it properly.

    I received just some fastq files. No bam files. Now I try to build bam files using Picard.
    Thus I have to specify read group tags.
    Using the file name I can read off the sample name and the lane id. But I don't know which of the name parts resemble the library id.

    Can you identify the different read groups from the file name in the red backet?

    Thanks
    Zmk

  • Great,
    this is exactly what I did in the mean time.
    Now I have all the information for starting the conversion from fastq to ubam.
    Thanks,
    zmk

    P.s.
    the library name should be the 4s002434.

Sign In or Register to comment.