How to use bwa mem for paired-end Illumina reads

suggi_01suggi_01 germanyMember
edited November 2014 in Ask the GATK team

Dear All,

we would like to use the bwa mem algorithm for the alignment of paired-end (100 bp) Illumina reads and variant calling
with GATK. Unfortunately there are some problems understanding the command description.
Do I need to use the -p and [mates.fq] options for paired-end reads?
And what about simply using the command below?

bwa mem -M -t 16 ref.fa read1.fq read2.fq > aln.sam

Command description (http://bio-bwa.sourceforge.net/bwa.shtml):
"If mates.fq file is absent and option -p is not set, this command regards input reads are single-end. If mates.fq is present, this command assumes the i-th read in reads.fq and the i-th read in mates.fq constitute a read pair. If -p is used, the command assumes the 2i-th and the (2i+1)-th read in reads.fq constitute a read pair (such input file is said to be interleaved). In this case, mates.fq is ignored. In the paired-end mode, the mem command will infer the read orientation and the insert size distribution from a batch of reads.
The BWA-MEM algorithm performs local alignment. It may produce multiple primary alignments for different part of a query sequence. This is a crucial feature for long sequences. However, some tools such as Picard’s markDuplicates does not work with split alignments. One may consider to use option -M to flag shorter split hits as secondary."

We appreciate your help!

Best regards,
Sugi

Best Answer

Answers

  • cruckertcruckert GermanyMember

    If you have two fastq files your command:
    bwa mem -M -t 16 ref.fa read1.fq read2.fq > aln.sam
    is absolutely fine. Your read2.fq is called mates.fq in the bwa examples. If you view the first lines of both files the read names are identical despite a 1 or 2 for the corresponding read of the pair.

    If you only have one interleaved fastq file you would use the -p option:
    bwa mem -M -t 16 -p ref.fa read.fq > aln.sam
    In this case both reads of a pair are in the same fastq file successively. Have a look at the read names.

    For the unlikely case you would like to handle your paired-end reads as single ends the command is:
    bwa mem -M -t 16 ref.fa read.fq > aln.sam

  • ss.santosss.santos LisbonMember

    Hi, I have a set of paired end reads that were heavily contaminated with adapters, and after I cleaned them only about 30% of both pairs survived, and I got a big single-end file remaining. Is there any way I can run bwa mem with 3 sets of reads, 2 pair-ended and one single-ended? I tried the command below, but I got the exact same result as when I used only the pair-ended files, so it seems like bwa mem simply didn't read the third file.

    bwa mem ref.fa reads1.fq reads2.fq single_end.fq > reads.sam

    Thanks a lot to anyone who can give me a hand!!

    Sandra

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @ss.santos
    Hi Sandra,

    We cannot help with bwa questions. However, I think bwa mem takes only paired reads or single end reads. Instead of adding all three files, add the two paired end files and the single end file separately. http://bio-bwa.sourceforge.net/bwa.shtml

    -Sheila

  • ss.santosss.santos LisbonMember

    Hi Sheila, thanks for that. Another question, about the read group. I read that it's important for gatk's downstream processing, but I've never used it. Where do I find the information to create the read group information??

    Sandra

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @ss.santos
    Hi Sandra,

    I think this article will answer all of your questions: http://gatkforums.broadinstitute.org/discussion/1317/collected-faqs-about-bam-files

    -Sheila

  • Hi, I am trying to run BWA using the following command:

    module load bwa/0.7.15

    time bwa mem -aCMP -t 12 -R '@RG ID:MiSeq SM:VA-002 PL:illumina LB:XT40 PU:Batch22' referencebasename R1_001.fastq.gz R2_001.fastq.gz > VA-002-aligned_reads.sam

    Error is: [E::bwa_idx_load_from_disk] fail to locate the index files real 0m0.005s user 0m0.001s sys 0m0.001s /var/spool/torque/mom_priv/jobs/2184674.pbs.scm.SC: line 12: R2_001.fastq.gz: command not found

    So it seems to be unable to read which of the files are my indexes and which are the read pairs? I have already ran bwa index successfully and the samtools faidx on the reference.

    Thanks!

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @imginger
    Hi,

    Have a look at this thread. We don't provide support for bwa, but if you google the error message, you should find some other helpful threads :smile:

    -Sheila

  • dhwani2410dhwani2410 Member
    edited November 2017

    how to do exact mapping using BWA. What is the parameter used for it .
    How can we use the option -M INT: Mismatch penalty for the same. I want to only keep reads with exact mapping to the reference sequence.

    Post edited by dhwani2410 on
  • EADGEADG KielMember ✭✭✭

    Hi @dhwani2410,

    if you want only reads with exact mapping, just filter your mapped sam file. If you run bwa mem with default options (Matching Score is 1 ) a perfect matching read should have an alignment score which is identical to the read length.

    This thread on seqAnswers explain to you who to do it :)
    seqanswers.com/forums/showthread.php?t=24123

    Greetings EADG

  • BegaliBegali GermanyMember

    @Sheila
    @Geraldine_VdAuwera

    I am mapping one input fastq( Panceratics cfDNA) with bwa mem
    bwa mem hg83 input.fq -> output.sam
    RAM memory around 32GB

    but [M::bwa_idx_load_from_disk] read 0 ALT contigs
    more than 3 hours still in the first line
    is there any parameter should be add as I applied before last year but with plant genome was fine now also with plant the same problem still [M::bwa_idx_load_from_disk] read 0 ALT contigs
    I want to check if bwa mem command line has been updated or what

    Thanks in advance

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Begali
    Hi,

    Are you using the latest version of BWA? You may want to check with their team as well, since we do not help with BWA questions.

    -Sheila

  • BegaliBegali GermanyMember

    @Sheila
    hi

    yes last version and many thanks for your reply , the problem was solved. it worked suddenly

    -Begali

  • madzayasodaramadzayasodara UCLAMember
    edited January 26
    Hello,
    I am using:
    bwa mem -R @RG\tID:firstrun\tSM:1Z_S57\tPL:illumina\tLB:lib1\tPU:006 -p ref.fa read_u.fq > aln.sam
    to align my paired-end reads, and the alignment happens without errors.
    However, in the log files I see:
    [M::mem_process_seqs] Processed 87234 reads in 15.023 CPU sec, 14.867 real sec
    [M::process] 101954 single-end sequences; 0 paired-end sequences
    [M::process] read 115544 sequences (9922652 bp)...
    Does this means that my reads are being aligned as single-end?
    -madza
  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    HI @madzayasodara

    Has @SkyWarrior's suggestion resolved your issue?

Sign In or Register to comment.