To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

Question about BQSR

Hello!
I have 100 samples and I followed the best practices. Every step is fine but when I was using the PrintReads it turned up to be an error.

Version is 3.8

The commands I ran,
$BWA mem -t 4 -aM -R '@RG\tID:seq103\tSM:seq103\tPL:ILLUMINA\tLB:seq103' $GENOME 37_1_paird.fq 37_2_paird.fq > seq103.sam
java -XX:+UseSerialGC -jar $ReorderSam.jar I=$sample.sam O=$sample-reorder.sam R=$GENOME
java -jar $SamFormatConverter.jar I=$sample-reorder.sam O=$sample.bam
java -jar $SortSam.jar I=$sample.bam O=$sample-sort.bam SO=coordinate
java -jar $MarkDuplicates.jar MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=2000 I=$sample-sort.bam O=$sample-md.bam M=$sample-md.m
samtools index $sample-md.bam
java -Xmx10240m -jar $GATK -T HaplotypeCaller -R $GENOME -I $sample-md.bam -stand_call_conf 30 --emitRefConfidence GVCF -o $sample.raw1.g.vcf
java -jar $GATK -T BaseRecalibrator -R $GENOME -I seq103-md.bam -knownSites seq103.raw1.g.vcf -o seq103-BQSR.1.grp -bqsrBAQGOP 30 -nct 2
java -jar $GATK -T PrintReads -R $GENOME -I seq103-md.bam -BQSR seq103-BQSR.1.grp -o seq103.b1.bam -nct 2

It stopped at the beginning of PrintReads . The error message is ,

ERROR MESSAGE: SAM/BAM/CRAM file htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter@4f6a5cc9 is malformed. Please see https://software.broadinstitute.org/gatk/documentation/article?id=1317for more information.

Error details: the BAM file has a read with no stored bases (i.e. it uses '*') which is not supported in the GATK; see the --filter_bases_not_stored argument. Offender: K00132:80:H3WVJBBXX:5:1106:19258:44746

So why the bam files has * ? Is there something wrong with my commands?
Thanks !

Tagged:

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @niuguohao
    Hi,

    Can you retrace your steps and find out exactly when the issue occurred? Can you try validating your input BAM file at each step with ValidateSamFile and letting us know which step throws an error? It looks like BaseRecalibrator from your post, but I just want to make sure the error did not arise in a previous step.

    Thanks,
    Sheila

  • Thanks, Sheila!
    I think the error arose in the previous steps.
    Yesterday I tried to add two steps before the BaseRecalibrator and it worked.
    The steps I ran are RealignerTargetCreator and IndelRealigner.
    Command line,
    java -jar $GATK -T RealignerTargetCreator -R $GENOME -I $sample-md.bam -o $sample-md.intervals
    java -jar $GATK -T IndelRealigner -R $GENOME -filterNoBases -targetIntervals $sample-md.intervals -I $sample-md.bam -o $sample-md_rl.bam
    Then I used the new bam file to run BQSR. It didn't post any error.

    But I think the two steps are not nessary for the best practices, right?

    Thanks,
    Niu.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @niuguohao
    Hi Niu,

    So, running the Indel Realignment step caused the error to go away? Weird. Indeed, those steps are not required anymore. You can read more about why here.

    -Sheila

Sign In or Register to comment.