Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

sorting a BAM file with PICARD

BogdanBogdan Palo Alto, CAMember ✭✭

Dear all,

would you please advise: I am using PICARD in order to sort a BAM based on read name (it is a BAM file from EGA that contains cancer sequencing data), and when I do run PICARD SortSam, I am getting the following error (below), and the file does not get sorted. Is there a way i could fix it ? Thank you very much !

**Exception in thread "main" htsjdk.samtools.SAMFormatException: SAM validation error: ERROR: Record 957453876, Read name HWI-ST7001002R:223:C14GPACXX:3:1305:7471:56486, MAPQ should be 0 for unmapped read"
**
The command from PICARD is :

**java -jar $PICARD SortSam \
I=$FILE \
O="${FILE}.sorted.picard.queryname.bam" **

Tagged:

Best Answer

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Bogdan
    Hi Bogdan,

    It looks like your BAM file is malformed. You will need to fix those errors before proceeding.

    Which aligner did you use? It looks like some unmapped reads have mapping qualities that are not 0.

    -Sheila

  • Theostef974Theostef974 Member
    edited November 2018

    Hello everybody,

    I am asking my question in this exisiting thread because as a new user I cannot create a new subject.
    I working with RNAseq Data and I'm trying to create VCF files with GATK. Here is my command line:

    SNPref=/home/theos974/projects/def-thchlava/Chromomes_GATK_Files/GenomeAnalysisTK.jar
    Humanref=/home/theos974/projects/def-benlab11/reference/hg38ercc.fa

    readarray -t input_bam_files_cohort2 < input_bam_files_cohort2.txt
    readarray -t output_vcf_Files_chr1_Cohort2 < output_vcf_Files_chr1_Cohort2.txt

    java -jar $SNPref -L /home/theos974/projects/def-thchlava/Chromomes_GATK_Files/chr1_KG.all.chr.bim.hg38.intervals -T HaplotypeCaller -R $Humanref -U ALLOW_N_CIGAR_READS -rf ReassignMappingQuality -DMQ 60 -I ${input_bam_files_cohort2[$SLURM_ARRAY_TASK_ID]} -stand_call_conf 20 -o ${output_vcf_Files_chr1_Cohort2[$SLURM_ARRAY_TASK_ID]}"

    This commad seems to be good because it worked with my first set of data. But with my new set I obtain this ERROR message:

    ERROR MESSAGE: SAM/BAM/CRAM file /home/theos974/projects/def-thchlava/Cohort2/bam/NAC_215.sorted.bam is malformed. Please see "software.broadinstitute.org/gatk/documentation/article?id=1317" for more information. Error details: SAM file doesn't have any read groups defined in the header. The GATK no longer supports SAM files without read groups

    I hope this is not a big problem,

    Thank you very much in advance for any answer you would provide me.

  • HI Banu,

    Thank you for the link you gave me. I think I will be able to find a solution reading this.

    Theo

Sign In or Register to comment.