Holiday Notice:
The Frontline Support team will be slow to respond December 17-18 due to an institute-wide retreat and offline December 22- January 1, while the institute is closed. Thank you for your patience during these next few weeks. Happy Holidays!

sorting a BAM file with PICARD

BogdanBogdan Palo Alto, CAMember ✭✭

Dear all,

would you please advise: I am using PICARD in order to sort a BAM based on read name (it is a BAM file from EGA that contains cancer sequencing data), and when I do run PICARD SortSam, I am getting the following error (below), and the file does not get sorted. Is there a way i could fix it ? Thank you very much !

**Exception in thread "main" htsjdk.samtools.SAMFormatException: SAM validation error: ERROR: Record 957453876, Read name HWI-ST7001002R:223:C14GPACXX:3:1305:7471:56486, MAPQ should be 0 for unmapped read"
**
The command from PICARD is :

**java -jar $PICARD SortSam \
I=$FILE \
O="${FILE}.sorted.picard.queryname.bam" **

Tagged:

Best Answer

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @Bogdan
    Hi Bogdan,

    It looks like your BAM file is malformed. You will need to fix those errors before proceeding.

    Which aligner did you use? It looks like some unmapped reads have mapping qualities that are not 0.

    -Sheila

  • Theostef974Theostef974 Member
    edited November 21

    Hello everybody,

    I am asking my question in this exisiting thread because as a new user I cannot create a new subject.
    I working with RNAseq Data and I'm trying to create VCF files with GATK. Here is my command line:

    SNPref=/home/theos974/projects/def-thchlava/Chromomes_GATK_Files/GenomeAnalysisTK.jar
    Humanref=/home/theos974/projects/def-benlab11/reference/hg38ercc.fa

    readarray -t input_bam_files_cohort2 < input_bam_files_cohort2.txt
    readarray -t output_vcf_Files_chr1_Cohort2 < output_vcf_Files_chr1_Cohort2.txt

    java -jar $SNPref -L /home/theos974/projects/def-thchlava/Chromomes_GATK_Files/chr1_KG.all.chr.bim.hg38.intervals -T HaplotypeCaller -R $Humanref -U ALLOW_N_CIGAR_READS -rf ReassignMappingQuality -DMQ 60 -I ${input_bam_files_cohort2[$SLURM_ARRAY_TASK_ID]} -stand_call_conf 20 -o ${output_vcf_Files_chr1_Cohort2[$SLURM_ARRAY_TASK_ID]}"

    This commad seems to be good because it worked with my first set of data. But with my new set I obtain this ERROR message:

    ERROR MESSAGE: SAM/BAM/CRAM file /home/theos974/projects/def-thchlava/Cohort2/bam/NAC_215.sorted.bam is malformed. Please see "software.broadinstitute.org/gatk/documentation/article?id=1317" for more information. Error details: SAM file doesn't have any read groups defined in the header. The GATK no longer supports SAM files without read groups

    I hope this is not a big problem,

    Thank you very much in advance for any answer you would provide me.

  • HI Banu,

    Thank you for the link you gave me. I think I will be able to find a solution reading this.

    Theo

Sign In or Register to comment.