Using previously processed WXS BAM file

pagarwal14pagarwal14 Durham, NCMember

Hi,
I have received some cancer patient derived xenograft sampleb WXS data as BAM files that seem to have been partially processed through the GATK pipeline. Based on the header information it appears that bwa alignment, Indel Realignment and Mark Duplicates were run. I was planning to create a fastq file from the BAM file and take it through the latest GATK pipeline, but since the BAM file is not just from alignment stage but further downstream, I am not sure how to process the BAM file. Can you please advise the best strategy for processing this BAM file using the GATK pipeline. Can you make out from the header information which version of GATK was used and if this is still available and can be used. Some information from the header is given below and the entire header is attached as a text file.
Thanks,

  • Pankaj
    $ samtools view -H 10960X13.bam
    @HD VN:1.4 GO:none SO:coordinate
    @SQ SN:1 LN:249250621
    @SQ SN:2 LN:243199373
    @SQ SN:3 LN:198022430
    @SQ SN:4 LN:191154276
    ...
    @SQ SN:GL000192.1 LN:547496
    @RG ID:10960X13_utah-140807-C5E3HANXX_s_4-lumd_rnaseq_and_exome-CGATGT PL:Illumina PU:4-lumd_rnaseq_and_exome-CGATGT SM:10960X13
    @PG ID:GATK IndelRealigner CL:knownAlleles=[(RodBinding name=knownAlleles source=/Volumes/hts_core/Shared/gatk_resources/2.3/b37/1000G_phase1.indels.b37.vcf), (RodBinding name=knownAlleles2 source=/Volumes/hts_core/Shared/gatk_resources/2.3/b37/Mills_and_1000G_gold_standard.indels.b37.vcf)] targetIntervals=./06_intervals/cleaned.intervals LODThresholdForCleaning=5.0 consensusDeterminationModel=USE_READS entropyThreshold=0.15 maxReadsInMemory=1000000 maxIsizeForMovement=3000 maxPositionalMoveAllowed=200 maxConsensuses=30 maxReadsForConsensuses=120 maxReadsForRealignment=20000 noOriginalAlignmentTags=false nWayOut=null generate_nWayOut_md5s=false check_early=false noPGTag=false keepPGTags=false indelsFileForDebugging=null statisticsFileForDebugging=null SNPsFileForDebugging=null
    @PG ID:MarkDuplicates PN:MarkDuplicates VN:1.118(2329276ea55d31ab6b19bab55b9ee7b51e4a446e_1406559781) CL:picard.sam.MarkDuplicates INPUT=[03_sorted_bams/0.bam] OUTPUT=./05_dup_marked/cleaned.bam METRICS_FILE=./05_dup_marked/mark_dups_metrics.txt TMP_DIR=[/Volumes/hts_raw/scratch/fast_scratch/tmp.km28SVoZxZ.10960X13.17895] VALIDATION_STRINGENCY=LENIENT COMPRESSION_LEVEL=8 MAX_RECORDS_IN_RAM=6000000 CREATE_INDEX=true PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates REMOVE_DUPLICATES=false ASSUME_SORTED=false MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9]:([0-9]+):([0-9]+):([0-9]+).* OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false CREATE_MD5_FILE=false
    @PG ID:bwa PN:bwa VN:0.7.10-r789 CL:bwa mem -v 2 -M -t 10 -R @RG\tID:10960X13_utah-140807-C5E3HANXX_s_4-lumd_rnaseq_and_exome-CGATGT\tSM:10960X13\tPL:Illumina\tPU:4-lumd_rnaseq_and_exome-CGATGT /Volumes/hts_core/Shared/gatk_resources/2.3/b37/bwa_7_6_indexed/human_g1k_v37.fasta /Volumes/qac/welma/20141121_lumd_rnaseq_and_exome/00_incoming_data/10960R_HCI001_to_19_Exome_Seq/Fastq/10960X13_140807_D00294_0121_AC5E3HANXX_4_1.txt.gz /Volumes/qac/welma/20141121_lumd_rnaseq_and_exome/00_incoming_data/10960R_HCI001_to_19_Exome_Seq/Fastq/10960X13_140807_D00294_0121_AC5E3HANXX_4_2.txt.gz
    @PG ID:GATK PrintReads VN:3.2-2-gec30cee CL:readGroup=null platform=null number=-1 sample_file=[] sample_name=[] simplify=false no_pg_tag=false

Answers

Sign In or Register to comment.