How to Prepare the normal.bam and tumor.bam files

Dear Sir,

am new and currently trying to learn whole exome analysis of breast cancer samples using the GDC Bioinformatics Pipeline https://docs.gdc.cancer.gov/Data/Bioinformatics_Pipelines/DNA_Seq_Variant_Calling_Pipeline/

The data .bam file was downloaded from GDC legacy archives.
https://portal.gdc.cancer.gov/legacy-archive/files/9efa8d39-37e0-4236-9737-e14ddcfd93ff

The reference genome is downloaded from here
https://gdc.cancer.gov/about-data/data-harmonization-and-generation/gdc-reference-files
GRCh38.d1.vd1.fa.tar.gz

was able to complete the Genome Alignment and Alignment Co-Cleaning, next wanted to do the variant calling step, in the next using MuSe

MuSE call -f -r <tumor.bam> <normal.bam> -O <intermediate_muse_call.txt>

I don't know what the region is (is it the chromosome number, or the read group)
also how to prepare the normal.bam and tumor.bam files. Please help.

Thanks
Dr. Prabhakar

Best Answers

  • prabhakar8279prabhakar8279 Member
    edited February 6 Accepted Answer

    Hi,

    Thanks for the response, now while using the Mutect2 in the below command

    -----------------------------------------------------------------------------------------

    java -jar GenomeAnalysisTK.jar \
    -T MuTect2 \
    -R \
    -L \
    -I:tumor <tumor.bam> \
    -I:normal <normal.bam> \
    --normal_panel <pon.vcf> \
    --cosmic <cosmic.vcf> \
    --dbsnp <dbsnp.vcf> \
    --contamination_fraction_to_filter 0.02 \
    -o <mutect_variants.vcf> \
    --output_mode EMIT_VARIANTS_ONLY \
    --disable_auto_index_creation_and_locking_when_reading_

    -----------------------------------------------------------------------------------------

    For the tumor.bam am using the exome data downloaded from GDC legacy archives (after pre-processing and co-cleaning).
    https://portal.gdc.cancer.gov/legacy-archive/files/9efa8d39-37e0-4236-9737-e14ddcfd93ff

    Would you please suggest what could be used for normal.bam, is there any publicly available normal exome data for using in this step. And is it ok to use any other normal exome data that was not obtained from the same study.

    My goal is to identify variants and find the copy number variations. Please help in the download tumor exome data.

    Thanks
    Dr. Prabhakar

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @prabhakar8279
    Hi Dr. Prabhakar,

    Unfortunately, we cannot help you use MuSE, as we only provide support for GATK tools.

    -Sheila

    P.S. The region is usually the interval you would like the tool to run on. In your case, you would provide an exome intervals file (gotten from our sequencing provider). We have a Best Practice section for pre-processing your files that may be helpful as well. Of course, we would also encourage you to use Mutect2 in GATK instead of MuSE :smile:

  • prabhakar8279prabhakar8279 Member
    edited February 6 Accepted Answer

    Hi,

    Thanks for the response, now while using the Mutect2 in the below command

    -----------------------------------------------------------------------------------------

    java -jar GenomeAnalysisTK.jar \
    -T MuTect2 \
    -R \
    -L \
    -I:tumor <tumor.bam> \
    -I:normal <normal.bam> \
    --normal_panel <pon.vcf> \
    --cosmic <cosmic.vcf> \
    --dbsnp <dbsnp.vcf> \
    --contamination_fraction_to_filter 0.02 \
    -o <mutect_variants.vcf> \
    --output_mode EMIT_VARIANTS_ONLY \
    --disable_auto_index_creation_and_locking_when_reading_

    -----------------------------------------------------------------------------------------

    For the tumor.bam am using the exome data downloaded from GDC legacy archives (after pre-processing and co-cleaning).
    https://portal.gdc.cancer.gov/legacy-archive/files/9efa8d39-37e0-4236-9737-e14ddcfd93ff

    Would you please suggest what could be used for normal.bam, is there any publicly available normal exome data for using in this step. And is it ok to use any other normal exome data that was not obtained from the same study.

    My goal is to identify variants and find the copy number variations. Please help in the download tumor exome data.

    Thanks
    Dr. Prabhakar

Sign In or Register to comment.