Attention: Want an end-to-end pipelining solution for GATK Best Practices?


Check out Terra here! For more details on whether this is the right fit for you checkout our blogs here.

Number of SNPs are so low in VCF File

wukhanwukhan PakistanMember
edited August 2015 in Ask the GATK team

Hi,

Dear GATK Team,

I have WGS data of one healthy individual. I want to make a VCF file. I am running the following pipeline but the number of SNPs are low, only in thousands. I have a big bam file (256GB) with depth of 38X.

java -Xmx32g -Djava.io.tmpdir=java_tmp -XX:MaxPermSize=512m -XX:-UseGCOverheadLimit -jar /home/lifescope/Desktop/picard-1.133/dist/picard.jar MarkDuplicates MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=16000 REMOVE_DUPLICATES=true INPUT=gcxall_sort_mapped.bam OUTPUT=gcxall_sort_mapped_dedup.bam METRICS_FILE=gcx.mat ASSUME_SORTED=true

nohup java -Xmx32g -Djava.io.tmpdir=java_tmp -XX:MaxPermSize=512m -XX:-UseGCOverheadLimit -jar /home/lifescope/Desktop/picard-1.133/dist/picard.jar AddOrReplaceReadGroups INPUT=gcxall_sort_mapped_dedup.bam OUTPUT=gcxall_sort_mapped_dedup_crctd.bam SORT_ORDER=coordinate RGLB=MP RGPL=SOLID RGPU=NOBARCODE RGSM=GCXI

java -Xmx32g -Djava.io.tmpdir=java_tmp -XX:MaxPermSize=512m -XX:-UseGCOverheadLimit -jar /home/lifescope/Desktop/picard-1.133/dist/picard.jar CreateSequenceDictionary REFERENCE=new_newhg38.fa OUTPUT=new_newhg38.dict

java -Xmx32g -Djava.io.tmpdir=java_tmp -XX:MaxPermSize=512m -XX:-UseGCOverheadLimit -jar /home/lifescope/Desktop/picard-1.133/dist/picard.jar ReorderSam I=gcxall_sort_mapped_dedup_crctd.bam O=gcxall_sort_mapped_dedup_crctd_rordr.bam REFERENCE=new_newhg38.fa

/home/lifescope/Desktop/usr/java/jdk1.7.0_79/bin/java -Xmx32g -Djava.io.tmpdir=java_tmp -XX:MaxPermSize=512m -XX:-UseGCOverheadLimit -jar /home/lifescope/Desktop/GenomeAnalysisTK.jar -T RealignerTargetCreator -R /scratch/gcxi_all_bam/new_newhg38.fa -I gcxall_sort_mapped_dedup_crctd_rordr.bam --known Mills_and_1000G_gold_standard.indels.hg19.sites.vcf --known 1000G_phase1.indels.hg19.sites.vcf -o gcxall_sort_mapped_dedup_crctd_rordr.intervals --defaultBaseQualities 35

/home/lifescope/Desktop/usr/java/jdk1.7.0_79/bin/java -Xmx32g -Djava.io.tmpdir=java_tmp -XX:MaxPermSize=512m -XX:-UseGCOverheadLimit -jar /home/lifescope/Desktop/GenomeAnalysisTK.jar -T IndelRealigner -R /scratch/gcxi_all_bam/new_newhg38.fa -I gcxall_sort_mapped_dedup_crctd_rordr.bam -targetIntervals gcxall_sort_mapped_dedup_crctd_rordr.intervals -o gcxall_sort_mapped_dedup_crctd_rordr_interval.bam -known Mills_and_1000G_gold_standard.indels.hg19.sites.vcf -known 1000G_phase1.indels.hg19.sites.vcf -l INFO --defaultBaseQualities 35

/home/lifescope/Desktop/usr/java/jdk1.7.0_79/bin/java -Xmx32g -Djava.io.tmpdir=java_tmp -XX:MaxPermSize=512m -XX:-UseGCOverheadLimit -jar /home/lifescope/Desktop/GenomeAnalysisTK.jar -T BaseRecalibrator -R /scratch/gcxi_all_bam/new_newhg38.fa -I gcxall_sort_mapped_dedup_crctd_rordr_interval.bam -knownSites 1000G_phase1.indels.hg19.sites.vcf -knownSites Mills_and_1000G_gold_standard.indels.hg19.sites.vcf -knownSites dbsnp_138.hg19_out.vcf -o gcxi_recal_data.table --covariate QualityScoreCovariate --covariate ReadGroupCovariate --covariate ContextCovariate --covariate CycleCovariate --solid_nocall_strategy PURGE_READ --solid_recal_mode SET_Q_ZERO_BASE_N

/home/lifescope/Desktop/usr/java/jdk1.7.0_79/bin/java -Xmx32g -Djava.io.tmpdir=java_tmp -XX:MaxPermSize=512m -XX:-UseGCOverheadLimit -jar /home/lifescope/Desktop/GenomeAnalysisTK.jar -T PrintReads -R /scratch/gcxi_all_bam/new_newhg38.fa -I gcxall_sort_mapped_dedup_crctd_rordr_interval.bam -BQSR gcxi_recal_data.table -o gcxall_sort_mapped_dedup_crctd_rordr_interval_recal.bam -allowPotentiallyMisencodedQuals

/home/lifescope/Desktop/usr/java/jdk1.7.0_79/bin/java -Xmx32g -Djava.io.tmpdir=java_tmp -XX:MaxPermSize=512m -XX:-UseGCOverheadLimit -jar /home/lifescope/Desktop/GenomeAnalysisTK.jar -T UnifiedGenotyper -glm BOTH -R /scratch/gcxi_all_bam/new_newhg38.fa -I gcxall_sort_mapped_dedup_crctd_rordr_interval_recal.bam -o raw_variants.vcf -D dbsnp_138.hg19_out.vcf -allowPotentiallyMisencodedQuals

Kindly help me in this regard, what I am missing..,,!!!??

Thanks.

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @wukhan
    Hi,

    Unfortunately, we cannot help you understand why there are not enough SNPs being called. It is up to you to do some QC on your data and determine what is going on. Perhaps the quality scores are bad. We also do not support Unified Genotyper any longer. We recommend Haplotype Caller in our Best Practices.

    -Sheila

  • wukhanwukhan PakistanMember

    Thanks Sheila,

    I also tried the HaplotypeCaller:

    /home/lifescope/Desktop/usr/java/jdk1.7.0_79/bin/java -Xmx32g -Djava.io.tmpdir=java_tmp -XX:MaxPermSize=512m -XX:-UseGCOverheadLimit -jar /home/lifescope/Desktop/GenomeAnalysisTK.jar -T HaplotypeCaller -R /scratch/sindhi_all_bam/new_newhg38.fa -I sindhall_sort_mapped_dedup_crctd_rordr_interval_recal.bam -o raw_variants_hc.vcf --variant_index_type LINEAR --variant_index_parameter 128000 --dbsnp dbsnp_138.hg19_out.vcf -L sindhall_sort_mapped_dedup_crctd_rordr.intervals -allowPotentiallyMisencodedQuals

    but still no luck..,,!!!! Is it safe to remove -L parameter.?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin
Sign In or Register to comment.