Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Help for using GATK

hliu2hliu2 worcester Member

I hava a question wish to get help from the developers:
I am using GATK with two modles:
1, I just use the UnifiedGenotyper to call the variants from a prepared bam file, then I get a vcf file. (Call it A.vcf)
2, I run the UnifiedGenotype by Chr, one by one, say, by using the "-L" arg, then I get a sets of small vcf files.

But, the result of ChrY is confusing me a lot.... the ChrY part in the A.vcf is QUITE different from the small vcf file that generated by "-L chrY", the difference seems to be larger than 50%.
That means, the result is DIFFERENT for chrY.
However, I have also checked the other Chromosomes, the difference is slight. ONLY the ChrY has this problem.

Our script pasted here:

java -Xmx30g -jar /data/SG/Env/software_installed/GenomeAnalysisTK.jar \
-L xx \ # I add -L option here when I do step 2. when I generate A.vcf ,I didn't add -L here
-R ucsc.hg19.fasta \
-T RealignerTargetCreator \
-I ${sampleName}.bam \
-o ${sampleName}.intervals \
-known Mills_and_1000G_gold_standard.indels.hg19.sites.vcf \
-known 1000G_phase1.indels.hg19.sites.vcf

java -Xmx30g -jar /data/SG/Env/software_installed/GenomeAnalysisTK.jar \
-L xx \ # I add -L option here when I do step 2. when I generate A.vcf ,I didn't add -L here
-R ucsc.hg19.fasta \
-T IndelRealigner \
-targetIntervals ${sampleName}.intervals \
-I ${sampleName}.bam \
-o ${sampleName}.realigned.bam \
-known Mills_and_1000G_gold_standard.indels.hg19.sites.vcf \
-known 1000G_phase1.indels.hg19.sites.vcf

samtools index ${sampleName}.realigned.bam

java -Xmx30g -jar /data/SG/Env/software_installed/GenomeAnalysisTK.jar \
-L xx \ # I add -L option here when I do step 2. when I generate A.vcf ,I didn't add -L here
-R ucsc.hg19.fasta \
-T BaseRecalibrator \
-nct 8 \
-I ${sampleName}.realigned.bam \
-knownSites dbsnp_138.hg19.vcf \
-knownSites Mills_and_1000G_gold_standard.indels.hg19.sites.vcf \
-knownSites 1000G_phase1.indels.hg19.sites.vcf \
-o ${sampleName}.recal_data.grp

java -Xmx30g -jar /data/SG/Env/software_installed/GenomeAnalysisTK.jar \
-L xx \ # I add -L option here when I do step 2. when I generate A.vcf ,I didn't add -L here
-R ucsc.hg19.fasta \
-T PrintReads \
-nct 8 \
-I ${sampleName}.realigned.bam \
-BQSR ${sampleName}.recal_data.grp \
-o ${sampleName}.realigned.recal.bam

samtools index ${sampleName}.realigned.recal.bam

java -Xmx30g -jar /data/SG/Env/software_installed/GenomeAnalysisTK.jar \
-L xx \ # I add -L option here when I do step 2. when I generate A.vcf ,I didn't add -L here
-R ucsc.hg19.fasta \
-T UnifiedGenotyper \
-nct 8 \
-glm BOTH \
-I ${sampleName}.realigned.recal.bam \
-D dbsnp_138.hg19.vcf \
-o ${sampleName}.vcf \ #here A.vcf or small vcf generated
-stand_call_conf 50.0 \
-stand_emit_conf 10.0 \
-dcov 200 \
-A AlleleBalance -A QualByDepth -A HaplotypeScore -A MappingQualityRankSumTest -A ReadPosRankSumTest -A FisherStrand -A RMSMappingQuality -A InbreedingCoeff -A Coverage

Wish you guys can offer me some help.

Thanks,

Answers

Sign In or Register to comment.