Holiday Notice:
The Frontline Support team will be slow to respond December 17-18 due to an institute-wide retreat and offline December 22- January 1, while the institute is closed. Thank you for your patience during these next few weeks. Happy Holidays!

Help for using GATK

hliu2hliu2 worcester Member

I hava a question wish to get help from the developers:
I am using GATK with two modles:
1, I just use the UnifiedGenotyper to call the variants from a prepared bam file, then I get a vcf file. (Call it A.vcf)
2, I run the UnifiedGenotype by Chr, one by one, say, by using the "-L" arg, then I get a sets of small vcf files.

But, the result of ChrY is confusing me a lot.... the ChrY part in the A.vcf is QUITE different from the small vcf file that generated by "-L chrY", the difference seems to be larger than 50%.
That means, the result is DIFFERENT for chrY.
However, I have also checked the other Chromosomes, the difference is slight. ONLY the ChrY has this problem.

Our script pasted here:

java -Xmx30g -jar /data/SG/Env/software_installed/GenomeAnalysisTK.jar \
-L xx \ # I add -L option here when I do step 2. when I generate A.vcf ,I didn't add -L here
-R ucsc.hg19.fasta \
-T RealignerTargetCreator \
-I ${sampleName}.bam \
-o ${sampleName}.intervals \
-known Mills_and_1000G_gold_standard.indels.hg19.sites.vcf \
-known 1000G_phase1.indels.hg19.sites.vcf

java -Xmx30g -jar /data/SG/Env/software_installed/GenomeAnalysisTK.jar \
-L xx \ # I add -L option here when I do step 2. when I generate A.vcf ,I didn't add -L here
-R ucsc.hg19.fasta \
-T IndelRealigner \
-targetIntervals ${sampleName}.intervals \
-I ${sampleName}.bam \
-o ${sampleName}.realigned.bam \
-known Mills_and_1000G_gold_standard.indels.hg19.sites.vcf \
-known 1000G_phase1.indels.hg19.sites.vcf

samtools index ${sampleName}.realigned.bam

java -Xmx30g -jar /data/SG/Env/software_installed/GenomeAnalysisTK.jar \
-L xx \ # I add -L option here when I do step 2. when I generate A.vcf ,I didn't add -L here
-R ucsc.hg19.fasta \
-T BaseRecalibrator \
-nct 8 \
-I ${sampleName}.realigned.bam \
-knownSites dbsnp_138.hg19.vcf \
-knownSites Mills_and_1000G_gold_standard.indels.hg19.sites.vcf \
-knownSites 1000G_phase1.indels.hg19.sites.vcf \
-o ${sampleName}.recal_data.grp

java -Xmx30g -jar /data/SG/Env/software_installed/GenomeAnalysisTK.jar \
-L xx \ # I add -L option here when I do step 2. when I generate A.vcf ,I didn't add -L here
-R ucsc.hg19.fasta \
-T PrintReads \
-nct 8 \
-I ${sampleName}.realigned.bam \
-BQSR ${sampleName}.recal_data.grp \
-o ${sampleName}.realigned.recal.bam

samtools index ${sampleName}.realigned.recal.bam

java -Xmx30g -jar /data/SG/Env/software_installed/GenomeAnalysisTK.jar \
-L xx \ # I add -L option here when I do step 2. when I generate A.vcf ,I didn't add -L here
-R ucsc.hg19.fasta \
-T UnifiedGenotyper \
-nct 8 \
-glm BOTH \
-I ${sampleName}.realigned.recal.bam \
-D dbsnp_138.hg19.vcf \
-o ${sampleName}.vcf \ #here A.vcf or small vcf generated
-stand_call_conf 50.0 \
-stand_emit_conf 10.0 \
-dcov 200 \
-A AlleleBalance -A QualByDepth -A HaplotypeScore -A MappingQualityRankSumTest -A ReadPosRankSumTest -A FisherStrand -A RMSMappingQuality -A InbreedingCoeff -A Coverage

Wish you guys can offer me some help.

Thanks,

Answers

Sign In or Register to comment.