Service notice: Several of our team members are on vacation so service will be slow through at least July 13th, possibly longer depending on how much backlog accumulates during that time. This means that for a while it may take us more time than usual to answer your questions. Thank you for your patience.

Help for using GATK

hliu2hliu2 worcester Member

I hava a question wish to get help from the developers:
I am using GATK with two modles:
1, I just use the UnifiedGenotyper to call the variants from a prepared bam file, then I get a vcf file. (Call it A.vcf)
2, I run the UnifiedGenotype by Chr, one by one, say, by using the "-L" arg, then I get a sets of small vcf files.

But, the result of ChrY is confusing me a lot.... the ChrY part in the A.vcf is QUITE different from the small vcf file that generated by "-L chrY", the difference seems to be larger than 50%.
That means, the result is DIFFERENT for chrY.
However, I have also checked the other Chromosomes, the difference is slight. ONLY the ChrY has this problem.

Our script pasted here:

java -Xmx30g -jar /data/SG/Env/software_installed/GenomeAnalysisTK.jar \
-L xx \ # I add -L option here when I do step 2. when I generate A.vcf ,I didn't add -L here
-R ucsc.hg19.fasta \
-T RealignerTargetCreator \
-I ${sampleName}.bam \
-o ${sampleName}.intervals \
-known Mills_and_1000G_gold_standard.indels.hg19.sites.vcf \
-known 1000G_phase1.indels.hg19.sites.vcf

java -Xmx30g -jar /data/SG/Env/software_installed/GenomeAnalysisTK.jar \
-L xx \ # I add -L option here when I do step 2. when I generate A.vcf ,I didn't add -L here
-R ucsc.hg19.fasta \
-T IndelRealigner \
-targetIntervals ${sampleName}.intervals \
-I ${sampleName}.bam \
-o ${sampleName}.realigned.bam \
-known Mills_and_1000G_gold_standard.indels.hg19.sites.vcf \
-known 1000G_phase1.indels.hg19.sites.vcf

samtools index ${sampleName}.realigned.bam

java -Xmx30g -jar /data/SG/Env/software_installed/GenomeAnalysisTK.jar \
-L xx \ # I add -L option here when I do step 2. when I generate A.vcf ,I didn't add -L here
-R ucsc.hg19.fasta \
-T BaseRecalibrator \
-nct 8 \
-I ${sampleName}.realigned.bam \
-knownSites dbsnp_138.hg19.vcf \
-knownSites Mills_and_1000G_gold_standard.indels.hg19.sites.vcf \
-knownSites 1000G_phase1.indels.hg19.sites.vcf \
-o ${sampleName}.recal_data.grp

java -Xmx30g -jar /data/SG/Env/software_installed/GenomeAnalysisTK.jar \
-L xx \ # I add -L option here when I do step 2. when I generate A.vcf ,I didn't add -L here
-R ucsc.hg19.fasta \
-T PrintReads \
-nct 8 \
-I ${sampleName}.realigned.bam \
-BQSR ${sampleName}.recal_data.grp \
-o ${sampleName}.realigned.recal.bam

samtools index ${sampleName}.realigned.recal.bam

java -Xmx30g -jar /data/SG/Env/software_installed/GenomeAnalysisTK.jar \
-L xx \ # I add -L option here when I do step 2. when I generate A.vcf ,I didn't add -L here
-R ucsc.hg19.fasta \
-T UnifiedGenotyper \
-nct 8 \
-glm BOTH \
-I ${sampleName}.realigned.recal.bam \
-D dbsnp_138.hg19.vcf \
-o ${sampleName}.vcf \ #here A.vcf or small vcf generated
-stand_call_conf 50.0 \
-stand_emit_conf 10.0 \
-dcov 200 \
-A AlleleBalance -A QualByDepth -A HaplotypeScore -A MappingQualityRankSumTest -A ReadPosRankSumTest -A FisherStrand -A RMSMappingQuality -A InbreedingCoeff -A Coverage

Wish you guys can offer me some help.

Thanks,

Answers

Sign In or Register to comment.