Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
why data are missing in vcf file?
I am doing SNP analysis on a plant species. For SNP finding process another plant species has been taken (because phylogenetically both species are very close) whose reference genome is available on NCBI. The vcf generated is not showing main information, first few lines of output is like this.
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT CF speciesch01 1 . A . . . . GT ./. speciesch01 2 . G . . . . GT ./. speciesch01 3 . A . . . . GT ./. speciesch01 4 . G . . . . GT ./. speciesch01 5 . G . . . . GT ./. speciesch01 6 . T . . . . GT ./.
I have used the commands as follows:
java -Xmx16g -jar GenomeAnalysisTK.jar -R Reference_genome.fasta -T UnifiedGenotyper -I species.realigned.bam -o species.realigned.snps.vcf -stand_call_conf 30 -stand_emit_conf 10 --output_mode EMIT_ALL_SITES
java -Xmx16g -jar GenomeAnalysisTK.jar -T BaseRecalibrator -I species.realigned.bam -R Reference_genome.fasta -knownSites species.realigned.snps.vcf -o species.realigned_data.grp
java -Xmx16g -jar GenomeAnalysisTK.jar -T PrintReads -I species.realigned.bam -R Reference_genome.fasta -BQSR species.realigned_data.grp -o species.realigned.recal.bam
java -Xmx16g -jar GenomeAnalysisTK.jar -R Reference_genome.fasta -T UnifiedGenotyper -I species.realigned.bam --dbsnp species.realigned.snps.vcf -o species.realigned.recal.snps.vcf -stand_call_conf 30 -stand_emit_conf 10 --output_mode EMIT_ALL_SITES
Among the UnifiedGenotyper commands i.e. 1 and 4, the dbsnp flag has been used only in 4th command because I don't have any available snp information about both of the plant species. I need a vcf file with all data for further analysis. Please let me know where I am making mistake and what are the possible solutions of this problem?