Service notice: Several of our team members are on vacation so service will be slow through at least July 13th, possibly longer depending on how much backlog accumulates during that time. This means that for a while it may take us more time than usual to answer your questions. Thank you for your patience.

No common samples in VCF and BAM headers, so nothing could possibly be phased!

alisonewralisonewr Member
edited November 2015 in Ask the GATK team

I want to phase some DNA-seq data.

java -jar GenomeAnalysisTK.jar -T ReadBackedPhasing -R ref.fasta -I readnames.bam --variant test.vcf -L Chr.list -o phased_SNPs.vcf --phaseQualityThresh 20.0

My vcf file looks like this and only contains information for 1 sample

fileformat=VCFv4.0

source=pileup_to_vcf.pyV1.2

INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">

INFO=<ID=SAF,Number=.,Type=Float,Description="Specific Allele Frequency">

FILTER=<ID=DP,Description="Minimum depth of 10">

FILTER=<ID=SAF,Description="Allele frequency of at least 0.3 with base quality minimum 0">

CHROM POS ID REF ALT QUAL FILTER INFO

NC_024331.1 131 . G GA . PASS SAF=0.655738;DP=61
NC_024331.1 147 . C G . PASS SAF=0.320000;DP=25
NC_024331.1 422 . C A . PASS SAF=0.414545;DP=275

I previously had an error message saying my bam file did not have read names. I ran
java -jar AddOrReplaceReadGroups.jar I=sorted.bam O=readnames.bam RGLB=LaneX RGPU=NONE RGSM=AnySampleName RGPL=illumina

Now I am getting an error

ERROR
ERROR MESSAGE: No common samples in VCF and BAM headers, so nothing could possibly be phased!

Is there somewhere in the header of the vcf I can add AnySampleName?

Thanks!!

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Hi there,

    The problem is deeper than that -- you need to have genotype calls for your samples in the VCF. Otherwise there's nothing to phase. Have you read the documentation about phasing?

  • alisonewralisonewr Member
    edited November 2015

    Thanks! So as I only have one sample, does that mean I have to specify
    -v, --VCF Compute genotype likelihoods and output them in the variant call format (VCF) when I run samtools mpileup?

    And as my bam file is only one sample, it has no sample information in the header. How do I fix this?

    Thanks very much!!

Sign In or Register to comment.