Service note: Geraldine is on vacation this week; other members of GSA will be responding to questions, but they have a lot of work besides this, so be aware that responses may be a little slower than usual. Thank you for your patience.

How read groups affect Variant Calling?

Gaurav1983Gaurav1983 Posts: 9Member

Hi,

I have a bam file with multiple read groups for same sample. Does variant calling algorithm (UnifiedGenotyper) will consider bam file as multiple-sample data or a single sample-data (irrespective of read groups) for calling varaints?

Eg.

Read Groups in BAM file:

@RG     ID:41852        PL:illumina     PU:41852        LB:nolib        SM:41852p
@RG     ID:41852.1      PL:illumina     PU:41852        LB:nolib        SM:41852s
@RG     ID:41853        PL:illumina     PU:41853        LB:nolib        SM:41853s
@RG     ID:41854        PL:illumina     PU:41854        LB:nolib        SM:41854p
@RG     ID:41854.4      PL:illumina     PU:41854        LB:nolib        SM:41854s
@RG     ID:41855        PL:illumina     PU:41855        LB:nolib        SM:41855p
@RG     ID:41855.6      PL:illumina     PU:41855        LB:nolib        SM:41855s

Variants call in VCF file

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  41852p  41852s  41853s  41854p  41854s  41855p  41855s

chrM    150     .       T       C       41679.01        PASS    ABHom=0.999;AC=14;AF=1.00;AN=14;BaseQRankSum=1.362;DP=1255;DS;Dels=0.00;FS=0.000;HaplotypeScore=4.6324;MLEAC=14;MLEAF=1.00;MQ=40.73;MQ0=1;MQRankSum=-0.678;OND=3.149e-03;QD=33.21;ReadPosRankSum=1.479;SB=-2.052e+04    GT:AD:DP:GQ:PL  1/1:0,200:200:99:6075,544,0     1/1:0,200:200:99:6315,568,0     1/1:0,200:200:99:7094,599,0     1/1:1,197:198:99:6820,547,0     1/1:0,113:113:99:3624,322,0     1/1:0,200:200:99:7111,599,0     1/1:0,141:141:99:4640,403,0

Regards

Gaurav

Post edited by Geraldine_VdAuwera on
Tagged:

Comments

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 2,238Administrator, GSA Official Member admin

    Hi there,

    The caller will consider reads that have the same sample name (SM) together, even if they belong to different read groups. But the sample name must be the same. In your example file it seems your sample names are all different, with "p" or "s" differentiating samples that have the same number. If you want those to be taken together, you will need to modify the sample names to lose the p or s.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.