The current GATK version is 3.4-46

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

# Probable serious bug in VariantsToBinaryPed causing incorrect mapping of sample to genotype

Posts: 61Member

VariantsToBinaryPed would seem to expect the fam file (first six cols of ped file) to describe the samples in the same order as the input VCF file: if they are not in the same order, it would appear to not correctly map sample IDs with the genotypes in the output binary PED.

I found this issue because I converted trio VCF files to binary PED, and then computed kinship coefficients using the binary PED file which showed that the relationships were wrong. If I fixed the .fam file so that the sample IDs were in the same order as the .vcf file and re-run the conversion to binary PED, then the kinship coefficients are as they should be given the pedigree.

This also made me wonder whether PhaseByTransmission has the same problem, but initial tests would seem to indicate that PhaseByTransmission may handle correctly the scenario where the sample order differs between the .fam file and the .vcf files.

Hi Tim, can you confirm that you're using the latest version of GATK?

Geraldine Van der Auwera, PhD

• Posts: 61Member

Hi Geraldine,

I am using version 2.5 (haven't upgraded to 2.6 because I don't have java 1.7).

I have worked around by ensuring the FAM file respects the VCF sample order.

Should be pretty quick for the developer of VariantsToBinaryPed to check whether there is a routine for SampleID matching between the fam file and the vcf in the tool code....

• Posts: 61Member

A quick test can easily be done by feeding a trio VCF file to VariantsToBinaryPed with a .fam file that is correct but where the samples are described in a different order form the VCF file.

Then feed the produced .bed file to KING http://people.virginia.edu/~wc9c/KING/manual.html

king -b myFile.bed --kinship


The last column of the output file king.kin tells of any discrepancies between the declared and the empirical kinships.