Using GATK convert ped to vcf

JinyanJinyan Member Posts: 3
edited January 2013 in Ask the GATK team

Before there is webpage for how to convert plink ped format to vcf
format. But it seems that this link disappeared.

http://www.broadinstitute.org/gsa/wiki/index.php/Converting_ped_to_vcf

Thank you very much in advance.

Post edited by Geraldine_VdAuwera on
Tagged:

Best Answers

Answers

  • chongmchongm Member Posts: 33
    edited December 2013

    Thanks chapmanb for the script!

    For me, the script seems to be running into an error partway through chromosome 1:

    Traceback (most recent call last):
    File "/home/chongm/Scripts/plink_to_vcf_chapmanb.py", line 158, in
    main(*sys.argv[1:])
    File "/home/chongm/Scripts/plink_to_vcf_chapmanb.py", line 29, in main
    fix_nonref_positions(vcf_file, ref_file)
    File "/home/chongm/Scripts/plink_to_vcf_chapmanb.py", line 148, in fix_nonref_positions
    parts = fix_vcf_line(parts, ref_base)
    File "/home/chongm/Scripts/plink_to_vcf_chapmanb.py", line 97, in fix_vcf_line
    elif ref_base != ref and complements[ref] == ref_base:
    KeyError: 'D'

    Any suggestions on how to fix this?

    Thanks,

    MC

    Post edited by chongm on

    Thanks,

    MC

  • chongmchongm Member Posts: 33

    Actually you know what I realized that my chip data had some deletions and insertions which might be problematic...

    Thanks,

    MC

  • chrchangchrchang Hong KongMember Posts: 1
    edited December 2013

    @chongm said:
    Thanks chapmanb for the script!

    For me, the script seems to be running into an error partway through chromosome 1:

    Traceback (most recent call last):
    File "/home/chongm/Scripts/plink_to_vcf_chapmanb.py", line 158, in
    main(*sys.argv[1:])
    File "/home/chongm/Scripts/plink_to_vcf_chapmanb.py", line 29, in main
    fix_nonref_positions(vcf_file, ref_file)
    File "/home/chongm/Scripts/plink_to_vcf_chapmanb.py", line 148, in fix_nonref_positions
    parts = fix_vcf_line(parts, ref_base)
    File "/home/chongm/Scripts/plink_to_vcf_chapmanb.py", line 97, in fix_vcf_line
    elif ref_base != ref and complements[ref] == ref_base:
    KeyError: 'D'

    Any suggestions on how to fix this?

    Thanks,

    MC

    PLINK 1.9 can handle this:

    plink --vcf [filename] --make-bed --out [new prefix]

    Some useful flags:

    --keep-allele-order keeps the original reference allele (instead of automatically resetting based on minor/major)

    --biallelic-only throws out all variants with 2+ alternate alleles that show up (without this flag, the most common alternate allele is kept).

    --double-id, --const-fid, and --id-delim let you fine-tune how VCF sample IDs are converted to PLINK family + individual IDs.

    You can see more details at https://www.cog-genomics.org/plink2/input#vcf .

    Post edited by chrchang on
  • blueskypyblueskypy Member Posts: 261 ✭✭

    @chapmanb I run the script, but it does not work either. I used b37 as reference. Here is the error msg:

    Creating new project specification file [ /gwas/96samplesfinnal.pseq ]
    expecting 4548474 variants on 96 individuals
    inserted 96 individuals, 4548474 variants
    Did not associate ref G with line: ['1', '900427', 'kgp15625134', 'A', 'G,0', '.', '.', '.', 'GT']
    Did not associate ref G with line: ['1', '934345', 'rs9697457', 'A', 'G,0', '.', '.', '.', 'GT']
    Traceback (most recent call last):
    File "/site/ne/home/cuiji01/seqs/software/bin/ped2vcf.py", line 162, in
    main(*sys.argv[1:])
    File "/site/ne/home/cuiji01/seqs/software/bin/ped2vcf.py", line 32, in main
    fix_nonref_positions(vcf_file, ref_file)
    File "/site/ne/home/cuiji01/seqs/software/bin/ped2vcf.py", line 152, in fix_nonref_positions
    parts = fix_vcf_line(parts, ref_base)
    File "/site/ne/home/cuiji01/seqs/software/bin/ped2vcf.py", line 103, in fix_vcf_line
    varinfo[4] = ",".join([complements[v] for v in var.split(",")])
    KeyError: '0'

    I cannot use the method @ebanks recommends either because I don't have the reference allele file. What a headache!

  • chapmanbchapmanb Boston, MAMember Posts: 28

    Sorry about the issues. The latest version of the script with fixes is here:

    https://github.com/chapmanb/bcbio-nextgen/blob/master/scripts/utils/plink_to_vcf.py

    Hopefully that'll work cleanly for your input data. If not happy to try and debug further.

    Brad Chapman, Bioinformatics Core at Harvard Chan School

Sign In or Register to comment.