Using GATK convert ped to vcf

JinyanJinyan Posts: 3Member
edited January 2013 in Ask the GATK team

Before there is webpage for how to convert plink ped format to vcf
format. But it seems that this link disappeared.

http://www.broadinstitute.org/gsa/wiki/index.php/Converting_ped_to_vcf

Thank you very much in advance.

Post edited by Geraldine_VdAuwera on
Tagged:

Best Answers

Answers

  • chongmchongm Posts: 33Member
    edited December 2013

    Thanks chapmanb for the script!

    For me, the script seems to be running into an error partway through chromosome 1:

    Traceback (most recent call last):
    File "/home/chongm/Scripts/plink_to_vcf_chapmanb.py", line 158, in
    main(*sys.argv[1:])
    File "/home/chongm/Scripts/plink_to_vcf_chapmanb.py", line 29, in main
    fix_nonref_positions(vcf_file, ref_file)
    File "/home/chongm/Scripts/plink_to_vcf_chapmanb.py", line 148, in fix_nonref_positions
    parts = fix_vcf_line(parts, ref_base)
    File "/home/chongm/Scripts/plink_to_vcf_chapmanb.py", line 97, in fix_vcf_line
    elif ref_base != ref and complements[ref] == ref_base:
    KeyError: 'D'

    Any suggestions on how to fix this?

    Thanks,

    MC

    Post edited by chongm on

    Thanks,

    MC

  • chongmchongm Posts: 33Member

    Actually you know what I realized that my chip data had some deletions and insertions which might be problematic...

    Thanks,

    MC

  • chrchangchrchang Hong KongPosts: 1Member
    edited December 2013

    @chongm said:
    Thanks chapmanb for the script!

    For me, the script seems to be running into an error partway through chromosome 1:

    Traceback (most recent call last):
    File "/home/chongm/Scripts/plink_to_vcf_chapmanb.py", line 158, in
    main(*sys.argv[1:])
    File "/home/chongm/Scripts/plink_to_vcf_chapmanb.py", line 29, in main
    fix_nonref_positions(vcf_file, ref_file)
    File "/home/chongm/Scripts/plink_to_vcf_chapmanb.py", line 148, in fix_nonref_positions
    parts = fix_vcf_line(parts, ref_base)
    File "/home/chongm/Scripts/plink_to_vcf_chapmanb.py", line 97, in fix_vcf_line
    elif ref_base != ref and complements[ref] == ref_base:
    KeyError: 'D'

    Any suggestions on how to fix this?

    Thanks,

    MC

    PLINK 1.9 can handle this:

    plink --vcf [filename] --make-bed --out [new prefix]

    Some useful flags:

    --keep-allele-order keeps the original reference allele (instead of automatically resetting based on minor/major)

    --biallelic-only throws out all variants with 2+ alternate alleles that show up (without this flag, the most common alternate allele is kept).

    --double-id, --const-fid, and --id-delim let you fine-tune how VCF sample IDs are converted to PLINK family + individual IDs.

    You can see more details at https://www.cog-genomics.org/plink2/input#vcf .

    Post edited by chrchang on
  • blueskypyblueskypy Posts: 253Member ✭✭

    @chapmanb I run the script, but it does not work either. I used b37 as reference. Here is the error msg:

    Creating new project specification file [ /gwas/96samplesfinnal.pseq ]
    expecting 4548474 variants on 96 individuals
    inserted 96 individuals, 4548474 variants
    Did not associate ref G with line: ['1', '900427', 'kgp15625134', 'A', 'G,0', '.', '.', '.', 'GT']
    Did not associate ref G with line: ['1', '934345', 'rs9697457', 'A', 'G,0', '.', '.', '.', 'GT']
    Traceback (most recent call last):
    File "/site/ne/home/cuiji01/seqs/software/bin/ped2vcf.py", line 162, in
    main(*sys.argv[1:])
    File "/site/ne/home/cuiji01/seqs/software/bin/ped2vcf.py", line 32, in main
    fix_nonref_positions(vcf_file, ref_file)
    File "/site/ne/home/cuiji01/seqs/software/bin/ped2vcf.py", line 152, in fix_nonref_positions
    parts = fix_vcf_line(parts, ref_base)
    File "/site/ne/home/cuiji01/seqs/software/bin/ped2vcf.py", line 103, in fix_vcf_line
    varinfo[4] = ",".join([complements[v] for v in var.split(",")])
    KeyError: '0'

    I cannot use the method @ebanks recommends either because I don't have the reference allele file. What a headache!

  • chapmanbchapmanb Boston, MAPosts: 21Member

    Sorry about the issues. The latest version of the script with fixes is here:

    https://github.com/chapmanb/bcbio-nextgen/blob/master/scripts/utils/plink_to_vcf.py

    Hopefully that'll work cleanly for your input data. If not happy to try and debug further.

    Brad Chapman, Bioinformatics Core at Harvard Chan School

Sign In or Register to comment.