US Holiday notice: this Thursday and Friday (Nov 25-26) the forum will be unattended. Normal service will resume Monday Nov 29. Happy Thanksgiving!

Using GATK convert ped to vcf

JinyanJinyan Posts: 3Member
edited January 2013 in Ask the GATK team

Before there is webpage for how to convert plink ped format to vcf format. But it seems that this link disappeared.

http://www.broadinstitute.org/gsa/wiki/index.php/Converting_ped_to_vcf

Thank you very much in advance.

Post edited by Geraldine_VdAuwera on
Tagged:

Best Answers

Answers

  • chongmchongm Posts: 33Member
    edited December 2013

    Thanks chapmanb for the script!

    For me, the script seems to be running into an error partway through chromosome 1:

    Traceback (most recent call last): File "/home/chongm/Scripts/plink_to_vcf_chapmanb.py", line 158, in main(*sys.argv[1:]) File "/home/chongm/Scripts/plink_to_vcf_chapmanb.py", line 29, in main fix_nonref_positions(vcf_file, ref_file) File "/home/chongm/Scripts/plink_to_vcf_chapmanb.py", line 148, in fix_nonref_positions parts = fix_vcf_line(parts, ref_base) File "/home/chongm/Scripts/plink_to_vcf_chapmanb.py", line 97, in fix_vcf_line elif ref_base != ref and complements[ref] == ref_base: KeyError: 'D'

    Any suggestions on how to fix this?

    Thanks,

    MC

    Post edited by chongm on

    Thanks,

    MC

  • chongmchongm Posts: 33Member

    Actually you know what I realized that my chip data had some deletions and insertions which might be problematic...

    Thanks,

    MC

  • chrchangchrchang Hong KongPosts: 1Member
    edited December 2013

    @chongm said: Thanks chapmanb for the script!

    For me, the script seems to be running into an error partway through chromosome 1:

    Traceback (most recent call last): File "/home/chongm/Scripts/plink_to_vcf_chapmanb.py", line 158, in main(*sys.argv[1:]) File "/home/chongm/Scripts/plink_to_vcf_chapmanb.py", line 29, in main fix_nonref_positions(vcf_file, ref_file) File "/home/chongm/Scripts/plink_to_vcf_chapmanb.py", line 148, in fix_nonref_positions parts = fix_vcf_line(parts, ref_base) File "/home/chongm/Scripts/plink_to_vcf_chapmanb.py", line 97, in fix_vcf_line elif ref_base != ref and complements[ref] == ref_base: KeyError: 'D'

    Any suggestions on how to fix this?

    Thanks,

    MC

    PLINK 1.9 can handle this:

    plink --vcf [filename] --make-bed --out [new prefix]

    Some useful flags:

    --keep-allele-order keeps the original reference allele (instead of automatically resetting based on minor/major)

    --biallelic-only throws out all variants with 2+ alternate alleles that show up (without this flag, the most common alternate allele is kept).

    --double-id, --const-fid, and --id-delim let you fine-tune how VCF sample IDs are converted to PLINK family + individual IDs.

    You can see more details at https://www.cog-genomics.org/plink2/input#vcf .

    Post edited by chrchang on
Sign In or Register to comment.