We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

What INFO/FORMAT fields are required for ReadBackedPhasing algorithm ?

I have been trying to use ReadBackedPhasing, but its is producing an output with no phasing tags (HP tag). It works with a vcf file that is been called on a single bam file, but does not produce any output from vcf files generated from vcf-subset. In short, I would like to know what INFO/FORMAT fields it looks for while assigning HP tag.

Answers

  • gouthamatlagouthamatla SpainMember

    My vcf line looks like:

    gi|589289699|ref|NW_006711278.1|        2239908 .       C       A       9686.62 .       NS=1,TYPE=snp   GT:AO:DP:PL:QA:QR:RO    1/1:3:3:112,9
    
    Gouthams-MacBook-Pro:Downloads goutham$ java -jar GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T ReadBackedPhasing -R ~/Desktop/74533_ref_PanTig1.0_chrUn.fa -I re.sorted_sbset_RG.bam --variant new_vcf.vcf -o phased_out.vcf --phaseQualityThresh 0
    
    INFO  09:55:03,348 HelpFormatter - -------------------------------------------------------------------------------- 
    INFO  09:55:03,350 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.5-0-g36282e4, Compiled 2015/11/25 04:03:56 
    INFO  09:55:03,351 HelpFormatter - Copyright (c) 2010 The Broad Institute 
    INFO  09:55:03,351 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk 
    INFO  09:55:03,354 HelpFormatter - Program Args: -T ReadBackedPhasing -R /Users/goutham/Desktop/74533_ref_PanTig1.0_chrUn.fa -I re.sorted_sbset_RG.bam --variant new_vcf.vcf -o phased_out.vcf --phaseQualityThresh 0 
    INFO  09:55:03,689 HelpFormatter - Executing as [email protected] on Mac OS X 10.11.1 x86_64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_79-b15. 
    INFO  09:55:03,690 HelpFormatter - Date/Time: 2015/12/27 09:55:03 
    INFO  09:55:03,690 HelpFormatter - -------------------------------------------------------------------------------- 
    INFO  09:55:03,690 HelpFormatter - -------------------------------------------------------------------------------- 
    INFO  09:55:04,104 GenomeAnalysisEngine - Strictness is SILENT 
    INFO  09:55:04,410 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000 
    INFO  09:55:04,416 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
    INFO  09:55:04,571 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.15 
    WARN  09:55:04,669 RMDTrackBuilder - Index file /Users/goutham/Downloads/new_vcf.vcf.idx is out of date (index older than input file), deleting and updating the index file 
    INFO  09:55:05,012 RMDTrackBuilder - Writing Tribble index to disk for file /Users/goutham/Downloads/new_vcf.vcf.idx 
    INFO  09:55:05,401 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files 
    INFO  09:55:05,420 GenomeAnalysisEngine - Done preparing for traversal 
    INFO  09:55:05,421 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] 
    INFO  09:55:05,421 ProgressMeter -                 | processed |    time |    per 1M |           |   total | remaining 
    INFO  09:55:05,421 ProgressMeter -        Location |     sites | elapsed |     sites | completed | runtime |   runtime 
    Coverage over ALL samples:
    Number of reads observed: 22118
    Number of variant sites observed: 1389
    Average coverage: 15.92368610511159
    
    --- Phasing summary [minimal haplotype quality (PQ): 0.0, maxPhaseSites: 10, cacheWindow: 20000] ---
    
    INFO  09:55:13,993 ProgressMeter -            done     86222.0     8.0 s      99.0 s       99.9%     8.0 s       0.0 s 
    INFO  09:55:13,994 ProgressMeter - Total runtime 8.57 secs, 0.14 min, 0.00 hours 
    INFO  09:55:13,994 MicroScheduler - 0 reads were filtered out during the traversal out of approximately 14128 total reads (0.00%) 
    INFO  09:55:13,994 MicroScheduler -   -> 0 reads (0.00% of total) failing BadCigarFilter 
    INFO  09:55:13,995 MicroScheduler -   -> 0 reads (0.00% of total) failing DuplicateReadFilter 
    INFO  09:55:13,995 MicroScheduler -   -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter 
    INFO  09:55:13,995 MicroScheduler -   -> 0 reads (0.00% of total) failing MalformedReadFilter 
    INFO  09:55:13,995 MicroScheduler -   -> 0 reads (0.00% of total) failing MappingQualityZeroFilter 
    INFO  09:55:13,996 MicroScheduler -   -> 0 reads (0.00% of total) failing NotPrimaryAlignmentFilter 
    INFO  09:55:13,996 MicroScheduler -   -> 0 reads (0.00% of total) failing UnmappedReadFilter 
    INFO  09:55:15,328 GATKRunReport - Uploaded run statistics report to AWS S3 
    
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi @gouthamatla,

    I'm not sure. What processing did you apply with vcf-subset? Is it something you can't do with GATK's own SelectVariants?

Sign In or Register to comment.