Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

What INFO/FORMAT fields are required for ReadBackedPhasing algorithm ?

I have been trying to use ReadBackedPhasing, but its is producing an output with no phasing tags (HP tag). It works with a vcf file that is been called on a single bam file, but does not produce any output from vcf files generated from vcf-subset. In short, I would like to know what INFO/FORMAT fields it looks for while assigning HP tag.

Answers

  • gouthamatlagouthamatla SpainMember

    My vcf line looks like:

    gi|589289699|ref|NW_006711278.1|        2239908 .       C       A       9686.62 .       NS=1,TYPE=snp   GT:AO:DP:PL:QA:QR:RO    1/1:3:3:112,9
    
    Gouthams-MacBook-Pro:Downloads goutham$ java -jar GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar -T ReadBackedPhasing -R ~/Desktop/74533_ref_PanTig1.0_chrUn.fa -I re.sorted_sbset_RG.bam --variant new_vcf.vcf -o phased_out.vcf --phaseQualityThresh 0
    
    INFO  09:55:03,348 HelpFormatter - -------------------------------------------------------------------------------- 
    INFO  09:55:03,350 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.5-0-g36282e4, Compiled 2015/11/25 04:03:56 
    INFO  09:55:03,351 HelpFormatter - Copyright (c) 2010 The Broad Institute 
    INFO  09:55:03,351 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk 
    INFO  09:55:03,354 HelpFormatter - Program Args: -T ReadBackedPhasing -R /Users/goutham/Desktop/74533_ref_PanTig1.0_chrUn.fa -I re.sorted_sbset_RG.bam --variant new_vcf.vcf -o phased_out.vcf --phaseQualityThresh 0 
    INFO  09:55:03,689 HelpFormatter - Executing as [email protected] on Mac OS X 10.11.1 x86_64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_79-b15. 
    INFO  09:55:03,690 HelpFormatter - Date/Time: 2015/12/27 09:55:03 
    INFO  09:55:03,690 HelpFormatter - -------------------------------------------------------------------------------- 
    INFO  09:55:03,690 HelpFormatter - -------------------------------------------------------------------------------- 
    INFO  09:55:04,104 GenomeAnalysisEngine - Strictness is SILENT 
    INFO  09:55:04,410 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000 
    INFO  09:55:04,416 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
    INFO  09:55:04,571 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.15 
    WARN  09:55:04,669 RMDTrackBuilder - Index file /Users/goutham/Downloads/new_vcf.vcf.idx is out of date (index older than input file), deleting and updating the index file 
    INFO  09:55:05,012 RMDTrackBuilder - Writing Tribble index to disk for file /Users/goutham/Downloads/new_vcf.vcf.idx 
    INFO  09:55:05,401 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files 
    INFO  09:55:05,420 GenomeAnalysisEngine - Done preparing for traversal 
    INFO  09:55:05,421 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] 
    INFO  09:55:05,421 ProgressMeter -                 | processed |    time |    per 1M |           |   total | remaining 
    INFO  09:55:05,421 ProgressMeter -        Location |     sites | elapsed |     sites | completed | runtime |   runtime 
    Coverage over ALL samples:
    Number of reads observed: 22118
    Number of variant sites observed: 1389
    Average coverage: 15.92368610511159
    
    --- Phasing summary [minimal haplotype quality (PQ): 0.0, maxPhaseSites: 10, cacheWindow: 20000] ---
    
    INFO  09:55:13,993 ProgressMeter -            done     86222.0     8.0 s      99.0 s       99.9%     8.0 s       0.0 s 
    INFO  09:55:13,994 ProgressMeter - Total runtime 8.57 secs, 0.14 min, 0.00 hours 
    INFO  09:55:13,994 MicroScheduler - 0 reads were filtered out during the traversal out of approximately 14128 total reads (0.00%) 
    INFO  09:55:13,994 MicroScheduler -   -> 0 reads (0.00% of total) failing BadCigarFilter 
    INFO  09:55:13,995 MicroScheduler -   -> 0 reads (0.00% of total) failing DuplicateReadFilter 
    INFO  09:55:13,995 MicroScheduler -   -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter 
    INFO  09:55:13,995 MicroScheduler -   -> 0 reads (0.00% of total) failing MalformedReadFilter 
    INFO  09:55:13,995 MicroScheduler -   -> 0 reads (0.00% of total) failing MappingQualityZeroFilter 
    INFO  09:55:13,996 MicroScheduler -   -> 0 reads (0.00% of total) failing NotPrimaryAlignmentFilter 
    INFO  09:55:13,996 MicroScheduler -   -> 0 reads (0.00% of total) failing UnmappedReadFilter 
    INFO  09:55:15,328 GATKRunReport - Uploaded run statistics report to AWS S3 
    
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi @gouthamatla,

    I'm not sure. What processing did you apply with vcf-subset? Is it something you can't do with GATK's own SelectVariants?

Sign In or Register to comment.