We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

How can I check the I/O of GATK?

I wanna optimize GATK with Hadoop.
GATK internally support MapReduce framework. So I wanna change framework in I/O process.
For changing framework, I'll try to analyze the source. But I exactly know the I/O process.
/BiO/tool/GenomeAnalysisTK-1.6-5-g557da77/o/org/broadinstitute/sting/gatk/io Path is right?
How compose the IO process??


Best Answers


  • sangrholeesangrholee Member

    Thank you for replying my question :)

    Yes. I think that is big project. But if that is completed, the efficiency of GATK can be improved.
    So I try to change I/O framework rather than internal framework(existing MapReduce and so on).
    I did try to change other tool in I/O framework, and that was finished successfully.
    I think GATK also can be changed in I/O framework adding MapReduce.
    You said to rewrite executive.* and traversals*.
    Am I just rewrite that frameworks only?
    I think this project is related gatk.io.*. (Surely, I/O process expend all framework.)

    Can I get the blueprint of GATK or Architecture? (rely on framework)

  • sangrholeesangrholee Member

    Thank you for replying detailed.
    Yes. I'll not try to change internal function like similar MR.
    I think that is too hard and tough project as you say.
    So I try to change the I/O with MapReduce.
    On input processing, If MapReduce is adapted, the efficiency is better than before, I think.( Using linux Pipe)
    For working this project, I have been analysis to the framework of GATK.

    My analysis result is,
    1) CommandLineGATK.class ( create CommandLineGATK object in main function)
    2) CommandLineExecutable.class( create GenomeAnalysisEngine object, run GenomeAnalysisEngine.execute() )
    3) GenomeAnalysisEngine.class (run initializeOutputStream())
    4) OutputTracker.class (create input object by ArgumentSource.class with hashmap, create output object by Stub.class and Storage.class with hashmap)
    and so on.

    When I was tried to improve other sequencing tool, I used to work this process(analyzing source).

    Do you support the document of explaining framework like this process??

  • jitendrasbhatijitendrasbhati IndiaMember
    edited January 2014

    @droazen :hi .Can you elaborate more on hadoop streaming using gatk like what steps needs to be followed or command to run GATK with hadoop streaming?It would be great if you can share the idea on the code also.

  • jitendrasbhatijitendrasbhati IndiaMember

    @sangrholee were you able to run gatk in hadoop.if yes kindly share your thoughts on the same.Your help would be appreciated.Thanks

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi @jitendrasbhati,

    We are currently looking into the possibility of porting GATK to Hadoop. However we don't have the time right now to discuss this at length on the forum. When we have actionable results we will let you know. In the meantime I'm afraid you're on your own. Good luck!

  • jitendrasbhatijitendrasbhati IndiaMember

    Thanks Geraldine. Will wait for your answer once you have actionable results.

  • fanliangzefanliangze ChinaMember

    any update on this?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    We have moved away from Hadoop for now and are looking at other technologies.

Sign In or Register to comment.