Heads up:
We’re moving the GATK website, docs and forum to a new platform. Read the full story and breakdown of key changes on this blog.
Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Pedigree files for a Queue script

Hi,
I've got my a Queue script running the Best Practice pipeline mostly together but I'm having a heck of a time getting it to accept a pedigree file.

Within my inputs section I have:

@Input(doc="Pedigree file", fullName="pedigree", shortName="ped", required=false)
var pedigree: File = _

then within my methods:

// General arguments to GATK walkers
trait CommandLineGATKArgs extends CommandLineGATK with ExternalCommonArgs {
this.reference_sequence = qscript.reference
this.memoryLimit = memoryLimit
this.num_threads = num_threads
this.num_cpu_threads_per_data_thread = num_threads
this.interval_padding = interval_padding
this.pedigree = pedigree
}

I've also tried setting my inputs to the type @Argument and have tried typing the value as a File and as String. I have also tried putting the this.pedigree = pedigree directly in the VariantAnnotator method, with no success.

In all cases I get the error:

ERROR MESSAGE: Possible de novos annotation can only be used from the Variant Annotator, and must be provided a valid PED file (-ped) from the command line.

My program args indicate that VariantAnnotator isn't given the pedigree file flag or reference:
INFO 13:37:12,220 HelpFormatter - Program Args: -T VariantAnnotator -L /data/gatk_yale_test/.queue/scatterGather/.qlog/project.varAnn.vcf.varannotator-sg/temp_19_of_20/scatter.intervals -R /data/cccb/db/gatk/hg19/ucsc.hg19.fasta -V /data/gatk_yale_test/project.vcf -D /data/cccb/db/gatk/hg19/dbsnp_137.hg19.vcf -o /data/gatk_yale_test/.queue/scatterGather/.qlog/project.varAnn.vcf.varannotator-sg/temp_19_of_20/project.varAnn.vcf -A GenotypeSummaries -A VariantType -A InbreedingCoeff -A TransmissionDisequilibriumTest -A PossibleDeNovo -alwaysAppendDbsnpId

Help please!

Thanks,
-Alex-

Best Answers

Answers

  • Oh, and I forgot to mention, when calling the Queue script I add the line:
    --pedigree /data/this_is_my_pedigree.ped \

  • also, my VariantAnnotator method looks like:

    case class varannotator (inVcf: File, outVcf: File) extends VariantAnnotator {
    this.variant = inVcf
    // this.snpEffFile = inSnpEffFile
    this.out = outVcf
    this.alwaysAppendDbsnpId = true
    this.dbsnp = dbSNPvqsr
    this.R = reference
    // this.annotation = Seq("GenotypeSummaries", "VariantType")
    this.annotation = Seq("GenotypeSummaries", "VariantType", "InbreedingCoeff", "TransmissionDisequilibriumTest", "PossibleDeNovo")
    this.isIntermediate = false
    this.analysisName = queueLogDir + outVcf + ".varannotator"
    this.jobName = queueLogDir + outVcf + ".varannotator"
    this.scatterCount = nContigs
    }

  • Hmm. That was certainly ONE error.
    However, its still not working, and I did previously try putting the "this.pedigree = pedigree" directly into the VariantAnnotator case class.

    Should the input be an @Input or an @Argument, and as a File or String?

    Currently my script looks like:

    @Input(doc="Pedigree file", fullName="pedigree", shortName="ped", required=false)
    var pedigree: File = _
    // @Argument(doc="Pedigree file", fullName="pedigree", shortName="ped", required=false)
    // var pedigree: File = _

    ...

    // General arguments to GATK walkers
    trait CommandLineGATKArgs extends CommandLineGATK with ExternalCommonArgs {
    this.reference_sequence = qscript.reference
    this.memoryLimit = memoryLimit
    this.num_threads = num_threads
    this.num_cpu_threads_per_data_thread = num_threads
    this.interval_padding = interval_padding
    this.pedigree = pedigree
    this.pedigreeValidationType = org.broadinstitute.gatk.engine.samples.PedigreeValidationType.SILENT
    }

    ...

    case class varannotator (inVcf: File, outVcf: File) extends VariantAnnotator with CommandLineGATKArgs {
    this.variant = inVcf
    // this.snpEffFile = inSnpEffFile
    this.out = outVcf
    this.alwaysAppendDbsnpId = true
    this.dbsnp = dbSNPvqsr
    this.R = reference
    // this.annotation = Seq("GenotypeSummaries", "VariantType")
    this.annotation = Seq("GenotypeSummaries", "VariantType", "InbreedingCoeff", "TransmissionDisequilibriumTest", "PossibleDeNovo")
    this.isIntermediate = false
    this.analysisName = queueLogDir + outVcf + ".varannotator"
    this.jobName = queueLogDir + outVcf + ".varannotator"
    this.scatterCount = nContigs
    }

    and kicks out:

    INFO 14:03:51,104 HelpFormatter - Program Args: -T VariantAnnotator -L /data/gatk_yale_test/.queue/scatterGather/.qlog/project.varAnn.vcf.varannotator-sg/temp_19_of_20/scatter.intervals -R /data/cccb/db/gatk/hg19/ucsc.hg19.fasta -pedValidationType SILENT -V /data/gatk_yale_test/project.vcf -D /data/cccb/db/gatk/hg19/dbsnp_137.hg19.vcf -o /data/gatk_yale_test/.queue/scatterGather/.qlog/project.varAnn.vcf.varannotator-sg/temp_19_of_20/project.varAnn.vcf -A GenotypeSummaries -A VariantType -A InbreedingCoeff -A TransmissionDisequilibriumTest -A PossibleDeNovo -alwaysAppendDbsnpId

    ERROR MESSAGE: Possible de novos annotation can only be used from the Variant Annotator, and must be provided a valid PED file (-ped) from the command line.
  • pdexheimerpdexheimer Member ✭✭✭✭

    Hmm, I'm a little surprised this compiled, now that I look at it more closely.

    Your pedigree variable should be a File, though there are implicit conversions defined in the QFunction class that will give you a little breathing room. But the CommandLineGATK.pedigree field should be a List[File], not a File - so you either need to add your variable to the list (this.pedigree :+= pedigree) or wrap it into a List before assigning it (this.pedigree = List(pedigree)). I can't quite figure out why a straight assignment of a File to a List compiled...

  • Okay, so I modified my CommandLineGATKArgs to:

    // General arguments to GATK walkers trait CommandLineGATKArgs extends CommandLineGATK with ExternalCommonArgs { this.reference_sequence = qscript.reference this.memoryLimit = memoryLimit this.num_threads = num_threads this.num_cpu_threads_per_data_thread = num_threads this.interval_padding = interval_padding // this.pedigree = pedigree this.pedigree :+= pedigree // this.pedigree = List(pedigree) this.pedigreeValidationType = org.broadinstitute.gatk.engine.samples.PedigreeValidationType.SILENT }
    and got a compile time error of:

    INFO 14:22:41,002 QCommandLine - Shutting down jobs. Please wait... INFO 14:24:13,082 QScriptManager - Compiling 1 QScript ERROR 14:24:14,951 QScriptManager - ExomeGATKPipeline.scala:450: type mismatch; found : Seq[Object] required: Seq[java.io.File] ERROR 14:24:14,955 QScriptManager - this.pedigree :+= pedigree ERROR 14:24:14,956 QScriptManager - ^ ERROR 14:24:15,660 QScriptManager - two errors found

    After that I also tried changing my input type, which made no difference:
    from:
    var pedigree: File = _
    to:
    var pedigree: Seq[File] = Seq()

  • pdexheimerpdexheimer Member ✭✭✭✭

    Sorry, I doubt I'm going to be able to figure this out without sitting down with the code. I suspect that the error is pointing you in the right area - make sure that pedigree is a java.io.File, not some other kind of File or String or something. I've occasionally run into shadowing problems in Scala, where what you think is a File isn't actually the File you think it is. If that makes sense...

  • Hmm. I've been able to muddle through my other problems, but this really has me stumped.
    Do you have some example code internally that I could crib off of? Do any of your internal pipelines read in a PED file? I'd be interested to see how you take the PED file as an argument, and how you pass it to the internal GATK calls.

    Alternatively, I'd be immensely appreciative if you'd take a look at my queue script. Its a modified version of the old Broad data processing pipeline so it should be relatively familiar. Its hosted on Github at: https://github.com/alexholman/gatk_queue_pipeline

    Thanks,
    -Alex-

  • wait... wait... I think I have it!
    For the file input I should have been modeling after the IntervalsFile input not the BAM file input:
    @Argument(doc="the -L interval string to be used by GATK - output bams at interval only", fullName="gatk_interval_string", shortName="L", required=false) var intervalString: String = ""
    This seems to work:
    @Argument(doc="Pedigree file", fullName="pedigree", shortName="ped", required=false) var pedigree: String = ""
    with:
    trait CommandLineGATKArgs extends CommandLineGATK with ExternalCommonArgs { this.reference_sequence = qscript.reference this.memoryLimit = memoryLimit this.num_threads = num_threads this.num_cpu_threads_per_data_thread = num_threads this.interval_padding = interval_padding this.pedigree ++= Seq(qscript.pedigree) this.pedigreeValidationType = org.broadinstitute.gatk.engine.samples.PedigreeValidationType.SILENT }

    This is running now, nothing has broken, and the logs show that VariantAnnotator is correctly getting the PED file.
    I think this might just be working...

    Thanks for all your help.
    -Alex-

Sign In or Register to comment.