To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

ROD files out of FASTA? + other questions

Hey all, newbie here.
tl;dr:
I have a fasta file containing two sequences of my region of interest (~5.5 kbp), that differ in ~100 SNPs. What is the fastest way to generate a ROD file out of these sequences, as an input to BQSR?

So, hey.
I'm trying to determine the frequency of a genetic fragment I introduced into a bacterial strain, at several different samples. As I wrote, my current challenge is to create the aforementioned ROD file; however, my project is a bit different than 'usual' variant calling projects, and any advice regarding processing and analysis would be appreciated.

  1. I have a WT bacteria strain. I introduced a 5.5kbp genetic fragment to it, by electroporation and homologous recombination. It is safe to assume different parts of the fragment have invaded the host's genome with different efficiencies (so I may have 'hybrid' variants, that are half WT and half mutated). The introduced fragment had ~100 SNPs compared to the WT fragment.
  2. I took that sample and grew it on different conditions, in order to determine whether the fragment I introduced is beneficial to the bacteria.
  3. The fragments were PCR-amplified, sheared to smaller DNA fragments (~300-500 bp), and sequenced (150bp per read, paired-end). I have a coverage of 10^6 reads per base for each sample.
  4. I'd like to determine the frequency of each SNP at each sample, and ideally, the identity and frequency of each variant.

I have:
The sequencing samples (1 sample of the initial pool, 6 samples of biological replicates for one condition, and 3 samples of biological replicates for the second condition), the sequence of the WT's genome, and the sequence of the of the fragment I introduced.

My questions:
1. How do I turn the fasta file containing my WT and modified fragments to a ROD file (type doesn't matter) for the BQSR procedure? I do not need to relay on the sequenced samples to determine the differences between the sequences, I already know them.
2. Since all my reads originate from a PCR-amplified fragment, can de-duplication introduce biases \ underestimation to my data?
3. I have a huge coverage. Does it require any different processing methods?
4. Any other advice?

Thanks,
Omer

Answers

Sign In or Register to comment.