Resumen de la ponencia
High throughput genomic methods, such as ChIP-Seq, have greatly expanded our knowledge of transcription factor (TF) binding to DNA. A frequent observation is the presence of binding sites for individual TFs that span a wide range of apparent affinities. A quantitative biophysical understanding of the basis of TF binding strength is necessary for unraveling the physiological impact of diverse TF binding and would pave the way for designing novel interactions for synthetic biology applications. We have developed a convolutional neural network that learns a biophysical model of transcription factor binding energy from high throughput ChIP-Seq data. And we have successfully applied our model to new ChIP-Seq data for over 200 transcription factors. Our model correctly identifies all known binding sites with single nucleotide accuracy. And our model goes beyond traditional models of positional frequencies to quantify the contribution of each nucleotide at each position of a site to the overall binding strength. Using these quantitative models, we have successfully engineered novel binding sites, including extrapolating to stronger sites than are present naturally. Our novel binding sites are validated using both high throughput genomics, and quantitative biophysical measurements.
Dr. James Galagan, Precision Diagnostics Center, Boston University