What is computational biology?

Biosci 1540: Computational Biology Nathan Brouwer nlb24 at pitt edu

What is “Computational Biology”?

3 Types of biologists

Lab / Field / Clinical biologists study how the world IS

Theoretical biologists study how the COULD BE

Computational biologists

Computational biologists often work on…

Computational biology is an approach NOT a field of study

What are some of the hardest questions in biology?

The Thinker

One of the hardest questions:

How do proteins fold?

Two issues

Phospholipase A2

“How do proteins fold” has a unique place in biology

REVIEW: Central Dogma

REVIEW: Expanded Central Dogma

REVIEW: Expanded Central Dogma

Levels of protein structure

Protein-x interactions also important

STRUCTURAL BIOLOGY is the empircal study of macromolecular structures

STRUCTURAL BIOINFORMATICS is the computational study of macromolecular sequences

Goal: Can we …

Strucual bioinformatics uses several approaches

Ab initio methods try to predict structure from just sequence

Ab initio methods are REALLY hard

NOTE: Computer simulations and working with powverful computers are a key feature of computational biology

Homology modeling uses evolutionary similar proteins to predict 3D structure

Number of known structure increaing exponentially

strucs <- c(194259,185472,172880,158882,147386,
            136217,125151,114337,105085,95523,
            86184,77429,69502,61757,54468,47565,
            40431,34024,28690,23541,19394,16400,
            13587,10960,8604,6548
,4984,3812,2871,1582,886,694,507
,365,291,238,213
,195,175,153,117,85
,69,53,42,36,13)

strucs <- rev(strucs)

yrs <- 1:length(strucs) + 1975
plot(strucs ~ yrs, type  = "b", pch = 16, xlab = "Year", 
     ylab = "Total Structures",
     main = "Growth of PDB over time")

Homology modeling only works for proteins w/ close evolutionary relatives (HOMOLOGS)

NOTE: Studying evolution / using evolutionary information is a common aspects of computational biology

NOTE: using lots of empirical data produced by others is a common feature of comp bio

NOTE: Using databases is a common feature of comp bio

THREADING pieces together solved subunits of many proteins to predict unknown structure

Structural bioinformatics approaches can be combined

CASP is an annual competition to predict the structure of proteins

“Critical Assessment of protein Structural prediction”

NOTE: Key theme of MACHINE LEARNING (ML) is “holding out” data from analysis so that a finalized model can be judged

CASP uses the machine-learning (ML) strategy by holding out data from participants

Incremental progress has been made on predicting 3D structures

Hello AlphaFold!

Recently ML has been used to revolutionize the prediction of macromolecules

AlphaFold uses

What is a “deep learning algorithm”

Aside: Math vs. algorithms

Math

Aside: Math vs. algorithms

Algorithms

Back to “Deep learning algorithm”

What does the “deep” mean”

So, alpha fold is …

ML / DL is gaining mainstream recognition, but

Application of ML/DL in biology

Genomics

Medicine

Ecology

(Current) limits of AlphaFold

These should all be tackled with time…

How does Alaphfold (kinda) work

Once AlphaFold was fully trained it was used for prediction

Analogies for the training / validation / test set

Key aspecs of compio apparent in AlphaFold

This is “data science” and the focus of this class: