Objective

Analyze biodiesel potential across algae groups and test for differences.

## **Analyzed groups:**  diatom, Thalassiosira, Chlorella, Nannochloropsis, Scenedesmus

From algae to biodiesel (overview)

  • Cultivation: photobioreactor or ponds with CO₂, N, P, trace nutrients.
  • Harvesting: flocculation, filtration, or centrifugation concentrate biomass.
  • Lipid extraction: solvents, supercritical CO₂, or mechanical pressing.
  • Transesterification: triglycerides + methanol (catalyst) → biodiesel (FAME) + glycerol.
  • Polishing and blending: wash, dry, blend to fuel specs; glycerol to by-product stream.

Summary stats

## # A tibble: 5 × 4
##   group           mean_bf  sd_bf     n
##   <fct>             <dbl>  <dbl> <int>
## 1 diatom            0.286 0.0287    96
## 2 Thalassiosira     0.299 0.0284   119
## 3 Chlorella         0.463 0.0363    76
## 4 Nannochloropsis   0.600 0.0409    74
## 5 Scenedesmus       0.437 0.0311    55

ggplot: biodiesel fraction by group

Math: biodiesel by group

Let \(Y_{gi}\) be biodiesel fraction for observation \(i\) in group \(g\).
Group mean and variance: \[\bar Y_g=\frac{1}{n_g}\sum_{i=1}^{n_g}Y_{gi},\quad s_g^2=\frac{1}{n_g-1}\sum_{i=1}^{n_g}(Y_{gi}-\bar Y_g)^2.\]

Code: biodiesel fraction by group

ggplot(df, aes(group, biodiesel_fraction)) +
  geom_violin(trim = FALSE) +
  geom_boxplot(width = 0.2, outlier.shape = NA) +
  geom_point(position = position_jitter(width = 0.15), alpha = 0.35) +
  labs(x = "Group", y = "Biodiesel fraction")

ggplot: lipid% vs biodiesel fraction

Math: lipid% vs biodiesel

Per-group OLS fits \[Y_{gi}=\beta_{0g}+\beta_{1g} L_{gi}+\varepsilon_{gi},\] with \(L\)=lipid% and \(Y\)=biodiesel fraction; line is \(\hat Y_{gi}=\hat\beta_{0g}+\hat\beta_{1g}L_{gi}\).

Code: lipid% vs biodiesel

ggplot(df, aes(lipid_pct, biodiesel_fraction, color = group)) +
  geom_point(alpha = 0.6) +
  geom_smooth(method = "lm", se = FALSE) +
  labs(x = "Lipid %", y = "Biodiesel fraction")

Energy-normalized yield (biodiesel per kWh)

Math: energy-normalized yield

For observation \(i\) with input \(E_i\) (kWh/L) and biodiesel fraction \(Y_i\), \[\text{yield\_per\_kWh}_i=\dfrac{Y_i}{E_i}.\] Violin and boxplot summarize this distribution by group.

Code: energy-normalized yield

yd <- df %>%
  filter(!is.na(biodiesel_fraction),
         !is.na(energy_kWh_per_L),
         energy_kWh_per_L > 0) %>%
  mutate(yield_per_kWh = biodiesel_fraction / energy_kWh_per_L)

ggplot(yd, aes(group, yield_per_kWh, fill = group)) +
  geom_violin(trim = FALSE, alpha = 0.4) +
  geom_boxplot(width = 0.2, outlier.shape = NA) +
  stat_summary(fun = mean, geom = "point", size = 2) +
  labs(x = "Group", y = "Biodiesel fraction per kWh (1/L·kWh)", title = "Higher is better") +
  theme(legend.position = "none")

3D plotly: lipid %, growth rate, biodiesel fraction

Math: multivariate relation

A plane fit corresponds to multiple regression: \[Y_i=\beta_0+\beta_1 L_i+\beta_2 r_i+\varepsilon_i,\] with \(L\)=lipid% and \(r\)=growth rate per day.

Code: 3D plotly

plot_ly(df, x = ~lipid_pct, y = ~growth_rate_per_day, z = ~biodiesel_fraction,
        type = "scatter3d", mode = "markers", color = ~group)

Fit and coefficients

## # A tibble: 5 × 5
##   term                 estimate std.error statistic   p.value
##   <chr>                   <dbl>     <dbl>     <dbl>     <dbl>
## 1 (Intercept)            0.286    0.00335     85.4  2.50e-265
## 2 groupThalassiosira     0.0135   0.00450      2.99 2.93e-  3
## 3 groupChlorella         0.177    0.00504     35.2  1.03e-126
## 4 groupNannochloropsis   0.314    0.00507     61.9  9.11e-212
## 5 groupScenedesmus       0.151    0.00555     27.3  1.07e- 94

Extended model (LaTeX)

We also adjust for lipid% (\(L\)), growth rate (\(r\)), and total energy (\(E\)). \[ \begin{aligned} Y &= \beta_0 \\ &\quad + \sum_{g\ne \text{diatom}} \beta_g \mathbf{1}\{G=g\} \\ &\quad + \gamma_1 L + \gamma_2 r + \gamma_3 E + \varepsilon. \end{aligned} \]

Conclusion

## * Winner by adjusted model: **Scenedesmus** (coef = 0.156, p = 3.59e-79)
## * Lipid% and growth rate are positive predictors; energy impact is small in this sample.
## * Groups overlap after noise; ranking shows mean advantage, not dominance.