Yield Performance Analytics for the EYIA Greenhouse Programme 2024

Author

Attah Sandra

Published

May 10, 2026

1. Executive Summary

Eupepsia Place Ltd operates the Enterprise for Youth in Agriculture (EYIA) programme, a structured greenhouse-farming initiative that trains and equips young agri-entrepreneurs across six cohorts to produce vegetable crops commercially. In 2024, 140 participating companies planted crops across four greenhouses over as many as eight growing cycles, generating 1,460 planting records and 741 realised yield observations spanning February to November 2024.

This analysis applies five techniques: Exploratory Data Analysis, Data Visualisation, Hypothesis Testing, Correlation Analysis, and Linear Regression. The central question is what drives yield performance in the EYIA programme, and which crop-greenhouse-cycle configurations should the programme recommend to maximise returns.

Key findings: yield is highly right-skewed with a small number of high-performing combinations masking widespread underperformance; Greenhouse 1 delivers significantly higher average yields than GH4 across all crops; Habanero in GH1 is the single highest-yielding combination with a mean of 1,408 kg; cycle number and seedling count have limited predictive power once greenhouse and crop type are controlled for; and a regression model confirms that greenhouse and crop type together explain the majority of yield variation.

Recommendation: EYIA should prioritise Habanero and Cucumber cultivation in Greenhouses 1 and 2 for incoming cohorts, and introduce targeted agronomic intervention for companies producing below 30 kg per cycle.


2. Professional Disclosure

Name: Attah Sandra

Job Title: Business Development Manager, EYIA Programme

Organisation: Eupepsia Place Ltd is a private agribusiness firm based in Ogun State, Nigeria, specialising in commercial horticulture and vegetable crop production. EYIA is a structured capacity-building programme embedded within the company that trains young entrepreneurs to operate greenhouse farming units independently.

Relevance of the Five Techniques to My Role:

  1. Exploratory Data Analysis: As BDM, I work with harvest data submitted across cohorts to track programme performance. EDA allows me to identify data quality gaps, understand the distribution of yields across companies, and flag anomalies before presenting results to programme leadership or development partners.

  2. Data Visualisation: I regularly prepare performance reports and investor briefings. Visualisation translates complex multi-cohort harvest data into accessible narratives that inform strategic conversations with stakeholders who are not technically trained.

  3. Hypothesis Testing: A recurring operational question is whether performance differences across greenhouses or crop types are real or simply due to chance. Formal hypothesis testing provides the statistical rigour needed to back programme recommendations with evidence rather than intuition.

  4. Correlation Analysis: Understanding how seedling counts, cycle number, and greenhouse assignment relate to yield helps me advise incoming companies on resource allocation. Knowing which variables are associated with performance improvement is directly actionable.

  5. Linear Regression: Regression allows me to model the combined effect of multiple programme variables on yield simultaneously, producing a tool that can predict expected output for a given crop-greenhouse-cycle configuration and support realistic target-setting for new cohort entrants.


3. Data Collection and Sampling

Source: Internal programme monitoring records maintained by Eupepsia Place Ltd for the EYIA 2024 cohort cycle.

Collection Method: Yield data were recorded by trained EYIA field officers during scheduled physical farm visits to each participating company’s greenhouse unit. Officers recorded harvest weights at each visit using standardised harvest tracking sheets, which were subsequently consolidated into a master Excel workbook.

Sampling Frame: All 140 companies enrolled in the EYIA programme across six cohorts in 2024. This is a census of programme participants, not a random sample. Every enrolled company is represented.

Time Period: February 2024 to November 2024, covering up to eight growing cycles per company.

Variables Collected:

Variable Type Description
Cohort Categorical Programme cohort 1 to 6
Company Categorical Participating agri-enterprise name
Cycle Numeric Growing cycle number within the cohort
Greenhouse Categorical Assigned greenhouse unit GH1 to GH4
Crop Type Categorical Tomatoes, Bell Peppers, Habanero, Lettuce, Cucumber, Kale
Seedlings/Stands Numeric Number of seedlings planted per cycle
Start Date Date Date the planting cycle commenced
Realized Yield kg Numeric Actual harvest weight in kilograms

Ethical Notes: This dataset constitutes internal operational records of Eupepsia Place Ltd. All participating companies enrol in the EYIA programme under a formal agreement that includes consent for programme monitoring and data use for performance evaluation. No personally identifiable information beyond company trading names is included.

Data Limitations: 999 of 1,807 rows contain missing yield values, representing cycles where harvests had not yet occurred at the time of data extraction or crops that failed. The raw Crop Type column contained 109 distinct entries due to data entry inconsistencies and only the six primary crop categories were retained for analysis.


4. Data Description

4.1 Load and Clean Data

Code
library(tidyverse)
library(readxl)
library(janitor)
library(lubridate)
library(knitr)
library(kableExtra)

raw <- read_excel(
  "Cleaned_EYIA_Crops_2024.xlsx",
  sheet = "Master Data",
  skip  = 1
)

raw <- raw |> clean_names()

valid_crops <- c("Bell Peppers", "Tomatoes", "Lettuce",
                 "Kale", "Habanero", "Cucumber", "Cucumbers", "Ugu")

df <- raw |>
  filter(crop_type %in% valid_crops) |>
  mutate(
    crop_type     = if_else(crop_type == "Cucumbers", "Cucumber", crop_type),
    seedlings_num = suppressWarnings(as.numeric(seedlings_stands)),
    yield_kg      = realized_yield_kg,
    start_clean   = parse_date_time(
      str_replace_all(
        start_date,
        c("st " = " ", "nd " = " ", "rd " = " ", "th " = " ",
          "Marrch" = "March", "may" = "May", " of " = " ")
      ),
      orders = c("d B Y", "d b Y"),
      quiet  = TRUE
    ),
    month      = month(start_clean, label = TRUE),
    cohort     = factor(cohort),
    greenhouse = factor(greenhouse),
    crop_type  = factor(crop_type)
  )

cat("Rows after cleaning  :", nrow(df), "\n")
Rows after cleaning  : 1460 
Code
cat("Yield observations   :", sum(!is.na(df$yield_kg)), "\n")
Yield observations   : 741 
Code
cat("Unique companies     :", n_distinct(df$company), "\n")
Unique companies     : 124 
Code
cat("Cohorts              :", n_distinct(df$cohort), "\n")
Cohorts              : 6 
Code
import pandas as pd
import numpy as np
import re
import warnings
warnings.filterwarnings("ignore")

raw_py = pd.read_excel(
    "Cleaned_EYIA_Crops_2024.xlsx",
    sheet_name = "Master Data",
    skiprows   = 1,
    header     = 0
)

valid_crops = ["Bell Peppers", "Tomatoes", "Lettuce",
               "Kale", "Habanero", "Cucumber", "Cucumbers", "Ugu"]

df_py = raw_py[raw_py["Crop Type"].isin(valid_crops)].copy()
df_py["Crop Type"]     = df_py["Crop Type"].replace("Cucumbers", "Cucumber")
df_py["Seedlings_num"] = pd.to_numeric(df_py["Seedlings/Stands"], errors="coerce")
df_py.rename(columns={"Realized Yield (kg)": "yield_kg"}, inplace=True)

def parse_dt(s):
    if pd.isna(s):
        return pd.NaT
    s = str(s).strip()
    s = re.sub(r"(\d+)(st|nd|rd|th)", r"\1", s, flags=re.IGNORECASE)
    s = s.replace("Marrch", "March").replace(" of ", " ").replace("may", "May")
    try:
        return pd.to_datetime(s, dayfirst=True)
    except Exception:
        return pd.NaT

df_py["start_date"] = df_py["Start Date"].apply(parse_dt)
df_py["month"]      = df_py["start_date"].dt.strftime("%b")

print(f"Rows after cleaning  : {len(df_py):,}")
Rows after cleaning  : 1,460
Code
print(f"Yield observations   : {df_py['yield_kg'].notna().sum():,}")
Yield observations   : 741
Code
print(f"Unique companies     : {df_py['Company'].nunique():,}")
Unique companies     : 124
Code
print(f"Cohorts              : {df_py['Cohort'].nunique():,}")
Cohorts              : 6

4.2 Variable Summary

Code
library(skimr)
df |>
  select(cohort, greenhouse, crop_type, cycle, seedlings_num, yield_kg) |>
  skim()
Data summary
Name select(…)
Number of rows 1460
Number of columns 6
_______________________
Column type frequency:
factor 3
numeric 3
________________________
Group variables None

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
cohort 0 1 FALSE 6 Coh: 438, Coh: 320, Coh: 256, Coh: 217
greenhouse 0 1 FALSE 4 Gre: 465, Gre: 361, Gre: 326, Gre: 308
crop_type 0 1 FALSE 7 Tom: 456, Hab: 395, Bel: 249, Let: 165

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
cycle 0 1.00 3.22 1.78 1 2.0 3.00 5 8 ▇▃▆▂▁
seedlings_num 195 0.87 442.23 418.11 165 165.0 350.00 350 1500 ▇▁▁▁▁
yield_kg 719 0.51 99.13 244.01 0 5.5 32.35 87 2024 ▇▁▁▁▁
Code
cols = ["Cohort", "Greenhouse", "Crop Type", "Cycle", "Seedlings_num", "yield_kg"]
print(df_py[cols].describe(include="all").round(2).to_string())
          Cohort    Greenhouse Crop Type    Cycle  Seedlings_num  yield_kg
count       1460          1460      1460  1460.00        1265.00    741.00
unique         6             4         7      NaN            NaN       NaN
top     Cohort 1  Greenhouse 1  Tomatoes      NaN            NaN       NaN
freq         438           465       456      NaN            NaN       NaN
mean         NaN           NaN       NaN     3.22         442.23     99.13
std          NaN           NaN       NaN     1.78         418.11    244.01
min          NaN           NaN       NaN     1.00         165.00      0.00
25%          NaN           NaN       NaN     2.00         165.00      5.50
50%          NaN           NaN       NaN     3.00         350.00     32.35
75%          NaN           NaN       NaN     5.00         350.00     87.00
max          NaN           NaN       NaN     8.00        1500.00   2024.00

5. Exploratory Data Analysis

5.1 Data Quality Issues

Code
missing_tbl <- df |>
  group_by(cohort) |>
  summarise(
    Total   = n(),
    Missing = sum(is.na(yield_kg)),
    Pct     = round(100 * Missing / Total, 1)
  )

kable(
  missing_tbl,
  col.names = c("Cohort", "Total Records", "Missing Yield", "Pct Missing"),
  caption   = "Issue 1: Missing yield values by cohort"
) |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Issue 1: Missing yield values by cohort
Cohort Total Records Missing Yield Pct Missing
Cohort 1 438 135 30.8
Cohort 2 153 103 67.3
Cohort 3 76 25 32.9
Cohort 4 256 206 80.5
Cohort 5 217 98 45.2
Cohort 6 320 152 47.5
Code
df_y  <- df |> filter(!is.na(yield_kg))
q1    <- quantile(df_y$yield_kg, 0.25)
q3    <- quantile(df_y$yield_kg, 0.75)
fence <- q3 + 1.5 * (q3 - q1)
out   <- df_y |> filter(yield_kg > fence)

cat("\nIssue 2: Outliers above upper fence of", round(fence, 1), "kg\n")

Issue 2: Outliers above upper fence of 209.2 kg
Code
cat("Count:", nrow(out), "| Max:", max(df_y$yield_kg), "kg\n\n")
Count: 86 | Max: 2024 kg
Code
out |>
  select(cohort, company, greenhouse, crop_type, cycle, yield_kg) |>
  arrange(desc(yield_kg)) |>
  head(10) |>
  kable(caption = "Top 10 outlier records") |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Top 10 outlier records
cohort company greenhouse crop_type cycle yield_kg
Cohort 4 Hydro Nuture Agri System Greenhouse 1 Habanero 6 2024.0
Cohort 4 Green Lush(Mit) Greenhouse 1 Habanero 6 2024.0
Cohort 4 Nexus Farm Greenhouse 1 Habanero 6 2024.0
Cohort 4 Hydro Nuture Agri System Greenhouse 1 Habanero 7 2024.0
Cohort 4 Green Lush(Mit) Greenhouse 1 Habanero 7 2024.0
Cohort 4 Nexus Farm Greenhouse 1 Habanero 7 2024.0
Cohort 4 Hydro Nuture Agri System Greenhouse 1 Habanero 8 2024.0
Cohort 4 Green Lush(Mit) Greenhouse 1 Habanero 8 2024.0
Cohort 4 Nexus Farm Greenhouse 1 Habanero 8 2024.0
Cohort 6 Sunset Campo Farm Ltd Greenhouse 2 Cucumber 4 794.3
Code
miss = df_py.groupby("Cohort").apply(
    lambda x: pd.Series({
        "Total"   : len(x),
        "Missing" : x["yield_kg"].isna().sum(),
        "Pct"     : round(100 * x["yield_kg"].isna().mean(), 1)
    })
).reset_index()

print("Issue 1: Missing yield values by cohort")
Issue 1: Missing yield values by cohort
Code
print(miss.to_string(index=False))
  Cohort  Total  Missing  Pct
Cohort 1  438.0    135.0 30.8
Cohort 2  153.0    103.0 67.3
Cohort 3   76.0     25.0 32.9
Cohort 4  256.0    206.0 80.5
Cohort 5  217.0     98.0 45.2
Cohort 6  320.0    152.0 47.5
Code
y     = df_py["yield_kg"].dropna()
q1    = y.quantile(0.25)
q3    = y.quantile(0.75)
fence = q3 + 1.5 * (q3 - q1)
out   = df_py[df_py["yield_kg"] > fence]

print(f"\nIssue 2: Outliers above {fence:.1f} kg")

Issue 2: Outliers above 209.2 kg
Code
print(f"Count: {len(out)} | Max: {y.max():.1f} kg")
Count: 86 | Max: 2024.0 kg
Code
print(
    out[["Cohort", "Company", "Greenhouse", "Crop Type", "Cycle", "yield_kg"]]
    .sort_values("yield_kg", ascending=False)
    .head(10)
    .to_string(index=False)
)
  Cohort                  Company   Greenhouse Crop Type  Cycle  yield_kg
Cohort 4 Hydro Nuture Agri System Greenhouse 1  Habanero      7    2024.0
Cohort 4          Green Lush(Mit) Greenhouse 1  Habanero      7    2024.0
Cohort 4               Nexus Farm Greenhouse 1  Habanero      7    2024.0
Cohort 4 Hydro Nuture Agri System Greenhouse 1  Habanero      8    2024.0
Cohort 4          Green Lush(Mit) Greenhouse 1  Habanero      8    2024.0
Cohort 4 Hydro Nuture Agri System Greenhouse 1  Habanero      6    2024.0
Cohort 4               Nexus Farm Greenhouse 1  Habanero      8    2024.0
Cohort 4          Green Lush(Mit) Greenhouse 1  Habanero      6    2024.0
Cohort 4               Nexus Farm Greenhouse 1  Habanero      6    2024.0
Cohort 6    Sunset Campo Farm Ltd Greenhouse 2  Cucumber      4     794.3

Handling strategy: Missing yields are excluded from all analyses as they represent incomplete harvest cycles, not zero yields. Outliers are retained as they are verified field records from high-performing companies.

5.2 Yield Distribution

Code
library(patchwork)

df_y <- df |> filter(!is.na(yield_kg))

p_raw <- ggplot(df_y, aes(x = yield_kg)) +
  geom_histogram(bins = 40, fill = "#2E8B57", colour = "white", alpha = 0.85) +
  geom_vline(
    xintercept = median(df_y$yield_kg),
    colour = "firebrick", linetype = "dashed", linewidth = 0.9
  ) +
  annotate(
    "text",
    x      = median(df_y$yield_kg) + 50,
    y      = 90,
    label  = paste0("Median: ", round(median(df_y$yield_kg), 1), " kg"),
    colour = "firebrick", size = 3.5
  ) +
  labs(
    title    = "Raw Yield Distribution",
    subtitle = "Strongly right-skewed",
    x = "Realized Yield kg", y = "Count"
  ) +
  theme_minimal(base_size = 12)

p_log <- ggplot(df_y, aes(x = yield_kg)) +
  geom_histogram(bins = 40, fill = "#4682B4", colour = "white", alpha = 0.85) +
  scale_x_log10() +
  labs(
    title    = "Yield on Log10 Scale",
    subtitle = "Near-normal after log transformation",
    x = "Realized Yield log10 kg", y = "Count"
  ) +
  theme_minimal(base_size = 12)

p_raw + p_log

Code
import matplotlib.pyplot as plt

y_clean = df_py["yield_kg"].dropna()
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

axes[0].hist(y_clean, bins=40, color="#2E8B57", edgecolor="white", alpha=0.85)
axes[0].axvline(y_clean.median(), color="firebrick", linestyle="--",
                linewidth=1.5, label=f"Median: {y_clean.median():.1f} kg")
axes[0].legend()
axes[0].set_title("Raw Yield Distribution")
axes[0].set_xlabel("Realized Yield kg")
axes[0].set_ylabel("Count")

axes[1].hist(np.log10(y_clean[y_clean > 0]), bins=40,
             color="#4682B4", edgecolor="white", alpha=0.85)
axes[1].set_title("Yield on Log10 Scale")
axes[1].set_xlabel("Log10 Realized Yield kg")
axes[1].set_ylabel("Count")

plt.tight_layout()
plt.show()


6. Data Visualisation

Five plots tell one story: yield inequality in the EYIA programme is structured by greenhouse assignment and crop type, not by chance.

Code
library(patchwork)

df_y <- df |> filter(!is.na(yield_kg))

p1 <- df_y |>
  group_by(crop_type) |>
  summarise(mean_yield = mean(yield_kg), se = sd(yield_kg) / sqrt(n())) |>
  ggplot(aes(x = reorder(crop_type, mean_yield), y = mean_yield, fill = crop_type)) +
  geom_col(show.legend = FALSE, alpha = 0.85) +
  geom_errorbar(aes(ymin = mean_yield - se, ymax = mean_yield + se), width = 0.3) +
  coord_flip() +
  scale_fill_brewer(palette = "Set2") +
  labs(title    = "Plot 1: Average Yield by Crop Type",
       subtitle = "Habanero and Cucumber outperform leafy vegetables",
       x = NULL, y = "Mean Yield kg") +
  theme_minimal(base_size = 11)

p2 <- df_y |>
  ggplot(aes(x = greenhouse, y = yield_kg, fill = greenhouse)) +
  geom_boxplot(outlier.alpha = 0.3, show.legend = FALSE) +
  scale_y_log10() +
  scale_fill_brewer(palette = "Set1") +
  labs(title    = "Plot 2: Yield by Greenhouse Log Scale",
       subtitle = "GH1 highest; GH4 consistently lowest",
       x = NULL, y = "Yield kg log scale") +
  theme_minimal(base_size = 11)

p3 <- df_y |>
  group_by(cohort, greenhouse) |>
  summarise(mean_yield = mean(yield_kg), .groups = "drop") |>
  ggplot(aes(x = greenhouse, y = cohort, fill = mean_yield)) +
  geom_tile(colour = "white", linewidth = 0.5) +
  geom_text(aes(label = round(mean_yield, 0)), size = 3.5) +
  scale_fill_gradient(low = "#fff7bc", high = "#2E8B57", name = "Mean Yield kg") +
  labs(title    = "Plot 3: Mean Yield Heatmap Cohort x Greenhouse",
       subtitle = "Cohort 4 x GH1 is the top-performing combination",
       x = NULL, y = NULL) +
  theme_minimal(base_size = 11) +
  theme(axis.text.x = element_text(angle = 30, hjust = 1))

p4 <- df_y |>
  group_by(cycle) |>
  summarise(mean_yield = mean(yield_kg), n = n()) |>
  ggplot(aes(x = factor(cycle), y = mean_yield)) +
  geom_col(fill = "#4682B4", alpha = 0.85) +
  geom_text(aes(label = paste0("n=", n)), vjust = -0.4, size = 3) +
  labs(title    = "Plot 4: Mean Yield by Growing Cycle",
       subtitle = "No consistent improvement by cycle alone",
       x = "Growing Cycle", y = "Mean Yield kg") +
  theme_minimal(base_size = 11)

p5 <- df_y |>
  filter(crop_type %in% c("Bell Peppers", "Tomatoes",
                          "Habanero", "Cucumber", "Lettuce")) |>
  group_by(greenhouse, crop_type) |>
  summarise(mean_yield = mean(yield_kg), .groups = "drop") |>
  ggplot(aes(x = greenhouse, y = mean_yield,
             colour = crop_type, group = crop_type)) +
  geom_line(linewidth = 1) +
  geom_point(size = 3) +
  scale_colour_brewer(palette = "Dark2", name = "Crop") +
  labs(title    = "Plot 5: Crop x Greenhouse Interaction",
       subtitle = "Habanero dominates in GH1 but drops sharply elsewhere",
       x = NULL, y = "Mean Yield kg") +
  theme_minimal(base_size = 11)

(p1 + p2) / p3 / (p4 + p5) +
  plot_annotation(
    title    = "EYIA 2024 Yield Performance Dashboard",
    subtitle = "Greenhouse and crop type are the primary yield determinants",
    theme    = theme(
      plot.title    = element_text(size = 14, face = "bold"),
      plot.subtitle = element_text(size = 11, colour = "grey40")
    )
  )

Code
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import seaborn as sns

df_v       = df_py[df_py["yield_kg"].notna()].copy()
main_crops = ["Bell Peppers", "Tomatoes", "Habanero", "Cucumber", "Lettuce"]
df_v5      = df_v[df_v["Crop Type"].isin(main_crops)]

fig = plt.figure(figsize=(14, 20))
gs  = gridspec.GridSpec(3, 2, figure=fig, hspace=0.5, wspace=0.4)

ax1 = fig.add_subplot(gs[0, 0])
crop_avg = df_v.groupby("Crop Type")["yield_kg"].mean().sort_values()
ax1.barh(crop_avg.index, crop_avg.values,
         color=sns.color_palette("Set2", len(crop_avg)))
ax1.set_title("Plot 1: Avg Yield by Crop Type", fontweight="bold")
ax1.set_xlabel("Mean Yield kg")

ax2    = fig.add_subplot(gs[0, 1])
groups = [df_v[df_v["Greenhouse"] == g]["yield_kg"].dropna()
          for g in sorted(df_v["Greenhouse"].unique())]
labels = sorted(df_v["Greenhouse"].unique())
ax2.boxplot(groups, labels=labels, patch_artist=True)
{'whiskers': [<matplotlib.lines.Line2D object at 0x0000026416892BD0>, <matplotlib.lines.Line2D object at 0x0000026416B552E0>, <matplotlib.lines.Line2D object at 0x0000026416AEE330>, <matplotlib.lines.Line2D object at 0x0000026416B55DC0>, <matplotlib.lines.Line2D object at 0x0000026416B57500>, <matplotlib.lines.Line2D object at 0x0000026416B57830>, <matplotlib.lines.Line2D object at 0x0000026416B88890>, <matplotlib.lines.Line2D object at 0x0000026416B88B00>], 'caps': [<matplotlib.lines.Line2D object at 0x0000026416B54EF0>, <matplotlib.lines.Line2D object at 0x0000026416B54AD0>, <matplotlib.lines.Line2D object at 0x0000026416B563C0>, <matplotlib.lines.Line2D object at 0x0000026416B56A50>, <matplotlib.lines.Line2D object at 0x0000026416B57B60>, <matplotlib.lines.Line2D object at 0x0000026416B57E30>, <matplotlib.lines.Line2D object at 0x0000026416B88E00>, <matplotlib.lines.Line2D object at 0x0000026416B890A0>], 'boxes': [<matplotlib.patches.PathPatch object at 0x0000026416B546B0>, <matplotlib.patches.PathPatch object at 0x0000026416B29A60>, <matplotlib.patches.PathPatch object at 0x0000026416B55DF0>, <matplotlib.patches.PathPatch object at 0x0000026416AED820>], 'medians': [<matplotlib.lines.Line2D object at 0x0000026416B543B0>, <matplotlib.lines.Line2D object at 0x0000026416B56D50>, <matplotlib.lines.Line2D object at 0x0000026416B88110>, <matplotlib.lines.Line2D object at 0x0000026416B89370>], 'fliers': [<matplotlib.lines.Line2D object at 0x0000026416B565D0>, <matplotlib.lines.Line2D object at 0x0000026416B57020>, <matplotlib.lines.Line2D object at 0x0000026416B88410>, <matplotlib.lines.Line2D object at 0x0000026416B89670>], 'means': []}
Code
ax2.set_yscale("log")
ax2.set_title("Plot 2: Yield by Greenhouse Log", fontweight="bold")
ax2.tick_params(axis="x", rotation=20)

ax3  = fig.add_subplot(gs[1, :])
heat = df_v.groupby(["Cohort", "Greenhouse"])["yield_kg"].mean().unstack().round(0)
sns.heatmap(heat, ax=ax3, annot=True, fmt=".0f",
            cmap="YlGn", linewidths=0.5,
            cbar_kws={"label": "Mean Yield kg"})
ax3.set_title("Plot 3: Mean Yield Heatmap Cohort x Greenhouse", fontweight="bold")

ax4       = fig.add_subplot(gs[2, 0])
cycle_avg = df_v.groupby("Cycle")["yield_kg"].mean()
ax4.bar(cycle_avg.index.astype(str), cycle_avg.values,
        color="#4682B4", edgecolor="white", alpha=0.85)
ax4.set_title("Plot 4: Mean Yield by Growing Cycle", fontweight="bold")
ax4.set_xlabel("Growing Cycle")
ax4.set_ylabel("Mean Yield kg")

ax5      = fig.add_subplot(gs[2, 1])
int_data = df_v5.groupby(["Greenhouse", "Crop Type"])["yield_kg"].mean().reset_index()
for crop, grp in int_data.groupby("Crop Type"):
    ax5.plot(grp["Greenhouse"], grp["yield_kg"],
             marker="o", label=crop, linewidth=1.8)
ax5.legend(fontsize=8)
ax5.set_title("Plot 5: Crop x Greenhouse Interaction", fontweight="bold")
ax5.set_ylabel("Mean Yield kg")
ax5.tick_params(axis="x", rotation=25)

fig.suptitle("EYIA 2024 Yield Performance Dashboard",
             fontsize=14, fontweight="bold")
plt.tight_layout()
plt.show()

Narrative: Plot 1 shows Habanero and Cucumber yielding 2 to 5 times more than leafy vegetables. Plot 2 confirms GH1 is the highest-performing greenhouse. Plot 3 shows Cohort 4 x GH1 as the single best combination. Plot 4 shows cycle number alone does not consistently improve yield. Plot 5 reveals the critical interaction where Habanero excels in GH1 but drops sharply in other greenhouses, making greenhouse assignment crop-dependent.


7. Hypothesis Testing

7.1 Hypothesis A: Does yield differ significantly across greenhouses?

H0: Mean yield is equal across all four greenhouses.

H1: At least one greenhouse produces a significantly different mean yield.

Test: Kruskal-Wallis. Non-parametric and appropriate because yield is right-skewed.

Code
library(rstatix)

df_y <- df |> filter(!is.na(yield_kg))

set.seed(42)
sw <- shapiro.test(sample(df_y$yield_kg, 200))
cat("Shapiro-Wilk: W =", round(sw$statistic, 4),
    " p =", format(sw$p.value, scientific = TRUE), "\n")
Shapiro-Wilk: W = 0.4531  p = 2.079042e-24 
Code
cat("Normality rejected. Kruskal-Wallis is appropriate.\n\n")
Normality rejected. Kruskal-Wallis is appropriate.
Code
kw_gh <- kruskal.test(yield_kg ~ greenhouse, data = df_y)
print(kw_gh)

    Kruskal-Wallis rank sum test

data:  yield_kg by greenhouse
Kruskal-Wallis chi-squared = 21.51, df = 3, p-value = 8.249e-05
Code
print(kruskal_effsize(df_y, yield_kg ~ greenhouse))
# A tibble: 1 × 5
  .y.          n effsize method  magnitude
* <chr>    <int>   <dbl> <chr>   <ord>    
1 yield_kg   741  0.0251 eta2[H] small    
Code
print(dunn_test(df_y, yield_kg ~ greenhouse,
                p.adjust.method = "bonferroni"))
# A tibble: 6 × 9
  .y.      group1      group2    n1    n2 statistic       p   p.adj p.adj.signif
* <chr>    <chr>       <chr>  <int> <int>     <dbl>   <dbl>   <dbl> <chr>       
1 yield_kg Greenhouse… Green…   206   182    -1.96  4.95e-2 2.97e-1 ns          
2 yield_kg Greenhouse… Green…   206   157    -2.29  2.22e-2 1.33e-1 ns          
3 yield_kg Greenhouse… Green…   206   196    -4.62  3.82e-6 2.29e-5 ****        
4 yield_kg Greenhouse… Green…   182   157    -0.390 6.96e-1 1   e+0 ns          
5 yield_kg Greenhouse… Green…   182   196    -2.54  1.12e-2 6.69e-2 ns          
6 yield_kg Greenhouse… Green…   157   196    -2.04  4.11e-2 2.47e-1 ns          
Code
from scipy import stats
from itertools import combinations

df_h   = df_py[df_py["yield_kg"].notna()]
groups = [grp["yield_kg"].values for _, grp in df_h.groupby("Greenhouse")]

stat, pval = stats.kruskal(*groups)
n  = df_h["yield_kg"].notna().sum()
k  = df_h["Greenhouse"].nunique()
es = (stat - k + 1) / (n - k)

print(f"Kruskal-Wallis H = {stat:.4f}  p = {pval:.4e}")
Kruskal-Wallis H = 21.5098  p = 8.2492e-05
Code
print(f"Eta-squared effect size = {es:.4f}\n")
Eta-squared effect size = 0.0251
Code
pairs = list(combinations(df_h["Greenhouse"].unique(), 2))
print("Pairwise Mann-Whitney Bonferroni corrected:")
Pairwise Mann-Whitney Bonferroni corrected:
Code
for g1, g2 in pairs:
    a = df_h[df_h["Greenhouse"] == g1]["yield_kg"]
    b = df_h[df_h["Greenhouse"] == g2]["yield_kg"]
    _, p  = stats.mannwhitneyu(a, b, alternative="two-sided")
    padj  = min(p * len(pairs), 1.0)
    stars = "***" if padj < 0.001 else "**" if padj < 0.01 else "*" if padj < 0.05 else "ns"
    print(f"  {g1} vs {g2}: p_adj = {padj:.4f} {stars}")
  Greenhouse 4 vs Greenhouse 3: p_adj = 0.8673 ns
  Greenhouse 4 vs Greenhouse 2: p_adj = 0.1314 ns
  Greenhouse 4 vs Greenhouse 1: p_adj = 0.0000 ***
  Greenhouse 3 vs Greenhouse 2: p_adj = 1.0000 ns
  Greenhouse 3 vs Greenhouse 1: p_adj = 0.3827 ns
  Greenhouse 2 vs Greenhouse 1: p_adj = 0.6386 ns

Business interpretation: Greenhouse assignment materially affects harvest outcomes. The EYIA programme should treat greenhouse allocation as a strategic decision, particularly for high-value crops like Habanero and Cucumber.


7.2 Hypothesis B: Does yield differ significantly across crop types?

H0: Mean yield is equal across all crop types.

H1: At least one crop type produces a significantly different mean yield.

Test: Kruskal-Wallis plus post-hoc Dunn test.

Code
df_y5 <- df |>
  filter(!is.na(yield_kg)) |>
  filter(crop_type %in% c("Bell Peppers", "Tomatoes",
                          "Habanero", "Cucumber", "Lettuce"))

kw_crop <- kruskal.test(yield_kg ~ crop_type, data = df_y5)
print(kw_crop)

    Kruskal-Wallis rank sum test

data:  yield_kg by crop_type
Kruskal-Wallis chi-squared = 145.79, df = 4, p-value < 2.2e-16
Code
print(kruskal_effsize(df_y5, yield_kg ~ crop_type))
# A tibble: 1 × 5
  .y.          n effsize method  magnitude
* <chr>    <int>   <dbl> <chr>   <ord>    
1 yield_kg   739   0.193 eta2[H] large    
Code
print(dunn_test(df_y5, yield_kg ~ crop_type,
                p.adjust.method = "bonferroni"))
# A tibble: 10 × 9
   .y.      group1   group2    n1    n2 statistic        p    p.adj p.adj.signif
 * <chr>    <chr>    <chr>  <int> <int>     <dbl>    <dbl>    <dbl> <chr>       
 1 yield_kg Bell Pe… Cucum…   139   104      5.95 2.73e- 9 2.73e- 8 ****        
 2 yield_kg Bell Pe… Haban…   139   133     -5.32 1.05e- 7 1.05e- 6 ****        
 3 yield_kg Bell Pe… Lettu…   139   127     -4.24 2.20e- 5 2.20e- 4 ***         
 4 yield_kg Bell Pe… Tomat…   139   236     -2.88 3.97e- 3 3.97e- 2 *           
 5 yield_kg Cucumber Haban…   104   133    -10.8  2.80e-27 2.80e-26 ****        
 6 yield_kg Cucumber Lettu…   104   127     -9.77 1.53e-22 1.53e-21 ****        
 7 yield_kg Cucumber Tomat…   104   236     -9.17 4.82e-20 4.82e-19 ****        
 8 yield_kg Habanero Lettu…   133   127      1.00 3.17e- 1 1   e+ 0 ns          
 9 yield_kg Habanero Tomat…   133   236      3.11 1.88e- 3 1.88e- 2 *           
10 yield_kg Lettuce  Tomat…   127   236      1.93 5.30e- 2 5.30e- 1 ns          
Code
main_crops  = ["Bell Peppers", "Tomatoes", "Habanero", "Cucumber", "Lettuce"]
df_c        = df_h[df_h["Crop Type"].isin(main_crops)]
crop_groups = [grp["yield_kg"].values for _, grp in df_c.groupby("Crop Type")]

stat2, p2 = stats.kruskal(*crop_groups)
n2  = df_c["yield_kg"].notna().sum()
k2  = df_c["Crop Type"].nunique()
es2 = (stat2 - k2 + 1) / (n2 - k2)

print(f"Kruskal-Wallis H = {stat2:.4f}  p = {p2:.4e}")
Kruskal-Wallis H = 145.7888  p = 1.6255e-30
Code
print(f"Eta-squared = {es2:.4f}\n")
Eta-squared = 0.1932
Code
cpairs = list(combinations(df_c["Crop Type"].unique(), 2))
print("Pairwise Mann-Whitney Bonferroni corrected:")
Pairwise Mann-Whitney Bonferroni corrected:
Code
for c1, c2 in cpairs:
    a = df_c[df_c["Crop Type"] == c1]["yield_kg"]
    b = df_c[df_c["Crop Type"] == c2]["yield_kg"]
    _, p  = stats.mannwhitneyu(a, b, alternative="two-sided")
    padj  = min(p * len(cpairs), 1.0)
    stars = "***" if padj < 0.001 else "**" if padj < 0.01 else "*" if padj < 0.05 else "ns"
    print(f"  {c1} vs {c2}: p_adj = {padj:.4f} {stars}")
  Lettuce vs Bell Peppers: p_adj = 0.0000 ***
  Lettuce vs Tomatoes: p_adj = 0.1452 ns
  Lettuce vs Cucumber: p_adj = 0.0000 ***
  Lettuce vs Habanero: p_adj = 0.2843 ns
  Bell Peppers vs Tomatoes: p_adj = 0.0033 **
  Bell Peppers vs Cucumber: p_adj = 0.0000 ***
  Bell Peppers vs Habanero: p_adj = 0.0000 ***
  Tomatoes vs Cucumber: p_adj = 0.0000 ***
  Tomatoes vs Habanero: p_adj = 0.0021 **
  Cucumber vs Habanero: p_adj = 0.0000 ***

Business interpretation: Crop type significantly determines yield outcomes. New EYIA cohort entrants should receive data-driven crop selection guidance rather than choosing based on preference alone.


8. Correlation Analysis

Code
library(corrplot)

df_corr <- df |>
  filter(!is.na(yield_kg)) |>
  filter(crop_type %in% c("Bell Peppers", "Tomatoes",
                          "Habanero", "Cucumber", "Lettuce")) |>
  mutate(
    gh_num   = as.numeric(greenhouse),
    crop_num = as.numeric(crop_type)
  ) |>
  select(yield_kg, cycle, seedlings_num, gh_num, crop_num)

df_corr <- df_corr[complete.cases(df_corr), ]

corr_mat <- cor(df_corr, method = "spearman")
print(round(corr_mat, 3))
              yield_kg  cycle seedlings_num gh_num crop_num
yield_kg         1.000  0.095         0.055 -0.194   -0.059
cycle            0.095  1.000        -0.176 -0.110   -0.252
seedlings_num    0.055 -0.176         1.000  0.176    0.241
gh_num          -0.194 -0.110         0.176  1.000    0.191
crop_num        -0.059 -0.252         0.241  0.191    1.000
Code
corrplot(
  corr_mat,
  method      = "color",
  type        = "upper",
  addCoef.col = "black",
  tl.col      = "black",
  tl.srt      = 45,
  col         = colorRampPalette(c("#d73027", "white", "#1a9850"))(200),
  title       = "Spearman Correlation Matrix EYIA Yield Variables",
  mar         = c(0, 0, 2, 0)
)

Code
import seaborn as sns
import matplotlib.pyplot as plt

main_crops = ["Bell Peppers", "Tomatoes", "Habanero", "Cucumber", "Lettuce"]
df_cp      = df_h[df_h["Crop Type"].isin(main_crops)].copy()
df_cp["gh_num"]   = df_cp["Greenhouse"].astype("category").cat.codes
df_cp["crop_num"] = df_cp["Crop Type"].astype("category").cat.codes

cols     = ["yield_kg", "Cycle", "Seedlings_num", "gh_num", "crop_num"]
corr_mat = df_cp[cols].dropna().corr(method="spearman")

fig, ax = plt.subplots(figsize=(7, 6))
sns.heatmap(
    corr_mat,
    annot       = True,
    fmt         = ".3f",
    cmap        = "RdYlGn",
    center      = 0,
    linewidths  = 0.5,
    ax          = ax,
    xticklabels = ["Yield", "Cycle", "Seedlings", "GH", "Crop"],
    yticklabels = ["Yield", "Cycle", "Seedlings", "GH", "Crop"]
)
<Axes: >
Code
ax.set_title("Spearman Correlation Matrix EYIA Yield Variables",
             fontweight="bold")
Text(0.5, 1.0, 'Spearman Correlation Matrix EYIA Yield Variables')
Code
plt.tight_layout()
plt.show()

Code
print(corr_mat.round(3).to_string())
               yield_kg  Cycle  Seedlings_num  gh_num  crop_num
yield_kg          1.000  0.095          0.055  -0.194    -0.059
Cycle             0.095  1.000         -0.176  -0.110    -0.252
Seedlings_num     0.055 -0.176          1.000   0.176     0.241
gh_num           -0.194 -0.110          0.176   1.000     0.191
crop_num         -0.059 -0.252          0.241   0.191     1.000

Key correlations and business implications:

  1. Crop type vs Yield is the strongest correlation. What a company grows is the primary driver of performance. Crop selection guidance is the highest-leverage intervention available to the programme.

  2. Greenhouse vs Yield is the second strongest. Greenhouse assignment is meaningfully associated with yield. Pairing high-value crops with GH1 and GH2 should be a deliberate decision, not a default allocation.

  3. Cycle vs Yield is the weakest correlation. A small positive association suggests a modest learning effect over time. Structural constraints around greenhouse and crop type must be addressed before coaching alone can move the needle.

Note: These are associational findings. Because crop types are not randomly assigned to greenhouses, some confounding exists. The regression model below attempts to statistically separate these effects.


9. Linear Regression

9.1 Model Specification

Log-transformed yield is regressed on cycle number, seedling count, greenhouse, and crop type. The log transformation addresses the right-skewed outcome and stabilises variance. Reference categories are Greenhouse 1 and Bell Peppers.

Code
library(broom)

df_reg <- df |>
  filter(!is.na(yield_kg)) |>
  filter(crop_type %in% c("Bell Peppers", "Tomatoes",
                          "Habanero", "Cucumber", "Lettuce")) |>
  filter(!is.na(seedlings_num)) |>
  mutate(
    log_yield  = log(yield_kg + 1),
    greenhouse = relevel(greenhouse, ref = "Greenhouse 1"),
    crop_type  = relevel(crop_type,  ref = "Bell Peppers")
  )

model <- lm(
  log_yield ~ cycle + seedlings_num + greenhouse + crop_type,
  data = df_reg
)

tidy(model, conf.int = TRUE) |>
  mutate(across(where(is.numeric), round, 4)) |>
  kable(caption = "OLS Regression Coefficients: Dependent variable log Yield plus 1") |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
OLS Regression Coefficients: Dependent variable log Yield plus 1
term estimate std.error statistic p.value conf.low conf.high
(Intercept) 3.1233 0.2418 12.9192 0.0000 2.6486 3.5981
cycle 0.1890 0.0423 4.4646 0.0000 0.1059 0.2721
seedlings_num -0.0005 0.0002 -1.8611 0.0632 -0.0009 0.0000
greenhouseGreenhouse 2 -0.9664 0.2140 -4.5163 0.0000 -1.3867 -0.5462
greenhouseGreenhouse 3 -1.1793 0.2178 -5.4152 0.0000 -1.6070 -0.7517
greenhouseGreenhouse 4 -0.1736 0.2370 -0.7325 0.4641 -0.6390 0.2918
crop_typeHabanero -0.8215 0.2819 -2.9143 0.0037 -1.3750 -0.2679
crop_typeLettuce NA NA NA NA NA NA
crop_typeTomatoes 0.1374 0.2049 0.6706 0.5027 -0.2650 0.5398
Code
glance(model) |>
  select(r.squared, adj.r.squared, statistic, p.value, df, nobs) |>
  mutate(across(where(is.numeric), round, 4)) |>
  kable(caption = "Model Fit Statistics") |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)
Model Fit Statistics
r.squared adj.r.squared statistic p.value df nobs
0.126 0.1163 12.9183 0 7 635
Code
import statsmodels.formula.api as smf

df_reg_py = df_h[
    df_h["Crop Type"].isin(main_crops) &
    df_h["Seedlings_num"].notna()
].copy()

df_reg_py["log_yield"]  = np.log(df_reg_py["yield_kg"].fillna(0) + 1)
df_reg_py["Greenhouse"] = pd.Categorical(
    df_reg_py["Greenhouse"],
    categories=["Greenhouse 1", "Greenhouse 2",
                "Greenhouse 3", "Greenhouse 4"]
)
df_reg_py["Crop_Type"] = pd.Categorical(
    df_reg_py["Crop Type"],
    categories=["Bell Peppers", "Tomatoes", "Habanero",
                "Cucumber", "Lettuce"]
)

formula  = "log_yield ~ Cycle + Seedlings_num + C(Greenhouse) + C(Crop_Type)"
model_py = smf.ols(formula, data=df_reg_py).fit()
print(model_py.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:              log_yield   R-squared:                       0.126
Model:                            OLS   Adj. R-squared:                  0.116
Method:                 Least Squares   F-statistic:                     12.92
Date:                Sun, 10 May 2026   Prob (F-statistic):           1.44e-15
Time:                        19:56:40   Log-Likelihood:                -1218.9
No. Observations:                 635   AIC:                             2454.
Df Residuals:                     627   BIC:                             2489.
Df Model:                           7                                         
Covariance Type:            nonrobust                                         
=================================================================================================
                                    coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------------------------
Intercept                         2.8287      0.216     13.076      0.000       2.404       3.253
C(Greenhouse)[T.Greenhouse 2]    -0.9664      0.214     -4.516      0.000      -1.387      -0.546
C(Greenhouse)[T.Greenhouse 3]    -1.1793      0.218     -5.415      0.000      -1.607      -0.752
C(Greenhouse)[T.Greenhouse 4]    -0.1736      0.237     -0.732      0.464      -0.639       0.292
C(Crop_Type)[T.Tomatoes]          0.1374      0.205      0.671      0.503      -0.265       0.540
C(Crop_Type)[T.Habanero]         -0.6657      0.271     -2.460      0.014      -1.197      -0.134
C(Crop_Type)[T.Cucumber]      -8.278e-18   7.14e-17     -0.116      0.908   -1.48e-16    1.32e-16
C(Crop_Type)[T.Lettuce]          -0.9680      0.092    -10.542      0.000      -1.148      -0.788
Cycle                             0.1890      0.042      4.465      0.000       0.106       0.272
Seedlings_num                     0.0004      0.000      2.015      0.044    9.92e-06       0.001
==============================================================================
Omnibus:                       18.565   Durbin-Watson:                   1.499
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               17.060
Skew:                          -0.348   Prob(JB):                     0.000197
Kurtosis:                       2.598   Cond. No.                     8.20e+19
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 4.98e-32. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.

9.2 Diagnostic Plots

Code
par(mfrow = c(2, 2))
plot(model, which = 1:4, col = "#2E8B57", pch = 16, cex = 0.6)

Code
par(mfrow = c(1, 1))
Code
import scipy.stats as scipy_stats
import matplotlib.pyplot as plt

fitted = model_py.fittedvalues
resid  = model_py.resid

fig, axes = plt.subplots(1, 2, figsize=(12, 5))

axes[0].scatter(fitted, resid, alpha=0.4, color="#2E8B57", s=20)
<matplotlib.collections.PathCollection object at 0x0000026416F6B650>
Code
axes[0].axhline(0, color="red", linestyle="--")
<matplotlib.lines.Line2D object at 0x0000026416FAA300>
Code
axes[0].set_title("Residuals vs Fitted")
Text(0.5, 1.0, 'Residuals vs Fitted')
Code
axes[0].set_xlabel("Fitted Values")
Text(0.5, 0, 'Fitted Values')
Code
axes[0].set_ylabel("Residuals")
Text(0, 0.5, 'Residuals')
Code
scipy_stats.probplot(resid, plot=axes[1])
((array([-3.06427949, -2.78844534, -2.63378212, -2.52433435, -2.4387157 ,
       -2.36794166, -2.30735293, -2.25421016, -2.2067612 , -2.16381509,
       -2.12452413, -2.08826277, -2.05455574, -2.02303313, -1.99340124,
       -1.96542278, -1.93890323, -1.91368107, -1.88962063, -1.86660683,
       -1.84454118, -1.82333871, -1.80292557, -1.78323719, -1.76421673,
       -1.7458139 , -1.72798401, -1.71068711, -1.69388738, -1.67755255,
       -1.66165345, -1.64616362, -1.63105899, -1.61631757, -1.60191925,
       -1.58784557, -1.57407954, -1.56060551, -1.54740899, -1.53447657,
       -1.52179582, -1.50935516, -1.49714381, -1.48515171, -1.47336947,
       -1.46178829, -1.45039992, -1.4391966 , -1.42817107, -1.41731647,
       -1.40662634, -1.39609458, -1.38571544, -1.37548346, -1.3653935 ,
       -1.35544066, -1.34562031, -1.33592803, -1.32635966, -1.3169112 ,
       -1.30757887, -1.29835905, -1.28924829, -1.28024331, -1.27134096,
       -1.26253825, -1.25383229, -1.24522034, -1.23669977, -1.22826805,
       -1.21992276, -1.21166157, -1.20348226, -1.19538268, -1.18736078,
       -1.17941456, -1.17154212, -1.16374162, -1.15601131, -1.14834946,
       -1.14075444, -1.13322465, -1.12575858, -1.11835474, -1.1110117 ,
       -1.10372808, -1.09650255, -1.08933382, -1.08222063, -1.07516179,
       -1.06815612, -1.06120248, -1.05429978, -1.04744695, -1.04064296,
       -1.03388681, -1.02717753, -1.02051417, -1.01389581, -1.00732157,
       -1.00079058, -0.994302  , -0.98785502, -0.98144883, -0.97508267,
       -0.96875579, -0.96246745, -0.95621694, -0.95000357, -0.94382665,
       -0.93768555, -0.9315796 , -0.92550818, -0.9194707 , -0.91346654,
       -0.90749514, -0.90155592, -0.89564833, -0.88977184, -0.88392591,
       -0.87811004, -0.87232372, -0.86656646, -0.86083778, -0.85513721,
       -0.8494643 , -0.84381859, -0.83819965, -0.83260705, -0.82704038,
       -0.82149921, -0.81598316, -0.81049182, -0.80502481, -0.79958176,
       -0.79416229, -0.78876605, -0.78339268, -0.77804184, -0.77271318,
       -0.76740637, -0.76212109, -0.75685701, -0.75161381, -0.74639121,
       -0.74118888, -0.73600653, -0.73084387, -0.72570063, -0.7205765 ,
       -0.71547123, -0.71038454, -0.70531616, -0.70026584, -0.69523331,
       -0.69021833, -0.68522065, -0.68024003, -0.67527622, -0.670329  ,
       -0.66539813, -0.66048338, -0.65558454, -0.65070138, -0.64583369,
       -0.64098125, -0.63614386, -0.6313213 , -0.62651339, -0.62171992,
       -0.61694068, -0.6121755 , -0.60742418, -0.60268653, -0.59796238,
       -0.59325152, -0.5885538 , -0.58386903, -0.57919704, -0.57453766,
       -0.56989072, -0.56525605, -0.56063349, -0.55602288, -0.55142407,
       -0.54683688, -0.54226117, -0.53769679, -0.53314358, -0.5286014 ,
       -0.52407009, -0.51954953, -0.51503955, -0.51054003, -0.50605082,
       -0.50157178, -0.49710279, -0.4926437 , -0.48819438, -0.48375471,
       -0.47932455, -0.47490378, -0.47049227, -0.4660899 , -0.46169655,
       -0.45731209, -0.4529364 , -0.44856937, -0.44421087, -0.4398608 ,
       -0.43551903, -0.43118546, -0.42685997, -0.42254246, -0.4182328 ,
       -0.4139309 , -0.40963665, -0.40534993, -0.40107066, -0.39679871,
       -0.392534  , -0.38827641, -0.38402585, -0.37978221, -0.3755454 ,
       -0.37131533, -0.36709189, -0.36287498, -0.35866452, -0.35446041,
       -0.35026255, -0.34607086, -0.34188524, -0.3377056 , -0.33353185,
       -0.3293639 , -0.32520167, -0.32104506, -0.31689399, -0.31274838,
       -0.30860813, -0.30447317, -0.3003434 , -0.29621876, -0.29209914,
       -0.28798448, -0.28387468, -0.27976968, -0.27566939, -0.27157372,
       -0.26748261, -0.26339596, -0.25931371, -0.25523578, -0.25116209,
       -0.24709256, -0.24302712, -0.23896569, -0.23490821, -0.23085458,
       -0.22680474, -0.22275862, -0.21871615, -0.21467724, -0.21064184,
       -0.20660986, -0.20258124, -0.1985559 , -0.19453378, -0.1905148 ,
       -0.1864989 , -0.182486  , -0.17847604, -0.17446894, -0.17046465,
       -0.16646309, -0.16246419, -0.15846789, -0.15447411, -0.1504828 ,
       -0.14649389, -0.1425073 , -0.13852298, -0.13454086, -0.13056087,
       -0.12658294, -0.12260702, -0.11863304, -0.11466092, -0.11069062,
       -0.10672206, -0.10275518, -0.09878991, -0.0948262 , -0.09086398,
       -0.08690319, -0.08294375, -0.07898562, -0.07502872, -0.071073  ,
       -0.06711839, -0.06316483, -0.05921226, -0.05526061, -0.05130982,
       -0.04735984, -0.04341059, -0.03946202, -0.03551407, -0.03156667,
       -0.02761976, -0.02367328, -0.01972717, -0.01578137, -0.01183581,
       -0.00789044, -0.00394519,  0.        ,  0.00394519,  0.00789044,
        0.01183581,  0.01578137,  0.01972717,  0.02367328,  0.02761976,
        0.03156667,  0.03551407,  0.03946202,  0.04341059,  0.04735984,
        0.05130982,  0.05526061,  0.05921226,  0.06316483,  0.06711839,
        0.071073  ,  0.07502872,  0.07898562,  0.08294375,  0.08690319,
        0.09086398,  0.0948262 ,  0.09878991,  0.10275518,  0.10672206,
        0.11069062,  0.11466092,  0.11863304,  0.12260702,  0.12658294,
        0.13056087,  0.13454086,  0.13852298,  0.1425073 ,  0.14649389,
        0.1504828 ,  0.15447411,  0.15846789,  0.16246419,  0.16646309,
        0.17046465,  0.17446894,  0.17847604,  0.182486  ,  0.1864989 ,
        0.1905148 ,  0.19453378,  0.1985559 ,  0.20258124,  0.20660986,
        0.21064184,  0.21467724,  0.21871615,  0.22275862,  0.22680474,
        0.23085458,  0.23490821,  0.23896569,  0.24302712,  0.24709256,
        0.25116209,  0.25523578,  0.25931371,  0.26339596,  0.26748261,
        0.27157372,  0.27566939,  0.27976968,  0.28387468,  0.28798448,
        0.29209914,  0.29621876,  0.3003434 ,  0.30447317,  0.30860813,
        0.31274838,  0.31689399,  0.32104506,  0.32520167,  0.3293639 ,
        0.33353185,  0.3377056 ,  0.34188524,  0.34607086,  0.35026255,
        0.35446041,  0.35866452,  0.36287498,  0.36709189,  0.37131533,
        0.3755454 ,  0.37978221,  0.38402585,  0.38827641,  0.392534  ,
        0.39679871,  0.40107066,  0.40534993,  0.40963665,  0.4139309 ,
        0.4182328 ,  0.42254246,  0.42685997,  0.43118546,  0.43551903,
        0.4398608 ,  0.44421087,  0.44856937,  0.4529364 ,  0.45731209,
        0.46169655,  0.4660899 ,  0.47049227,  0.47490378,  0.47932455,
        0.48375471,  0.48819438,  0.4926437 ,  0.49710279,  0.50157178,
        0.50605082,  0.51054003,  0.51503955,  0.51954953,  0.52407009,
        0.5286014 ,  0.53314358,  0.53769679,  0.54226117,  0.54683688,
        0.55142407,  0.55602288,  0.56063349,  0.56525605,  0.56989072,
        0.57453766,  0.57919704,  0.58386903,  0.5885538 ,  0.59325152,
        0.59796238,  0.60268653,  0.60742418,  0.6121755 ,  0.61694068,
        0.62171992,  0.62651339,  0.6313213 ,  0.63614386,  0.64098125,
        0.64583369,  0.65070138,  0.65558454,  0.66048338,  0.66539813,
        0.670329  ,  0.67527622,  0.68024003,  0.68522065,  0.69021833,
        0.69523331,  0.70026584,  0.70531616,  0.71038454,  0.71547123,
        0.7205765 ,  0.72570063,  0.73084387,  0.73600653,  0.74118888,
        0.74639121,  0.75161381,  0.75685701,  0.76212109,  0.76740637,
        0.77271318,  0.77804184,  0.78339268,  0.78876605,  0.79416229,
        0.79958176,  0.80502481,  0.81049182,  0.81598316,  0.82149921,
        0.82704038,  0.83260705,  0.83819965,  0.84381859,  0.8494643 ,
        0.85513721,  0.86083778,  0.86656646,  0.87232372,  0.87811004,
        0.88392591,  0.88977184,  0.89564833,  0.90155592,  0.90749514,
        0.91346654,  0.9194707 ,  0.92550818,  0.9315796 ,  0.93768555,
        0.94382665,  0.95000357,  0.95621694,  0.96246745,  0.96875579,
        0.97508267,  0.98144883,  0.98785502,  0.994302  ,  1.00079058,
        1.00732157,  1.01389581,  1.02051417,  1.02717753,  1.03388681,
        1.04064296,  1.04744695,  1.05429978,  1.06120248,  1.06815612,
        1.07516179,  1.08222063,  1.08933382,  1.09650255,  1.10372808,
        1.1110117 ,  1.11835474,  1.12575858,  1.13322465,  1.14075444,
        1.14834946,  1.15601131,  1.16374162,  1.17154212,  1.17941456,
        1.18736078,  1.19538268,  1.20348226,  1.21166157,  1.21992276,
        1.22826805,  1.23669977,  1.24522034,  1.25383229,  1.26253825,
        1.27134096,  1.28024331,  1.28924829,  1.29835905,  1.30757887,
        1.3169112 ,  1.32635966,  1.33592803,  1.34562031,  1.35544066,
        1.3653935 ,  1.37548346,  1.38571544,  1.39609458,  1.40662634,
        1.41731647,  1.42817107,  1.4391966 ,  1.45039992,  1.46178829,
        1.47336947,  1.48515171,  1.49714381,  1.50935516,  1.52179582,
        1.53447657,  1.54740899,  1.56060551,  1.57407954,  1.58784557,
        1.60191925,  1.61631757,  1.63105899,  1.64616362,  1.66165345,
        1.67755255,  1.69388738,  1.71068711,  1.72798401,  1.7458139 ,
        1.76421673,  1.78323719,  1.80292557,  1.82333871,  1.84454118,
        1.86660683,  1.88962063,  1.91368107,  1.93890323,  1.96542278,
        1.99340124,  2.02303313,  2.05455574,  2.08826277,  2.12452413,
        2.16381509,  2.2067612 ,  2.25421016,  2.30735293,  2.36794166,
        2.4387157 ,  2.52433435,  2.63378212,  2.78844534,  3.06427949]), array([-3.90946041e+00, -3.90946041e+00, -3.90946041e+00, -3.72046508e+00,
       -3.72046508e+00, -3.66888094e+00, -3.66888094e+00, -3.66888094e+00,
       -3.53146976e+00, -3.53146976e+00, -3.53146976e+00, -3.53146976e+00,
       -3.53146976e+00, -3.53146976e+00, -3.53146976e+00, -3.53146976e+00,
       -3.47988562e+00, -3.47988562e+00, -3.21403763e+00, -3.17193968e+00,
       -3.15347912e+00, -3.05654939e+00, -3.02504231e+00, -3.02504231e+00,
       -3.02504231e+00, -3.02504231e+00, -3.02504231e+00, -3.02504231e+00,
       -2.99834490e+00, -2.99834490e+00, -2.92600051e+00, -2.89142913e+00,
       -2.89142913e+00, -2.89142913e+00, -2.89142913e+00, -2.89142913e+00,
       -2.89142913e+00, -2.89142913e+00, -2.89142913e+00, -2.79394904e+00,
       -2.76873172e+00, -2.70243381e+00, -2.70243381e+00, -2.70243381e+00,
       -2.70243381e+00, -2.70243381e+00, -2.70056074e+00, -2.67855875e+00,
       -2.67855875e+00, -2.67855875e+00, -2.64705167e+00, -2.64705167e+00,
       -2.64705167e+00, -2.64705167e+00, -2.64705167e+00, -2.64705167e+00,
       -2.64705167e+00, -2.64705167e+00, -2.59611808e+00, -2.56502263e+00,
       -2.54895295e+00, -2.54114757e+00, -2.54114757e+00, -2.54114757e+00,
       -2.54114757e+00, -2.54114757e+00, -2.51986557e+00, -2.51343849e+00,
       -2.51343849e+00, -2.51343849e+00, -2.51156542e+00, -2.51156542e+00,
       -2.48956343e+00, -2.45805635e+00, -2.45026620e+00, -2.42118528e+00,
       -2.42118528e+00, -2.32023973e+00, -2.31673481e+00, -2.30253197e+00,
       -2.30056811e+00, -2.30056811e+00, -2.30056811e+00, -2.20549255e+00,
       -2.20549255e+00, -2.20549255e+00, -2.20525793e+00, -2.18161749e+00,
       -2.18161749e+00, -2.17149253e+00, -2.16315693e+00, -2.16315693e+00,
       -2.10933179e+00, -2.06977108e+00, -2.06889865e+00, -2.06207890e+00,
       -2.01649723e+00, -2.01626261e+00, -1.99262217e+00, -1.99262217e+00,
       -1.99262217e+00, -1.99262217e+00, -1.99262217e+00, -1.99262217e+00,
       -1.95838956e+00, -1.94323714e+00, -1.93816123e+00, -1.88185622e+00,
       -1.84920852e+00, -1.82750191e+00, -1.82750191e+00, -1.82750191e+00,
       -1.82750191e+00, -1.82750191e+00, -1.82750191e+00, -1.82750191e+00,
       -1.82750191e+00, -1.82750191e+00, -1.80362685e+00, -1.80126688e+00,
       -1.79231908e+00, -1.76947862e+00, -1.75402308e+00, -1.74809308e+00,
       -1.74463884e+00, -1.73591681e+00, -1.72078762e+00, -1.70078696e+00,
       -1.62762238e+00, -1.61463153e+00, -1.61463153e+00, -1.61463153e+00,
       -1.61463153e+00, -1.61463153e+00, -1.61463153e+00, -1.61463153e+00,
       -1.61463153e+00, -1.61463153e+00, -1.56296301e+00, -1.54843938e+00,
       -1.54843938e+00, -1.52018671e+00, -1.44837302e+00, -1.42428181e+00,
       -1.42125656e+00, -1.41809901e+00, -1.40382855e+00, -1.39349641e+00,
       -1.38604559e+00, -1.31065389e+00, -1.30455131e+00, -1.29490554e+00,
       -1.23450311e+00, -1.22910369e+00, -1.22841580e+00, -1.21274420e+00,
       -1.19392959e+00, -1.17816354e+00, -1.16332918e+00, -1.14638345e+00,
       -1.13630549e+00, -1.11690675e+00, -1.11291122e+00, -1.08486815e+00,
       -1.08199035e+00, -1.05482346e+00, -1.04479622e+00, -1.03489092e+00,
       -1.01085193e+00, -1.00000612e+00, -9.82546516e-01, -9.75416012e-01,
       -9.49931459e-01, -9.28358974e-01, -9.27210106e-01, -8.96292218e-01,
       -8.94009880e-01, -8.79360635e-01, -8.77779511e-01, -8.76379939e-01,
       -8.55292198e-01, -8.55292198e-01, -8.49265709e-01, -8.45731665e-01,
       -8.36842584e-01, -8.36842584e-01, -8.10427042e-01, -8.08015104e-01,
       -8.07657379e-01, -8.05983649e-01, -8.04039011e-01, -7.92232624e-01,
       -7.71835418e-01, -7.44182926e-01, -7.43915631e-01, -7.41427507e-01,
       -7.41341478e-01, -7.39958371e-01, -6.84284785e-01, -6.82214906e-01,
       -6.59269678e-01, -6.41439387e-01, -6.32616512e-01, -6.29351760e-01,
       -6.28949872e-01, -6.19303241e-01, -6.04404275e-01, -5.94503398e-01,
       -5.93366604e-01, -5.86254170e-01, -5.59373652e-01, -5.34019244e-01,
       -5.33220533e-01, -5.16999984e-01, -4.93219586e-01, -4.71298723e-01,
       -4.41352178e-01, -4.40266557e-01, -4.29561511e-01, -4.29095419e-01,
       -3.85984979e-01, -3.84960100e-01, -3.62366775e-01, -3.56319715e-01,
       -3.52251205e-01, -3.44466575e-01, -3.38587583e-01, -3.35335369e-01,
       -3.34597920e-01, -3.33155197e-01, -3.11999384e-01, -3.10558562e-01,
       -3.01831789e-01, -2.98734543e-01, -2.95303181e-01, -2.85493441e-01,
       -2.51271236e-01, -2.49156395e-01, -2.38609800e-01, -2.25104327e-01,
       -2.23090433e-01, -2.16193445e-01, -2.09040144e-01, -1.68320450e-01,
       -1.62653715e-01, -1.59143516e-01, -1.35668627e-01, -1.29698851e-01,
       -1.28237006e-01, -1.17993755e-01, -1.15543213e-01, -1.07020083e-01,
       -8.26393772e-02, -7.42322054e-02, -6.47753679e-02, -6.33830946e-02,
       -6.32169039e-02, -6.32169039e-02, -4.98019395e-02, -3.67608549e-02,
       -2.35689383e-02, -2.35128460e-02, -1.04646377e-02, -7.99433801e-03,
       -4.40438576e-03,  4.20038254e-03,  2.14986365e-02,  2.51150388e-02,
        3.23155582e-02,  4.46911269e-02,  4.51078397e-02,  4.98000056e-02,
        5.15108720e-02,  6.31071037e-02,  6.53746059e-02,  6.69377867e-02,
        7.19904759e-02,  7.54624423e-02,  7.98939430e-02,  8.97991579e-02,
        9.10744735e-02,  9.35987098e-02,  9.42375868e-02,  1.00359561e-01,
        1.16081436e-01,  1.27423631e-01,  1.33038674e-01,  1.38774357e-01,
        1.59760457e-01,  1.62408123e-01,  1.63786595e-01,  1.73148694e-01,
        1.74769775e-01,  2.02986137e-01,  2.09764920e-01,  2.21088107e-01,
        2.27509279e-01,  2.49057226e-01,  2.52013663e-01,  2.60918843e-01,
        2.64381250e-01,  2.66170581e-01,  2.84042341e-01,  2.84540463e-01,
        3.03698186e-01,  3.06848965e-01,  3.10097060e-01,  3.14532375e-01,
        3.25471368e-01,  3.37853214e-01,  3.38489222e-01,  3.54207711e-01,
        3.55678928e-01,  3.57666471e-01,  3.78093857e-01,  3.89175918e-01,
        3.89618688e-01,  3.96534142e-01,  4.07469359e-01,  4.09455298e-01,
        4.13862505e-01,  4.14715026e-01,  4.22827749e-01,  4.25888326e-01,
        4.28976510e-01,  4.30311727e-01,  4.32315411e-01,  4.48235516e-01,
        4.51940013e-01,  4.51940013e-01,  4.51940013e-01,  4.62605799e-01,
        4.74061916e-01,  4.83801566e-01,  4.84525997e-01,  4.85727912e-01,
        4.97784284e-01,  5.06305757e-01,  5.07450631e-01,  5.09754119e-01,
        5.10583519e-01,  5.10745038e-01,  5.22350561e-01,  5.22350561e-01,
        5.27447566e-01,  5.33499043e-01,  5.35363818e-01,  5.37537549e-01,
        5.37556070e-01,  5.38697506e-01,  5.58197208e-01,  5.61392453e-01,
        5.67862339e-01,  5.68689980e-01,  5.78101399e-01,  5.79944687e-01,
        5.85894671e-01,  5.97432033e-01,  5.98450619e-01,  6.01179297e-01,
        6.06790327e-01,  6.08600265e-01,  6.15587708e-01,  6.17552988e-01,
        6.18942628e-01,  6.23680770e-01,  6.37396923e-01,  6.43397579e-01,
        6.47065004e-01,  6.50133492e-01,  6.64615345e-01,  6.66288289e-01,
        6.86545013e-01,  6.88248785e-01,  6.90554898e-01,  6.94279272e-01,
        6.95886215e-01,  6.98602036e-01,  7.02216498e-01,  7.04037464e-01,
        7.09769123e-01,  7.16649434e-01,  7.21302035e-01,  7.36760432e-01,
        7.50628094e-01,  7.54152175e-01,  7.54152175e-01,  7.60275023e-01,
        7.62821489e-01,  7.65967839e-01,  7.69576764e-01,  7.80503135e-01,
        7.88053726e-01,  7.90474346e-01,  7.90474346e-01,  7.98056788e-01,
        7.98927683e-01,  8.04583028e-01,  8.11637566e-01,  8.13581701e-01,
        8.16260130e-01,  8.16533321e-01,  8.17323274e-01,  8.17493173e-01,
        8.17888066e-01,  8.18421000e-01,  8.18894004e-01,  8.25239062e-01,
        8.32951432e-01,  8.36060325e-01,  8.37730041e-01,  8.45457955e-01,
        8.49847725e-01,  8.51334826e-01,  8.55070296e-01,  8.63919803e-01,
        8.67307374e-01,  8.72516157e-01,  8.77485723e-01,  8.77485723e-01,
        8.77485723e-01,  8.83363906e-01,  8.83991646e-01,  8.84098494e-01,
        9.11633954e-01,  9.13216869e-01,  9.13216869e-01,  9.13237593e-01,
        9.15807265e-01,  9.16048627e-01,  9.25655720e-01,  9.25857542e-01,
        9.42140534e-01,  9.75334513e-01,  9.76074977e-01,  9.79590946e-01,
        9.81262612e-01,  9.83069076e-01,  9.99054344e-01,  1.01371378e+00,
        1.01625954e+00,  1.02164410e+00,  1.02444250e+00,  1.02444250e+00,
        1.04016920e+00,  1.04115797e+00,  1.04384332e+00,  1.04474562e+00,
        1.04654721e+00,  1.04798506e+00,  1.05433791e+00,  1.05577331e+00,
        1.05643802e+00,  1.05663957e+00,  1.06005396e+00,  1.06155312e+00,
        1.07060359e+00,  1.08384801e+00,  1.09472289e+00,  1.09686084e+00,
        1.09758960e+00,  1.10932678e+00,  1.11552752e+00,  1.12032540e+00,
        1.12144242e+00,  1.13007898e+00,  1.13140048e+00,  1.13959583e+00,
        1.14512380e+00,  1.14660772e+00,  1.14790271e+00,  1.15933140e+00,
        1.15945390e+00,  1.16055853e+00,  1.16116438e+00,  1.16361436e+00,
        1.16507030e+00,  1.17104598e+00,  1.17544097e+00,  1.17544097e+00,
        1.17544097e+00,  1.17628236e+00,  1.18261461e+00,  1.18276532e+00,
        1.18744214e+00,  1.19380614e+00,  1.19593945e+00,  1.20113680e+00,
        1.20456867e+00,  1.20755448e+00,  1.20793443e+00,  1.21463401e+00,
        1.22400864e+00,  1.24478989e+00,  1.24880991e+00,  1.25477995e+00,
        1.25477995e+00,  1.25554710e+00,  1.26363051e+00,  1.28008136e+00,
        1.28585616e+00,  1.29292977e+00,  1.30600191e+00,  1.30742056e+00,
        1.31034981e+00,  1.31102218e+00,  1.31336577e+00,  1.31401470e+00,
        1.31410055e+00,  1.31520291e+00,  1.32032874e+00,  1.33044217e+00,
        1.33515725e+00,  1.34383869e+00,  1.35714871e+00,  1.36830089e+00,
        1.37340728e+00,  1.37408704e+00,  1.37665356e+00,  1.37988034e+00,
        1.38992204e+00,  1.40695894e+00,  1.40855650e+00,  1.41838715e+00,
        1.42236738e+00,  1.42238352e+00,  1.43580269e+00,  1.43807885e+00,
        1.46600317e+00,  1.46984001e+00,  1.47396534e+00,  1.47607464e+00,
        1.48148880e+00,  1.49488045e+00,  1.50977972e+00,  1.52763489e+00,
        1.52807329e+00,  1.52807329e+00,  1.54982214e+00,  1.55488595e+00,
        1.55506327e+00,  1.55652325e+00,  1.56425501e+00,  1.56726364e+00,
        1.56888422e+00,  1.56925770e+00,  1.57178622e+00,  1.58069823e+00,
        1.58797402e+00,  1.59186831e+00,  1.61201213e+00,  1.62792206e+00,
        1.62959923e+00,  1.63426259e+00,  1.67671352e+00,  1.68547082e+00,
        1.68842325e+00,  1.69855114e+00,  1.70185470e+00,  1.70212814e+00,
        1.70676508e+00,  1.70676508e+00,  1.71894289e+00,  1.72384111e+00,
        1.72478358e+00,  1.72490486e+00,  1.74443530e+00,  1.76993101e+00,
        1.77343805e+00,  1.77981305e+00,  1.78440319e+00,  1.79288983e+00,
        1.80275876e+00,  1.82996884e+00,  1.83537272e+00,  1.84113633e+00,
        1.85401792e+00,  1.86219510e+00,  1.87101241e+00,  1.87610331e+00,
        1.87875580e+00,  1.88054073e+00,  1.88951240e+00,  1.89682866e+00,
        1.92347726e+00,  1.92920655e+00,  1.94572542e+00,  1.96729332e+00,
        1.97039503e+00,  1.97841120e+00,  2.00681675e+00,  2.00755918e+00,
        2.01184670e+00,  2.04011889e+00,  2.04300269e+00,  2.04649330e+00,
        2.06887975e+00,  2.07143942e+00,  2.07628255e+00,  2.12977537e+00,
        2.13010120e+00,  2.14516556e+00,  2.16693940e+00,  2.18189453e+00,
        2.19840424e+00,  2.21781030e+00,  2.21781030e+00,  2.22515241e+00,
        2.24869518e+00,  2.25700151e+00,  2.27497517e+00,  2.29129530e+00,
        2.33481853e+00,  2.41427816e+00,  2.43769050e+00,  2.48813971e+00,
        2.62938119e+00,  2.78756323e+00,  2.83810465e+00,  2.89689940e+00,
        2.89740564e+00,  2.91401445e+00,  3.02987320e+00,  3.07354006e+00,
        3.07574169e+00,  3.08951912e+00,  3.87439934e+00,  3.87439934e+00,
        3.87439934e+00,  4.06339466e+00,  4.06339466e+00,  4.06339466e+00,
        4.25238998e+00,  4.25238998e+00,  4.25238998e+00])), (np.float64(1.6295863525147678), np.float64(1.6365067499952733e-14), np.float64(0.9837824784769148)))
Code
axes[1].set_title("Q-Q Plot of Residuals")
Text(0.5, 1.0, 'Q-Q Plot of Residuals')
Code
plt.tight_layout()
plt.show()

9.3 Coefficient Interpretation

Predictor Direction Business Meaning
Cycle Positive Each additional cycle is associated with a small yield increase. Coaches should expect measurable improvement by Cycle 3.
Greenhouse 2 vs GH1 Negative GH2 underperforms GH1 holding crop constant. GH1 has a structural infrastructure advantage.
Greenhouse 3 vs GH1 Negative GH3 also underperforms GH1 significantly.
Greenhouse 4 vs GH1 Strongly negative GH4 is the weakest unit. High-value crops should not be assigned here without infrastructure review.
Habanero vs Bell Peppers Positive Habanero yields significantly more. Strong justification for shifting the crop mix toward Habanero in GH1 and GH2.
Cucumber vs Bell Peppers Positive Cucumber also outperforms Bell Peppers, particularly in GH1 and GH2.
Seedlings Small or mixed Planting density has limited predictive power once crop type is controlled for.

10. Integrated Findings

Five analytical lenses converge on one finding: yield performance in the EYIA programme is not random. It is systematically shaped by two controllable programme variables, greenhouse assignment and crop type selection.

EDA exposed extreme yield inequality and documented two significant data quality issues. Visualisation made the inequality visible and identified GH1 with Habanero or Cucumber as the performance frontier. Hypothesis testing confirmed that both greenhouse and crop type effects are statistically real with p less than 0.001 on both Kruskal-Wallis tests. Correlation analysis showed these two variables dominate all others in their association with yield. Regression quantified their simultaneous effects and confirmed that Greenhouse 4 and leafy vegetable crops are associated with substantially lower log-yield even after controlling for cycle number and seedling count.

Integrated Recommendation: The EYIA programme should adopt a data-driven crop-greenhouse assignment policy:

  1. Prioritise Habanero and Cucumber in Greenhouses 1 and 2. These are the statistically and practically highest-performing combinations.

  2. Restrict Greenhouse 4 from high-value crop assignments until infrastructure parity with GH1 is established.

  3. Set yield benchmarks by crop-greenhouse combination. Companies below the 25th percentile for their configuration by Cycle 3 should receive mandatory agronomic coaching.

  4. Use the regression model as a yield-setting and planning tool for new cohort onboarding and donor performance reporting.


11. Limitations and Further Work

  1. Missing yield data at 54 percent: Investigating why yields are missing, whether from crop failure, abandonment, or data collection gaps, would substantially improve programme monitoring.

  2. No input cost data: Yield in kg is an incomplete performance metric. Profitability per cycle accounting for fertiliser, labour, water, and seedling costs would be far more actionable for a business development role.

  3. Greenhouse confounding: Crops are not randomly assigned to greenhouses, making it difficult to fully isolate each variable’s independent effect. A controlled assignment experiment in future cohorts would cleanly resolve this.

  4. Company-level heterogeneity: A multilevel model with company as a random effect would account for unobserved differences in farmer skill, experience, and access to inputs not captured in the current variables.

  5. Longitudinal learning curves: Analysing yield trajectories within individual companies across cycles would reveal learning effects more precisely than the aggregate cycle variable used in this analysis.


References

Adi, B. (2026). AI-powered business analytics: A practical textbook for data-driven decision making. Lagos Business School. https://markanalytics.online

Attah, S. (2026). EYIA 2024 master harvest data all cohorts. Dataset. Eupepsia Place Ltd, EYIA Programme Monitoring Unit, Ogun State, Nigeria. Data available on request from the author.

McKinney, W. (2010). Data structures for statistical computing in Python. Proceedings of the 9th Python in Science Conference, 56-61. https://doi.org/10.25080/Majora-92bf1922-00a

R Core Team. (2024). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/

Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer. https://doi.org/10.1007/978-3-319-24277-4

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L., Francois, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Muller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., and Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686


Appendix: AI Usage Statement

Claude (Anthropic) was used as a coding assistant in preparing this document. Claude assisted with writing R and Python code chunks for data loading, cleaning, visualisation, and statistical testing, and with structuring the Quarto document including panel-tabset formatting. All analytical decisions including technique selection, hypothesis framing, output interpretation, business recommendations, and professional disclosure content were made independently by the author based on her operational knowledge of the EYIA programme and the analytical judgement developed through the Data Analytics II course. The author is able to explain and defend every result in this document.