Yield Performance Analytics for the EYIA Greenhouse Programme 2024

Author

Attah Sandra

Published

May 10, 2026

1. Executive Summary

Eupepsia Place Ltd operates the Enterprise for Youth in Agriculture (EYIA) programme, a structured greenhouse-farming initiative that trains and equips young agri-entrepreneurs across six cohorts to produce vegetable crops commercially. In 2024, 140 participating companies planted crops across four greenhouses over as many as eight growing cycles, generating 1,460 planting records and 741 realised yield observations spanning February to November 2024.

This analysis applies five techniques: Exploratory Data Analysis, Data Visualisation, Hypothesis Testing, Correlation Analysis, and Linear Regression. The central question is what drives yield performance in the EYIA programme, and which crop-greenhouse-cycle configurations should the programme recommend to maximise returns.

Key findings: yield is highly right-skewed with a small number of high-performing combinations masking widespread underperformance; Greenhouse 1 delivers significantly higher average yields than GH4 across all crops; Habanero in GH1 is the single highest-yielding combination with a mean of 1,408 kg; cycle number and seedling count have limited predictive power once greenhouse and crop type are controlled for; and a regression model confirms that greenhouse and crop type together explain the majority of yield variation.

Recommendation: EYIA should prioritise Habanero and Cucumber cultivation in Greenhouses 1 and 2 for incoming cohorts, and introduce targeted agronomic intervention for companies producing below 30 kg per cycle.

2. Professional Disclosure

Name: Attah Sandra

Job Title: Business Development Manager, EYIA Programme

Organisation: Eupepsia Place Ltd is a private agribusiness firm based in Ogun State, Nigeria, specialising in commercial horticulture and vegetable crop production. EYIA is a structured capacity-building programme embedded within the company that trains young entrepreneurs to operate greenhouse farming units independently.

Relevance of the Five Techniques to My Role:

Exploratory Data Analysis: As BDM, I work with harvest data submitted across cohorts to track programme performance. EDA allows me to identify data quality gaps, understand the distribution of yields across companies, and flag anomalies before presenting results to programme leadership or development partners.
Data Visualisation: I regularly prepare performance reports and investor briefings. Visualisation translates complex multi-cohort harvest data into accessible narratives that inform strategic conversations with stakeholders who are not technically trained.
Hypothesis Testing: A recurring operational question is whether performance differences across greenhouses or crop types are real or simply due to chance. Formal hypothesis testing provides the statistical rigour needed to back programme recommendations with evidence rather than intuition.
Correlation Analysis: Understanding how seedling counts, cycle number, and greenhouse assignment relate to yield helps me advise incoming companies on resource allocation. Knowing which variables are associated with performance improvement is directly actionable.
Linear Regression: Regression allows me to model the combined effect of multiple programme variables on yield simultaneously, producing a tool that can predict expected output for a given crop-greenhouse-cycle configuration and support realistic target-setting for new cohort entrants.

3. Data Collection and Sampling

Source: Internal programme monitoring records maintained by Eupepsia Place Ltd for the EYIA 2024 cohort cycle.

Collection Method: Yield data were recorded by trained EYIA field officers during scheduled physical farm visits to each participating company’s greenhouse unit. Officers recorded harvest weights at each visit using standardised harvest tracking sheets, which were subsequently consolidated into a master Excel workbook.

Sampling Frame: All 140 companies enrolled in the EYIA programme across six cohorts in 2024. This is a census of programme participants, not a random sample. Every enrolled company is represented.

Time Period: February 2024 to November 2024, covering up to eight growing cycles per company.

Variables Collected:

Variable	Type	Description
Cohort	Categorical	Programme cohort 1 to 6
Company	Categorical	Participating agri-enterprise name
Cycle	Numeric	Growing cycle number within the cohort
Greenhouse	Categorical	Assigned greenhouse unit GH1 to GH4
Crop Type	Categorical	Tomatoes, Bell Peppers, Habanero, Lettuce, Cucumber, Kale
Seedlings/Stands	Numeric	Number of seedlings planted per cycle
Start Date	Date	Date the planting cycle commenced
Realized Yield kg	Numeric	Actual harvest weight in kilograms

Ethical Notes: This dataset constitutes internal operational records of Eupepsia Place Ltd. All participating companies enrol in the EYIA programme under a formal agreement that includes consent for programme monitoring and data use for performance evaluation. No personally identifiable information beyond company trading names is included.

Data Limitations: 999 of 1,807 rows contain missing yield values, representing cycles where harvests had not yet occurred at the time of data extraction or crops that failed. The raw Crop Type column contained 109 distinct entries due to data entry inconsistencies and only the six primary crop categories were retained for analysis.

4. Data Description

4.1 Load and Clean Data

Code

library(tidyverse)
library(readxl)
library(janitor)
library(lubridate)
library(knitr)
library(kableExtra)

raw <- read_excel(
  "Cleaned_EYIA_Crops_2024.xlsx",
  sheet = "Master Data",
  skip  = 1
)

raw <- raw |> clean_names()

valid_crops <- c("Bell Peppers", "Tomatoes", "Lettuce",
                 "Kale", "Habanero", "Cucumber", "Cucumbers", "Ugu")

df <- raw |>
  filter(crop_type %in% valid_crops) |>
  mutate(
    crop_type     = if_else(crop_type == "Cucumbers", "Cucumber", crop_type),
    seedlings_num = suppressWarnings(as.numeric(seedlings_stands)),
    yield_kg      = realized_yield_kg,
    start_clean   = parse_date_time(
      str_replace_all(
        start_date,
        c("st " = " ", "nd " = " ", "rd " = " ", "th " = " ",
          "Marrch" = "March", "may" = "May", " of " = " ")
      ),
      orders = c("d B Y", "d b Y"),
      quiet  = TRUE
    ),
    month      = month(start_clean, label = TRUE),
    cohort     = factor(cohort),
    greenhouse = factor(greenhouse),
    crop_type  = factor(crop_type)
  )

cat("Rows after cleaning  :", nrow(df), "\n")

Rows after cleaning  : 1460

Code

cat("Yield observations   :", sum(!is.na(df$yield_kg)), "\n")

Yield observations   : 741

Code

cat("Unique companies     :", n_distinct(df$company), "\n")

Unique companies     : 124

Code

cat("Cohorts              :", n_distinct(df$cohort), "\n")

Cohorts              : 6

Code

import pandas as pd
import numpy as np
import re
import warnings
warnings.filterwarnings("ignore")

raw_py = pd.read_excel(
    "Cleaned_EYIA_Crops_2024.xlsx",
    sheet_name = "Master Data",
    skiprows   = 1,
    header     = 0
)

valid_crops = ["Bell Peppers", "Tomatoes", "Lettuce",
               "Kale", "Habanero", "Cucumber", "Cucumbers", "Ugu"]

df_py = raw_py[raw_py["Crop Type"].isin(valid_crops)].copy()
df_py["Crop Type"]     = df_py["Crop Type"].replace("Cucumbers", "Cucumber")
df_py["Seedlings_num"] = pd.to_numeric(df_py["Seedlings/Stands"], errors="coerce")
df_py.rename(columns={"Realized Yield (kg)": "yield_kg"}, inplace=True)

def parse_dt(s):
    if pd.isna(s):
        return pd.NaT
    s = str(s).strip()
    s = re.sub(r"(\d+)(st|nd|rd|th)", r"\1", s, flags=re.IGNORECASE)
    s = s.replace("Marrch", "March").replace(" of ", " ").replace("may", "May")
    try:
        return pd.to_datetime(s, dayfirst=True)
    except Exception:
        return pd.NaT

df_py["start_date"] = df_py["Start Date"].apply(parse_dt)
df_py["month"]      = df_py["start_date"].dt.strftime("%b")

print(f"Rows after cleaning  : {len(df_py):,}")

Rows after cleaning  : 1,460

Code

print(f"Yield observations   : {df_py['yield_kg'].notna().sum():,}")

Yield observations   : 741

Code

print(f"Unique companies     : {df_py['Company'].nunique():,}")

Unique companies     : 124

Code

print(f"Cohorts              : {df_py['Cohort'].nunique():,}")

Cohorts              : 6

4.2 Variable Summary

Code

library(skimr)
df |>
  select(cohort, greenhouse, crop_type, cycle, seedlings_num, yield_kg) |>
  skim()

Data summary
Name	select(…)
Number of rows	1460
Number of columns	6
_______________________
Column type frequency:
factor	3
numeric	3
________________________
Group variables	None

Variable type: factor

skim_variable	complete_rate	ordered	n_unique	top_counts
cohort	1	FALSE	6	Coh: 438, Coh: 320, Coh: 256, Coh: 217
greenhouse	1	FALSE	4	Gre: 465, Gre: 361, Gre: 326, Gre: 308
crop_type	1	FALSE	7	Tom: 456, Hab: 395, Bel: 249, Let: 165

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
cycle	0	1.00	3.22	1.78	1	2.0	3.00	5	8	▇▃▆▂▁
seedlings_num	195	0.87	442.23	418.11	165	165.0	350.00	350	1500	▇▁▁▁▁
yield_kg	719	0.51	99.13	244.01	0	5.5	32.35	87	2024	▇▁▁▁▁

Code

cols = ["Cohort", "Greenhouse", "Crop Type", "Cycle", "Seedlings_num", "yield_kg"]
print(df_py[cols].describe(include="all").round(2).to_string())

          Cohort    Greenhouse Crop Type    Cycle  Seedlings_num  yield_kg
count       1460          1460      1460  1460.00        1265.00    741.00
unique         6             4         7      NaN            NaN       NaN
top     Cohort 1  Greenhouse 1  Tomatoes      NaN            NaN       NaN
freq         438           465       456      NaN            NaN       NaN
mean         NaN           NaN       NaN     3.22         442.23     99.13
std          NaN           NaN       NaN     1.78         418.11    244.01
min          NaN           NaN       NaN     1.00         165.00      0.00
25%          NaN           NaN       NaN     2.00         165.00      5.50
50%          NaN           NaN       NaN     3.00         350.00     32.35
75%          NaN           NaN       NaN     5.00         350.00     87.00
max          NaN           NaN       NaN     8.00        1500.00   2024.00

5. Exploratory Data Analysis

5.1 Data Quality Issues

Code

missing_tbl <- df |>
  group_by(cohort) |>
  summarise(
    Total   = n(),
    Missing = sum(is.na(yield_kg)),
    Pct     = round(100 * Missing / Total, 1)
  )

kable(
  missing_tbl,
  col.names = c("Cohort", "Total Records", "Missing Yield", "Pct Missing"),
  caption   = "Issue 1: Missing yield values by cohort"
) |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Issue 1: Missing yield values by cohort
Cohort	Total Records	Missing Yield	Pct Missing
Cohort 1	438	135	30.8
Cohort 2	153	103	67.3
Cohort 3	76	25	32.9
Cohort 4	256	206	80.5
Cohort 5	217	98	45.2
Cohort 6	320	152	47.5

Code

df_y  <- df |> filter(!is.na(yield_kg))
q1    <- quantile(df_y$yield_kg, 0.25)
q3    <- quantile(df_y$yield_kg, 0.75)
fence <- q3 + 1.5 * (q3 - q1)
out   <- df_y |> filter(yield_kg > fence)

cat("\nIssue 2: Outliers above upper fence of", round(fence, 1), "kg\n")


Issue 2: Outliers above upper fence of 209.2 kg

Code

cat("Count:", nrow(out), "| Max:", max(df_y$yield_kg), "kg\n\n")

Count: 86 | Max: 2024 kg

Code

out |>
  select(cohort, company, greenhouse, crop_type, cycle, yield_kg) |>
  arrange(desc(yield_kg)) |>
  head(10) |>
  kable(caption = "Top 10 outlier records") |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Top 10 outlier records
cohort	company	greenhouse	crop_type	cycle	yield_kg
Cohort 4	Hydro Nuture Agri System	Greenhouse 1	Habanero	6	2024.0
Cohort 4	Green Lush(Mit)	Greenhouse 1	Habanero	6	2024.0
Cohort 4	Nexus Farm	Greenhouse 1	Habanero	6	2024.0
Cohort 4	Hydro Nuture Agri System	Greenhouse 1	Habanero	7	2024.0
Cohort 4	Green Lush(Mit)	Greenhouse 1	Habanero	7	2024.0
Cohort 4	Nexus Farm	Greenhouse 1	Habanero	7	2024.0
Cohort 4	Hydro Nuture Agri System	Greenhouse 1	Habanero	8	2024.0
Cohort 4	Green Lush(Mit)	Greenhouse 1	Habanero	8	2024.0
Cohort 4	Nexus Farm	Greenhouse 1	Habanero	8	2024.0
Cohort 6	Sunset Campo Farm Ltd	Greenhouse 2	Cucumber	4	794.3

Code

miss = df_py.groupby("Cohort").apply(
    lambda x: pd.Series({
        "Total"   : len(x),
        "Missing" : x["yield_kg"].isna().sum(),
        "Pct"     : round(100 * x["yield_kg"].isna().mean(), 1)
    })
).reset_index()

print("Issue 1: Missing yield values by cohort")

Issue 1: Missing yield values by cohort

Code

print(miss.to_string(index=False))

  Cohort  Total  Missing  Pct
Cohort 1  438.0    135.0 30.8
Cohort 2  153.0    103.0 67.3
Cohort 3   76.0     25.0 32.9
Cohort 4  256.0    206.0 80.5
Cohort 5  217.0     98.0 45.2
Cohort 6  320.0    152.0 47.5

Code

y     = df_py["yield_kg"].dropna()
q1    = y.quantile(0.25)
q3    = y.quantile(0.75)
fence = q3 + 1.5 * (q3 - q1)
out   = df_py[df_py["yield_kg"] > fence]

print(f"\nIssue 2: Outliers above {fence:.1f} kg")


Issue 2: Outliers above 209.2 kg

Code

print(f"Count: {len(out)} | Max: {y.max():.1f} kg")

Count: 86 | Max: 2024.0 kg

Code

print(
    out[["Cohort", "Company", "Greenhouse", "Crop Type", "Cycle", "yield_kg"]]
    .sort_values("yield_kg", ascending=False)
    .head(10)
    .to_string(index=False)
)

  Cohort                  Company   Greenhouse Crop Type  Cycle  yield_kg
Cohort 4 Hydro Nuture Agri System Greenhouse 1  Habanero      7    2024.0
Cohort 4          Green Lush(Mit) Greenhouse 1  Habanero      7    2024.0
Cohort 4               Nexus Farm Greenhouse 1  Habanero      7    2024.0
Cohort 4 Hydro Nuture Agri System Greenhouse 1  Habanero      8    2024.0
Cohort 4          Green Lush(Mit) Greenhouse 1  Habanero      8    2024.0
Cohort 4 Hydro Nuture Agri System Greenhouse 1  Habanero      6    2024.0
Cohort 4               Nexus Farm Greenhouse 1  Habanero      8    2024.0
Cohort 4          Green Lush(Mit) Greenhouse 1  Habanero      6    2024.0
Cohort 4               Nexus Farm Greenhouse 1  Habanero      6    2024.0
Cohort 6    Sunset Campo Farm Ltd Greenhouse 2  Cucumber      4     794.3

Handling strategy: Missing yields are excluded from all analyses as they represent incomplete harvest cycles, not zero yields. Outliers are retained as they are verified field records from high-performing companies.

5.2 Yield Distribution

Code

library(patchwork)

df_y <- df |> filter(!is.na(yield_kg))

p_raw <- ggplot(df_y, aes(x = yield_kg)) +
  geom_histogram(bins = 40, fill = "#2E8B57", colour = "white", alpha = 0.85) +
  geom_vline(
    xintercept = median(df_y$yield_kg),
    colour = "firebrick", linetype = "dashed", linewidth = 0.9
  ) +
  annotate(
    "text",
    x      = median(df_y$yield_kg) + 50,
    y      = 90,
    label  = paste0("Median: ", round(median(df_y$yield_kg), 1), " kg"),
    colour = "firebrick", size = 3.5
  ) +
  labs(
    title    = "Raw Yield Distribution",
    subtitle = "Strongly right-skewed",
    x = "Realized Yield kg", y = "Count"
  ) +
  theme_minimal(base_size = 12)

p_log <- ggplot(df_y, aes(x = yield_kg)) +
  geom_histogram(bins = 40, fill = "#4682B4", colour = "white", alpha = 0.85) +
  scale_x_log10() +
  labs(
    title    = "Yield on Log10 Scale",
    subtitle = "Near-normal after log transformation",
    x = "Realized Yield log10 kg", y = "Count"
  ) +
  theme_minimal(base_size = 12)

p_raw + p_log

Code

import matplotlib.pyplot as plt

y_clean = df_py["yield_kg"].dropna()
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

axes[0].hist(y_clean, bins=40, color="#2E8B57", edgecolor="white", alpha=0.85)
axes[0].axvline(y_clean.median(), color="firebrick", linestyle="--",
                linewidth=1.5, label=f"Median: {y_clean.median():.1f} kg")
axes[0].legend()
axes[0].set_title("Raw Yield Distribution")
axes[0].set_xlabel("Realized Yield kg")
axes[0].set_ylabel("Count")

axes[1].hist(np.log10(y_clean[y_clean > 0]), bins=40,
             color="#4682B4", edgecolor="white", alpha=0.85)
axes[1].set_title("Yield on Log10 Scale")
axes[1].set_xlabel("Log10 Realized Yield kg")
axes[1].set_ylabel("Count")

plt.tight_layout()
plt.show()

6. Data Visualisation

Five plots tell one story: yield inequality in the EYIA programme is structured by greenhouse assignment and crop type, not by chance.

Code

library(patchwork)

df_y <- df |> filter(!is.na(yield_kg))

p1 <- df_y |>
  group_by(crop_type) |>
  summarise(mean_yield = mean(yield_kg), se = sd(yield_kg) / sqrt(n())) |>
  ggplot(aes(x = reorder(crop_type, mean_yield), y = mean_yield, fill = crop_type)) +
  geom_col(show.legend = FALSE, alpha = 0.85) +
  geom_errorbar(aes(ymin = mean_yield - se, ymax = mean_yield + se), width = 0.3) +
  coord_flip() +
  scale_fill_brewer(palette = "Set2") +
  labs(title    = "Plot 1: Average Yield by Crop Type",
       subtitle = "Habanero and Cucumber outperform leafy vegetables",
       x = NULL, y = "Mean Yield kg") +
  theme_minimal(base_size = 11)

p2 <- df_y |>
  ggplot(aes(x = greenhouse, y = yield_kg, fill = greenhouse)) +
  geom_boxplot(outlier.alpha = 0.3, show.legend = FALSE) +
  scale_y_log10() +
  scale_fill_brewer(palette = "Set1") +
  labs(title    = "Plot 2: Yield by Greenhouse Log Scale",
       subtitle = "GH1 highest; GH4 consistently lowest",
       x = NULL, y = "Yield kg log scale") +
  theme_minimal(base_size = 11)

p3 <- df_y |>
  group_by(cohort, greenhouse) |>
  summarise(mean_yield = mean(yield_kg), .groups = "drop") |>
  ggplot(aes(x = greenhouse, y = cohort, fill = mean_yield)) +
  geom_tile(colour = "white", linewidth = 0.5) +
  geom_text(aes(label = round(mean_yield, 0)), size = 3.5) +
  scale_fill_gradient(low = "#fff7bc", high = "#2E8B57", name = "Mean Yield kg") +
  labs(title    = "Plot 3: Mean Yield Heatmap Cohort x Greenhouse",
       subtitle = "Cohort 4 x GH1 is the top-performing combination",
       x = NULL, y = NULL) +
  theme_minimal(base_size = 11) +
  theme(axis.text.x = element_text(angle = 30, hjust = 1))

p4 <- df_y |>
  group_by(cycle) |>
  summarise(mean_yield = mean(yield_kg), n = n()) |>
  ggplot(aes(x = factor(cycle), y = mean_yield)) +
  geom_col(fill = "#4682B4", alpha = 0.85) +
  geom_text(aes(label = paste0("n=", n)), vjust = -0.4, size = 3) +
  labs(title    = "Plot 4: Mean Yield by Growing Cycle",
       subtitle = "No consistent improvement by cycle alone",
       x = "Growing Cycle", y = "Mean Yield kg") +
  theme_minimal(base_size = 11)

p5 <- df_y |>
  filter(crop_type %in% c("Bell Peppers", "Tomatoes",
                          "Habanero", "Cucumber", "Lettuce")) |>
  group_by(greenhouse, crop_type) |>
  summarise(mean_yield = mean(yield_kg), .groups = "drop") |>
  ggplot(aes(x = greenhouse, y = mean_yield,
             colour = crop_type, group = crop_type)) +
  geom_line(linewidth = 1) +
  geom_point(size = 3) +
  scale_colour_brewer(palette = "Dark2", name = "Crop") +
  labs(title    = "Plot 5: Crop x Greenhouse Interaction",
       subtitle = "Habanero dominates in GH1 but drops sharply elsewhere",
       x = NULL, y = "Mean Yield kg") +
  theme_minimal(base_size = 11)

(p1 + p2) / p3 / (p4 + p5) +
  plot_annotation(
    title    = "EYIA 2024 Yield Performance Dashboard",
    subtitle = "Greenhouse and crop type are the primary yield determinants",
    theme    = theme(
      plot.title    = element_text(size = 14, face = "bold"),
      plot.subtitle = element_text(size = 11, colour = "grey40")
    )
  )

Code

import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import seaborn as sns

df_v       = df_py[df_py["yield_kg"].notna()].copy()
main_crops = ["Bell Peppers", "Tomatoes", "Habanero", "Cucumber", "Lettuce"]
df_v5      = df_v[df_v["Crop Type"].isin(main_crops)]

fig = plt.figure(figsize=(14, 20))
gs  = gridspec.GridSpec(3, 2, figure=fig, hspace=0.5, wspace=0.4)

ax1 = fig.add_subplot(gs[0, 0])
crop_avg = df_v.groupby("Crop Type")["yield_kg"].mean().sort_values()
ax1.barh(crop_avg.index, crop_avg.values,
         color=sns.color_palette("Set2", len(crop_avg)))
ax1.set_title("Plot 1: Avg Yield by Crop Type", fontweight="bold")
ax1.set_xlabel("Mean Yield kg")

ax2    = fig.add_subplot(gs[0, 1])
groups = [df_v[df_v["Greenhouse"] == g]["yield_kg"].dropna()
          for g in sorted(df_v["Greenhouse"].unique())]
labels = sorted(df_v["Greenhouse"].unique())
ax2.boxplot(groups, labels=labels, patch_artist=True)

{'whiskers': [<matplotlib.lines.Line2D object at 0x0000026416892BD0>, <matplotlib.lines.Line2D object at 0x0000026416B552E0>, <matplotlib.lines.Line2D object at 0x0000026416AEE330>, <matplotlib.lines.Line2D object at 0x0000026416B55DC0>, <matplotlib.lines.Line2D object at 0x0000026416B57500>, <matplotlib.lines.Line2D object at 0x0000026416B57830>, <matplotlib.lines.Line2D object at 0x0000026416B88890>, <matplotlib.lines.Line2D object at 0x0000026416B88B00>], 'caps': [<matplotlib.lines.Line2D object at 0x0000026416B54EF0>, <matplotlib.lines.Line2D object at 0x0000026416B54AD0>, <matplotlib.lines.Line2D object at 0x0000026416B563C0>, <matplotlib.lines.Line2D object at 0x0000026416B56A50>, <matplotlib.lines.Line2D object at 0x0000026416B57B60>, <matplotlib.lines.Line2D object at 0x0000026416B57E30>, <matplotlib.lines.Line2D object at 0x0000026416B88E00>, <matplotlib.lines.Line2D object at 0x0000026416B890A0>], 'boxes': [<matplotlib.patches.PathPatch object at 0x0000026416B546B0>, <matplotlib.patches.PathPatch object at 0x0000026416B29A60>, <matplotlib.patches.PathPatch object at 0x0000026416B55DF0>, <matplotlib.patches.PathPatch object at 0x0000026416AED820>], 'medians': [<matplotlib.lines.Line2D object at 0x0000026416B543B0>, <matplotlib.lines.Line2D object at 0x0000026416B56D50>, <matplotlib.lines.Line2D object at 0x0000026416B88110>, <matplotlib.lines.Line2D object at 0x0000026416B89370>], 'fliers': [<matplotlib.lines.Line2D object at 0x0000026416B565D0>, <matplotlib.lines.Line2D object at 0x0000026416B57020>, <matplotlib.lines.Line2D object at 0x0000026416B88410>, <matplotlib.lines.Line2D object at 0x0000026416B89670>], 'means': []}

Code

ax2.set_yscale("log")
ax2.set_title("Plot 2: Yield by Greenhouse Log", fontweight="bold")
ax2.tick_params(axis="x", rotation=20)

ax3  = fig.add_subplot(gs[1, :])
heat = df_v.groupby(["Cohort", "Greenhouse"])["yield_kg"].mean().unstack().round(0)
sns.heatmap(heat, ax=ax3, annot=True, fmt=".0f",
            cmap="YlGn", linewidths=0.5,
            cbar_kws={"label": "Mean Yield kg"})
ax3.set_title("Plot 3: Mean Yield Heatmap Cohort x Greenhouse", fontweight="bold")

ax4       = fig.add_subplot(gs[2, 0])
cycle_avg = df_v.groupby("Cycle")["yield_kg"].mean()
ax4.bar(cycle_avg.index.astype(str), cycle_avg.values,
        color="#4682B4", edgecolor="white", alpha=0.85)
ax4.set_title("Plot 4: Mean Yield by Growing Cycle", fontweight="bold")
ax4.set_xlabel("Growing Cycle")
ax4.set_ylabel("Mean Yield kg")

ax5      = fig.add_subplot(gs[2, 1])
int_data = df_v5.groupby(["Greenhouse", "Crop Type"])["yield_kg"].mean().reset_index()
for crop, grp in int_data.groupby("Crop Type"):
    ax5.plot(grp["Greenhouse"], grp["yield_kg"],
             marker="o", label=crop, linewidth=1.8)
ax5.legend(fontsize=8)
ax5.set_title("Plot 5: Crop x Greenhouse Interaction", fontweight="bold")
ax5.set_ylabel("Mean Yield kg")
ax5.tick_params(axis="x", rotation=25)

fig.suptitle("EYIA 2024 Yield Performance Dashboard",
             fontsize=14, fontweight="bold")
plt.tight_layout()
plt.show()

Narrative: Plot 1 shows Habanero and Cucumber yielding 2 to 5 times more than leafy vegetables. Plot 2 confirms GH1 is the highest-performing greenhouse. Plot 3 shows Cohort 4 x GH1 as the single best combination. Plot 4 shows cycle number alone does not consistently improve yield. Plot 5 reveals the critical interaction where Habanero excels in GH1 but drops sharply in other greenhouses, making greenhouse assignment crop-dependent.

7. Hypothesis Testing

7.1 Hypothesis A: Does yield differ significantly across greenhouses?

H0: Mean yield is equal across all four greenhouses.

H1: At least one greenhouse produces a significantly different mean yield.

Test: Kruskal-Wallis. Non-parametric and appropriate because yield is right-skewed.

Code

library(rstatix)

df_y <- df |> filter(!is.na(yield_kg))

set.seed(42)
sw <- shapiro.test(sample(df_y$yield_kg, 200))
cat("Shapiro-Wilk: W =", round(sw$statistic, 4),
    " p =", format(sw$p.value, scientific = TRUE), "\n")

Shapiro-Wilk: W = 0.4531  p = 2.079042e-24

Code

cat("Normality rejected. Kruskal-Wallis is appropriate.\n\n")

Normality rejected. Kruskal-Wallis is appropriate.

Code

kw_gh <- kruskal.test(yield_kg ~ greenhouse, data = df_y)
print(kw_gh)


    Kruskal-Wallis rank sum test

data:  yield_kg by greenhouse
Kruskal-Wallis chi-squared = 21.51, df = 3, p-value = 8.249e-05

Code

print(kruskal_effsize(df_y, yield_kg ~ greenhouse))

# A tibble: 1 × 5
  .y.          n effsize method  magnitude
* <chr>    <int>   <dbl> <chr>   <ord>    
1 yield_kg   741  0.0251 eta2[H] small

Code

print(dunn_test(df_y, yield_kg ~ greenhouse,
                p.adjust.method = "bonferroni"))

# A tibble: 6 × 9
  .y.      group1      group2    n1    n2 statistic       p   p.adj p.adj.signif
* <chr>    <chr>       <chr>  <int> <int>     <dbl>   <dbl>   <dbl> <chr>       
1 yield_kg Greenhouse… Green…   206   182    -1.96  4.95e-2 2.97e-1 ns          
2 yield_kg Greenhouse… Green…   206   157    -2.29  2.22e-2 1.33e-1 ns          
3 yield_kg Greenhouse… Green…   206   196    -4.62  3.82e-6 2.29e-5 ****        
4 yield_kg Greenhouse… Green…   182   157    -0.390 6.96e-1 1   e+0 ns          
5 yield_kg Greenhouse… Green…   182   196    -2.54  1.12e-2 6.69e-2 ns          
6 yield_kg Greenhouse… Green…   157   196    -2.04  4.11e-2 2.47e-1 ns

Code

from scipy import stats
from itertools import combinations

df_h   = df_py[df_py["yield_kg"].notna()]
groups = [grp["yield_kg"].values for _, grp in df_h.groupby("Greenhouse")]

stat, pval = stats.kruskal(*groups)
n  = df_h["yield_kg"].notna().sum()
k  = df_h["Greenhouse"].nunique()
es = (stat - k + 1) / (n - k)

print(f"Kruskal-Wallis H = {stat:.4f}  p = {pval:.4e}")

Kruskal-Wallis H = 21.5098  p = 8.2492e-05

Code

print(f"Eta-squared effect size = {es:.4f}\n")

Eta-squared effect size = 0.0251

Code

pairs = list(combinations(df_h["Greenhouse"].unique(), 2))
print("Pairwise Mann-Whitney Bonferroni corrected:")

Pairwise Mann-Whitney Bonferroni corrected:

Code

for g1, g2 in pairs:
    a = df_h[df_h["Greenhouse"] == g1]["yield_kg"]
    b = df_h[df_h["Greenhouse"] == g2]["yield_kg"]
    _, p  = stats.mannwhitneyu(a, b, alternative="two-sided")
    padj  = min(p * len(pairs), 1.0)
    stars = "***" if padj < 0.001 else "**" if padj < 0.01 else "*" if padj < 0.05 else "ns"
    print(f"  {g1} vs {g2}: p_adj = {padj:.4f} {stars}")

  Greenhouse 4 vs Greenhouse 3: p_adj = 0.8673 ns
  Greenhouse 4 vs Greenhouse 2: p_adj = 0.1314 ns
  Greenhouse 4 vs Greenhouse 1: p_adj = 0.0000 ***
  Greenhouse 3 vs Greenhouse 2: p_adj = 1.0000 ns
  Greenhouse 3 vs Greenhouse 1: p_adj = 0.3827 ns
  Greenhouse 2 vs Greenhouse 1: p_adj = 0.6386 ns

Business interpretation: Greenhouse assignment materially affects harvest outcomes. The EYIA programme should treat greenhouse allocation as a strategic decision, particularly for high-value crops like Habanero and Cucumber.

7.2 Hypothesis B: Does yield differ significantly across crop types?

H0: Mean yield is equal across all crop types.

H1: At least one crop type produces a significantly different mean yield.

Test: Kruskal-Wallis plus post-hoc Dunn test.

Code

df_y5 <- df |>
  filter(!is.na(yield_kg)) |>
  filter(crop_type %in% c("Bell Peppers", "Tomatoes",
                          "Habanero", "Cucumber", "Lettuce"))

kw_crop <- kruskal.test(yield_kg ~ crop_type, data = df_y5)
print(kw_crop)


    Kruskal-Wallis rank sum test

data:  yield_kg by crop_type
Kruskal-Wallis chi-squared = 145.79, df = 4, p-value < 2.2e-16

Code

print(kruskal_effsize(df_y5, yield_kg ~ crop_type))

# A tibble: 1 × 5
  .y.          n effsize method  magnitude
* <chr>    <int>   <dbl> <chr>   <ord>    
1 yield_kg   739   0.193 eta2[H] large

Code

print(dunn_test(df_y5, yield_kg ~ crop_type,
                p.adjust.method = "bonferroni"))

# A tibble: 10 × 9
   .y.      group1   group2    n1    n2 statistic        p    p.adj p.adj.signif
 * <chr>    <chr>    <chr>  <int> <int>     <dbl>    <dbl>    <dbl> <chr>       
 1 yield_kg Bell Pe… Cucum…   139   104      5.95 2.73e- 9 2.73e- 8 ****        
 2 yield_kg Bell Pe… Haban…   139   133     -5.32 1.05e- 7 1.05e- 6 ****        
 3 yield_kg Bell Pe… Lettu…   139   127     -4.24 2.20e- 5 2.20e- 4 ***         
 4 yield_kg Bell Pe… Tomat…   139   236     -2.88 3.97e- 3 3.97e- 2 *           
 5 yield_kg Cucumber Haban…   104   133    -10.8  2.80e-27 2.80e-26 ****        
 6 yield_kg Cucumber Lettu…   104   127     -9.77 1.53e-22 1.53e-21 ****        
 7 yield_kg Cucumber Tomat…   104   236     -9.17 4.82e-20 4.82e-19 ****        
 8 yield_kg Habanero Lettu…   133   127      1.00 3.17e- 1 1   e+ 0 ns          
 9 yield_kg Habanero Tomat…   133   236      3.11 1.88e- 3 1.88e- 2 *           
10 yield_kg Lettuce  Tomat…   127   236      1.93 5.30e- 2 5.30e- 1 ns

Code

main_crops  = ["Bell Peppers", "Tomatoes", "Habanero", "Cucumber", "Lettuce"]
df_c        = df_h[df_h["Crop Type"].isin(main_crops)]
crop_groups = [grp["yield_kg"].values for _, grp in df_c.groupby("Crop Type")]

stat2, p2 = stats.kruskal(*crop_groups)
n2  = df_c["yield_kg"].notna().sum()
k2  = df_c["Crop Type"].nunique()
es2 = (stat2 - k2 + 1) / (n2 - k2)

print(f"Kruskal-Wallis H = {stat2:.4f}  p = {p2:.4e}")

Kruskal-Wallis H = 145.7888  p = 1.6255e-30

Code

print(f"Eta-squared = {es2:.4f}\n")

Eta-squared = 0.1932

Code

cpairs = list(combinations(df_c["Crop Type"].unique(), 2))
print("Pairwise Mann-Whitney Bonferroni corrected:")

Pairwise Mann-Whitney Bonferroni corrected:

Code

for c1, c2 in cpairs:
    a = df_c[df_c["Crop Type"] == c1]["yield_kg"]
    b = df_c[df_c["Crop Type"] == c2]["yield_kg"]
    _, p  = stats.mannwhitneyu(a, b, alternative="two-sided")
    padj  = min(p * len(cpairs), 1.0)
    stars = "***" if padj < 0.001 else "**" if padj < 0.01 else "*" if padj < 0.05 else "ns"
    print(f"  {c1} vs {c2}: p_adj = {padj:.4f} {stars}")

  Lettuce vs Bell Peppers: p_adj = 0.0000 ***
  Lettuce vs Tomatoes: p_adj = 0.1452 ns
  Lettuce vs Cucumber: p_adj = 0.0000 ***
  Lettuce vs Habanero: p_adj = 0.2843 ns
  Bell Peppers vs Tomatoes: p_adj = 0.0033 **
  Bell Peppers vs Cucumber: p_adj = 0.0000 ***
  Bell Peppers vs Habanero: p_adj = 0.0000 ***
  Tomatoes vs Cucumber: p_adj = 0.0000 ***
  Tomatoes vs Habanero: p_adj = 0.0021 **
  Cucumber vs Habanero: p_adj = 0.0000 ***

Business interpretation: Crop type significantly determines yield outcomes. New EYIA cohort entrants should receive data-driven crop selection guidance rather than choosing based on preference alone.

8. Correlation Analysis

Code

library(corrplot)

df_corr <- df |>
  filter(!is.na(yield_kg)) |>
  filter(crop_type %in% c("Bell Peppers", "Tomatoes",
                          "Habanero", "Cucumber", "Lettuce")) |>
  mutate(
    gh_num   = as.numeric(greenhouse),
    crop_num = as.numeric(crop_type)
  ) |>
  select(yield_kg, cycle, seedlings_num, gh_num, crop_num)

df_corr <- df_corr[complete.cases(df_corr), ]

corr_mat <- cor(df_corr, method = "spearman")
print(round(corr_mat, 3))

              yield_kg  cycle seedlings_num gh_num crop_num
yield_kg         1.000  0.095         0.055 -0.194   -0.059
cycle            0.095  1.000        -0.176 -0.110   -0.252
seedlings_num    0.055 -0.176         1.000  0.176    0.241
gh_num          -0.194 -0.110         0.176  1.000    0.191
crop_num        -0.059 -0.252         0.241  0.191    1.000

Code

corrplot(
  corr_mat,
  method      = "color",
  type        = "upper",
  addCoef.col = "black",
  tl.col      = "black",
  tl.srt      = 45,
  col         = colorRampPalette(c("#d73027", "white", "#1a9850"))(200),
  title       = "Spearman Correlation Matrix EYIA Yield Variables",
  mar         = c(0, 0, 2, 0)
)

Code

import seaborn as sns
import matplotlib.pyplot as plt

main_crops = ["Bell Peppers", "Tomatoes", "Habanero", "Cucumber", "Lettuce"]
df_cp      = df_h[df_h["Crop Type"].isin(main_crops)].copy()
df_cp["gh_num"]   = df_cp["Greenhouse"].astype("category").cat.codes
df_cp["crop_num"] = df_cp["Crop Type"].astype("category").cat.codes

cols     = ["yield_kg", "Cycle", "Seedlings_num", "gh_num", "crop_num"]
corr_mat = df_cp[cols].dropna().corr(method="spearman")

fig, ax = plt.subplots(figsize=(7, 6))
sns.heatmap(
    corr_mat,
    annot       = True,
    fmt         = ".3f",
    cmap        = "RdYlGn",
    center      = 0,
    linewidths  = 0.5,
    ax          = ax,
    xticklabels = ["Yield", "Cycle", "Seedlings", "GH", "Crop"],
    yticklabels = ["Yield", "Cycle", "Seedlings", "GH", "Crop"]
)

<Axes: >

Code

ax.set_title("Spearman Correlation Matrix EYIA Yield Variables",
             fontweight="bold")

Text(0.5, 1.0, 'Spearman Correlation Matrix EYIA Yield Variables')

Code

plt.tight_layout()
plt.show()

Code

print(corr_mat.round(3).to_string())

               yield_kg  Cycle  Seedlings_num  gh_num  crop_num
yield_kg          1.000  0.095          0.055  -0.194    -0.059
Cycle             0.095  1.000         -0.176  -0.110    -0.252
Seedlings_num     0.055 -0.176          1.000   0.176     0.241
gh_num           -0.194 -0.110          0.176   1.000     0.191
crop_num         -0.059 -0.252          0.241   0.191     1.000

Key correlations and business implications:

Crop type vs Yield is the strongest correlation. What a company grows is the primary driver of performance. Crop selection guidance is the highest-leverage intervention available to the programme.
Greenhouse vs Yield is the second strongest. Greenhouse assignment is meaningfully associated with yield. Pairing high-value crops with GH1 and GH2 should be a deliberate decision, not a default allocation.
Cycle vs Yield is the weakest correlation. A small positive association suggests a modest learning effect over time. Structural constraints around greenhouse and crop type must be addressed before coaching alone can move the needle.

Note: These are associational findings. Because crop types are not randomly assigned to greenhouses, some confounding exists. The regression model below attempts to statistically separate these effects.

9. Linear Regression

9.1 Model Specification

Log-transformed yield is regressed on cycle number, seedling count, greenhouse, and crop type. The log transformation addresses the right-skewed outcome and stabilises variance. Reference categories are Greenhouse 1 and Bell Peppers.

Code

library(broom)

df_reg <- df |>
  filter(!is.na(yield_kg)) |>
  filter(crop_type %in% c("Bell Peppers", "Tomatoes",
                          "Habanero", "Cucumber", "Lettuce")) |>
  filter(!is.na(seedlings_num)) |>
  mutate(
    log_yield  = log(yield_kg + 1),
    greenhouse = relevel(greenhouse, ref = "Greenhouse 1"),
    crop_type  = relevel(crop_type,  ref = "Bell Peppers")
  )

model <- lm(
  log_yield ~ cycle + seedlings_num + greenhouse + crop_type,
  data = df_reg
)

tidy(model, conf.int = TRUE) |>
  mutate(across(where(is.numeric), round, 4)) |>
  kable(caption = "OLS Regression Coefficients: Dependent variable log Yield plus 1") |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

OLS Regression Coefficients: Dependent variable log Yield plus 1
term	estimate	std.error	statistic	p.value	conf.low	conf.high
(Intercept)	3.1233	0.2418	12.9192	0.0000	2.6486	3.5981
cycle	0.1890	0.0423	4.4646	0.0000	0.1059	0.2721
seedlings_num	-0.0005	0.0002	-1.8611	0.0632	-0.0009	0.0000
greenhouseGreenhouse 2	-0.9664	0.2140	-4.5163	0.0000	-1.3867	-0.5462
greenhouseGreenhouse 3	-1.1793	0.2178	-5.4152	0.0000	-1.6070	-0.7517
greenhouseGreenhouse 4	-0.1736	0.2370	-0.7325	0.4641	-0.6390	0.2918
crop_typeHabanero	-0.8215	0.2819	-2.9143	0.0037	-1.3750	-0.2679
crop_typeLettuce	NA	NA	NA	NA	NA	NA
crop_typeTomatoes	0.1374	0.2049	0.6706	0.5027	-0.2650	0.5398

Code

glance(model) |>
  select(r.squared, adj.r.squared, statistic, p.value, df, nobs) |>
  mutate(across(where(is.numeric), round, 4)) |>
  kable(caption = "Model Fit Statistics") |>
  kable_styling(bootstrap_options = c("striped", "hover"), full_width = FALSE)

Model Fit Statistics
r.squared	adj.r.squared	statistic	p.value	df	nobs
0.126	0.1163	12.9183	0	7	635

Code

import statsmodels.formula.api as smf

df_reg_py = df_h[
    df_h["Crop Type"].isin(main_crops) &
    df_h["Seedlings_num"].notna()
].copy()

df_reg_py["log_yield"]  = np.log(df_reg_py["yield_kg"].fillna(0) + 1)
df_reg_py["Greenhouse"] = pd.Categorical(
    df_reg_py["Greenhouse"],
    categories=["Greenhouse 1", "Greenhouse 2",
                "Greenhouse 3", "Greenhouse 4"]
)
df_reg_py["Crop_Type"] = pd.Categorical(
    df_reg_py["Crop Type"],
    categories=["Bell Peppers", "Tomatoes", "Habanero",
                "Cucumber", "Lettuce"]
)

formula  = "log_yield ~ Cycle + Seedlings_num + C(Greenhouse) + C(Crop_Type)"
model_py = smf.ols(formula, data=df_reg_py).fit()
print(model_py.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:              log_yield   R-squared:                       0.126
Model:                            OLS   Adj. R-squared:                  0.116
Method:                 Least Squares   F-statistic:                     12.92
Date:                Sun, 10 May 2026   Prob (F-statistic):           1.44e-15
Time:                        19:56:40   Log-Likelihood:                -1218.9
No. Observations:                 635   AIC:                             2454.
Df Residuals:                     627   BIC:                             2489.
Df Model:                           7                                         
Covariance Type:            nonrobust                                         
=================================================================================================
                                    coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------------------------
Intercept                         2.8287      0.216     13.076      0.000       2.404       3.253
C(Greenhouse)[T.Greenhouse 2]    -0.9664      0.214     -4.516      0.000      -1.387      -0.546
C(Greenhouse)[T.Greenhouse 3]    -1.1793      0.218     -5.415      0.000      -1.607      -0.752
C(Greenhouse)[T.Greenhouse 4]    -0.1736      0.237     -0.732      0.464      -0.639       0.292
C(Crop_Type)[T.Tomatoes]          0.1374      0.205      0.671      0.503      -0.265       0.540
C(Crop_Type)[T.Habanero]         -0.6657      0.271     -2.460      0.014      -1.197      -0.134
C(Crop_Type)[T.Cucumber]      -8.278e-18   7.14e-17     -0.116      0.908   -1.48e-16    1.32e-16
C(Crop_Type)[T.Lettuce]          -0.9680      0.092    -10.542      0.000      -1.148      -0.788
Cycle                             0.1890      0.042      4.465      0.000       0.106       0.272
Seedlings_num                     0.0004      0.000      2.015      0.044    9.92e-06       0.001
==============================================================================
Omnibus:                       18.565   Durbin-Watson:                   1.499
Prob(Omnibus):                  0.000   Jarque-Bera (JB):               17.060
Skew:                          -0.348   Prob(JB):                     0.000197
Kurtosis:                       2.598   Cond. No.                     8.20e+19
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 4.98e-32. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.

9.2 Diagnostic Plots

Code

par(mfrow = c(2, 2))
plot(model, which = 1:4, col = "#2E8B57", pch = 16, cex = 0.6)

Code

par(mfrow = c(1, 1))

Code

import scipy.stats as scipy_stats
import matplotlib.pyplot as plt

fitted = model_py.fittedvalues
resid  = model_py.resid

fig, axes = plt.subplots(1, 2, figsize=(12, 5))

axes[0].scatter(fitted, resid, alpha=0.4, color="#2E8B57", s=20)

<matplotlib.collections.PathCollection object at 0x0000026416F6B650>

Code

axes[0].axhline(0, color="red", linestyle="--")

<matplotlib.lines.Line2D object at 0x0000026416FAA300>

Code

axes[0].set_title("Residuals vs Fitted")

Text(0.5, 1.0, 'Residuals vs Fitted')

Code

axes[0].set_xlabel("Fitted Values")

Text(0.5, 0, 'Fitted Values')

Code

axes[0].set_ylabel("Residuals")

Text(0, 0.5, 'Residuals')

Code

scipy_stats.probplot(resid, plot=axes[1])

((array([-3.06427949, -2.78844534, -2.63378212, -2.52433435, -2.4387157 ,
       -2.36794166, -2.30735293, -2.25421016, -2.2067612 , -2.16381509,
       -2.12452413, -2.08826277, -2.05455574, -2.02303313, -1.99340124,
       -1.96542278, -1.93890323, -1.91368107, -1.88962063, -1.86660683,
       -1.84454118, -1.82333871, -1.80292557, -1.78323719, -1.76421673,
       -1.7458139 , -1.72798401, -1.71068711, -1.69388738, -1.67755255,
       -1.66165345, -1.64616362, -1.63105899, -1.61631757, -1.60191925,
       -1.58784557, -1.57407954, -1.56060551, -1.54740899, -1.53447657,
       -1.52179582, -1.50935516, -1.49714381, -1.48515171, -1.47336947,
       -1.46178829, -1.45039992, -1.4391966 , -1.42817107, -1.41731647,
       -1.40662634, -1.39609458, -1.38571544, -1.37548346, -1.3653935 ,
       -1.35544066, -1.34562031, -1.33592803, -1.32635966, -1.3169112 ,
       -1.30757887, -1.29835905, -1.28924829, -1.28024331, -1.27134096,
       -1.26253825, -1.25383229, -1.24522034, -1.23669977, -1.22826805,
       -1.21992276, -1.21166157, -1.20348226, -1.19538268, -1.18736078,
       -1.17941456, -1.17154212, -1.16374162, -1.15601131, -1.14834946,
       -1.14075444, -1.13322465, -1.12575858, -1.11835474, -1.1110117 ,
       -1.10372808, -1.09650255, -1.08933382, -1.08222063, -1.07516179,
       -1.06815612, -1.06120248, -1.05429978, -1.04744695, -1.04064296,
       -1.03388681, -1.02717753, -1.02051417, -1.01389581, -1.00732157,
       -1.00079058, -0.994302  , -0.98785502, -0.98144883, -0.97508267,
       -0.96875579, -0.96246745, -0.95621694, -0.95000357, -0.94382665,
       -0.93768555, -0.9315796 , -0.92550818, -0.9194707 , -0.91346654,
       -0.90749514, -0.90155592, -0.89564833, -0.88977184, -0.88392591,
       -0.87811004, -0.87232372, -0.86656646, -0.86083778, -0.85513721,
       -0.8494643 , -0.84381859, -0.83819965, -0.83260705, -0.82704038,
       -0.82149921, -0.81598316, -0.81049182, -0.80502481, -0.79958176,
       -0.79416229, -0.78876605, -0.78339268, -0.77804184, -0.77271318,
       -0.76740637, -0.76212109, -0.75685701, -0.75161381, -0.74639121,
       -0.74118888, -0.73600653, -0.73084387, -0.72570063, -0.7205765 ,
       -0.71547123, -0.71038454, -0.70531616, -0.70026584, -0.69523331,
       -0.69021833, -0.68522065, -0.68024003, -0.67527622, -0.670329  ,
       -0.66539813, -0.66048338, -0.65558454, -0.65070138, -0.64583369,
       -0.64098125, -0.63614386, -0.6313213 , -0.62651339, -0.62171992,
       -0.61694068, -0.6121755 , -0.60742418, -0.60268653, -0.59796238,
       -0.59325152, -0.5885538 , -0.58386903, -0.57919704, -0.57453766,
       -0.56989072, -0.56525605, -0.56063349, -0.55602288, -0.55142407,
       -0.54683688, -0.54226117, -0.53769679, -0.53314358, -0.5286014 ,
       -0.52407009, -0.51954953, -0.51503955, -0.51054003, -0.50605082,
       -0.50157178, -0.49710279, -0.4926437 , -0.48819438, -0.48375471,
       -0.47932455, -0.47490378, -0.47049227, -0.4660899 , -0.46169655,
       -0.45731209, -0.4529364 , -0.44856937, -0.44421087, -0.4398608 ,
       -0.43551903, -0.43118546, -0.42685997, -0.42254246, -0.4182328 ,
       -0.4139309 , -0.40963665, -0.40534993, -0.40107066, -0.39679871,
       -0.392534  , -0.38827641, -0.38402585, -0.37978221, -0.3755454 ,
       -0.37131533, -0.36709189, -0.36287498, -0.35866452, -0.35446041,
       -0.35026255, -0.34607086, -0.34188524, -0.3377056 , -0.33353185,
       -0.3293639 , -0.32520167, -0.32104506, -0.31689399, -0.31274838,
       -0.30860813, -0.30447317, -0.3003434 , -0.29621876, -0.29209914,
       -0.28798448, -0.28387468, -0.27976968, -0.27566939, -0.27157372,
       -0.26748261, -0.26339596, -0.25931371, -0.25523578, -0.25116209,
       -0.24709256, -0.24302712, -0.23896569, -0.23490821, -0.23085458,
       -0.22680474, -0.22275862, -0.21871615, -0.21467724, -0.21064184,
       -0.20660986, -0.20258124, -0.1985559 , -0.19453378, -0.1905148 ,
       -0.1864989 , -0.182486  , -0.17847604, -0.17446894, -0.17046465,
       -0.16646309, -0.16246419, -0.15846789, -0.15447411, -0.1504828 ,
       -0.14649389, -0.1425073 , -0.13852298, -0.13454086, -0.13056087,
       -0.12658294, -0.12260702, -0.11863304, -0.11466092, -0.11069062,
       -0.10672206, -0.10275518, -0.09878991, -0.0948262 , -0.09086398,
       -0.08690319, -0.08294375, -0.07898562, -0.07502872, -0.071073  ,
       -0.06711839, -0.06316483, -0.05921226, -0.05526061, -0.05130982,
       -0.04735984, -0.04341059, -0.03946202, -0.03551407, -0.03156667,
       -0.02761976, -0.02367328, -0.01972717, -0.01578137, -0.01183581,
       -0.00789044, -0.00394519,  0.        ,  0.00394519,  0.00789044,
        0.01183581,  0.01578137,  0.01972717,  0.02367328,  0.02761976,
        0.03156667,  0.03551407,  0.03946202,  0.04341059,  0.04735984,
        0.05130982,  0.05526061,  0.05921226,  0.06316483,  0.06711839,
        0.071073  ,  0.07502872,  0.07898562,  0.08294375,  0.08690319,
        0.09086398,  0.0948262 ,  0.09878991,  0.10275518,  0.10672206,
        0.11069062,  0.11466092,  0.11863304,  0.12260702,  0.12658294,
        0.13056087,  0.13454086,  0.13852298,  0.1425073 ,  0.14649389,
        0.1504828 ,  0.15447411,  0.15846789,  0.16246419,  0.16646309,
        0.17046465,  0.17446894,  0.17847604,  0.182486  ,  0.1864989 ,
        0.1905148 ,  0.19453378,  0.1985559 ,  0.20258124,  0.20660986,
        0.21064184,  0.21467724,  0.21871615,  0.22275862,  0.22680474,
        0.23085458,  0.23490821,  0.23896569,  0.24302712,  0.24709256,
        0.25116209,  0.25523578,  0.25931371,  0.26339596,  0.26748261,
        0.27157372,  0.27566939,  0.27976968,  0.28387468,  0.28798448,
        0.29209914,  0.29621876,  0.3003434 ,  0.30447317,  0.30860813,
        0.31274838,  0.31689399,  0.32104506,  0.32520167,  0.3293639 ,
        0.33353185,  0.3377056 ,  0.34188524,  0.34607086,  0.35026255,
        0.35446041,  0.35866452,  0.36287498,  0.36709189,  0.37131533,
        0.3755454 ,  0.37978221,  0.38402585,  0.38827641,  0.392534  ,
        0.39679871,  0.40107066,  0.40534993,  0.40963665,  0.4139309 ,
        0.4182328 ,  0.42254246,  0.42685997,  0.43118546,  0.43551903,
        0.4398608 ,  0.44421087,  0.44856937,  0.4529364 ,  0.45731209,
        0.46169655,  0.4660899 ,  0.47049227,  0.47490378,  0.47932455,
        0.48375471,  0.48819438,  0.4926437 ,  0.49710279,  0.50157178,
        0.50605082,  0.51054003,  0.51503955,  0.51954953,  0.52407009,
        0.5286014 ,  0.53314358,  0.53769679,  0.54226117,  0.54683688,
        0.55142407,  0.55602288,  0.56063349,  0.56525605,  0.56989072,
        0.57453766,  0.57919704,  0.58386903,  0.5885538 ,  0.59325152,
        0.59796238,  0.60268653,  0.60742418,  0.6121755 ,  0.61694068,
        0.62171992,  0.62651339,  0.6313213 ,  0.63614386,  0.64098125,
        0.64583369,  0.65070138,  0.65558454,  0.66048338,  0.66539813,
        0.670329  ,  0.67527622,  0.68024003,  0.68522065,  0.69021833,
        0.69523331,  0.70026584,  0.70531616,  0.71038454,  0.71547123,
        0.7205765 ,  0.72570063,  0.73084387,  0.73600653,  0.74118888,
        0.74639121,  0.75161381,  0.75685701,  0.76212109,  0.76740637,
        0.77271318,  0.77804184,  0.78339268,  0.78876605,  0.79416229,
        0.79958176,  0.80502481,  0.81049182,  0.81598316,  0.82149921,
        0.82704038,  0.83260705,  0.83819965,  0.84381859,  0.8494643 ,
        0.85513721,  0.86083778,  0.86656646,  0.87232372,  0.87811004,
        0.88392591,  0.88977184,  0.89564833,  0.90155592,  0.90749514,
        0.91346654,  0.9194707 ,  0.92550818,  0.9315796 ,  0.93768555,
        0.94382665,  0.95000357,  0.95621694,  0.96246745,  0.96875579,
        0.97508267,  0.98144883,  0.98785502,  0.994302  ,  1.00079058,
        1.00732157,  1.01389581,  1.02051417,  1.02717753,  1.03388681,
        1.04064296,  1.04744695,  1.05429978,  1.06120248,  1.06815612,
        1.07516179,  1.08222063,  1.08933382,  1.09650255,  1.10372808,
        1.1110117 ,  1.11835474,  1.12575858,  1.13322465,  1.14075444,
        1.14834946,  1.15601131,  1.16374162,  1.17154212,  1.17941456,
        1.18736078,  1.19538268,  1.20348226,  1.21166157,  1.21992276,
        1.22826805,  1.23669977,  1.24522034,  1.25383229,  1.26253825,
        1.27134096,  1.28024331,  1.28924829,  1.29835905,  1.30757887,
        1.3169112 ,  1.32635966,  1.33592803,  1.34562031,  1.35544066,
        1.3653935 ,  1.37548346,  1.38571544,  1.39609458,  1.40662634,
        1.41731647,  1.42817107,  1.4391966 ,  1.45039992,  1.46178829,
        1.47336947,  1.48515171,  1.49714381,  1.50935516,  1.52179582,
        1.53447657,  1.54740899,  1.56060551,  1.57407954,  1.58784557,
        1.60191925,  1.61631757,  1.63105899,  1.64616362,  1.66165345,
        1.67755255,  1.69388738,  1.71068711,  1.72798401,  1.7458139 ,
        1.76421673,  1.78323719,  1.80292557,  1.82333871,  1.84454118,
        1.86660683,  1.88962063,  1.91368107,  1.93890323,  1.96542278,
        1.99340124,  2.02303313,  2.05455574,  2.08826277,  2.12452413,
        2.16381509,  2.2067612 ,  2.25421016,  2.30735293,  2.36794166,
        2.4387157 ,  2.52433435,  2.63378212,  2.78844534,  3.06427949]), array([-3.90946041e+00, -3.90946041e+00, -3.90946041e+00, -3.72046508e+00,
       -3.72046508e+00, -3.66888094e+00, -3.66888094e+00, -3.66888094e+00,
       -3.53146976e+00, -3.53146976e+00, -3.53146976e+00, -3.53146976e+00,
       -3.53146976e+00, -3.53146976e+00, -3.53146976e+00, -3.53146976e+00,
       -3.47988562e+00, -3.47988562e+00, -3.21403763e+00, -3.17193968e+00,
       -3.15347912e+00, -3.05654939e+00, -3.02504231e+00, -3.02504231e+00,
       -3.02504231e+00, -3.02504231e+00, -3.02504231e+00, -3.02504231e+00,
       -2.99834490e+00, -2.99834490e+00, -2.92600051e+00, -2.89142913e+00,
       -2.89142913e+00, -2.89142913e+00, -2.89142913e+00, -2.89142913e+00,
       -2.89142913e+00, -2.89142913e+00, -2.89142913e+00, -2.79394904e+00,
       -2.76873172e+00, -2.70243381e+00, -2.70243381e+00, -2.70243381e+00,
       -2.70243381e+00, -2.70243381e+00, -2.70056074e+00, -2.67855875e+00,
       -2.67855875e+00, -2.67855875e+00, -2.64705167e+00, -2.64705167e+00,
       -2.64705167e+00, -2.64705167e+00, -2.64705167e+00, -2.64705167e+00,
       -2.64705167e+00, -2.64705167e+00, -2.59611808e+00, -2.56502263e+00,
       -2.54895295e+00, -2.54114757e+00, -2.54114757e+00, -2.54114757e+00,
       -2.54114757e+00, -2.54114757e+00, -2.51986557e+00, -2.51343849e+00,
       -2.51343849e+00, -2.51343849e+00, -2.51156542e+00, -2.51156542e+00,
       -2.48956343e+00, -2.45805635e+00, -2.45026620e+00, -2.42118528e+00,
       -2.42118528e+00, -2.32023973e+00, -2.31673481e+00, -2.30253197e+00,
       -2.30056811e+00, -2.30056811e+00, -2.30056811e+00, -2.20549255e+00,
       -2.20549255e+00, -2.20549255e+00, -2.20525793e+00, -2.18161749e+00,
       -2.18161749e+00, -2.17149253e+00, -2.16315693e+00, -2.16315693e+00,
       -2.10933179e+00, -2.06977108e+00, -2.06889865e+00, -2.06207890e+00,
       -2.01649723e+00, -2.01626261e+00, -1.99262217e+00, -1.99262217e+00,
       -1.99262217e+00, -1.99262217e+00, -1.99262217e+00, -1.99262217e+00,
       -1.95838956e+00, -1.94323714e+00, -1.93816123e+00, -1.88185622e+00,
       -1.84920852e+00, -1.82750191e+00, -1.82750191e+00, -1.82750191e+00,
       -1.82750191e+00, -1.82750191e+00, -1.82750191e+00, -1.82750191e+00,
       -1.82750191e+00, -1.82750191e+00, -1.80362685e+00, -1.80126688e+00,
       -1.79231908e+00, -1.76947862e+00, -1.75402308e+00, -1.74809308e+00,
       -1.74463884e+00, -1.73591681e+00, -1.72078762e+00, -1.70078696e+00,
       -1.62762238e+00, -1.61463153e+00, -1.61463153e+00, -1.61463153e+00,
       -1.61463153e+00, -1.61463153e+00, -1.61463153e+00, -1.61463153e+00,
       -1.61463153e+00, -1.61463153e+00, -1.56296301e+00, -1.54843938e+00,
       -1.54843938e+00, -1.52018671e+00, -1.44837302e+00, -1.42428181e+00,
       -1.42125656e+00, -1.41809901e+00, -1.40382855e+00, -1.39349641e+00,
       -1.38604559e+00, -1.31065389e+00, -1.30455131e+00, -1.29490554e+00,
       -1.23450311e+00, -1.22910369e+00, -1.22841580e+00, -1.21274420e+00,
       -1.19392959e+00, -1.17816354e+00, -1.16332918e+00, -1.14638345e+00,
       -1.13630549e+00, -1.11690675e+00, -1.11291122e+00, -1.08486815e+00,
       -1.08199035e+00, -1.05482346e+00, -1.04479622e+00, -1.03489092e+00,
       -1.01085193e+00, -1.00000612e+00, -9.82546516e-01, -9.75416012e-01,
       -9.49931459e-01, -9.28358974e-01, -9.27210106e-01, -8.96292218e-01,
       -8.94009880e-01, -8.79360635e-01, -8.77779511e-01, -8.76379939e-01,
       -8.55292198e-01, -8.55292198e-01, -8.49265709e-01, -8.45731665e-01,
       -8.36842584e-01, -8.36842584e-01, -8.10427042e-01, -8.08015104e-01,
       -8.07657379e-01, -8.05983649e-01, -8.04039011e-01, -7.92232624e-01,
       -7.71835418e-01, -7.44182926e-01, -7.43915631e-01, -7.41427507e-01,
       -7.41341478e-01, -7.39958371e-01, -6.84284785e-01, -6.82214906e-01,
       -6.59269678e-01, -6.41439387e-01, -6.32616512e-01, -6.29351760e-01,
       -6.28949872e-01, -6.19303241e-01, -6.04404275e-01, -5.94503398e-01,
       -5.93366604e-01, -5.86254170e-01, -5.59373652e-01, -5.34019244e-01,
       -5.33220533e-01, -5.16999984e-01, -4.93219586e-01, -4.71298723e-01,
       -4.41352178e-01, -4.40266557e-01, -4.29561511e-01, -4.29095419e-01,
       -3.85984979e-01, -3.84960100e-01, -3.62366775e-01, -3.56319715e-01,
       -3.52251205e-01, -3.44466575e-01, -3.38587583e-01, -3.35335369e-01,
       -3.34597920e-01, -3.33155197e-01, -3.11999384e-01, -3.10558562e-01,
       -3.01831789e-01, -2.98734543e-01, -2.95303181e-01, -2.85493441e-01,
       -2.51271236e-01, -2.49156395e-01, -2.38609800e-01, -2.25104327e-01,
       -2.23090433e-01, -2.16193445e-01, -2.09040144e-01, -1.68320450e-01,
       -1.62653715e-01, -1.59143516e-01, -1.35668627e-01, -1.29698851e-01,
       -1.28237006e-01, -1.17993755e-01, -1.15543213e-01, -1.07020083e-01,
       -8.26393772e-02, -7.42322054e-02, -6.47753679e-02, -6.33830946e-02,
       -6.32169039e-02, -6.32169039e-02, -4.98019395e-02, -3.67608549e-02,
       -2.35689383e-02, -2.35128460e-02, -1.04646377e-02, -7.99433801e-03,
       -4.40438576e-03,  4.20038254e-03,  2.14986365e-02,  2.51150388e-02,
        3.23155582e-02,  4.46911269e-02,  4.51078397e-02,  4.98000056e-02,
        5.15108720e-02,  6.31071037e-02,  6.53746059e-02,  6.69377867e-02,
        7.19904759e-02,  7.54624423e-02,  7.98939430e-02,  8.97991579e-02,
        9.10744735e-02,  9.35987098e-02,  9.42375868e-02,  1.00359561e-01,
        1.16081436e-01,  1.27423631e-01,  1.33038674e-01,  1.38774357e-01,
        1.59760457e-01,  1.62408123e-01,  1.63786595e-01,  1.73148694e-01,
        1.74769775e-01,  2.02986137e-01,  2.09764920e-01,  2.21088107e-01,
        2.27509279e-01,  2.49057226e-01,  2.52013663e-01,  2.60918843e-01,
        2.64381250e-01,  2.66170581e-01,  2.84042341e-01,  2.84540463e-01,
        3.03698186e-01,  3.06848965e-01,  3.10097060e-01,  3.14532375e-01,
        3.25471368e-01,  3.37853214e-01,  3.38489222e-01,  3.54207711e-01,
        3.55678928e-01,  3.57666471e-01,  3.78093857e-01,  3.89175918e-01,
        3.89618688e-01,  3.96534142e-01,  4.07469359e-01,  4.09455298e-01,
        4.13862505e-01,  4.14715026e-01,  4.22827749e-01,  4.25888326e-01,
        4.28976510e-01,  4.30311727e-01,  4.32315411e-01,  4.48235516e-01,
        4.51940013e-01,  4.51940013e-01,  4.51940013e-01,  4.62605799e-01,
        4.74061916e-01,  4.83801566e-01,  4.84525997e-01,  4.85727912e-01,
        4.97784284e-01,  5.06305757e-01,  5.07450631e-01,  5.09754119e-01,
        5.10583519e-01,  5.10745038e-01,  5.22350561e-01,  5.22350561e-01,
        5.27447566e-01,  5.33499043e-01,  5.35363818e-01,  5.37537549e-01,
        5.37556070e-01,  5.38697506e-01,  5.58197208e-01,  5.61392453e-01,
        5.67862339e-01,  5.68689980e-01,  5.78101399e-01,  5.79944687e-01,
        5.85894671e-01,  5.97432033e-01,  5.98450619e-01,  6.01179297e-01,
        6.06790327e-01,  6.08600265e-01,  6.15587708e-01,  6.17552988e-01,
        6.18942628e-01,  6.23680770e-01,  6.37396923e-01,  6.43397579e-01,
        6.47065004e-01,  6.50133492e-01,  6.64615345e-01,  6.66288289e-01,
        6.86545013e-01,  6.88248785e-01,  6.90554898e-01,  6.94279272e-01,
        6.95886215e-01,  6.98602036e-01,  7.02216498e-01,  7.04037464e-01,
        7.09769123e-01,  7.16649434e-01,  7.21302035e-01,  7.36760432e-01,
        7.50628094e-01,  7.54152175e-01,  7.54152175e-01,  7.60275023e-01,
        7.62821489e-01,  7.65967839e-01,  7.69576764e-01,  7.80503135e-01,
        7.88053726e-01,  7.90474346e-01,  7.90474346e-01,  7.98056788e-01,
        7.98927683e-01,  8.04583028e-01,  8.11637566e-01,  8.13581701e-01,
        8.16260130e-01,  8.16533321e-01,  8.17323274e-01,  8.17493173e-01,
        8.17888066e-01,  8.18421000e-01,  8.18894004e-01,  8.25239062e-01,
        8.32951432e-01,  8.36060325e-01,  8.37730041e-01,  8.45457955e-01,
        8.49847725e-01,  8.51334826e-01,  8.55070296e-01,  8.63919803e-01,
        8.67307374e-01,  8.72516157e-01,  8.77485723e-01,  8.77485723e-01,
        8.77485723e-01,  8.83363906e-01,  8.83991646e-01,  8.84098494e-01,
        9.11633954e-01,  9.13216869e-01,  9.13216869e-01,  9.13237593e-01,
        9.15807265e-01,  9.16048627e-01,  9.25655720e-01,  9.25857542e-01,
        9.42140534e-01,  9.75334513e-01,  9.76074977e-01,  9.79590946e-01,
        9.81262612e-01,  9.83069076e-01,  9.99054344e-01,  1.01371378e+00,
        1.01625954e+00,  1.02164410e+00,  1.02444250e+00,  1.02444250e+00,
        1.04016920e+00,  1.04115797e+00,  1.04384332e+00,  1.04474562e+00,
        1.04654721e+00,  1.04798506e+00,  1.05433791e+00,  1.05577331e+00,
        1.05643802e+00,  1.05663957e+00,  1.06005396e+00,  1.06155312e+00,
        1.07060359e+00,  1.08384801e+00,  1.09472289e+00,  1.09686084e+00,
        1.09758960e+00,  1.10932678e+00,  1.11552752e+00,  1.12032540e+00,
        1.12144242e+00,  1.13007898e+00,  1.13140048e+00,  1.13959583e+00,
        1.14512380e+00,  1.14660772e+00,  1.14790271e+00,  1.15933140e+00,
        1.15945390e+00,  1.16055853e+00,  1.16116438e+00,  1.16361436e+00,
        1.16507030e+00,  1.17104598e+00,  1.17544097e+00,  1.17544097e+00,
        1.17544097e+00,  1.17628236e+00,  1.18261461e+00,  1.18276532e+00,
        1.18744214e+00,  1.19380614e+00,  1.19593945e+00,  1.20113680e+00,
        1.20456867e+00,  1.20755448e+00,  1.20793443e+00,  1.21463401e+00,
        1.22400864e+00,  1.24478989e+00,  1.24880991e+00,  1.25477995e+00,
        1.25477995e+00,  1.25554710e+00,  1.26363051e+00,  1.28008136e+00,
        1.28585616e+00,  1.29292977e+00,  1.30600191e+00,  1.30742056e+00,
        1.31034981e+00,  1.31102218e+00,  1.31336577e+00,  1.31401470e+00,
        1.31410055e+00,  1.31520291e+00,  1.32032874e+00,  1.33044217e+00,
        1.33515725e+00,  1.34383869e+00,  1.35714871e+00,  1.36830089e+00,
        1.37340728e+00,  1.37408704e+00,  1.37665356e+00,  1.37988034e+00,
        1.38992204e+00,  1.40695894e+00,  1.40855650e+00,  1.41838715e+00,
        1.42236738e+00,  1.42238352e+00,  1.43580269e+00,  1.43807885e+00,
        1.46600317e+00,  1.46984001e+00,  1.47396534e+00,  1.47607464e+00,
        1.48148880e+00,  1.49488045e+00,  1.50977972e+00,  1.52763489e+00,
        1.52807329e+00,  1.52807329e+00,  1.54982214e+00,  1.55488595e+00,
        1.55506327e+00,  1.55652325e+00,  1.56425501e+00,  1.56726364e+00,
        1.56888422e+00,  1.56925770e+00,  1.57178622e+00,  1.58069823e+00,
        1.58797402e+00,  1.59186831e+00,  1.61201213e+00,  1.62792206e+00,
        1.62959923e+00,  1.63426259e+00,  1.67671352e+00,  1.68547082e+00,
        1.68842325e+00,  1.69855114e+00,  1.70185470e+00,  1.70212814e+00,
        1.70676508e+00,  1.70676508e+00,  1.71894289e+00,  1.72384111e+00,
        1.72478358e+00,  1.72490486e+00,  1.74443530e+00,  1.76993101e+00,
        1.77343805e+00,  1.77981305e+00,  1.78440319e+00,  1.79288983e+00,
        1.80275876e+00,  1.82996884e+00,  1.83537272e+00,  1.84113633e+00,
        1.85401792e+00,  1.86219510e+00,  1.87101241e+00,  1.87610331e+00,
        1.87875580e+00,  1.88054073e+00,  1.88951240e+00,  1.89682866e+00,
        1.92347726e+00,  1.92920655e+00,  1.94572542e+00,  1.96729332e+00,
        1.97039503e+00,  1.97841120e+00,  2.00681675e+00,  2.00755918e+00,
        2.01184670e+00,  2.04011889e+00,  2.04300269e+00,  2.04649330e+00,
        2.06887975e+00,  2.07143942e+00,  2.07628255e+00,  2.12977537e+00,
        2.13010120e+00,  2.14516556e+00,  2.16693940e+00,  2.18189453e+00,
        2.19840424e+00,  2.21781030e+00,  2.21781030e+00,  2.22515241e+00,
        2.24869518e+00,  2.25700151e+00,  2.27497517e+00,  2.29129530e+00,
        2.33481853e+00,  2.41427816e+00,  2.43769050e+00,  2.48813971e+00,
        2.62938119e+00,  2.78756323e+00,  2.83810465e+00,  2.89689940e+00,
        2.89740564e+00,  2.91401445e+00,  3.02987320e+00,  3.07354006e+00,
        3.07574169e+00,  3.08951912e+00,  3.87439934e+00,  3.87439934e+00,
        3.87439934e+00,  4.06339466e+00,  4.06339466e+00,  4.06339466e+00,
        4.25238998e+00,  4.25238998e+00,  4.25238998e+00])), (np.float64(1.6295863525147678), np.float64(1.6365067499952733e-14), np.float64(0.9837824784769148)))

Code

axes[1].set_title("Q-Q Plot of Residuals")

Text(0.5, 1.0, 'Q-Q Plot of Residuals')

Code

plt.tight_layout()
plt.show()

9.3 Coefficient Interpretation

Predictor	Direction	Business Meaning
Cycle	Positive	Each additional cycle is associated with a small yield increase. Coaches should expect measurable improvement by Cycle 3.
Greenhouse 2 vs GH1	Negative	GH2 underperforms GH1 holding crop constant. GH1 has a structural infrastructure advantage.
Greenhouse 3 vs GH1	Negative	GH3 also underperforms GH1 significantly.
Greenhouse 4 vs GH1	Strongly negative	GH4 is the weakest unit. High-value crops should not be assigned here without infrastructure review.
Habanero vs Bell Peppers	Positive	Habanero yields significantly more. Strong justification for shifting the crop mix toward Habanero in GH1 and GH2.
Cucumber vs Bell Peppers	Positive	Cucumber also outperforms Bell Peppers, particularly in GH1 and GH2.
Seedlings	Small or mixed	Planting density has limited predictive power once crop type is controlled for.

10. Integrated Findings

Five analytical lenses converge on one finding: yield performance in the EYIA programme is not random. It is systematically shaped by two controllable programme variables, greenhouse assignment and crop type selection.

EDA exposed extreme yield inequality and documented two significant data quality issues. Visualisation made the inequality visible and identified GH1 with Habanero or Cucumber as the performance frontier. Hypothesis testing confirmed that both greenhouse and crop type effects are statistically real with p less than 0.001 on both Kruskal-Wallis tests. Correlation analysis showed these two variables dominate all others in their association with yield. Regression quantified their simultaneous effects and confirmed that Greenhouse 4 and leafy vegetable crops are associated with substantially lower log-yield even after controlling for cycle number and seedling count.

Integrated Recommendation: The EYIA programme should adopt a data-driven crop-greenhouse assignment policy:

Prioritise Habanero and Cucumber in Greenhouses 1 and 2. These are the statistically and practically highest-performing combinations.
Restrict Greenhouse 4 from high-value crop assignments until infrastructure parity with GH1 is established.
Set yield benchmarks by crop-greenhouse combination. Companies below the 25th percentile for their configuration by Cycle 3 should receive mandatory agronomic coaching.
Use the regression model as a yield-setting and planning tool for new cohort onboarding and donor performance reporting.

11. Limitations and Further Work

Missing yield data at 54 percent: Investigating why yields are missing, whether from crop failure, abandonment, or data collection gaps, would substantially improve programme monitoring.
No input cost data: Yield in kg is an incomplete performance metric. Profitability per cycle accounting for fertiliser, labour, water, and seedling costs would be far more actionable for a business development role.
Greenhouse confounding: Crops are not randomly assigned to greenhouses, making it difficult to fully isolate each variable’s independent effect. A controlled assignment experiment in future cohorts would cleanly resolve this.
Company-level heterogeneity: A multilevel model with company as a random effect would account for unobserved differences in farmer skill, experience, and access to inputs not captured in the current variables.
Longitudinal learning curves: Analysing yield trajectories within individual companies across cycles would reveal learning effects more precisely than the aggregate cycle variable used in this analysis.

References

Adi, B. (2026). AI-powered business analytics: A practical textbook for data-driven decision making. Lagos Business School. https://markanalytics.online

Attah, S. (2026). EYIA 2024 master harvest data all cohorts. Dataset. Eupepsia Place Ltd, EYIA Programme Monitoring Unit, Ogun State, Nigeria. Data available on request from the author.

McKinney, W. (2010). Data structures for statistical computing in Python. Proceedings of the 9th Python in Science Conference, 56-61. https://doi.org/10.25080/Majora-92bf1922-00a

R Core Team. (2024). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.R-project.org/

Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer. https://doi.org/10.1007/978-3-319-24277-4

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L., Francois, R., Grolemund, G., Hayes, A., Henry, L., Hester, J., Kuhn, M., Pedersen, T. L., Miller, E., Bache, S. M., Muller, K., Ooms, J., Robinson, D., Seidel, D. P., Spinu, V., and Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686

Appendix: AI Usage Statement

Claude (Anthropic) was used as a coding assistant in preparing this document. Claude assisted with writing R and Python code chunks for data loading, cleaning, visualisation, and statistical testing, and with structuring the Quarto document including panel-tabset formatting. All analytical decisions including technique selection, hypothesis framing, output interpretation, business recommendations, and professional disclosure content were made independently by the author based on her operational knowledge of the EYIA programme and the analytical judgement developed through the Data Analytics II course. The author is able to explain and defend every result in this document.