Generalised Linear Models

Author

D.McCabe

Published

August 8, 2025

“GLMs -”the sum of residuals times any regressor variable is always zero””

Introduction/terminology

In regression, covariates are the independent variables \(x_1, x_2, \dots, x_p\) used to predict the dependent outcome. A linear predictor is the weighted sum of the covariates:
\[ \eta = \beta_0 + \beta_1 x_1 + \dots + \beta_p x_p, \]
which is the purely linear part of the model before it is “bent to fit” non-normal data using a link function. The link function \(g(\mu) = \eta\) transforms the linear predictor \(\eta\) to the expected value of the response, \(\mu = \mathbb{E}[Y]\). The systematic component is this linear predictor:
\[ \text{Systematic component:} \quad \eta = \mathbf{X} \boldsymbol{\beta}, \]
representing the part of the model that is “systematic” – determined entirely by the covariates and coefficients, with no randomness. The remaining part is the random component (noise or error), which specifies how the outcome \(Y\) varies about the mean. This component always follows a distribution from the exponential family. The shape and variability of each \(Y_i\) depend on its mean \(\mu_i = \mathbb{E}[Y_i]\) and a dispersion parameter \(\phi\), which controls the spread of the distribution beyond the mean further generalising the models.

The complete GLM framework is
\[ \boxed{ \begin{aligned} &\text{Random component:} && Y_i \sim \text{Exponential family}(\mu_i, \phi) \\ &\text{Systematic component:} && \eta_i = \mathbf{x}_i^\top \boldsymbol{\beta} \\ &\text{Link function:} && g(\mu_i) = \eta_i \end{aligned} } \]

This general form of simple models covers many common distributions used in GLMs, such as Gaussian, Binomial, Poisson, and Gamma. There exist canonical link functions that arise naturally from the distribution of the response in the exponential family; these have useful mathematical properties, including simplifying estimation and interpretation.

d3 = require('d3')

Exponential Distributions (Distribution Class)

This is a huge class of distributions (Normal, Binomial, Poisson, Gamma, etc.). A probability distribution belongs to the exponential family if its probability density (or mass) function can be written as:

\[ f(y; \theta, \phi) = \exp\left( \frac{y \theta - b(\theta)}{a(\phi)} + c(y, \phi) \right) \]

where:

\(\theta\) is the natural parameter, which parametrizes the distribution in a way that appears linearly in the exponent.
\(\phi\) is the dispersion parameter, controlling the spread or variance, often known or fixed in some models.
\(b(\theta)\) is the cumulant function, a convex function that ensures proper normalization and encodes moments of the distribution.
\(a(\phi)\) is related to dispersion (variance scaling)
\(c(y, \phi)\) handles normalisation

The nice thing about this form is that: - The mean of the distribution is linked to the natural parameter by \(\mu = E[Y] = b'(\theta)\)

The variance is \(\mathrm{Var}(Y) = a(\phi) \, b''(\theta)\)

These properties make the exponential family distributions especially convenient for statistical modeling and inference.

Catalogue

Linear Regression

The canonical link here id the identity function \(g(\mu)=\mu\).

Gaussian (Normal) distribution in standard form: \(f(y; \mu, \sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left( -\frac{(y - \mu)^2}{2\sigma^2} \right)\)

Gaussian in exponential family form \(f(y; \theta, \phi) = \exp\left( \frac{y\theta - b(\theta)}{a(\phi)} + c(y, \phi) \right)\)

For the Gaussian: \(\theta = \mu\) \(\phi = \sigma^2\) \(b(\theta) = \frac{\theta^2}{2}\) \(a(\phi) = \phi\) \(c(y, \phi) = -\frac{y^2}{2\phi} - \frac12 \log(2\pi\phi)\)

Full exponential family form for Gaussian: \(f(y; \theta, \phi) = \exp\left( \frac{y\theta - \frac{\theta^2}{2}}{\phi} - \frac{y^2}{2\phi} - \frac12\log(2\pi\phi) \right)\)

sampleData1 = {
  const n = 50;
  const slope = 1.5;
  const intercept = 2;
  const sigma = 1.2;
  
  return d3.range(n).map(i => {
    const x = (i / (n-1)) * 10; // x from 0 to 10
    const yTrue = intercept + slope * x;
    const y = yTrue + d3.randomNormal(0, sigma)();
    return { x, y, yTrue };
  });
}

// Fixed model parameters (no user controls)
modelParams = { sigma: 1.2 }

// Calculate regression line
regressionStats1 = {
  const n = sampleData1.length;
  const xMean = d3.mean(sampleData1, d => d.x);
  const yMean = d3.mean(sampleData1, d => d.y);
  
  const numerator = d3.sum(sampleData1, d => (d.x - xMean) * (d.y - yMean));
  const denominator = d3.sum(sampleData1, d => (d.x - xMean) ** 2);
  
  const slope = numerator / denominator;
  const intercept = yMean - slope * xMean;
  
  return { slope, intercept, xMean, yMean };
}

// Selected point state with slider control
viewof selectedX1 = {
  const xExtent = d3.extent(sampleData1, d => d.x);
  const form = html`<div style="padding: 10px; background: #f8f9fa; border-radius: 8px; margin: 10px 0;">
    <label style="display: block; margin-bottom: 8px;">
      <strong>X Value (slide along the model):</strong>
      <input type="range" min="${xExtent[0]}" max="${xExtent[1]}" step="0.1" value="5" style="margin-left: 10px; width: 300px;">
      <span style="margin-left: 10px; font-family: monospace;">5.0</span>
    </label>
  </div>`;
  
  const slider = form.querySelector('input');
  const display = form.querySelector('span');
  
  slider.oninput = () => {
    display.textContent = (+slider.value).toFixed(1);
    form.value = +slider.value;
    form.dispatchEvent(new CustomEvent("input"));
  };
  
  form.value = +slider.value;
  return form;
}

// Main visualization
chart1 = {
  const width = 900;
  const height = 400;
  const margin = { top: 20, right: 20, bottom: 40, left: 60 };
  
  // Create SVG
  const svg = d3.create("svg")
    .attr("width", width)
    .attr("height", height)
    .style("background", "white")
    .style("border", "1px solid #ddd")
    .style("border-radius", "8px");
  
  // Left pane - Scatter plot with regression line
  const leftWidth = width / 2 - 10;
  const plotWidth = leftWidth - margin.left - margin.right;
  const plotHeight = height - margin.top - margin.bottom;
  
  // Scales for scatter plot
  const xExtent = d3.extent(sampleData1, d => d.x);
  const yExtent = d3.extent(sampleData1, d => d.y);
  const yPadding = (yExtent[1] - yExtent[0]) * 0.1;
  
  const xScale = d3.scaleLinear()
    .domain(xExtent)
    .range([0, plotWidth]);
  
  const yScale = d3.scaleLinear()
    .domain([yExtent[0] - yPadding, yExtent[1] + yPadding])
    .range([plotHeight, 0]);
  
  // Left plot group
  const leftGroup = svg.append("g")
    .attr("transform", `translate(${margin.left}, ${margin.top})`);
  
  // Add axes
  leftGroup.append("g")
    .attr("transform", `translate(0, ${plotHeight})`)
    .call(d3.axisBottom(xScale))
    .append("text")
    .attr("x", plotWidth / 2)
    .attr("y", 35)
    .attr("fill", "black")
    .style("text-anchor", "middle")
    .style("font-size", "12px")
    .text("X");
  
  leftGroup.append("g")
    .call(d3.axisLeft(yScale))
    .append("text")
    .attr("transform", "rotate(-90)")
    .attr("y", -40)
    .attr("x", -plotHeight / 2)
    .attr("fill", "black")
    .style("text-anchor", "middle")
    .style("font-size", "12px")
    .text("Y");
  
  // Add title
  leftGroup.append("text")
    .attr("x", plotWidth / 2)
    .attr("y", -5)
    .attr("text-anchor", "middle")
    .style("font-size", "14px")
    .style("font-weight", "bold")
    .text("Linear Model: Y = β₀ + β₁X + ε");
  
  // Draw regression line
  const lineData = [
    { x: xExtent[0], y: regressionStats1.intercept + regressionStats1.slope * xExtent[0] },
    { x: xExtent[1], y: regressionStats1.intercept + regressionStats1.slope * xExtent[1] }
  ];
  
  leftGroup.append("line")
    .attr("x1", xScale(lineData[0].x))
    .attr("y1", yScale(lineData[0].y))
    .attr("x2", xScale(lineData[1].x))
    .attr("y2", yScale(lineData[1].y))
    .attr("stroke", "#e74c3c")
    .attr("stroke-width", 2);
  
  // Add scatter points
  leftGroup.selectAll(".point")
    .data(sampleData1)
    .enter()
    .append("circle")
    .attr("class", "point")
    .attr("cx", d => xScale(d.x))
    .attr("cy", d => yScale(d.y))
    .attr("r", 4)
    .attr("fill", "#3498db")
    .attr("fill-opacity", 0.7)
    .attr("stroke", "white")
    .attr("stroke-width", 1);
  
  // Add vertical line and point for selected X
  const selectedY = regressionStats1.intercept + regressionStats1.slope * selectedX1;
  
  const selectionGroup = leftGroup.append("g")
    .attr("class", "selection");
  
  // Vertical line
  selectionGroup.append("line")
    .attr("x1", xScale(selectedX1))
    .attr("y1", 0)
    .attr("x2", xScale(selectedX1))
    .attr("y2", plotHeight)
    .attr("stroke", "#9b59b6")
    .attr("stroke-width", 2)
    .attr("stroke-dasharray", "5,5");
  
  // Selected point on line
  selectionGroup.append("circle")
    .attr("cx", xScale(selectedX1))
    .attr("cy", yScale(selectedY))
    .attr("r", 6)
    .attr("fill", "#9b59b6")
    .attr("stroke", "white")
    .attr("stroke-width", 2);
  
  // Add click interaction
  leftGroup.append("rect")
    .attr("width", plotWidth)
    .attr("height", plotHeight)
    .attr("fill", "transparent")
    .style("cursor", "crosshair")
    .on("click", function(event) {
      const [mouseX] = d3.pointer(event);
      const newX = xScale.invert(mouseX);
      selectedX1.value = Math.max(xExtent[0], Math.min(xExtent[1], newX));
      selectedX1.dispatchEvent(new CustomEvent("input"));
    });
  
  // Right pane - Normal distribution
  const rightGroup = svg.append("g")
    .attr("transform", `translate(${width/2 + 10 + margin.left}, ${margin.top})`);
  
  // Right plot dimensions
  const rightWidth = width / 2 - 10 - margin.left - margin.right;
  
  // Scales for PDF - fix x-axis to match left pane Y range
  const mean = selectedY;
  const sigma = 1.2; // Fixed sigma value
  
  // Use the same Y range as the left pane for the PDF x-axis
  const pdfXDomain = [yExtent[0] - yPadding, yExtent[1] + yPadding];
  const xPdf = d3.range(pdfXDomain[0], pdfXDomain[1], (pdfXDomain[1] - pdfXDomain[0])/200);
  const yPdf = xPdf.map(x => (1 / (sigma * Math.sqrt(2 * Math.PI))) * 
    Math.exp(-0.5 * Math.pow((x - mean) / sigma, 2)));
  
  const pdfXScale = d3.scaleLinear()
    .domain(pdfXDomain)
    .range([0, rightWidth]);
  
  const pdfYScale = d3.scaleLinear()
    .domain([0, d3.max(yPdf)])
    .range([plotHeight, 0]);
  
  // Add axes for PDF
  rightGroup.append("g")
    .attr("transform", `translate(0, ${plotHeight})`)
    .call(d3.axisBottom(pdfXScale))
    .append("text")
    .attr("x", rightWidth / 2)
    .attr("y", 35)
    .attr("fill", "black")
    .style("text-anchor", "middle")
    .style("font-size", "12px")
    .text("Y");
  
  rightGroup.append("g")
    .call(d3.axisLeft(pdfYScale).ticks(5))
    .append("text")
    .attr("transform", "rotate(-90)")
    .attr("y", -40)
    .attr("x", -plotHeight / 2)
    .attr("fill", "black")
    .style("text-anchor", "middle")
    .style("font-size", "12px")
    .text("Density");
  
  // Add title for right pane
  rightGroup.append("text")
    .attr("x", rightWidth / 2)
    .attr("y", -5)
    .attr("text-anchor", "middle")
    .style("font-size", "14px")
    .style("font-weight", "bold")
    .text(`Canonical Link: Identity (μ = ${selectedY.toFixed(2)})`);
  
  // Add subtitle
  rightGroup.append("text")
    .attr("x", rightWidth / 2)
    .attr("y", 12)
    .attr("text-anchor", "middle")
    .style("font-size", "12px")
    .style("fill", "#666")
    .text(`Y | X = ${selectedX1.toFixed(1)} ~ N(${selectedY.toFixed(2)}, ${sigma}²)`);

  
  // Draw PDF curve
  const line = d3.line()
    .x((d, i) => pdfXScale(xPdf[i]))
    .y(d => pdfYScale(d))
    .curve(d3.curveBasis);
  
  rightGroup.append("path")
    .datum(yPdf)
    .attr("fill", "none")
    .attr("stroke", "#9b59b6")
    .attr("stroke-width", 2)
    .attr("d", line);
  
  // Fill area under curve
  const area = d3.area()
    .x((d, i) => pdfXScale(xPdf[i]))
    .y0(plotHeight)
    .y1(d => pdfYScale(d))
    .curve(d3.curveBasis);
  
  rightGroup.append("path")
    .datum(yPdf)
    .attr("fill", "#9b59b6")
    .attr("fill-opacity", 0.3)
    .attr("d", area);
  
  // Add vertical line at mean
  rightGroup.append("line")
    .attr("x1", pdfXScale(mean))
    .attr("y1", 0)
    .attr("x2", pdfXScale(mean))
    .attr("y2", plotHeight)
    .attr("stroke", "#9b59b6")
    .attr("stroke-width", 1)
    .attr("stroke-dasharray", "3,3");
  
  // Add equation
  svg.append("text")
    .attr("x", width / 2)
    .attr("y", height - 5)
    .attr("text-anchor", "middle")
    .style("font-size", "12px")
    .style("font-style", "italic")
    .text(`ŷ = ${regressionStats1.intercept.toFixed(2)} + ${regressionStats1.slope.toFixed(2)}x`);
  
  return svg.node();
}

// Display current selection info
html`
<div style="padding: 15px; background: #f8f9fa; border-radius: 8px; margin: 10px 0; font-family: monospace;">
  <strong>GLM Components:</strong><br>
  <strong>Linear Predictor:</strong> η = β₀ + β₁X = ${(regressionStats1.intercept + regressionStats1.slope * selectedX1).toFixed(2)}<br>
  <strong>Link Function:</strong> g(μ) = μ (identity link)<br>
  <strong>Mean:</strong> μ = g⁻¹(η) = ${(regressionStats1.intercept + regressionStats1.slope * selectedX1).toFixed(2)}<br>
  <strong>Distribution:</strong> Y | X = ${selectedX1.toFixed(2)} ~ N(${(regressionStats1.intercept + regressionStats1.slope * selectedX1).toFixed(2)}, 1.2²)
</div>
`

Poisson (Count) Regression

The canonical link here id the log function \(g(\mu)=\operatorname{log}(\mu)\).

sampleData2 = {
  const n = 50;
  const intercept = 0.5;  // log-scale intercept
  const slope = 0.3;      // log-scale slope
  
  return d3.range(n).map(i => {
    const x = (i / (n-1)) * 10; // x from 0 to 10
    const lambda = Math.exp(intercept + slope * x); // exponential mean
    // Generate Poisson random variable (using approximation for simplicity)
    const y = Math.max(0, Math.round(d3.randomNormal(lambda, Math.sqrt(lambda))()));
    return { x, y, lambda };
  });
}

// Calculate Poisson GLM parameters (using log link)
regressionStats2 = {
  // For simplicity, we'll use the true parameters that generated the data
  // In practice, you'd estimate these via maximum likelihood
  const intercept = 0.5;
  const slope = 0.3;
  
  return { slope, intercept };
}

// Selected point state with slider control
viewof selectedX2 = {
  const xExtent = d3.extent(sampleData2, d => d.x);
  const form = html`<div style="padding: 10px; background: #f8f9fa; border-radius: 8px; margin: 10px 0;">
    <label style="display: block; margin-bottom: 8px;">
      <strong>X Value (slide along the model):</strong>
      <input type="range" min="${xExtent[0]}" max="${xExtent[1]}" step="0.1" value="5" style="margin-left: 10px; width: 300px;">
      <span style="margin-left: 10px; font-family: monospace;">5.0</span>
    </label>
  </div>`;
  
  const slider = form.querySelector('input');
  const display = form.querySelector('span');
  
  slider.oninput = () => {
    display.textContent = (+slider.value).toFixed(1);
    form.value = +slider.value;
    form.dispatchEvent(new CustomEvent("input"));
  };
  
  form.value = +slider.value;
  return form;
}

// Main visualization
chart2 = {
  const width = 900;
  const height = 400;
  const margin = { top: 20, right: 20, bottom: 40, left: 60 };
  
  // Create SVG
  const svg = d3.create("svg")
    .attr("width", width)
    .attr("height", height)
    .style("background", "white")
    .style("border", "1px solid #ddd")
    .style("border-radius", "8px");
  
  // Left pane - Scatter plot with exponential curve
  const leftWidth = width / 2 - 10;
  const plotWidth = leftWidth - margin.left - margin.right;
  const plotHeight = height - margin.top - margin.bottom;
  
  // Scales for scatter plot
  const xExtent = d3.extent(sampleData2, d => d.x);
  const yExtent = [0, d3.max(sampleData2, d => d.y) * 1.1];
  
  const xScale = d3.scaleLinear()
    .domain(xExtent)
    .range([0, plotWidth]);
  
  const yScale = d3.scaleLinear()
    .domain(yExtent)
    .range([plotHeight, 0]);
  
  // Left plot group
  const leftGroup = svg.append("g")
    .attr("transform", `translate(${margin.left}, ${margin.top})`);
  
  // Add axes
  leftGroup.append("g")
    .attr("transform", `translate(0, ${plotHeight})`)
    .call(d3.axisBottom(xScale))
    .append("text")
    .attr("x", plotWidth / 2)
    .attr("y", 35)
    .attr("fill", "black")
    .style("text-anchor", "middle")
    .style("font-size", "12px")
    .text("X");
  
  leftGroup.append("g")
    .call(d3.axisLeft(yScale))
    .append("text")
    .attr("transform", "rotate(-90)")
    .attr("y", -40)
    .attr("x", -plotHeight / 2)
    .attr("fill", "black")
    .style("text-anchor", "middle")
    .style("font-size", "12px")
    .text("Y (Count)");
  
  // Add title
  leftGroup.append("text")
    .attr("x", plotWidth / 2)
    .attr("y", -5)
    .attr("text-anchor", "middle")
    .style("font-size", "14px")
    .style("font-weight", "bold")
    .text("Poisson GLM: log(λ) = β₀ + β₁X");
  
  // Draw exponential curve (mean function)
  const curveData = d3.range(xExtent[0], xExtent[1], 0.1).map(x => ({
    x: x,
    y: Math.exp(regressionStats2.intercept + regressionStats2.slope * x)
  }));
  
  const line = d3.line()
    .x(d => xScale(d.x))
    .y(d => yScale(d.y))
    .curve(d3.curveBasis);
  
  leftGroup.append("path")
    .datum(curveData)
    .attr("fill", "none")
    .attr("stroke", "#e74c3c")
    .attr("stroke-width", 3)
    .attr("d", line);
  
  // Add scatter points
  leftGroup.selectAll(".point")
    .data(sampleData2)
    .enter()
    .append("circle")
    .attr("class", "point")
    .attr("cx", d => xScale(d.x))
    .attr("cy", d => yScale(d.y))
    .attr("r", 4)
    .attr("fill", "#3498db")
    .attr("fill-opacity", 0.7)
    .attr("stroke", "white")
    .attr("stroke-width", 1);
  
  // Add vertical line and point for selected X
  const selectedLambda = Math.exp(regressionStats2.intercept + regressionStats2.slope * selectedX2);
  
  const selectionGroup = leftGroup.append("g")
    .attr("class", "selection");
  
  // Vertical line
  selectionGroup.append("line")
    .attr("x1", xScale(selectedX2))
    .attr("y1", 0)
    .attr("x2", xScale(selectedX2))
    .attr("y2", plotHeight)
    .attr("stroke", "#9b59b6")
    .attr("stroke-width", 2)
    .attr("stroke-dasharray", "5,5");
  
  // Selected point on curve
  selectionGroup.append("circle")
    .attr("cx", xScale(selectedX2))
    .attr("cy", yScale(selectedLambda))
    .attr("r", 6)
    .attr("fill", "#9b59b6")
    .attr("stroke", "white")
    .attr("stroke-width", 2);
  
  // Add click interaction
  leftGroup.append("rect")
    .attr("width", plotWidth)
    .attr("height", plotHeight)
    .attr("fill", "transparent")
    .style("cursor", "crosshair")
    .on("click", function(event) {
      const [mouseX] = d3.pointer(event);
      const newX = xScale.invert(mouseX);
      selectedX2.value = Math.max(xExtent[0], Math.min(xExtent[1], newX));
      selectedX2.dispatchEvent(new CustomEvent("input"));
    });
  
  // Right pane - Poisson distribution
  const rightGroup = svg.append("g")
    .attr("transform", `translate(${width/2 + 10 + margin.left}, ${margin.top})`);
  
  // Right plot dimensions
  const rightWidth = width / 2 - 10 - margin.left - margin.right;
  
  // Calculate Poisson PMF
  const lambda = selectedLambda;
  const maxY = Math.min(50, Math.ceil(lambda + 4 * Math.sqrt(lambda))); // reasonable range
  const yValues = d3.range(0, maxY + 1);
  const pmfValues = yValues.map(k => {
    // Poisson PMF: P(X = k) = (λ^k * e^(-λ)) / k!
    let logProb = k * Math.log(lambda) - lambda - d3.sum(d3.range(1, k + 1), i => Math.log(i));
    return Math.exp(logProb);
  });
  
  // Scales for PMF
  const pmfXScale = d3.scaleLinear()
    .domain([0, maxY])
    .range([0, rightWidth]);
  
  const pmfYScale = d3.scaleLinear()
    .domain([0, d3.max(pmfValues)])
    .range([plotHeight, 0]);
  
  // Add axes for PMF
  rightGroup.append("g")
    .attr("transform", `translate(0, ${plotHeight})`)
    .call(d3.axisBottom(pmfXScale))
    .append("text")
    .attr("x", rightWidth / 2)
    .attr("y", 35)
    .attr("fill", "black")
    .style("text-anchor", "middle")
    .style("font-size", "12px")
    .text("Y (Count)");
  
  rightGroup.append("g")
    .call(d3.axisLeft(pmfYScale).ticks(5))
    .append("text")
    .attr("transform", "rotate(-90)")
    .attr("y", -40)
    .attr("x", -plotHeight / 2)
    .attr("fill", "black")
    .style("text-anchor", "middle")
    .style("font-size", "12px")
    .text("Probability");
  
  // Add title for right pane
  rightGroup.append("text")
    .attr("x", rightWidth / 2)
    .attr("y", -5)
    .attr("text-anchor", "middle")
    .style("font-size", "14px")
    .style("font-weight", "bold")
    .text(`Canonical Link: Log (λ = ${lambda.toFixed(2)})`);
  
  // Add subtitle
  rightGroup.append("text")
    .attr("x", rightWidth / 2)
    .attr("y", 12)
    .attr("text-anchor", "middle")
    .style("font-size", "12px")
    .style("fill", "#666")
    .text(`Y | X = ${selectedX2.toFixed(1)} ~ Poisson(${lambda.toFixed(2)})`);
  
  // Draw PMF bars
  rightGroup.selectAll(".bar")
    .data(yValues)
    .enter()
    .append("rect")
    .attr("class", "bar")
    .attr("x", d => pmfXScale(d) - 2)
    .attr("y", d => pmfYScale(pmfValues[d]))
    .attr("width", 4)
    .attr("height", d => plotHeight - pmfYScale(pmfValues[d]))
    .attr("fill", "#9b59b6")
    .attr("fill-opacity", 0.7)
    .attr("stroke", "white")
    .attr("stroke-width", 1);
  
  // Add vertical line at mean
  rightGroup.append("line")
    .attr("x1", pmfXScale(lambda))
    .attr("y1", 0)
    .attr("x2", pmfXScale(lambda))
    .attr("y2", plotHeight)
    .attr("stroke", "#9b59b6")
    .attr("stroke-width", 1)
    .attr("stroke-dasharray", "3,3");
  
  // Add equation
  svg.append("text")
    .attr("x", width / 2)
    .attr("y", height - 5)
    .attr("text-anchor", "middle")
    .style("font-size", "12px")
    .style("font-style", "italic")
    .text(`λ = exp(${regressionStats2.intercept.toFixed(2)} + ${regressionStats2.slope.toFixed(2)}x) = ${lambda.toFixed(2)}`);
  
  return svg.node();
}

// Display current selection info
html`
<div style="padding: 15px; background: #f8f9fa; border-radius: 8px; margin: 10px 0; font-family: monospace;">
  <strong>GLM Components:</strong><br>
  <strong>Linear Predictor:</strong> η = β₀ + β₁X = ${(regressionStats2.intercept + regressionStats2.slope * selectedX2).toFixed(2)}<br>
  <strong>Link Function:</strong> g(λ) = log(λ) (canonical log link)<br>
  <strong>Mean:</strong> λ = g⁻¹(η) = exp(η) = ${Math.exp(regressionStats2.intercept + regressionStats2.slope * selectedX2).toFixed(2)}<br>
  <strong>Distribution:</strong> Y | X = ${selectedX2.toFixed(2)} ~ Poisson(${Math.exp(regressionStats2.intercept + regressionStats2.slope * selectedX2).toFixed(2)})
</div>
`

Binomial (Logistic) Regression

The canonical link here id the logit function \(g(\mu)=\operatorname{log}\left(\frac{\mu}{1-\mu}\right)\).

sampleData3 = {
  const n = 100;
  const intercept3 = -2;    // logit-scale intercept
  const slope3 = 0.8;       // logit-scale slope
  
  return d3.range(n).map(i => {
    const x = (i / (n-1)) * 10; // x from 0 to 10
    const logit = intercept3 + slope3 * x;
    const p = Math.exp(logit) / (1 + Math.exp(logit)); // logistic function
    const y = Math.random() < p ? 1 : 0; // Bernoulli trial
    return { x, y, p };
  });
}

// Calculate Logistic GLM parameters (using logit link)
regressionStats3 = {
  // For simplicity, we'll use the true parameters that generated the data
  // In practice, you'd estimate these via maximum likelihood
  const intercept = -2;
  const slope = 0.8;
  
  return { slope, intercept };
}

// Selected point state with slider control
viewof selectedX3 = {
  const xExtent = d3.extent(sampleData3, d => d.x);
  const form = html`<div style="padding: 10px; background: #f8f9fa; border-radius: 8px; margin: 10px 0;">
    <label style="display: block; margin-bottom: 8px;">
      <strong>X Value (slide along the model):</strong>
      <input type="range" min="${xExtent[0]}" max="${xExtent[1]}" step="0.1" value="5" style="margin-left: 10px; width: 300px;">
      <span style="margin-left: 10px; font-family: monospace;">5.0</span>
    </label>
  </div>`;
  
  const slider = form.querySelector('input');
  const display = form.querySelector('span');
  
  slider.oninput = () => {
    display.textContent = (+slider.value).toFixed(1);
    form.value = +slider.value;
    form.dispatchEvent(new CustomEvent("input"));
  };
  
  form.value = +slider.value;
  return form;
}

// Main visualization
chart3 = {
  const width = 900;
  const height = 400;
  const margin = { top: 20, right: 20, bottom: 40, left: 60 };
  
  // Create SVG
  const svg = d3.create("svg")
    .attr("width", width)
    .attr("height", height)
    .style("background", "white")
    .style("border", "1px solid #ddd")
    .style("border-radius", "8px");
  
  // Left pane - Scatter plot with logistic curve
  const leftWidth = width / 2 - 10;
  const plotWidth = leftWidth - margin.left - margin.right;
  const plotHeight = height - margin.top - margin.bottom;
  
  // Scales for scatter plot
  const xExtent = d3.extent(sampleData3, d => d.x);
  const yExtent = [-0.1, 1.1]; // Binary outcomes with some padding
  
  const xScale = d3.scaleLinear()
    .domain(xExtent)
    .range([0, plotWidth]);
  
  const yScale = d3.scaleLinear()
    .domain(yExtent)
    .range([plotHeight, 0]);
  
  // Left plot group
  const leftGroup = svg.append("g")
    .attr("transform", `translate(${margin.left}, ${margin.top})`);
  
  // Add axes
  leftGroup.append("g")
    .attr("transform", `translate(0, ${plotHeight})`)
    .call(d3.axisBottom(xScale))
    .append("text")
    .attr("x", plotWidth / 2)
    .attr("y", 35)
    .attr("fill", "black")
    .style("text-anchor", "middle")
    .style("font-size", "12px")
    .text("X");
  
  leftGroup.append("g")
    .call(d3.axisLeft(yScale).ticks(3).tickFormat(d3.format("d")))
    .append("text")
    .attr("transform", "rotate(-90)")
    .attr("y", -40)
    .attr("x", -plotHeight / 2)
    .attr("fill", "black")
    .style("text-anchor", "middle")
    .style("font-size", "12px")
    .text("Y (0/1)");
  
  // Add title
  leftGroup.append("text")
    .attr("x", plotWidth / 2)
    .attr("y", -5)
    .attr("text-anchor", "middle")
    .style("font-size", "14px")
    .style("font-weight", "bold")
    .text("Logistic GLM: logit(p) = β₀ + β₁X");
  
  // Draw logistic curve (probability function)
  const curveData = d3.range(xExtent[0], xExtent[1], 0.1).map(x => {
    const logit = regressionStats3.intercept + regressionStats3.slope * x;
    const p = Math.exp(logit) / (1 + Math.exp(logit));
    return { x: x, y: p };
  });
  
  const line = d3.line()
    .x(d => xScale(d.x))
    .y(d => yScale(d.y))
    .curve(d3.curveBasis);
  
  leftGroup.append("path")
    .datum(curveData)
    .attr("fill", "none")
    .attr("stroke", "#e74c3c")
    .attr("stroke-width", 3)
    .attr("d", line);
  
  // Add scatter points with jitter for visibility
  leftGroup.selectAll(".point")
    .data(sampleData3)
    .enter()
    .append("circle")
    .attr("class", "point")
    .attr("cx", d => xScale(d.x))
    .attr("cy", d => yScale(d.y + (Math.random() - 0.5) * 0.05)) // small jitter
    .attr("r", 3)
    .attr("fill", d => d.y === 1 ? "#27ae60" : "#e74c3c")
    .attr("fill-opacity", 0.6)
    .attr("stroke", "white")
    .attr("stroke-width", 1);
  
  // Add vertical line and point for selected X
  const selectedLogit = regressionStats3.intercept + regressionStats3.slope * selectedX3;
  const selectedP = Math.exp(selectedLogit) / (1 + Math.exp(selectedLogit));
  
  const selectionGroup = leftGroup.append("g")
    .attr("class", "selection");
  
  // Vertical line
  selectionGroup.append("line")
    .attr("x1", xScale(selectedX3))
    .attr("y1", 0)
    .attr("x2", xScale(selectedX3))
    .attr("y2", plotHeight)
    .attr("stroke", "#9b59b6")
    .attr("stroke-width", 2)
    .attr("stroke-dasharray", "5,5");
  
  // Selected point on curve
  selectionGroup.append("circle")
    .attr("cx", xScale(selectedX3))
    .attr("cy", yScale(selectedP))
    .attr("r", 6)
    .attr("fill", "#9b59b6")
    .attr("stroke", "white")
    .attr("stroke-width", 2);
  
  // Add click interaction
  leftGroup.append("rect")
    .attr("width", plotWidth)
    .attr("height", plotHeight)
    .attr("fill", "transparent")
    .style("cursor", "crosshair")
    .on("click", function(event) {
      const [mouseX] = d3.pointer(event);
      const newX = xScale.invert(mouseX);
      selectedX3.value = Math.max(xExtent[0], Math.min(xExtent[1], newX));
      selectedX3.dispatchEvent(new CustomEvent("input"));
    });
  
  // Right pane - Bernoulli distribution
  const rightGroup = svg.append("g")
    .attr("transform", `translate(${width/2 + 10 + margin.left}, ${margin.top})`);
  
  // Right plot dimensions
  const rightWidth = width / 2 - 10 - margin.left - margin.right;
  
  // Calculate Bernoulli PMF (just two bars: P(Y=0) and P(Y=1))
  const p = selectedP;
  const bernoulliData = [
    { outcome: 0, probability: 1 - p, label: "Failure" },
    { outcome: 1, probability: p, label: "Success" }
  ];
  
  // Scales for PMF
  const pmfXScale = d3.scaleBand()
    .domain([0, 1])
    .range([0, rightWidth])
    .padding(0.3);
  
  const pmfYScale = d3.scaleLinear()
    .domain([0, 1])
    .range([plotHeight, 0]);
  
  // Add axes for PMF
  rightGroup.append("g")
    .attr("transform", `translate(0, ${plotHeight})`)
    .call(d3.axisBottom(pmfXScale).tickFormat(d => d === 0 ? "Failure (0)" : "Success (1)"))
    .selectAll("text")
    .style("font-size", "10px");
  
  rightGroup.append("g")
    .call(d3.axisLeft(pmfYScale).ticks(5))
    .append("text")
    .attr("transform", "rotate(-90)")
    .attr("y", -40)
    .attr("x", -plotHeight / 2)
    .attr("fill", "black")
    .style("text-anchor", "middle")
    .style("font-size", "12px")
    .text("Probability");
  
  // Add title for right pane
  rightGroup.append("text")
    .attr("x", rightWidth / 2)
    .attr("y", -5)
    .attr("text-anchor", "middle")
    .style("font-size", "14px")
    .style("font-weight", "bold")
    .text(`Canonical Link: Logit (p = ${p.toFixed(3)})`);
  
  // Add subtitle
  rightGroup.append("text")
    .attr("x", rightWidth / 2)
    .attr("y", 12)
    .attr("text-anchor", "middle")
    .style("font-size", "12px")
    .style("fill", "#666")
    .text(`Y | X = ${selectedX3.toFixed(1)} ~ Bernoulli(${p.toFixed(3)})`);
  
  // Draw PMF bars
  rightGroup.selectAll(".bar")
    .data(bernoulliData)
    .enter()
    .append("rect")
    .attr("class", "bar")
    .attr("x", d => pmfXScale(d.outcome))
    .attr("y", d => pmfYScale(d.probability))
    .attr("width", pmfXScale.bandwidth())
    .attr("height", d => plotHeight - pmfYScale(d.probability))
    .attr("fill", d => d.outcome === 1 ? "#27ae60" : "#e74c3c")
    .attr("fill-opacity", 0.7)
    .attr("stroke", "white")
    .attr("stroke-width", 2);
  
  // Add probability labels on bars
  rightGroup.selectAll(".bar-label")
    .data(bernoulliData)
    .enter()
    .append("text")
    .attr("class", "bar-label")
    .attr("x", d => pmfXScale(d.outcome) + pmfXScale.bandwidth() / 2)
    .attr("y", d => pmfYScale(d.probability) - 5)
    .attr("text-anchor", "middle")
    .style("font-size", "12px")
    .style("font-weight", "bold")
    .style("fill", "#333")
    .text(d => d.probability.toFixed(3));
  
  // Add odds ratio display
  const odds = p / (1 - p);
  rightGroup.append("text")
    .attr("x", rightWidth / 2)
    .attr("y", plotHeight + 25)
    .attr("text-anchor", "middle")
    .style("font-size", "11px")
    .style("fill", "#666")
    .text(`Odds = p/(1-p) = ${odds.toFixed(2)}`);
  
  // Add equation
  svg.append("text")
    .attr("x", width / 2)
    .attr("y", height - 5)
    .attr("text-anchor", "middle")
    .style("font-size", "12px")
    .style("font-style", "italic")
    .text(`p = 1/(1 + exp(-(${regressionStats3.intercept.toFixed(2)} + ${regressionStats3.slope.toFixed(2)}x))) = ${p.toFixed(3)}`);
  
  return svg.node();
}

// Display current selection info
html`
<div style="padding: 15px; background: #f8f9fa; border-radius: 8px; margin: 10px 0; font-family: monospace;">
  <strong>GLM Components:</strong><br>
  <strong>Linear Predictor:</strong> η = β₀ + β₁X = ${(regressionStats3.intercept + regressionStats3.slope * selectedX3).toFixed(3)}<br>
  <strong>Link Function:</strong> g(p) = logit(p) = log(p/(1-p)) (canonical logit link)<br>
  <strong>Mean:</strong> p = g⁻¹(η) = 1/(1 + exp(-η)) = ${(Math.exp(regressionStats3.intercept + regressionStats3.slope * selectedX3) / (1 + Math.exp(regressionStats3.intercept + regressionStats3.slope * selectedX3))).toFixed(3)}<br>
  <strong>Distribution:</strong> Y | X = ${selectedX3.toFixed(2)} ~ Bernoulli(${(Math.exp(regressionStats3.intercept + regressionStats3.slope * selectedX3) / (1 + Math.exp(regressionStats3.intercept + regressionStats3.slope * selectedX3))).toFixed(3)})
</div>
`

Gamma Regression: Modeling positive continuous skewed data (e.g., insurance claims)

The canonical link here id the inverse function \(g(\mu)=\frac{1}{\mu}\).

sampleData4 = {
  const n = 50;
  const intercept4 = 0.5;   // inverse-scale intercept (1/μ scale)
  const slope4 = 0.1;       // inverse-scale slope
  const shape = 2;          // fixed shape parameter
  
  return d3.range(n).map(i => {
    const x = (i / (n-1)) * 8 + 1; // x from 1 to 9 (avoid x=0 for inverse link)
    const eta = intercept4 + slope4 * x;
    const mu = 1 / eta; // inverse link: μ = 1/η
    const scale = mu / shape; // scale = μ/shape for Gamma(shape, scale)
    
    // Generate Gamma random variable (using approximation)
    // Gamma is approximately normal for large shape, but we'll use a simple method
    let y = 0;
    for(let j = 0; j < shape; j++) {
      y += -Math.log(Math.random()) * scale;
    }
    return { x, y: Math.max(0.1, y), mu }; // ensure y > 0
  });
}

// Calculate Gamma GLM parameters (using inverse link)
regressionStats4 = {
  // For simplicity, we'll use the true parameters that generated the data
  const intercept = 0.5;
  const slope = 0.1;
  const shape = 2; // fixed shape parameter
  
  return { slope, intercept, shape };
}

// Selected point state with slider control
viewof selectedX4 = {
  const xExtent = d3.extent(sampleData4, d => d.x);
  const form = html`<div style="padding: 10px; background: #f8f9fa; border-radius: 8px; margin: 10px 0;">
    <label style="display: block; margin-bottom: 8px;">
      <strong>X Value (slide along the model):</strong>
      <input type="range" min="${xExtent[0]}" max="${xExtent[1]}" step="0.1" value="5" style="margin-left: 10px; width: 300px;">
      <span style="margin-left: 10px; font-family: monospace;">5.0</span>
    </label>
  </div>`;
  
  const slider = form.querySelector('input');
  const display = form.querySelector('span');
  
  slider.oninput = () => {
    display.textContent = (+slider.value).toFixed(1);
    form.value = +slider.value;
    form.dispatchEvent(new CustomEvent("input"));
  };
  
  form.value = +slider.value;
  return form;
}

// Main visualization
chart4 = {
  const width = 900;
  const height = 400;
  const margin = { top: 20, right: 20, bottom: 40, left: 60 };
  
  // Create SVG
  const svg = d3.create("svg")
    .attr("width", width)
    .attr("height", height)
    .style("background", "white")
    .style("border", "1px solid #ddd")
    .style("border-radius", "8px");
  
  // Left pane - Scatter plot with inverse curve
  const leftWidth = width / 2 - 10;
  const plotWidth = leftWidth - margin.left - margin.right;
  const plotHeight = height - margin.top - margin.bottom;
  
  // Scales for scatter plot
  const xExtent = d3.extent(sampleData4, d => d.x);
  const yExtent = [0, d3.max(sampleData4, d => d.y) * 1.1];
  
  const xScale = d3.scaleLinear()
    .domain(xExtent)
    .range([0, plotWidth]);
  
  const yScale = d3.scaleLinear()
    .domain(yExtent)
    .range([plotHeight, 0]);
  
  // Left plot group
  const leftGroup = svg.append("g")
    .attr("transform", `translate(${margin.left}, ${margin.top})`);
  
  // Add axes
  leftGroup.append("g")
    .attr("transform", `translate(0, ${plotHeight})`)
    .call(d3.axisBottom(xScale))
    .append("text")
    .attr("x", plotWidth / 2)
    .attr("y", 35)
    .attr("fill", "black")
    .style("text-anchor", "middle")
    .style("font-size", "12px")
    .text("X");
  
  leftGroup.append("g")
    .call(d3.axisLeft(yScale))
    .append("text")
    .attr("transform", "rotate(-90)")
    .attr("y", -40)
    .attr("x", -plotHeight / 2)
    .attr("fill", "black")
    .style("text-anchor", "middle")
    .style("font-size", "12px")
    .text("Y (Continuous > 0)");
  
  // Add title
  leftGroup.append("text")
    .attr("x", plotWidth / 2)
    .attr("y", -5)
    .attr("text-anchor", "middle")
    .style("font-size", "14px")
    .style("font-weight", "bold")
    .text("Gamma GLM: 1/μ = β₀ + β₁X");
  
  // Draw inverse curve (mean function)
  const curveData = d3.range(xExtent[0], xExtent[1], 0.1).map(x => {
    const eta = regressionStats4.intercept + regressionStats4.slope * x;
    const mu = 1 / eta; // inverse link
    return { x: x, y: mu };
  });
  
  const line = d3.line()
    .x(d => xScale(d.x))
    .y(d => yScale(d.y))
    .curve(d3.curveBasis);
  
  leftGroup.append("path")
    .datum(curveData)
    .attr("fill", "none")
    .attr("stroke", "#e74c3c")
    .attr("stroke-width", 3)
    .attr("d", line);
  
  // Add scatter points
  leftGroup.selectAll(".point")
    .data(sampleData4)
    .enter()
    .append("circle")
    .attr("class", "point")
    .attr("cx", d => xScale(d.x))
    .attr("cy", d => yScale(d.y))
    .attr("r", 4)
    .attr("fill", "#3498db")
    .attr("fill-opacity", 0.7)
    .attr("stroke", "white")
    .attr("stroke-width", 1);
  
  // Add vertical line and point for selected X
  const selectedEta = regressionStats4.intercept + regressionStats4.slope * selectedX4;
  const selectedMu = 1 / selectedEta;
  
  const selectionGroup = leftGroup.append("g")
    .attr("class", "selection");
  
  // Vertical line
  selectionGroup.append("line")
    .attr("x1", xScale(selectedX4))
    .attr("y1", 0)
    .attr("x2", xScale(selectedX4))
    .attr("y2", plotHeight)
    .attr("stroke", "#9b59b6")
    .attr("stroke-width", 2)
    .attr("stroke-dasharray", "5,5");
  
  // Selected point on curve
  selectionGroup.append("circle")
    .attr("cx", xScale(selectedX4))
    .attr("cy", yScale(selectedMu))
    .attr("r", 6)
    .attr("fill", "#9b59b6")
    .attr("stroke", "white")
    .attr("stroke-width", 2);
  
  // Add click interaction
  leftGroup.append("rect")
    .attr("width", plotWidth)
    .attr("height", plotHeight)
    .attr("fill", "transparent")
    .style("cursor", "crosshair")
    .on("click", function(event) {
      const [mouseX] = d3.pointer(event);
      const newX = xScale.invert(mouseX);
      selectedX4.value = Math.max(xExtent[0], Math.min(xExtent[1], newX));
      selectedX4.dispatchEvent(new CustomEvent("input"));
    });
  
  // Right pane - Gamma distribution
  const rightGroup = svg.append("g")
    .attr("transform", `translate(${width/2 + 10 + margin.left}, ${margin.top})`);
  
  // Right plot dimensions
  const rightWidth = width / 2 - 10 - margin.left - margin.right;
  
  // Calculate Gamma PDF
  const mu = selectedMu;
  const shape = regressionStats4.shape;
  const scale = mu / shape; // scale parameter
  const rate = shape / mu;  // rate parameter (alternative parameterization)
  
  // Generate x values for PDF
  const maxY = Math.min(mu * 4, 20); // reasonable range
  const yValues = d3.range(0.01, maxY, maxY/200);
  
  // Gamma PDF: f(y) = (β^α / Γ(α)) * y^(α-1) * e^(-βy)
  // where α = shape, β = rate
  const gammaLogGamma = (x) => {
    // Stirling's approximation for log(Γ(x))
    if (x < 1) return gammaLogGamma(x + 1) - Math.log(x);
    return (x - 0.5) * Math.log(x) - x + 0.5 * Math.log(2 * Math.PI);
  };
  
  const pdfValues = yValues.map(y => {
    const logPdf = shape * Math.log(rate) - gammaLogGamma(shape) + 
                   (shape - 1) * Math.log(y) - rate * y;
    return Math.exp(logPdf);
  });
  
  // Scales for PDF
  const pdfXScale = d3.scaleLinear()
    .domain([0, maxY])
    .range([0, rightWidth]);
  
  const pdfYScale = d3.scaleLinear()
    .domain([0, d3.max(pdfValues)])
    .range([plotHeight, 0]);
  
  // Add axes for PDF
  rightGroup.append("g")
    .attr("transform", `translate(0, ${plotHeight})`)
    .call(d3.axisBottom(pdfXScale))
    .append("text")
    .attr("x", rightWidth / 2)
    .attr("y", 35)
    .attr("fill", "black")
    .style("text-anchor", "middle")
    .style("font-size", "12px")
    .text("Y");
  
  rightGroup.append("g")
    .call(d3.axisLeft(pdfYScale).ticks(5))
    .append("text")
    .attr("transform", "rotate(-90)")
    .attr("y", -40)
    .attr("x", -plotHeight / 2)
    .attr("fill", "black")
    .style("text-anchor", "middle")
    .style("font-size", "12px")
    .text("Density");
  
  // Add title for right pane
  rightGroup.append("text")
    .attr("x", rightWidth / 2)
    .attr("y", -5)
    .attr("text-anchor", "middle")
    .style("font-size", "14px")
    .style("font-weight", "bold")
    .text(`Canonical Link: Inverse (μ = ${mu.toFixed(2)})`);
  
  // Add subtitle
  rightGroup.append("text")
    .attr("x", rightWidth / 2)
    .attr("y", 12)
    .attr("text-anchor", "middle")
    .style("font-size", "12px")
    .style("fill", "#666")
    .text(`Y | X = ${selectedX4.toFixed(1)} ~ Gamma(${shape}, scale=${scale.toFixed(2)})`);
  
  // Draw PDF curve
  const pdfLine = d3.line()
    .x((d, i) => pdfXScale(yValues[i]))
    .y(d => pdfYScale(d))
    .curve(d3.curveBasis);
  
  rightGroup.append("path")
    .datum(pdfValues)
    .attr("fill", "none")
    .attr("stroke", "#9b59b6")
    .attr("stroke-width", 2)
    .attr("d", pdfLine);
  
  // Fill area under curve
  const area = d3.area()
    .x((d, i) => pdfXScale(yValues[i]))
    .y0(plotHeight)
    .y1(d => pdfYScale(d))
    .curve(d3.curveBasis);
  
  rightGroup.append("path")
    .datum(pdfValues)
    .attr("fill", "#9b59b6")
    .attr("fill-opacity", 0.3)
    .attr("d", area);
  
  // Add vertical line at mean
  rightGroup.append("line")
    .attr("x1", pdfXScale(mu))
    .attr("y1", 0)
    .attr("x2", pdfXScale(mu))
    .attr("y2", plotHeight)
    .attr("stroke", "#9b59b6")
    .attr("stroke-width", 1)
    .attr("stroke-dasharray", "3,3");
  
  // Add equation
  svg.append("text")
    .attr("x", width / 2)
    .attr("y", height - 5)
    .attr("text-anchor", "middle")
    .style("font-size", "12px")
    .style("font-style", "italic")
    .text(`μ = 1/(${regressionStats4.intercept.toFixed(2)} + ${regressionStats4.slope.toFixed(2)}x) = ${mu.toFixed(2)}`);
  
  return svg.node();
}

// Display current selection info
html`
<div style="padding: 15px; background: #f8f9fa; border-radius: 8px; margin: 10px 0; font-family: monospace;">
  <strong>GLM Components:</strong><br>
  <strong>Linear Predictor:</strong> η = β₀ + β₁X = ${(regressionStats4.intercept + regressionStats4.slope * selectedX4).toFixed(3)}<br>
  <strong>Link Function:</strong> g(μ) = μ⁻¹ = 1/μ (canonical inverse link)<br>
  <strong>Mean:</strong> μ = g⁻¹(η) = 1/η = ${(1 / (regressionStats4.intercept + regressionStats4.slope * selectedX4)).toFixed(2)}<br>
  <strong>Distribution:</strong> Y | X = ${selectedX4.toFixed(2)} ~ Gamma(shape=${regressionStats4.shape}, μ=${(1 / (regressionStats4.intercept + regressionStats4.slope * selectedX4)).toFixed(2)})
</div>
`

Inverse Gaussian Regression: Highly skewed positive outcomes

The canonical link here id the log function \(g(\mu)=\frac{1}{\mu^2}\).

Negative Binomial: Overdispersed count data

The canonical link here id the log function \(log(g(\mu))=log(\mu)\).

The Elegance of Generalised Linear Models (OpenAI, 2025)

Generalised Linear Models (GLMs) are often presented as a combination of three ingredients:
1. An exponential family distribution,
2. A link function, and
3. A linear predictor.

While this is correct, the real elegance of GLMs lies in the deeper mathematical and geometric properties they inherit from the exponential family and maximum likelihood estimation. This document highlights those properties, and provides first-principle derivations in the appendix.

1. Orthogonality of Residuals and Regressors

Statement: In a GLM with a canonical link, the residuals are orthogonal to every regressor.

Why it matters: This generalises the familiar OLS property that residuals are orthogonal to the regressors, even though GLMs are nonlinear.

Equation:

\[ X^T (y - \mu) = 0, \]

where \(X\) is the design matrix, \(y\) is the vector of responses, and \(\mu = \mathbb{E}[Y|X]\).

Proof idea (short): These are the score equations from maximum likelihood estimation. Differentiating the log-likelihood with respect to the coefficients and setting equal to zero yields this condition.
(Full derivation in Appendix A.1)

2. Fisher Information and the Information Equality

Statement: For exponential family distributions, the variance of the score equals the Fisher information.

Why it matters: This property allows GLMs to be estimated efficiently by iteratively reweighted least squares (IRLS). It also explains why they inherit OLS-like properties despite being more general.

Equation:

\[ \mathbb{E}\left[ \left( \frac{\partial \ell}{\partial \theta} \right)^2 \right] = - \mathbb{E}\left[ \frac{\partial^2 \ell}{\partial \theta^2} \right]. \]

Proof idea (short): Using the exponential family form, we can interchange differentiation and integration under regularity conditions.
(Full derivation in Appendix A.2)

3. Canonical Links as the Natural Choice

Statement: Choosing the canonical link aligns the linear predictor with the natural parameter of the exponential family.

Why it matters:
- Score equations simplify,
- Variance-mean relationships are clean,
- Estimation is more stable,
- Model interpretation aligns with sufficient statistics.

Example (Bernoulli):

\[ \eta = \log \frac{\mu}{1 - \mu}. \]

Explanation: For the Bernoulli, the canonical link is the logit. Substituting this into the score equations yields simple residual equations.
(Appendix A.3)

4. Unified Framework

Statement: GLMs unify linear regression, logistic regression, Poisson regression, and log-linear models under one framework.

Why it matters:
- The same estimation algorithm (IRLS),
- The same asymptotic theory for inference (Wald, score, and likelihood ratio tests),
- The same diagnostic toolkit.

This explains the broad applicability of GLMs across fields from epidemiology to social sciences.

5. Geometric Viewpoint

Statement: Fitting a GLM corresponds to projecting observed sufficient statistics onto the convex set of realisable means defined by the model.

Why it matters: This geometric interpretation explains why residual orthogonality appears naturally: it is the geometry of projections in the exponential family space.

Optional detail: This can be formalised using convex duality and information geometry.
(Appendix A.4)

Appendix: First-Principle Proofs (AI authered)

A.1 Residual Orthogonality

This appendix gives a first-principles derivation of the residual-orthogonality condition for Generalised Linear Models (GLMs). The derivation starts from the exponential-family formulation, applies the chain rule, and shows how the canonical link yields the simple orthogonality condition

\[ X^T (y - \mu) = 0. \]

It also states the more general weighted orthogonality that appears with non-canonical links.

1. Exponential-family setup

Assume observations y_1,…,y_n are independent and each has a density (or mass) in the exponential family written in canonical form:

\[ f(y_i; \theta_i, \phi) = \exp\left\{ \frac{y_i \theta_i - b(\theta_i)}{\phi} + c(y_i,\phi) \right\}, \]

where - \(\theta_i\) is the canonical (natural) parameter for observation \(i\), - \(b(\theta)\) is the cumulant function, - \(\phi > 0\) is the dispersion parameter (for many common models \(\phi = 1\)), - \(c(y_i,\phi)\) does not depend on \(\theta_i\).

Standard exponential-family identities are

\[ \\mu_i \\equiv E[Y_i] = b'(\\theta_i), \\qquad \\mathrm{Var}(Y_i) = \\phi\\, b''(\\theta_i). \]

2. GLM structure and link function

In a GLM we relate the mean \mu_i to a linear predictor:

\[ \\eta_i = x_i^T \\beta, \]

where x_i is the p-vector of regressors for observation i and \beta is the coefficient vector. The link function g(\cdot) relates \mu_i and \eta_i by

\[ \\eta_i = g(\\mu_i). \]

The canonical link is the choice for which the linear predictor equals the canonical parameter:

\[ \\eta_i = \\theta_i. \]

3. Log-likelihood and score

The log-likelihood for the full sample is

\[ \\ell(\\beta) = \\sum_{i=1}^n \\log f(y_i; \\theta_i(\\beta), \\phi) = \\sum_{i=1}^n \\left[ \\frac{y_i \\theta_i(\\beta) - b(\\theta_i(\\beta))}{\\phi} + c(y_i,\\phi) \\right]. \]

Differentiate \ell(\beta) with respect to \beta. By the chain rule,

\[ \\frac{\\partial \\ell}{\\partial \\beta} = \\sum_{i=1}^n \\frac{1}{\\phi} \\left( y_i - b'(\\theta_i) \\right) \\frac{\\partial \\theta_i}{\\partial \\beta}. \]

Using b’(\theta_i) = \mu_i, this becomes

\[ \\frac{\\partial \\ell}{\\partial \\beta} = \\frac{1}{\\phi} \\sum_{i=1}^n (y_i - \\mu_i) \\frac{\\partial \\theta_i}{\\partial \\beta}. \]

Setting the score equal to zero for the maximum-likelihood estimator yields the generic score condition

\[ \\sum_{i=1}^n (y_i - \\mu_i) \\frac{\\partial \\theta_i}{\\partial \\beta} = 0. \\tag{1} \]

Equation (1) is the scalar-observation form of the score equations.

4. Expressing the derivative \partial\theta_i / \partial\beta

We now expand \partial\theta_i/\partial\beta according to the link choice.

(a) Canonical link. If the link is canonical then \theta_i = \eta_i = x_i^T \beta. Hence

\[ \\frac{\\partial \\theta_i}{\\partial \\beta} = x_i. \]

Substituting into (1) gives

\[ \\sum_{i=1}^n (y_i - \\mu_i) x_i = 0, \]

or in matrix notation, with X the n-by-p design matrix and y, \mu the n-vectors,

\[ X^T (y - \\mu) = 0. \]

This is the residual-orthogonality condition: the residual vector y - \mu is orthogonal to every column of X.

(b) General (non-canonical) link. If the link is not canonical then \theta_i is a function of \eta_i, say \theta_i = h(\eta_i), and \eta_i = x_i^T \beta. By the chain rule

\[ \\frac{\\partial \\theta_i}{\\partial \\beta} = \\frac{d\\theta_i}{d\\eta_i} \\frac{\\partial \\eta_i}{\\partial \\beta} = \\frac{d\\theta_i}{d\\eta_i} x_i. \]

Then (1) becomes

\[ \\sum_{i=1}^n (y_i - \\mu_i) \\frac{d\\theta_i}{d\\eta_i} x_i = 0, \]

or equivalently

\[ X^T W (y - \\mu) = 0, \]

where W is the n-by-n diagonal matrix with diagonal entries w_i = d\theta_i/d\eta_i. Using standard identities (and algebraic manipulation) one commonly rewrites the weights in the IRLS form as

\[ w_i = \\frac{1}{\\mathrm{Var}(Y_i)} \\left( \\frac{d\\mu_i}{d\\eta_i} \\right)^2, \]

but the key point here is that non-canonical links yield a weighted orthogonality: the residuals are orthogonal to the regressors under the weighted inner product defined by W, not the plain inner product.

5. Remarks and interpretation

Why the canonical link is special. When the canonical link is used, \partial\theta_i/\partial\beta = x_i and the score simplifies to the unweighted orthogonality condition X^T(y - \mu) = 0. This mirrors ordinary least squares.
Residuals are orthogonal to regressors. The condition X^T(y - \mu) = 0 implies for each column j of X that

\[ \\sum_{i=1}^n x_{ij} (y_i - \\mu_i) = 0, \]

so the sample covariance between each regressor and the residuals is zero.
Weighted orthogonality for non-canonical links. With non-canonical links the score yields X^T W (y - \mu) = 0. In practice IRLS solves a sequence of weighted least squares problems using such weights; the canonical link makes those weights trivial in the score.
Dispersion parameter. The factor 1/\phi in the score does not affect the root of the score equations (unless \phi itself is a parameter to be estimated), so it does not change the orthogonality aside from a scalar factor that cancels when setting the score to zero.
Connection to sufficiency. For canonical links and exponential-family models, sufficient statistics for the regression coefficients typically appear linearly (for example X^T y), which is another reason the score (which involves X^T y and X^T \mu) has a linear form.

6. Worked example: Bernoulli (logit) with canonical link

For Bernoulli outcomes Y_i in {0,1} with probability \mu_i, the canonical-form log-likelihood per observation is

\[ \\ell_i = y_i \\theta_i - b(\\theta_i), \\qquad b(\\theta) = \\log(1 + e^{\\theta}), \\qquad \\mu_i = b'(\\theta_i) = \\frac{e^{\\theta_i}}{1 + e^{\\theta_i}}. \]

With the canonical link \theta_i = \eta_i = x_i^T \beta, the score is

\[ \\frac{\\partial \\ell}{\\partial \\beta} = \\sum_i (y_i - \\mu_i) x_i, \]

so at the maximum-likelihood estimator

\[ X^T (y - \\mu) = 0. \]

Conclusion

The orthogonality condition X^T (y - \mu) = 0 follows directly from the score equations for GLMs when the canonical link is chosen. For non-canonical links the same first-principle approach yields a weighted orthogonality X^T W (y - \mu) = 0. This derivation explains why residuals in canonical-link GLMs behave like OLS residuals: the likelihood geometry forces the same linear orthogonality with respect to the design matrix. ## A.2 Information Equality

This appendix gives a first-principles derivation of the information equality for exponential-family models and its matrix generalisation for GLMs. The main identity is

\[ \\mathrm{Var}\\left(\\frac{\\partial \\ell}{\\partial \\theta}\\right) = -\\mathbb{E}\\left[\\frac{\\partial^2 \\ell}{\\partial \\theta^2}\\right], \]

and it holds under standard regularity conditions. We show the scalar-parameter case first, then specialise to the exponential family, and finally give the multivariate version for GLM regression parameters.

1. Scalar-parameter proof (basic identity)

Let \(f(y;\\theta)\) be the likelihood for a single observation and write the log-likelihood as \(\\ell(\\theta;y)=\\log f(y;\\theta)\). Define the score

\[ u(\\theta;y) = \\frac{\\partial}{\\partial\\theta} \\ell(\\theta;y). \]

Under regularity conditions we may differentiate under the integral sign. Normalisation of the density gives

\[ 1 = \\int f(y;\\theta)\\,dy. \]

Differentiate both sides with respect to \(\\theta\) to obtain

\[ 0 = \\int \\frac{\\partial}{\\partial\\theta} f(y;\\theta)\\,dy = \\int f(y;\\theta) \\frac{\\partial}{\\partial\\theta} \\ell(\\theta;y)\\,dy = \\mathbb{E}[u(\\theta;Y)]. \]

Thus the score has zero expectation:

\[ \\mathbb{E}[u(\\theta;Y)] = 0. \]

Differentiate this identity with respect to \(\\theta\) again (and interchange differentiation and expectation). The left-hand side is zero, so

\[ 0 = \\frac{\\partial}{\\partial\\theta} \\mathbb{E}[u(\\theta;Y)] = \\mathbb{E}\\left[\\frac{\\partial}{\\partial\\theta} u(\\theta;Y)\\right]. \]

Now write

\[ \\frac{\\partial}{\\partial\\theta} u(\\theta;Y) = \\frac{\\partial^2}{\\partial\\theta^2} \\ell(\\theta;Y). \]

We also use the product rule on \(\\partial (f) / \\partial\\theta = f\\, u\) to get the relationship

\[ \\frac{\\partial}{\\partial\\theta} u(\\theta;Y) + u(\\theta;Y)^2 = \\frac{\\partial}{\\partial\\theta} u(\\theta;Y) + (u(\\theta;Y))^2 = \\frac{\\partial}{\\partial\\theta} u(\\theta;Y) + u(\\theta;Y)^2, \]

and, more directly, note the identity obtained by differentiating \(\\int f = 1\) twice:

\[ 0 = \\int \\frac{\\partial^2}{\\partial\\theta^2} f(y;\\theta)\\,dy = \\int f \\left(\\frac{\\partial^2 \\ell}{\\partial\\theta^2} + \\left(\\frac{\\partial \\ell}{\\partial\\theta}\\right)^2\\right) dy. \]

Hence

\[ 0 = \\mathbb{E}\\left[\\frac{\\partial^2 \\ell}{\\partial\\theta^2}\\right] + \\mathbb{E}\\left[\\left(\\frac{\\partial \\ell}{\\partial\\theta}\\right)^2\\right]. \]

Rearranging gives the scalar information equality:

\[ \\mathbb{E}\\left[\\left(\\frac{\\partial \\ell}{\\partial\\theta}\\right)^2\\right] = -\\mathbb{E}\\left[\\frac{\\partial^2 \\ell}{\\partial\\theta^2}\\right]. \]

The left-hand side is the variance of the score (since the score has mean zero), so this says

\[ \\mathrm{Var}\\left(\\frac{\\partial \\ell}{\\partial\\theta}\\right) = -\\mathbb{E}\\left[\\frac{\\partial^2 \\ell}{\\partial\\theta^2}\\right]. \]

2. Exponential-family specialisation

Consider the one-parameter exponential family in canonical form

\[ f(y;\\theta,\\phi) = \\exp\\left\\{\\frac{y\\theta - b(\\theta)}{\\phi} + c(y,\\phi)\\right\\}, \]

with canonical parameter \(\\theta\) and dispersion \(\\phi>0\). The log-likelihood for one observation is

\[ \\ell(\\theta;y) = \\frac{y\\theta - b(\\theta)}{\\phi} + c(y,\\phi). \]

Compute derivatives with respect to \(\\theta\):

\[ \\frac{\\partial \\ell}{\\partial\\theta} = \\frac{y - b'(\\theta)}{\\phi} = \\frac{y - \\mu}{\\phi}, \]

and

\[ \\frac{\\partial^2 \\ell}{\\partial\\theta^2} = -\\frac{b''(\\theta)}{\\phi}. \]

Now take expectations. Using the standard identities

\[ \\mu = \\mathbb{E}[Y] = b'(\\theta), \\qquad \\mathrm{Var}(Y) = \\phi\\, b''(\\theta), \]

we have

\[ \\mathbb{E}\\left[\\left(\\frac{\\partial \\ell}{\\partial\\theta}\\right)^2\\right] = \\mathbb{E}\\left[\\frac{(Y-\\mu)^2}{\\phi^2}\\right] = \\frac{\\mathrm{Var}(Y)}{\\phi^2} = \\frac{\\phi\\, b''(\\theta)}{\\phi^2} = \\frac{b''(\\theta)}{\\phi}. \]

On the other hand

\[ -\\mathbb{E}\\left[\\frac{\\partial^2 \\ell}{\\partial\\theta^2}\\right] = -\\left(-\\frac{b''(\\theta)}{\\phi}\\right) = \\frac{b''(\\theta)}{\\phi}, \]

so the identity holds. This direct calculation also clarifies why the exponential-family form makes the information quantity simple: the second derivative depends only on \(b''(\\theta)\), which is directly related to the variance function.

3. Multivariate parameter case and GLMs

For a vector parameter \(\\psi\\in\\mathbb{R}^p\) the scalar identity becomes the matrix identity for the Fisher information matrix:

\[ \\mathcal{I}(\\psi) = \\mathbb{E}\\left[\\left( -\\frac{\\partial^2 \\ell}{\\partial\\psi\\partial\\psi^T} \\right)\\right] = \\mathbb{E}\\left[\\left( \\frac{\\partial \\ell}{\\partial\\psi} \\right)\\left(\\frac{\\partial \\ell}{\\partial\\psi}\\right)^T\\right]. \]

In a GLM with independent observations and canonical link, let \(\\beta\) be the regression coefficient vector, \(X\) the \(n\\times p\) design matrix, and write the per-observation log-likelihood as in Section A.1. For canonical links the score vector is

\[ s(\\beta) = \\frac{\\partial \\ell}{\\partial\\beta} = \\frac{1}{\\phi} \\sum_{i=1}^n (y_i - \\mu_i) x_i = \\frac{1}{\\phi} X^T (y - \\mu). \]

Compute the covariance (or variance) of the score:

\[ \\mathrm{Var}(s(\\beta)) = \\frac{1}{\\phi^2} \\, \\mathrm{Var}\\left(X^T (y - \\mu)\\right) = \\frac{1}{\\phi^2} X^T \\mathrm{diag}(\\mathrm{Var}(Y_i)) X. \]

Using Var\((Y_i)=\\phi\\, b''(\\theta_i)\) we obtain

\[ \\mathrm{Var}(s(\\beta)) = \\frac{1}{\\phi^2} X^T \\mathrm{diag}(\\phi\\, b''(\\theta_i)) X = \\frac{1}{\\phi} X^T \\mathrm{diag}(b''(\\theta_i)) X. \]

Now compute minus the expected Hessian directly:

\[ -\\mathbb{E}\\left[\\frac{\\partial^2 \\ell}{\\partial\\beta\\partial\\beta^T}\\right] = -\\mathbb{E}\\left[\\frac{1}{\\phi} \\sum_{i=1}^n b''(\\theta_i) x_i x_i^T \\right] = \\frac{1}{\\phi} X^T \\mathrm{diag}(b''(\\theta_i)) X, \]

which matches the variance of the score. Thus the matrix form of the information equality holds:

\[ \\mathcal{I}(\\beta) = \\mathrm{Var}(s(\\beta)) = -\\mathbb{E}\\left[\\frac{\\partial^2 \\ell}{\\partial\\beta\\partial\\beta^T}\\right] = \\frac{1}{\\phi} X^T \\mathrm{diag}(b''(\\theta_i)) X. \]

Using Var\((Y_i)=\\phi\\, b''(\\theta_i)\) one may also write

\[ \\mathcal{I}(\\beta) = \\frac{1}{\\phi^2} X^T \\mathrm{diag}(\\mathrm{Var}(Y_i)) X. \]

4. Relation to IRLS and Newton updates

The information equality is the reason the expected Hessian (the Fisher information) is used as a natural curvature matrix in Newton-Raphson or Fisher scoring updates. For example, a Fisher-scoring update for \(\\beta\) is

\[ \\beta_{\\text{new}} = \\beta_{\\text{old}} + \\mathcal{I}(\\beta_{\\text{old}})^{-1} s(\\beta_{\\text{old}}). \]

For canonical-link GLMs this Newton/Fisher-scoring step can be rearranged to the familiar iteratively reweighted least-squares (IRLS) update: at each iteration one solves a weighted least-squares problem with weights related to \(b''(\\theta_i)\) (or equivalently Var\((Y_i)\)) and a working response that incorporates the current linear predictor. The equality between Var(score) and minus expected Hessian is what makes the Fisher-scoring step equivalent, in expectation, to the Newton step and gives IRLS its stability and OLS-like form.

5. Worked example: Bernoulli (logit, canonical link)

For Bernoulli observations with canonical parameter \(\\theta_i = x_i^T\\beta\) we have \(b''(\\theta_i) = \\mu_i(1-\\mu_i)\) and \(\\phi=1\). The Fisher information matrix is

\[ \\mathcal{I}(\\beta) = X^T \\mathrm{diag}(\\mu_i(1-\\mu_i)) X. \]

The score is \(s(\\beta) = X^T (y - \\mu)\) and its variance is

\[ \\mathrm{Var}(s(\\beta)) = X^T \\mathrm{diag}(\\mu_i(1-\\mu_i)) X, \]

matching the expected negative Hessian as required.

Conclusion

The information equality is a direct consequence of the normalisation of the likelihood and differentiability under the integral sign. For exponential-family models the identity becomes particularly explicit because the second derivative of the cumulant function, \(b''(\\theta)\), directly encodes the variance function. In GLMs the equality explains why the expected Hessian (Fisher information) is the correct curvature matrix to use in Fisher scoring/IRLS and why those algorithms have such close ties to weighted least squares.

A.3 Canonical Link Simplification

This appendix shows why the canonical link in a Generalised Linear Model (GLM) is special: it aligns the linear predictor with the natural (canonical) parameter of the exponential family. This yields much simpler score equations and variance functions, and explains the OLS-like residual orthogonality property described in Appendix A.1.

1. General GLM score equations

Recall the log-likelihood for independent observations \(y_1,...,y_n\) from an exponential family distribution:

\[ \\ell(\\beta) = \\sum_{i=1}^n \\left[ \\frac{y_i \\theta_i - b(\\theta_i)}{\\phi} + c(y_i,\\phi) \\right]. \]

Differentiating with respect to \(\\beta\) gives the score vector:

\[ \\frac{\\partial \\ell}{\\partial \\beta} = \\frac{1}{\\phi} \\sum_{i=1}^n (y_i - \\mu_i) \\frac{\\partial \\theta_i}{\\partial \\beta}. \]

By the chain rule,

\[ \\frac{\\partial \\theta_i}{\\partial \\beta} = \\frac{d\\theta_i}{d\\mu_i} \\frac{d\\mu_i}{d\\eta_i} \\frac{\\partial \\eta_i}{\\partial \\beta}, \]

where \(\\eta_i = x_i^T \\beta\) is the linear predictor.

So in general,

\[ \\frac{\\partial \\ell}{\\partial \\beta} = \\frac{1}{\\phi} \\sum_{i=1}^n (y_i - \\mu_i) \\left(\\frac{d\\theta_i}{d\\mu_i}\\right)\\left(\\frac{d\\mu_i}{d\\eta_i}\\right) x_i. \\tag{1} \]

This is the general GLM score equation.

2. Canonical link condition

The canonical link is defined by \(\\eta_i = \\theta_i\), i.e. the linear predictor is exactly the natural parameter. In this case,

\[ \\frac{d\\theta_i}{d\\mu_i} \\frac{d\\mu_i}{d\\eta_i} = 1. \]

To see this, recall the exponential-family identity

\[ \\mu_i = b'(\\theta_i). \]

\[ \\frac{d\\mu_i}{d\\theta_i} = b''(\\theta_i), \\qquad \\frac{d\\theta_i}{d\\mu_i} = \\frac{1}{b''(\\theta_i)}. \]

If the link is canonical, then \(\\eta_i = \\theta_i\), so \(d\\mu_i/d\\eta_i = b''(\\theta_i)\). Multiplying gives

\[ \\frac{d\\theta_i}{d\\mu_i}\\frac{d\\mu_i}{d\\eta_i} = \\frac{1}{b''(\\theta_i)} b''(\\theta_i) = 1. \]

3. Simplified score equation

Substituting back into (1), for canonical links the score reduces to

\[ \\frac{\\partial \\ell}{\\partial \\beta} = \\frac{1}{\\phi} \\sum_{i=1}^n (y_i - \\mu_i) x_i = \\frac{1}{\\phi} X^T (y - \\mu). \]

Setting this equal to zero gives

\[ X^T (y - \\mu) = 0, \]

which is the exact analogue of the normal equations in OLS.

4. Consequences

Simplified estimation equations. The orthogonality condition \(X^T(y - \\mu)=0\) is unweighted. For non-canonical links the equations involve additional weights depending on derivatives of the link function.
Variance function alignment. With canonical links, the variance of \(Y\) is directly tied to the cumulant function \(b(\\theta)\), making the information matrix simpler and the IRLS weights natural.
Sufficiency. In canonical-link GLMs, the sufficient statistics for the coefficients are linear in \(y\) (e.g. \(X^T y\)), which aligns estimation neatly with the design matrix.

5. Worked example: Poisson regression

For Poisson(\(\\mu\)), the exponential-family form has \(b(\\theta) = e^{\\theta}\), so \(\\mu = e^{\\theta}\). The canonical link sets \(\\eta = \\theta\), so

\[ \\mu = e^{\\eta} = e^{x^T \\beta}. \]

The score is

\[ \\frac{\\partial \\ell}{\\partial \\beta} = \\sum_{i=1}^n (y_i - \\mu_i) x_i, \]

and the information matrix is

\[ \\mathcal{I}(\\beta) = X^T \\mathrm{diag}(\\mu_i) X. \]

Again, the estimating equations are exactly in the form \(X^T(y - \\mu)=0\).

Conclusion

The canonical link eliminates the derivative terms in the score function by aligning the linear predictor with the natural parameter. This yields the simplest possible estimating equations, makes residual orthogonality exact rather than weighted, and gives GLMs their elegant OLS-like geometry.

A.4 Projection Interpretation

This appendix gives a geometric interpretation of the score equations for GLMs.
In the case of canonical links, the equations can be seen as an orthogonal projection of the observed data vector onto the model subspace spanned by the regressors. This parallels the OLS case, where residuals are orthogonal to the design matrix.

1. OLS as a projection

For ordinary least squares, the model is

\[ y = X \beta + \varepsilon, \]

and minimising the squared error \(||y - X\beta||^2\) yields the normal equations

\[ X^T (y - X\hat{\beta}) = 0. \]

This condition means that the residual vector \(r = y - X\hat{\beta}\) is orthogonal to every column of \(X\).
Geometrically, \(\hat{y} = X\hat{\beta}\) is the orthogonal projection of \(y\) onto the column space of \(X\).

2. GLMs and weighted projections

For a GLM, the mean vector is \(\mu(\beta)\) with \(\mu_i = b'(\theta_i)\), and the score equations are

\[ X^T W (y - \mu) = 0, \]

for some diagonal weight matrix \(W\) that depends on the link function and variance structure (see Appendix A.1).

With a canonical link, \(W = I\), and the equations reduce to

\[ X^T (y - \mu) = 0, \]

exactly mirroring OLS orthogonality.
With a non-canonical link, the weights \(W\) modify the inner product, so the residuals are orthogonal to the regressors under the weighted inner product

\[ \langle u, v \rangle_W = u^T W v. \]

Thus GLM estimation corresponds to a weighted projection of \(y\) onto the modelled mean space.

3. Iterative reweighted least squares (IRLS)

This projection interpretation underlies the IRLS algorithm for solving GLM score equations:

At each step, construct a “working dependent variable” \(z\) (a linearised version of \(y\) around the current mean).
Update \(\beta\) by solving a weighted least squares problem:

\[ \hat{\beta}_{\text{new}} = \arg\min_\beta (z - X\beta)^T W (z - X\beta). \]

Thus, IRLS can be viewed as repeatedly re-projecting the working response onto the column space of \(X\) with changing weights until convergence.

4. Geometric consequences

Canonical links give true orthogonal projections.
With \(W = I\), the fitted mean \(\mu\) is the plain orthogonal projection of \(y\) onto the model subspace (in expectation).
Non-canonical links give weighted projections.
Residuals are not orthogonal in the standard Euclidean sense, but are orthogonal under the inner product induced by \(W\).
Connection to sufficiency and information.
The projection view reflects the fact that \(X^Ty\) (or its weighted analogue) is sufficient for \(\beta\) in exponential families with canonical links. The Fisher information matrix, \(X^T W X\), describes the geometry of the projection.

5. Worked example: Logistic regression

For Bernoulli(\(\mu\)) with canonical logit link:

Score equations:

\[ X^T (y - \mu) = 0. \]
The fitted means \(\hat{\mu}\) are the projection of \(y\) onto the model space spanned by \(X\), under the Bernoulli likelihood geometry.
In IRLS, at each step one solves a weighted least squares projection with weights \(W = \text{diag}(\mu_i(1-\mu_i))\).

Conclusion

GLM estimation can be viewed geometrically as a projection problem:

Canonical links: exact orthogonal projection of the data vector (in expectation).
Non-canonical links: weighted projection under an inner product defined by the link and variance function.

This projection interpretation unifies OLS and GLMs under a common geometric framework and highlights why the canonical link yields particularly elegant estimation properties.

References

McCabe, D. (2025). Introduction to GLMs. Zenodo. https://doi.org/10.5281/zenodo.16821251

OpenAI. (2025). ChatGPT (GPT-5 mini) [large language model]. https://openai.com/chatgpt.