class: center, middle, inverse, title-slide .title[ # Generalized Linear Models With Julia ] .subtitle[ ## Statistical Modelling with Julia ] --- <style type="text/css"> body{ font-size: 20pt; } </style> ```julia using DataFrames using GLM # For Generalized Linear Models using StatsPlots # For plotting (optional) # 1. Create or load your data # (Example using a DataFrame; you can adapt this to your data structure) # Simulate some data (replace with your actual data) using Random Random.seed!(123) # For reproducibility n = 100 x = rand(n) true_beta0 = 2.0 true_beta1 = -1.5 true_sigma = 0.5 # Standard deviation of the error ``` --- ```julia # Linear predictor linear_predictor = true_beta0 .+ true_beta1 .* x # Generate data based on a Poisson distribution (example) # You can change this to other distributions like Binomial, Gamma, etc. y = rand.(Poisson.(exp.(linear_predictor))) # Inverse link function for Poisson df = DataFrame(x = x, y = y) println(df) ``` --- ```julia # 2. Specify the model # Choose the appropriate distribution and link function # Example 1: Poisson regression (as per simulated data) model_poisson = glm(@formula(y ~ x), df, Poisson(), LogLink()) # LogLink is the canonical link for Poisson # Example 2: Logistic regression (if y were binary 0/1) # df[:y_binary] = ifelse.(y.>median(y),1,0) # Create some binary outcome # model_logistic = glm(@formula(y_binary ~ x), df, Binomial(), LogitLink()) # Example 3: Gamma regression (if y were positive and continuous) # model_gamma = glm(@formula(y ~ x), df, Gamma(), InverseLink()) ``` --- ```julia # 3. Fit the model fit!(model_poisson) # In-place fitting is more efficient # 4. Examine the results println(model_poisson) # Print model summary # Coefficients and standard errors println("Coefficients: ", coef(model_poisson)) println("Standard Errors: ", stderror(model_poisson)) # Confidence intervals using Confirmatory println("95% Confidence Intervals: ", confint(model_poisson)) ``` --- ```julia # Goodness of fit (for Poisson, use a dispersion test) using Distributions dispersion_statistic = sum(deviance(model_poisson)) / dof_residual(model_poisson) println("Dispersion Statistic: ", dispersion_statistic) # If dispersion_statistic is significantly different from 1, it suggests over/under dispersion # which means Poisson may not be the best fit and a negative binomial or quasi-poisson might be more appropriate. ``` --- ```julia # 5. Make predictions new_data = DataFrame(x = [0.2, 0.5, 0.8]) # Create new data for prediction predictions = predict(model_poisson, new_data) println("Predictions: ", predictions) # For Poisson and other count data, you often want the *expected* counts: expected_counts = exp.(predict(model_poisson, new_data)) println("Expected Counts: ", expected_counts) ``` --- ```julia # 6. Model diagnostics and plotting (optional, but highly recommended) # Residuals residuals_pearson = residuals(model_poisson, PearsonResiduals()) # Plots (using StatsPlots) # scatter(df.x, residuals_pearson, xlabel = "x", ylabel = "Pearson Residuals", title = "Residual Plot") # scatter(df.x, y, label = "Observed", xlabel = "x", ylabel = "y") # plot!(df.x, exp.(predict(model_poisson)), label = "Fitted", color = :red) # For Poisson ``` --- ```julia # Example of Negative Binomial Regression (for overdispersion) # using GLM.jl # model_negbin = glm(@formula(y ~ x), df, NegativeBinomial(), LogLink()) # fit!(model_negbin) # println(model_negbin) ``` --- **Explanation and Key Improvements:** 1. **Data Simulation:** The code now includes an example of how to *simulate* data from a Poisson distribution. This makes the example self-contained and reproducible. **Replace this with your actual data loading.** 2. **Clearer Model Specification:** The model specification using `@formula` and the distribution/link function is more explicit. 3. **In-Place Fitting:** `fit!(model)` is used for in-place fitting, which is generally more efficient, especially for larger datasets. 4. **Comprehensive Results:** The example now prints coefficients, standard errors, and confidence intervals. 5. **Goodness of Fit:** Added a dispersion statistic calculation for Poisson regression to assess potential over/under dispersion. This is crucial for Poisson models. 6. **Prediction:** Demonstrates how to make predictions on new data using `predict()`. For Poisson (and other count data), it also shows how to get the *expected counts* by transforming the linear predictor (using `exp()` for the log link). 7. **Residuals and Plotting:** Includes the calculation of Pearson residuals and commented-out code for plotting using `StatsPlots`. Visualizing residuals is *essential* for model diagnostics. 8. **Negative Binomial Example:** Added a commented-out example of how to fit a Negative Binomial model. This is useful if your Poisson model shows overdispersion (dispersion statistic significantly different from 1). 9. **Comments and Structure:** Improved comments and code structure for better readability. --- **How to Run:** 1. Make sure you have the necessary packages installed: ```julia ] add DataFrames GLM StatsPlots Confirmatory Distributions ``` 2. Save the code as a `.jl` file (e.g., `glm_example.jl`). 3. Open a Julia REPL. 4. Navigate to the directory where you saved the file. 5. Type `include("glm_example.jl")` and press Enter. --- The output will show the model summary, coefficients, standard errors, predictions, and (if you uncomment it) the residual plot. Remember to replace the example simulated data with your actual data. Adjust the distribution and link function as needed for your specific problem.