PLP#6 Introduction to Linear Models

Linear Models

Linear models is considered as the most important aspect of math model possess the characteristics of the real world. In statistics, the term linear model is used in connection with regression models and the term is often taken as synonymous with linear regression model. The term “linear” is used to identify a subclass of models for which substantial reduction in the complexity of the related statistical theory is possible (https://en.wikipedia.org/wiki/Linear_model).

Linear Additive Model is use to describe observation which consist of mean plus error. The quality of statistical analysis is best judged by the model that assumed to describe the data. Linear models represents the relationship between the response variable or predictor variable.

Linear regression requires a linear model. A model is linear when each term is either a constant or the product of a parameter and a predictor variable. A linear equation is constructed by adding the results for each term. This constrains the equation to just one basic form:

Response = constant + parameter * predictor + … + parameter * predictor

Y = b o + b1X1 + b2X2 + … + bkXk

In statistics, a regression equation (or function) is linear when it is linear in the parameters. While the equation must be linear in the parameters, you can transform the predictor variables in ways that produce curvature. For instance, you can include a squared variable to produce a U-shaped curve.

Y = b o + b1X1 + b2X12

This model is still linear in the parameters even though the predictor variable is squared. You can also use log and inverse functional forms that are linear in the parameters to produce different types of curves.

Example of Linear Model

Source: https://blog.minitab.com/blog/adventures-in-statistics-2/what-is-the-difference-between-linear-and-nonlinear-equations-in-regression-analysis

There are three (3) conceptual model of Linear Model: 1. True model (describes data perfectly with no residual or unexplained variation) 2. Ideal model (closest to true model which are usually used for the analysis) 3. Operational model (simplified version of the ideal model and one used in analysis)

Writing of a good operational model

Operational model is a model or representation of how to perform the study, and how to investigate the behavior of interdependent variables and capture the dynamic interactions of the phenomenon under study.

Observations Y usually represents the Observation vectors that have multivariate normal distribution. Some examples of normally distributed continuous variables is growth rate, which has no limit number (infinite).

Categorical traits Categorical variable is a variable which has fixed number of possible values or assigning an individual to a particular group or category such as blood type, gender, sex, etc.

Factors Usually refers to variables either discrete or continuous, which may influence or are related to the elements in observation vector. For example: growth rate of fish which can be influenced by age at stocking, season in which it was grown, and the genetic potentials. Factors should be quantifiable so that it will not fall into an assumption and limitation of the study. There are different types of factors: 1) Discrete factors (have classes or levels e.g. distance of mangrove from a given point), 2) Nuisance factor (has an effect on the response, but is not important or the factor of interest). Nuisance factor, if known and uncontrollable, can be remove by using the analysis of covariance. 3) Fixed and Random factors (the traditional statistical approach or the Bayesian approach).

Fixed Factors

Fixed factors are classes comprise all of the possible classes of interest that could be observed such as gender of an animal in livestock situation. This is usually used if factors are small.

Random Factors

Random factors are factors whose levels are considered to be drawn randomly from an infinite population. This is usually used used if factors are large.

Residuals

Residual (e) is the difference between the observed value of the dependent variable (y) and the predicted value (ŷ) in regression analysis. Each data point has one residual.

Residual = Observed value - Predicted value e = y - ŷ

Both the sum and the mean of the residuals are equal to zero. That is, Σ e = 0 and e = 0.

Source https://stattrek.com/statistics/dictionary.aspx?definition=residual

Alternative Models

This describes the alternative hypotheses with Reduced Model (HO have no difference among treatment means), and Full Model (HA have some differences among treatments).

Non-linear model

Nonlinear equations can take many different forms while the linear has only one form. It covers many different forms, which is why nonlinear regression provides the most flexible curve-fitting functionality. Here are several examples from Minitab’s nonlinear function catalog. Thetas represent the parameters and X represents the predictor in the nonlinear functions. Unlike linear regression, these functions can have more than one parameter per predictor variable.

Example of Non-Linear Model

Example of Non Linear Model

Source: https://blog.minitab.com/blog/adventures-in-statistics-2/what-is-the-difference-between-linear-and-nonlinear-equations-in-regression-analysis

Model Building

Statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of sample data (and similar data from a larger population), which represents, often in considerably idealized form, the data-generating process and this also shows the mathematical relationship between one or more random variables and other non-random variables (https://en.wikipedia.org/wiki/Statistical_model).

Multicollinearity is a state of very high intercorrelations or inter-associations among the independent variables. It is therefore a type of disturbance in the data, and if present in the data the statistical inferences made about the data may not be reliable (https://www.statisticssolutions.com/multicollinearity/) and it is dependent on other variables.

Confounding variable is an “extra” variable that you didn’t account for, which can ruin an experiment and give useless results. They can mislead the occurence of correlation when in fact there isn’t, as well as introduction of bias.

Example of Confounding variable

Source: https://www.statisticshowto.datasciencecentral.com/experimental-design/confounding-variable/

Regression analysis is used to model the relationship between a response variable and one or more predictor variables (http://www.statgraphics.com/regression-analysis).

ANOVA is a statistical technique that assesses potential differences in a scale-level dependent variable by a nominal-level variable having 2 or more categories (https://www.statisticssolutions.com/manova-analysis-anova/).

ANCOVA (Analysis of covariance) is a general linear model which blends ANOVA and regression. ANCOVA evaluates whether the means of a dependent variable (DV) are equal across levels of a categorical independent variable (IV) often called a treatment, while statistically controlling for the effects of other continuous variables that are not of primary interest, known as covariates (CV) or nuisance variables (https://en.wikipedia.org/wiki/Analysis_of_covariance).