Load/install libraries

library(knitr) # descriptive library(MASS) library(psych) library(gmodels) library(vcd) library(plotrix) # inferential library(Lock5Data) library(car) library(forecast) library(tseries) library(Kendall) library(seasonal) library(ResourceSelection) library(survival) library(ggplot2)
library(agricolae)

Description

R was initially written as a research project by Ross Ihaka and Robert Gentleman at Department of Statistics in the University of Auckland, New Zealand during 1990s. To provide a statistical environment in their teaching lab (a language for data analysis and graphics) based on the S language. The aim of the training is to introduce model applications commonly used in the analysis of data from studies.We seek to give sufficient background so that you better understand the types of analyses that can be implemented for designed experiments. We give practical examples and hands-on applications so that you can experience how these analyses are conducted. We introduce a framework that will easily extend to more complex structures in the models. We will assume some underlying knowledge of mathematical concepts and statistical notation. # Basic descriptive analysis and graphics in R

Summary statistics

summary(mtcars$mpg)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   10.40   15.43   19.20   20.09   22.80   33.90

Means by group

#One Quantitative (y) and One Categorical (x)

by(InsectSprays$count,InsectSprays$spray,mean,na.rm=TRUE)

## InsectSprays$spray: A
## [1] 14.5
## ------------------------------------------------------------ 
## InsectSprays$spray: B
## [1] 15.33333
## ------------------------------------------------------------ 
## InsectSprays$spray: C
## [1] 2.083333
## ------------------------------------------------------------ 
## InsectSprays$spray: D
## [1] 4.916667
## ------------------------------------------------------------ 
## InsectSprays$spray: E
## [1] 3.5
## ------------------------------------------------------------ 
## InsectSprays$spray: F
## [1] 16.66667

Basic graphs

Pie Charts

Pie chart is popularly used in practice to show percentage break down of data. It is a circle representing a set of data by dividing the circle into sectors proportional to the number of items in the categories or a pie chart is a circle representing the total, cut into slices in proportional to the size of the parts that make up the total. It gives the proportional sizes of different data groups as slice of a pie or a circle.

slices <- c(10, 12,4, 16, 8)
lbls <- c("US", "UK", "Australia", "Germany", "France")

pie(slices, labels = lbls,main="Simple Pie Chart")

# Simple bar plot Simple Bar Chart: is a diagram in which categories of a variable are marked on the X axis and the frequencies of the categories are marked on the Y axis. It is applicable for discrete variables, that is, for data given according to some period, places and timings. These periods and timings are represented on the base line (X-axis) at regular interval and the corresponding frequencies are represented on the Y-axis.

barplot(GNP ~ Year, data = longley,col = "red", lwd = 2)

# Stacked BarPlots Multiple Bar Charts: used to display data on more than one variable. In the multiple bars diagram two or more sets of inter-related data are interpreted.

barplot(VADeaths, beside = TRUE,
        col = c("lightblue", "mistyrose", "lightcyan",
                "lavender", "cornsilk"),
        legend.text = rownames(VADeaths), ylim = c(0, 100))
title(main = "Death Rates in Virginia", font.main = 4)

Boxplots

Produce box-and-whisker plot(s) of the given (grouped) values.

boxplot(mtcars$hp ~ mtcars$am,
        xlab="Transformation",
        ylab="horse power", 
        main = "Boxplots of horse power by Transformation Status", 
        col="lightblue")

Scatter Plots

A scatter plot is a type of graph used to show the relationship between two numerical variables. Each point represents one observation.

plot(mtcars$mpg,mtcars$weight,
     xlab="Weight" ,
     ylab="miles per age",
     main="Plot of Weight vs miles per age ")

Histograms

A histogram is graph showing the frequency of occurrence of each value of the variable being analyzed. In histogram, data are plotted as a series of rectangles. Class intervals are shown on the “X-axis” and the frequencies on the “Y-axis”. The height of each rectangle represents the frequency of the class interval. Each rectangle is formed with the other so as to give a continuous picture.

hist(mtcars$mpg, breaks=12,col="red",
     xlab="Miles Per Gallon",
     main="Colored histogram with 12 bins")

2-D and 3-D plots

A 2-D plot displays the relationship between two variables on a flat graph using the x- and y-axes. A 3-D plot shows the relationship among three variables by adding a z-axis, giving the graph depth and a more realistic view of data.

x <- 10*(1:nrow(volcano))
y <- 10*(1:ncol(volcano))

Creates a 2-D image of x and y co-ordinates.

image(x, y, volcano, col = terrain.colors(100),
axes = FALSE)
# Adds contour lines to the current plot.
contour(x, y, volcano, levels = seq(90, 200, by=5),
add = TRUE, col = "peru")
# Adds x and y axes to the plot.
axis(1, at = seq(100, 800, by = 100))
axis(2, at = seq(100, 600, by = 100))
# Draws a box around the plot.
box()
# Adds a title.
title(main = "Maunga Whau Volcano", font.main = 4)

Infrential Statistics

Inferential statistics is a branch of statistics that uses sample data to make conclusions or predictions about a larger population. It involves methods like estimation, hypothesis testing, and confidence intervals to draw reliable inferences.

Chisquare test

The chi-square test is a statistical method used to determine whether there is a significant association between categorical variables or whether observed data fits an expected distribution.

chisq.test(mtcars$vs,mtcars$am)#H1:there is significant association

## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  mtcars$vs and mtcars$am
## X-squared = 0.34754, df = 1, p-value = 0.5555

Simple Random Sampling(SRS)

SRS is a method of selecting a sample where every member of the population has an equal chance of being chosen.

sample(1:2000,100)

##   [1] 1177 1984 1523 1401 1006  430  856  212  853 1959  815  986  626  589 1858
##  [16]  494  929 1581  800  754  982  894   15  337  299  258 1525 1758  817 1960
##  [31] 1335  920 1168  958 1797  167  778 1552  501 1976  799 1848  133 1765 1066
##  [46] 1462 1033 1186  376 1491  241 1328 1664  558  245 1566 1610 1135   14 1399
##  [61] 1456 1087 1132   11 1746 1494 1389  925 1378 1262 1449 1668 1990 1846 1913
##  [76] 1764 1419  135  875  492  981  551 1778 1911 1196  327  305 1267  873 1160
##  [91]  246 1561 1855  581  910 1094  542  914  268  426

sample(c("H","T"), 100, replace =TRUE)

##   [1] "T" "H" "H" "H" "T" "H" "H" "H" "T" "T" "H" "H" "T" "T" "H" "T" "T" "T"
##  [19] "H" "H" "T" "H" "T" "T" "H" "H" "H" "H" "H" "T" "H" "T" "H" "H" "H" "H"
##  [37] "H" "T" "T" "H" "H" "H" "H" "H" "T" "T" "H" "H" "H" "H" "T" "T" "H" "H"
##  [55] "H" "T" "T" "T" "H" "H" "T" "H" "H" "H" "T" "T" "H" "H" "H" "T" "H" "T"
##  [73] "H" "T" "T" "T" "T" "H" "H" "T" "H" "T" "H" "H" "H" "H" "T" "T" "T" "T"
##  [91] "H" "T" "T" "H" "T" "T" "T" "T" "T" "H"

Tabular values for distributions

They are pre-calculated values used to compare with your test statistic to decide whether to accept or reject a hypothesis. # Tabular value in the normal distribution

1-pnorm(2.15)

## [1] 0.01577761

qt(0.025,20)

## [1] -2.085963

Tabular value in the F distribution

qf(0.05,3,20, lower.tail=F)#To find upper 5% point for an F(3,20) distribution

## [1] 3.098391

Parametric and non parametric tests

Parametric tests are statistical tests that assume the data follows a specific distribution (usually normal distribution) and involve parameters like mean and standard deviation. Non-parametric tests do not assume any specific distribution of the data. # Two sample Parametric model A two-sample parametric model is used to compare the means of two independent groups when the data is numerical and assumes a normal distribution.

t.test(cars$dist,cars$speed)  #H1: true difference in means is not equal to 0

## 
##  Welch Two Sample t-test
## 
## data:  cars$dist and cars$speed
## t = 7.4134, df = 53.119, p-value = 9.628e-10
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  20.11848 35.04152
## sample estimates:
## mean of x mean of y 
##     42.98     15.40

Paired sample t test

A paired sample t-test (also called dependent t-test) is used to compare the means of two related groups to see if there is a significant difference.

t.test(cars$dist,cars$speed, paired=TRUE, alternative = 'two')

## 
##  Paired t-test
## 
## data:  cars$dist and cars$speed
## t = 8.9753, df = 49, p-value = 6.417e-12
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  21.40484 33.75516
## sample estimates:
## mean difference 
##           27.58

Compare two variances

The F-test is a parametric test used to determine whether two populations have equal variances. It is often used before performing t-tests to check the assumption of equal variance.

var.test(cars$dist,cars$speed) #H1:true ratio of variances is not equal to 1

## 
##  F test to compare two variances
## 
## data:  cars$dist and cars$speed
## F = 23.751, num df = 49, denom df = 49, p-value < 2.2e-16
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  13.47817 41.85388
## sample estimates:
## ratio of variances 
##           23.75108

Two-sample Wilcoxon test

The two-sample Wilcoxon test (also called the Mann–Whitney U test) is a non-parametric test used to compare the medians of two independent groups when the data does not assume a normal distribution.

wilcox.test(cars$dist,cars$speed,conf.int=TRUE)#H1: true location shift is not equal to 0

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  cars$dist and cars$speed
## W = 2198, p-value = 6.319e-11
## alternative hypothesis: true location shift is not equal to 0
## 95 percent confidence interval:
##  15.99997 31.00003
## sample estimates:
## difference in location 
##                     22

The matched-pairs Wilcoxon test

The matched-pairs Wilcoxon test (also called the Wilcoxon Signed-Rank Test) is a non-parametric test used to compare two related or paired samples when the data does not assume normality.

wilcox.test(cars$dist,cars$speed,conf.int=TRUE, paired=TRUE)

## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  cars$dist and cars$speed
## V = 1268.5, p-value = 1.148e-09
## alternative hypothesis: true location shift is not equal to 0
## 95 percent confidence interval:
##  19.49998 32.50006
## sample estimates:
## (pseudo)median 
##       25.00005

Ansari-Bradley test

The Ansari–Bradley test is a non-parametric test used to compare the variability (dispersion) of two independent groups when the data does not assume normality.

ansari.test(cars$dist,cars$speed)#H1:true ratio of scales is not equal to 1

## 
##  Ansari-Bradley test
## 
## data:  cars$dist and cars$speed
## AB = 1092, p-value = 0.01328
## alternative hypothesis: true ratio of scales is not equal to 1

Simple linear regression

Simple linear regression is a statistical method used to model the relationship between two variables: one independent variable (X) and one dependent variable (Y).

sl=lm(cars$speed~cars$dist)
summary(sl)

## 
## Call:
## lm(formula = cars$speed ~ cars$dist)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.5293 -2.1550  0.3615  2.4377  6.4179 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  8.28391    0.87438   9.474 1.44e-12 ***
## cars$dist    0.16557    0.01749   9.464 1.49e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.156 on 48 degrees of freedom
## Multiple R-squared:  0.6511, Adjusted R-squared:  0.6438 
## F-statistic: 89.57 on 1 and 48 DF,  p-value: 1.49e-12

Model assumption check

Scatter plot for linearity

A scatter plot is often used to check if a linear relationship exists between two variables before performing linear regression.

plot(cars$dist,cars$speed, xlab="distance",ylab="speed",main="Scatter plot")
abline(lm(cars$speed~cars$dist),lty=2)

QQ plot and Shapiro test for normmality

Both the QQ plot and the Shapiro-Wilk test are used to check if data is normally distributed, which is an important assumption for many parametric tests.

qqnorm(resid(sl))#normality
qqline(resid(sl),lty=2)

shapiro.test(cars$speed) #H1: The data are not normally distributed

## 
##  Shapiro-Wilk normality test
## 
## data:  cars$speed
## W = 0.97765, p-value = 0.4576

Residual vs fitted plot for Constant variance

A Residuals vs Fitted plot is a diagnostic tool in regression to check whether the variance of residuals is constant across all levels of predicted values.

plot(resid(sl),fitted(sl))# constant variance
abline(h=0,lty=2)

Diagnostic plots of the residuals

Residual diagnostic plots are graphical tools used to check the assumptions of a regression model, such as linearity, constant variance, and normality of errors. They help ensure the model is reliable.

par(mfrow=c(3,2))
plot(sl, which=1:5)
plot(cars$dist,cars$speed, xlab="distance",ylab="speed",main="Scatter plot")

Model Prediction

Model prediction refers to using a fitted statistical model to estimate or forecast the value of the dependent variable based on given independent variable(s).

predict(sl,int="p")

## Warning in predict.lm(sl, int = "p"): predictions on current data refer to _future_ responses

##          fit       lwr      upr
## 1   8.615041  2.046715 15.18337
## 2   9.939581  3.427222 16.45194
## 3   8.946176  2.392929 15.49942
## 4  11.926392  5.475837 18.37695
## 5  10.932987  4.454892 17.41108
## 6   9.939581  3.427222 16.45194
## 7  11.264122  4.795959 17.73228
## 8  12.588663  6.152686 19.02464
## 9  13.913203  7.497220 20.32919
## 10 11.098554  4.625520 17.57159
## 11 12.919798  6.489968 19.34963
## 12 10.601852  4.113078 17.09063
## 13 11.595257  5.136275 18.05424
## 14 12.257527  5.814641 18.70041
## 15 12.919798  6.489968 19.34963
## 16 12.588663  6.152686 19.02464
## 17 13.913203  7.497220 20.32919
## 18 13.913203  7.497220 20.32919
## 19 15.900014  9.490931 22.30910
## 20 12.588663  6.152686 19.02464
## 21 14.244338  7.831434 20.65724
## 22 18.217960 11.781853 24.65407
## 23 21.529312 14.990142 28.06848
## 24 11.595257  5.136275 18.05424
## 25 12.588663  6.152686 19.02464
## 26 17.224555 10.804639 23.64447
## 27 13.582068  7.162237 20.00190
## 28 14.906609  8.497548 21.31567
## 29 13.582068  7.162237 20.00190
## 30 14.906609  8.497548 21.31567
## 31 16.562284 10.149326 22.97524
## 32 15.237744  8.829448 21.64604
## 33 17.555690 11.131142 23.98024
## 34 20.867041 14.354431 27.37965
## 35 22.191582 15.622947 28.76022
## 36 14.244338  7.831434 20.65724
## 37 15.900014  9.490931 22.30910
## 38 19.542501 13.074147 26.01085
## 39 13.582068  7.162237 20.00190
## 40 16.231149  9.820514 22.64178
## 41 16.893420 10.477367 23.30947
## 42 17.555690 11.131142 23.98024
## 43 18.880230 12.429514 25.33095
## 44 19.211366 12.752207 25.67052
## 45 17.224555 10.804639 23.64447
## 46 19.873636 13.395335 26.35194
## 47 23.516123 16.879994 30.15225
## 48 23.681690 17.036335 30.32705
## 49 28.152015 21.194665 35.10936
## 50 22.357149 15.780699 28.93360

Quadratic model

A quadratic model is a type of regression model used to describe the relationship between a dependent variable 𝑌and an independent variable 𝑋when the relationship is curved, not linear.

Qm=lm(formula = cars$speed~ cars$dist + I(cars$dist^2))
summary(Qm)

## 
## Call:
## lm(formula = cars$speed ~ cars$dist + I(cars$dist^2))
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -7.559 -1.722  0.473  1.932  5.942 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     5.1439610  1.2954573   3.971 0.000244 ***
## cars$dist       0.3274544  0.0547392   5.982 2.86e-07 ***
## I(cars$dist^2) -0.0015284  0.0004939  -3.095 0.003316 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.907 on 47 degrees of freedom
## Multiple R-squared:  0.7101, Adjusted R-squared:  0.6978 
## F-statistic: 57.57 on 2 and 47 DF,  p-value: 2.299e-13

Comparing Models

ANOVA (Analysis of Variance) can be used to compare nested regression models to see if a more complex model significantly improves the fit compared to a simpler model.

anova(sl,Qm) #H1:There is a significant improvement in the model

## Analysis of Variance Table
## 
## Model 1: cars$speed ~ cars$dist
## Model 2: cars$speed ~ cars$dist + I(cars$dist^2)
##   Res.Df    RSS Df Sum of Sq      F   Pr(>F)   
## 1     48 478.02                                
## 2     47 397.11  1    80.911 9.5762 0.003316 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation

Correlation measures the strength and direction of a relationship between two variables. It tells us how one variable changes when the other changes. The Pearson correlation is rooted in the two-dimensional normal distribution where the theoretical correlation describes the contour ellipses for the density. Spearman’s ρ, you may be interested in nonparametric variants.These have the advantage of not depending on the normal distribution and, indeed, being invariant to monotone transformations of the coordinates. cor.test(cars\(dist,cars\)speed,method=“spearman”)

cor(cars$dist,cars$speed,use="complete.obs")

## [1] 0.8068949

cor(cars,use="complete.obs")

##           speed      dist
## speed 1.0000000 0.8068949
## dist  0.8068949 1.0000000

cor.test(cars$speed,cars$dist)

## 
##  Pearson's product-moment correlation
## 
## data:  cars$speed and cars$dist
## t = 9.464, df = 48, p-value = 1.49e-12
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.6816422 0.8862036
## sample estimates:
##       cor 
## 0.8068949

Multiple regression models

Multiple regression is an extension of simple linear regression that predicts a dependent variable (Y) using two or more independent variables (X₁, X₂, …, Xₖ).

ml=lm(UrbanPop~Murder+Assault+Rape, data=USArrests)
summary(ml)

## 
## Call:
## lm(formula = UrbanPop ~ Murder + Assault + Rape, data = USArrests)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -35.456  -6.950   0.077   7.770  25.221 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 52.84187    4.82483  10.952 2.09e-14 ***
## Murder      -1.41154    0.71954  -1.962   0.0559 .  
## Assault      0.05190    0.04161   1.247   0.2186    
## Rape         0.69841    0.26776   2.608   0.0122 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13.08 on 46 degrees of freedom
## Multiple R-squared:  0.2337, Adjusted R-squared:  0.1837 
## F-statistic: 4.676 on 3 and 46 DF,  p-value: 0.006208

Analysis of Variance (ANOVA)

ANOVA is a statistical method used to compare the means of three or more groups to see if at least one group mean is significantly different from the others.

summary(InsectSprays)# data sammary for each variable

##      count       spray 
##  Min.   : 0.00   A:12  
##  1st Qu.: 3.00   B:12  
##  Median : 7.00   C:12  
##  Mean   : 9.50   D:12  
##  3rd Qu.:14.25   E:12  
##  Max.   :26.00   F:12

Anova.Spray<-aov(sqrt(count)~spray,data= InsectSprays)
summary(Anova.Spray)

##             Df Sum Sq Mean Sq F value Pr(>F)    
## spray        5  88.44  17.688    44.8 <2e-16 ***
## Residuals   66  26.06   0.395                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Bartlett test of homogeneity of variances

Bartlett’s test is a statistical test used to check whether multiple groups have equal variances. It is important because many parametric tests assume homogeneity of variances.

bartlett.test(sqrt(count)~spray,data= InsectSprays)

## 
##  Bartlett test of homogeneity of variances
## 
## data:  sqrt(count) by spray
## Bartlett's K-squared = 3.7525, df = 5, p-value = 0.5856

#Ho: the distribution of a variable has the same variance in all groups

Kruskal–Wallis test

The Kruskal–Wallis test is a non-parametric alternative to one-way ANOVA. It is used to compare three or more independent groups when the data does not meet the normality assumption.

kruskal.test(sqrt(count)~spray,data= InsectSprays)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  sqrt(count) by spray
## Kruskal-Wallis chi-squared = 54.691, df = 5, p-value = 1.511e-10

#H1: the location parameters of the distribution of x are differ in at least one group (sample).

ANOVA for Repeated measure

Repeated Measures ANOVA is an extension of ANOVA used when the same subjects are measured multiple times under different conditions or over time. It accounts for within-subject correlation.

rmanova=aov(weight~Time+Error(Chick/Time),data=ChickWeight)
summary(rmanova)

## 
## Error: Chick
##           Df Sum Sq Mean Sq F value Pr(>F)   
## Time       1  89700   89700   9.776  0.003 **
## Residuals 48 440405    9175                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Error: Chick:Time
##           Df  Sum Sq Mean Sq F value Pr(>F)    
## Time       1 1962914 1962914   280.1 <2e-16 ***
## Residuals 49  343364    7007                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Error: Within
##            Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 478  78173   163.5

Logistic regression

Logistic regression is a statistical model used to predict a binary or categorical outcome based on one or more predictor variables.

fit.full <- glm( mtcars$am~ mtcars$vs+mtcars$mpg+mtcars$cyl, data=mtcars, family="binomial")

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

summary(fit.full)

## 
## Call:
## glm(formula = mtcars$am ~ mtcars$vs + mtcars$mpg + mtcars$cyl, 
##     family = "binomial", data = mtcars)
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)
## (Intercept)    69.0287 12251.6506   0.006    0.996
## mtcars$vs     -37.9198  6125.8244  -0.006    0.995
## mtcars$mpg      0.2910     0.1933   1.505    0.132
## mtcars$cyl     -9.4229  1531.4562  -0.006    0.995
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 43.230  on 31  degrees of freedom
## Residual deviance: 20.578  on 28  degrees of freedom
## AIC: 28.578
## 
## Number of Fisher Scoring iterations: 18

Time series model

A time series model is used to analyze data points collected or recorded over time to understand patterns, trends, and make forecasts.

summary(AirPassengers)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   104.0   180.0   265.5   280.3   360.5   622.0

plot(AirPassengers)

Seasonality effect

boxplot(AirPassengers~cycle(AirPassengers))

Decompose the time series data in to its component(trend, seasonality and irregularity)

tsdata=ts(AirPassengers,freq=12)

Decompose using multiplicative model

ddata=decompose(tsdata, "multiplicative")

plot(ddata)

Fit time series model

mymodel=arima(AirPassengers)# shows the best model
summary(mymodel)

##           Length Class  Mode     
## coef        1    -none- numeric  
## sigma2      1    -none- numeric  
## var.coef    1    -none- numeric  
## mask        1    -none- logical  
## loglik      1    -none- numeric  
## aic         1    -none- numeric  
## arma        7    -none- numeric  
## residuals 144    ts     numeric  
## call        2    -none- call     
## series      1    -none- character
## code        1    -none- numeric  
## n.cond      1    -none- numeric  
## nobs        1    -none- numeric  
## model      10    -none- list

Residual Plot

plot.ts(arima(AirPassengers)$residuals)

Autocorelation (ACF or PACF)

acf(ts(arima(AirPassengers)$residuals),main="ACF Residuals")

pacf(ts(arima(AirPassengers)$residuals),main="PACF Residuals")

Test a model

#H0: The residuals are independently distributed.
#HA: The residuals are not independently distributed; they exhibit serial correlation.
Box.test(arima(AirPassengers)$residuals,lag=5,type="Ljung-Box")#lag 5

## 
##  Box-Ljung test
## 
## data:  arima(AirPassengers)$residuals
## X-squared = 504.8, df = 5, p-value < 2.2e-16

Box.test(arima(AirPassengers)$residuals,lag=10,type="Ljung-Box")# lag 10

## 
##  Box-Ljung test
## 
## data:  arima(AirPassengers)$residuals
## X-squared = 857.07, df = 10, p-value < 2.2e-16

Box.test(arima(AirPassengers)$residuals,lag=15,type="Ljung-Box")#lag 15

## 
##  Box-Ljung test
## 
## data:  arima(AirPassengers)$residuals
## X-squared = 1241.5, df = 15, p-value < 2.2e-16

Analysis using R

Tegenu_Tesema

2026-03-27