Get package and Iris data.
library(datasets)
data("iris")
dim(iris)
## [1] 150 5
There are 150 cases included in the data.
head(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
sapply(iris, class)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## "numeric" "numeric" "numeric" "numeric" "factor"
There are 4 numerical variables including “Sepal.Length”,
“Sepal.Width”, “Petal.Length”, and “Petal.Width” as sapply() function
provides the detail and
levels(iris$Species)
## [1] "setosa" "versicolor" "virginica"
The only categorical variable is “Species”. The corresponding levels include “setosa”, “versicolor”, and “virginica”.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(psych)
##
## Attaching package: 'psych'
##
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
iris %>%
describe()
## vars n mean sd median trimmed mad min max range skew
## Sepal.Length 1 150 5.84 0.83 5.80 5.81 1.04 4.3 7.9 3.6 0.31
## Sepal.Width 2 150 3.06 0.44 3.00 3.04 0.44 2.0 4.4 2.4 0.31
## Petal.Length 3 150 3.76 1.77 4.35 3.76 1.85 1.0 6.9 5.9 -0.27
## Petal.Width 4 150 1.20 0.76 1.30 1.18 1.04 0.1 2.5 2.4 -0.10
## Species* 5 150 2.00 0.82 2.00 2.00 1.48 1.0 3.0 2.0 0.00
## kurtosis se
## Sepal.Length -0.61 0.07
## Sepal.Width 0.14 0.04
## Petal.Length -1.42 0.14
## Petal.Width -1.36 0.06
## Species* -1.52 0.07
Using the describe() function, one of the analyses we found is how Both Sepal.Length and Sepal.Width are postively skewed, but both Petal.Length and Petal.Width are negatively skewed.
Referencing page 48 in IPSUR: creating a similar graph as Figure 3.10 but just with different colors…
plot(iris$Petal.Width ~ iris$Petal.Length,
xlab = "Petal Length",
ylab = "Petal Width",
main = "Petal Length vs Petal Width",
col = iris$Species)
One example of a time series data is the AirPassenger dataset.
There are definitely more airline passengers as time progresses. However, it is also interesting to note how this graph has very consistent peaks and lows, as I am assuming seasonalities might have an impact on passengers. Overall, the airline passengers increase over time, going upwards.
data("AirPassengers")
Referencing https://rpubs.com/vivekkashyap043/airpassengers.:
plot(AirPassengers,
main = "Airline Passengers Over Time",
xlab = "Year-Month",
ylab = "Number of Passengers")
Referencing help(“AirPassengers”):
(fit <- StructTS(AirPassengers, type = "BSM"))
##
## Call:
## StructTS(x = AirPassengers, type = "BSM")
##
## Variances:
## level slope seas epsilon
## 0.00 160.98 29.85 0.00
plot(cbind(AirPassengers,fitted(fit)), plot.type = "single")