library(readxl)
mydata <- read_xlsx("./ObesityDataSet.xlsx")
mydata <- as.data.frame(mydata)
head(mydata)
## Age Gender Height Weight Alcohol CaloricFood MealsDay Smoke Activity ObesityLevel
## 1 21 2 1.62 64.0 no no 3 no 0 Normal_Weight
## 2 21 2 1.52 56.0 Sometimes no 3 yes 3 Normal_Weight
## 3 23 1 1.80 77.0 Frequently no 3 no 2 Normal_Weight
## 4 27 1 1.80 87.0 Frequently no 3 no 2 Overweight_Level_I
## 5 22 1 1.78 89.8 Sometimes no 1 no 0 Overweight_Level_II
## 6 29 1 1.62 53.0 Sometimes yes 3 no 0 Normal_Weight
This dataset includes data for the estimation of obesity level in individuals from the countries of Mexico, Peru and Colombia, based on their eating habits and physical condition.
Variables:
-Age: age of participant
-Gender: gender of participant (1=Male, 2=Female)
-Height: height of participant in meters
-Weight: weight of participant in kgs
-Alcohol: frequency of consuming alcohol (no, Sometimes, Frequently, Always)
-CaloricFood: whether they eat caloric food every day (yes, no)
-MealsDay: number of meals in a typical day
-Smoke: whether they smoke (yes, no)
-Activity: number of days they have a physical activity per typical week
-ObesityLevel: obesity level (Insufficient_Weight, Normal_Weight, Overweight_Level_I, Overweight_Level_II, Obesity_Type_I, Obesity_Type_II, Obesity_Type_III )
# Creating factor for variable Gender
mydata$GenderFactor <- factor(mydata$Gender,
levels = c(1, 2),
labels = c("Male", "Female"))
str (mydata)
## 'data.frame': 498 obs. of 11 variables:
## $ Age : num 21 21 23 27 22 29 23 22 24 22 ...
## $ Gender : num 2 2 1 1 1 1 2 1 1 1 ...
## $ Height : num 1.62 1.52 1.8 1.8 1.78 1.62 1.5 1.64 1.78 1.72 ...
## $ Weight : num 64 56 77 87 89.8 53 55 53 64 68 ...
## $ Alcohol : chr "no" "Sometimes" "Frequently" "Frequently" ...
## $ CaloricFood : chr "no" "no" "no" "no" ...
## $ MealsDay : num 3 3 3 3 1 3 3 3 3 3 ...
## $ Smoke : chr "no" "yes" "no" "no" ...
## $ Activity : num 0 3 2 2 0 0 1 3 1 1 ...
## $ ObesityLevel: chr "Normal_Weight" "Normal_Weight" "Normal_Weight" "Overweight_Level_I" ...
## $ GenderFactor: Factor w/ 2 levels "Male","Female": 2 2 1 1 1 1 2 1 1 1 ...
# Including only first four columns in a new data frame
mydata2 <- mydata[ , c(1,2,3,4)]
head(mydata2)
## Age Gender Height Weight
## 1 21 2 1.62 64.0
## 2 21 2 1.52 56.0
## 3 23 1 1.80 77.0
## 4 27 1 1.80 87.0
## 5 22 1 1.78 89.8
## 6 29 1 1.62 53.0
# Renaming variable ObesityLevel into ObesityLvl, and showing first 6 rows of last 3 columns
names(mydata)[10] <- "ObesityLvl"
head(mydata[, c(9, 10, 11)])
## Activity ObesityLvl GenderFactor
## 1 0 Normal_Weight Female
## 2 3 Normal_Weight Female
## 3 2 Normal_Weight Male
## 4 2 Overweight_Level_I Male
## 5 0 Overweight_Level_II Male
## 6 0 Normal_Weight Male
# Ordering data by age in ascending order
head(mydata[order(mydata$Age), ], 10)
## Age Gender Height Weight Alcohol CaloricFood MealsDay Smoke Activity ObesityLvl
## 416 14 1 1.71 72 no yes 3 no 2 Normal_Weight
## 117 15 2 1.65 86 no yes 3 no 3 Obesity_Type_I
## 277 16 1 1.67 50 no yes 1 no 1 Insufficient_Weight
## 291 16 1 1.82 71 Sometimes yes 3 no 2 Normal_Weight
## 296 16 2 1.66 58 no no 1 no 0 Normal_Weight
## 303 16 1 1.84 45 Sometimes yes 3 no 3 Insufficient_Weight
## 304 16 2 1.57 49 Sometimes yes 4 no 0 Normal_Weight
## 310 16 2 1.66 58 no no 1 no 0 Normal_Weight
## 313 16 2 1.60 57 no yes 3 no 3 Normal_Weight
## 386 16 2 1.55 45 no yes 3 no 1 Normal_Weight
## GenderFactor
## 416 Male
## 117 Female
## 277 Male
## 291 Male
## 296 Female
## 303 Male
## 304 Female
## 310 Female
## 313 Female
## 386 Female
Since I did not have any missing values in my data set, I did not use functions drop_na and replace_with_na.
# Descriptive statistics for Age, Height, Weight
summary(mydata[ , c(1,3,4)])
## Age Height Weight
## Min. :14.00 Min. :1.450 Min. : 39.00
## 1st Qu.:19.00 1st Qu.:1.613 1st Qu.: 58.00
## Median :21.00 Median :1.680 Median : 67.00
## Mean :23.15 Mean :1.686 Mean : 69.57
## 3rd Qu.:24.00 3rd Qu.:1.750 3rd Qu.: 80.00
## Max. :61.00 Max. :1.980 Max. :173.00
-MEAN: The average age of participants is 23.15, the average height of participants is 1.686m, the average weight of participants is 69.57kg.
-MEDIAN: Half of the participants are up to 21 years old, the others are older. Half of the participants are up to 1.680m tall, the others are taller. Half of the participants weigh up to 67kg, the others weigh more.
-MINIMUM: The youngest participant is 14 years old. The shortest participant is 1.450m tall. The lightest participant weighs 39kg.
-Q1: 25% of participants are up to 19 years old, the others are older. 25% of the participants are up to 1.613m tall, the others are higher. 25% of the participants weigh up to 58kg, the others weigh more.
library(ggplot2)
hist(mydata$Age,
ylab = "Frequency",
xlab = "Age",
main = "Distribution of age",
col = "skyblue",
breaks = seq(from = 10, to = 70, by = 10))
Most participants are between 20 and 30 years old.
library(car)
## Loading required package: carData
scatterplot(y = mydata$MealsDay,
x = mydata$Activity,
ylab = "Meals per day",
xlab = "Active days per week",
smooth = FALSE)
Relationship between variables is positive, meaning that, on average, the more active the participant is, the more meals per day he/she eats.
library(ggplot2)
ggplot(mydata, aes(y = Height, x = GenderFactor)) +
geom_boxplot()
On average, males are taller than females.
Half of the males are up to 1.75m tall, the others are taller.
75% of females are up to 1.65m tall, the others are taller.
I would not characterize the dots as outliers, because it is possible that a male is almost 2m tall and a female 1.8m tall.
library(readxl)
task2 <- read_xlsx("C:/Users/Tamara/Desktop/R data/R Take Home Exam 2024/Task 2/BusinessSchool.xlsx")
task2 <- as.data.frame(task2)
head(task2)
## Student ID Undergrad Degree Undergrad Grade MBA Grade Work Experience Employability (Before)
## 1 1 Business 68.4 90.2 No 252
## 2 2 Computer Science 70.2 68.7 Yes 101
## 3 3 Finance 76.4 83.3 No 401
## 4 4 Business 82.6 88.7 No 287
## 5 5 Finance 76.9 75.4 No 275
## 6 6 Computer Science 83.3 82.1 No 254
## Employability (After) Status Annual Salary
## 1 276 Placed 111000
## 2 119 Placed 107000
## 3 462 Placed 109000
## 4 342 Placed 148000
## 5 347 Placed 255500
## 6 313 Placed 103500
library(ggplot2)
ggplot(task2, aes(x = `Undergrad Degree`)) +
geom_bar(colour = "deeppink2", fill="thistle1") +
labs(title = "Distribution of Undergraduate Degrees",
x = "Undergraduate Degrees",
y = "Frequency")
The most common undergraduate degree is business. There is approximately 35 business graduates.
library(psych)
##
## Attaching package: 'psych'
## The following object is masked from 'package:car':
##
## logit
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
describe(task2$`Annual Salary`)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 100 109058 41501.49 103500 104600.2 25945.5 20000 340000 320000 2.22 9.41 4150.15
library(ggplot2)
ggplot(task2, aes(x = `Annual Salary` )) +
geom_histogram(colour="seagreen", fill="lawngreen")+
labs(y="Frequency", x="Annual Salary", title="Annual Salary Distribution")+
scale_x_continuous(breaks = seq(20000, 340000, 64000))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
The distribution is right-skewed, meaning that the lower salaries are more frequent compared to the higher ones; it is also unimodal.
t.test(task2$`MBA Grade`,
mu=74,
alternative="two.sided")
##
## One Sample t-test
##
## data: task2$`MBA Grade`
## t = 2.6587, df = 99, p-value = 0.00915
## alternative hypothesis: true mean is not equal to 74
## 95 percent confidence interval:
## 74.51764 77.56346
## sample estimates:
## mean of x
## 76.04055
We can reject null hypothesis at p=0.009. We can conclude that the average grade is different from 74. We can be 95% confident that the true average grade is between 74.52 and 77.56.
# Effect size
effectsize::cohens_d(task2$`MBA Grade`, mu=74)
## Cohen's d | 95% CI
## ------------------------
## 0.27 | [0.07, 0.46]
##
## - Deviation from a difference of 74.
effectsize::interpret_cohens_d(0.27, rules="sawilowsky2009")
## [1] "small"
## (Rules: sawilowsky2009)
The effect size is small.
library(readxl)
task3 <- read_xlsx("R Take Home Exam 2024/Task 3/Apartments.xlsx")
task3 <- as.data.frame(task3)
head(task3)
## Age Distance Price Parking Balcony
## 1 7 28 1640 0 1
## 2 18 1 2800 1 0
## 3 7 28 1660 0 0
## 4 28 29 1850 0 1
## 5 18 18 1640 1 1
## 6 28 12 1770 0 1
Description:
task3$Parking <- factor(task3$Parking,
levels = c(0, 1),
labels = c("No", "Yes"))
task3$Balcony <- factor(task3$Balcony,
levels = c(0, 1),
labels = c("No", "Yes"))
head(task3)
## Age Distance Price Parking Balcony
## 1 7 28 1640 No Yes
## 2 18 1 2800 Yes No
## 3 7 28 1660 No No
## 4 28 29 1850 No Yes
## 5 18 18 1640 Yes Yes
## 6 28 12 1770 No Yes
t.test(task3$Price,
mu=1900,
alternative="two.sided")
##
## One Sample t-test
##
## data: task3$Price
## t = 2.9022, df = 84, p-value = 0.004731
## alternative hypothesis: true mean is not equal to 1900
## 95 percent confidence interval:
## 1937.443 2100.440
## sample estimates:
## mean of x
## 2018.941
We can reject null hypothesis at p=0.005. We can conclude that the average price is different from 1900. We can be 95% confident that the true average price is between 1937.443 and 2100.440.
fit1 <- lm(Price ~ Age,
data = task3)
summary(fit1)
##
## Call:
## lm(formula = Price ~ Age, data = task3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -623.9 -278.0 -69.8 243.5 776.1
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2185.455 87.043 25.108 <2e-16 ***
## Age -8.975 4.164 -2.156 0.034 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 369.9 on 83 degrees of freedom
## Multiple R-squared: 0.05302, Adjusted R-squared: 0.04161
## F-statistic: 4.647 on 1 and 83 DF, p-value: 0.03401
-Regression coefficient: If age of the apartment increases by 1 year, the price per m2 decreases by 8.975 eur on average (p=0.034)
-Coefficient of determination: 5.30% of variability in price per m2 is affected by linear effect of age of the apartment.
-Intercept: If age of the apartment is 0, you would expect the price of 2185.45 eur per m2.
cor(task3$Price, task3$Age)
## [1] -0.230255
-Pearson correlation coefficient: Since the value is between 0.1 and 0.3 in absolute terms, the linear relationship between price per m2 and age of the apartment is weak. Since the value is negative, the relationship is negative.
library(car)
scatterplotMatrix(task3[c("Price", "Age", "Distance")],
smooth = FALSE)
Based on the matrix, I believe that there is not a significant problem with multicolinearity, since the values are dispersed and do not tightly follow the line in the graphs showing relationship between age and distance (row 2, column 3). The line is also closer to horizontal than to steep, again indicating that correlation is not strong.
fit2 <- lm(Price ~ Age + Distance,
data = task3)
summary(fit2)
##
## Call:
## lm(formula = Price ~ Age + Distance, data = task3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -603.23 -219.94 -85.68 211.31 689.58
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2460.101 76.632 32.10 < 2e-16 ***
## Age -7.934 3.225 -2.46 0.016 *
## Distance -20.667 2.748 -7.52 6.18e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 286.3 on 82 degrees of freedom
## Multiple R-squared: 0.4396, Adjusted R-squared: 0.4259
## F-statistic: 32.16 on 2 and 82 DF, p-value: 4.896e-11
vif(fit2)
## Age Distance
## 1.001845 1.001845
mean(vif(fit2))
## [1] 1.001845
Since the VIF statistics’ value is lower than 5 and its average is close to 1, the multicolinearity is not an issue and we should not remove any of the variables.
task3$StdResid <- round(rstandard(fit2), 3)
task3$CooksD <- round(cooks.distance(fit2), 3)
hist(task3$StdResid,
xlab = "Standardized residuals",
ylab = "Frequency",
main = "Histogram of standardized residuals")
head(task3[order(-task3$StdResid),],5)
## Age Distance Price Parking Balcony StdResid CooksD
## 38 5 45 2180 Yes Yes 2.577 0.320
## 33 2 11 2790 Yes No 2.051 0.069
## 2 18 1 2800 Yes No 1.783 0.030
## 61 18 1 2800 Yes Yes 1.783 0.030
## 58 8 2 2820 Yes No 1.655 0.037
head(task3[order(task3$StdResid),],5)
## Age Distance Price Parking Balcony StdResid CooksD
## 53 7 2 1760 No Yes -2.152 0.066
## 13 12 14 1650 No Yes -1.499 0.013
## 72 12 14 1650 No No -1.499 0.013
## 20 13 8 1800 No No -1.381 0.012
## 35 14 16 1660 No Yes -1.261 0.008
There aren’t any outliers (no units with standardized residuals bigger than 3 in absolute value).
hist(task3$CooksD,
xlab = "Cooks distance",
ylab = "Frequency",
main = "Histogram of Cooks distance")
There is a large gap between 0.15 and 0.30, indicating that the units appearing after this gap are possibly units with high impact.
head(task3[order(-task3$CooksD),])
## Age Distance Price Parking Balcony StdResid CooksD
## 38 5 45 2180 Yes Yes 2.577 0.320
## 55 43 37 1740 No No 1.445 0.104
## 33 2 11 2790 Yes No 2.051 0.069
## 53 7 2 1760 No Yes -2.152 0.066
## 22 37 3 2540 Yes Yes 1.576 0.061
## 39 40 2 2400 No Yes 1.091 0.038
summary(task3[ ,c(1,2,3)])
## Age Distance Price
## Min. : 1.00 Min. : 1.00 Min. :1400
## 1st Qu.:12.00 1st Qu.: 4.00 1st Qu.:1710
## Median :18.00 Median :12.00 Median :1950
## Mean :18.55 Mean :14.22 Mean :2019
## 3rd Qu.:24.00 3rd Qu.:20.00 3rd Qu.:2290
## Max. :45.00 Max. :45.00 Max. :2820
Since the value of the row 38 has a large value of Cooks distance, it might be a high impact value. Moreover, this unit has the combination of the largest possible distance from the city centre (45 km) and higher than average price, which is also strange. Therefore, I will consider that this unit is with high impact and remove it.
task3NEW <- task3[-38, ]
# Repeating to check
hist(task3NEW$CooksD,
xlab = "Cooks distance",
ylab = "Frequency",
main = "Histogram of Cooks distance")
There is still a relatively large gap between 0.07 and 0.10.
head(task3NEW[order(-task3NEW$CooksD),])
## Age Distance Price Parking Balcony StdResid CooksD
## 55 43 37 1740 No No 1.445 0.104
## 33 2 11 2790 Yes No 2.051 0.069
## 53 7 2 1760 No Yes -2.152 0.066
## 22 37 3 2540 Yes Yes 1.576 0.061
## 39 40 2 2400 No Yes 1.091 0.038
## 58 8 2 2820 Yes No 1.655 0.037
summary(task3NEW[ ,c(1,2,3)])
## Age Distance Price
## Min. : 1.00 Min. : 1.00 Min. :1400
## 1st Qu.:12.00 1st Qu.: 4.00 1st Qu.:1710
## Median :18.00 Median :12.00 Median :1930
## Mean :18.71 Mean :13.86 Mean :2017
## 3rd Qu.:24.25 3rd Qu.:19.25 3rd Qu.:2290
## Max. :45.00 Max. :40.00 Max. :2820
The value of the row 55 now has the largest value of Cooks distance. However, I do not believe it has a strange combination of values: it is almost the oldest apartment, it is relatively far away from the city centre compared to others, and its price is a little bit above the minimum (maybe due to its size). Thus, I do not consider it as a high impact unit and I will not remove it.
fit2 <- lm(Price ~ Age + Distance, data = task3NEW)
task3NEW$StdResid <- round(rstandard(fit2), 3)
task3NEW$StdFitted <- scale(fit2$fitted.values)
library(car)
scatterplot(y = task3NEW$StdResid, x = task3NEW$StdFitted,
ylab = "Standardized residuals",
xlab = "Standardized fitted values",
boxplots = FALSE,
regLine = FALSE,
smooth = FALSE)
As I do not see any curves evidently, I would conclude that there is no problem with non-linearity, but there might be a problem with heteroskedasticity because variance seems to be a bit higher on the right side of the graph. I will check for heteroskedasticity in the following chunk.
library(olsrr)
##
## Attaching package: 'olsrr'
## The following object is masked from 'package:datasets':
##
## rivers
ols_test_breusch_pagan(fit2)
##
## Breusch Pagan Test for Heteroskedasticity
## -----------------------------------------
## Ho: the variance is constant
## Ha: the variance is not constant
##
## Data
## ---------------------------------
## Response : Price
## Variables: fitted values of Price
##
## Test Summary
## -----------------------------
## DF = 1
## Chi2 = 2.927455
## Prob > Chi2 = 0.08708469
We can’t reject the null hypothesis (p=0.08 which is >0.05). Therefore, we can assume homoskedasticity.
task3NEW$StdResid <- round(rstandard(fit2), 3)
task3NEW$CooksD <- round(cooks.distance(fit2), 3)
hist(task3NEW$StdResid,
xlab = "Standardized residuals",
ylab = "Frequency",
main = "Histogram of standardized residuals")
shapiro.test(task3NEW$StdResid)
##
## Shapiro-Wilk normality test
##
## data: task3NEW$StdResid
## W = 0.95649, p-value = 0.006355
The graph seems to be slightly right-skewed. Also, based on Shapiro-Wilk test, we can reject the null hypothesis (p=0.006) and conclude that the standardized residuals are not distributed normally. However, our sample size is large (>30), so this should not be a problem.
fit2 <- lm(Price ~ Age + Distance,
data = task3NEW)
summary(fit2)
##
## Call:
## lm(formula = Price ~ Age + Distance, data = task3NEW)
##
## Residuals:
## Min 1Q Median 3Q Max
## -604.92 -229.63 -56.49 192.97 599.35
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2456.076 73.931 33.221 < 2e-16 ***
## Age -6.464 3.159 -2.046 0.044 *
## Distance -22.955 2.786 -8.240 2.52e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 276.1 on 81 degrees of freedom
## Multiple R-squared: 0.4838, Adjusted R-squared: 0.4711
## F-statistic: 37.96 on 2 and 81 DF, p-value: 2.339e-12
-Regression coefficient:
If age of the apartment increases by 1 year, the price per m2 decreases by 6.46 eur on average (p=0.044), assuming distance is constant.
If distance from the city centre increases by 1km, the price per m2 decreases by 22.96 eur on average (p<0.001), assuming age is constant.
-Coefficient of determination: 48.38% of variability in price per m2 is affected by linear effect of age and distance.
-Intercept: If age and distance of the apartment from the city centre are both 0, you would expect the price of 2456.076 eur per m2.
sqrt(summary(fit2)$r.squared)
## [1] 0.6955609
-Multiple correlation coefficient: Since the value is between 0.3 and 0.7 in absolute terms, the linear relationship of price with age and distance is semi-strong.
library(Hmisc)
##
## Attaching package: 'Hmisc'
## The following object is masked from 'package:psych':
##
## describe
## The following objects are masked from 'package:base':
##
## format.pval, units
rcorr(as.matrix(task3NEW[ ,c(1,2,3)]))
## Age Distance Price
## Age 1.00 0.09 -0.23
## Distance 0.09 1.00 -0.68
## Price -0.23 -0.68 1.00
##
## n= 84
##
##
## P
## Age Distance Price
## Age 0.3963 0.0387
## Distance 0.3963 0.0000
## Price 0.0387 0.0000
-Pearson correlation coefficient:
There is a weak negative linear relationship between price and age.
There is a semi-strong negative linear relationship between price and distance.
fit3 <- lm(Price ~ Age + Distance + Parking + Balcony,
data = task3NEW)
anova(fit2, fit3)
## Analysis of Variance Table
##
## Model 1: Price ~ Age + Distance
## Model 2: Price ~ Age + Distance + Parking + Balcony
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 81 6176767
## 2 79 5654480 2 522287 3.6485 0.03051 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We can reject the null hypothesis (p=0.03). We can conclude, with 95% confidence, that fit3 fits data better than fit2.
summary(fit3)
##
## Call:
## lm(formula = Price ~ Age + Distance + Parking + Balcony, data = task3NEW)
##
## Residuals:
## Min 1Q Median 3Q Max
## -473.21 -192.37 -28.89 204.17 558.77
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2329.724 93.066 25.033 < 2e-16 ***
## Age -5.821 3.074 -1.894 0.06190 .
## Distance -20.279 2.886 -7.026 6.66e-10 ***
## ParkingYes 167.531 62.864 2.665 0.00933 **
## BalconyYes -15.207 59.201 -0.257 0.79795
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 267.5 on 79 degrees of freedom
## Multiple R-squared: 0.5275, Adjusted R-squared: 0.5035
## F-statistic: 22.04 on 4 and 79 DF, p-value: 3.018e-12
Given the values of the other independent variables, the apartments with parking are on average by 167.53 eur per m2 more expensive in comparison to the apartments without parking (p=0.009).
Since the p-value for the variable Balcony is not significant, it does not make sense to interpret it. There is not enough evidence to conclude that the presence of balcony has a meaningful effect on price.
-F-statistic:
H0:βAge=βDistance=βParkingYes=βBalconyYes=0
H1:At least one βj is different from 0.
We can conclude that at least one of the independent variables significantly affects price per m2 (p<0.001).
task3NEW$FittedValues <- fitted(fit2)
task3NEW$Residuals <- residuals(fit2)
residual_ID2 <- task3NEW$Residuals[2]
print(residual_ID2)
## [1] 483.2291