diabetes dataset from the faraway package..Rmd file. Do include the command to load the package into your environment.#install.packages(faraway)
library(faraway)
x = diabetes$hdl
mean(x, na.rm = TRUE)
## [1] 50.44527
mean(subset(diabetes, gender == "female")$hdl)
## [1] 52.11111
Create a scatter plot of total cholesterol (y − axis) vs weight (x − axis). Use a non-default color for the points.
plot(chol ~ weight, data = diabetes,
xlab = "Weight",
ylab = "Total Cholesterol",
main = "Total Cholesterol vs Weight",
pch = 20,
cex = 2,
col = "lightpink")
boxplot(hdl ~ gender, data = diabetes,
xlab = "Gender",
ylab = "High-Density Lipoprotein",
main = "HDL vs Gender",
pch = 20,
cex = 2,
col = "lightpink",
border = "darkgreen")
nutrition.csv provided with the homework handle.It contains the nutritional values per serving size for a large variety of foods as calculated by the USDA.
nutrition <- read.csv("~/Downloads/nutrition.csv")
summary(nutrition)
## ID Desc Water Calories
## Min. : 1001 Length:5138 Min. : 0.00 Min. : 0.0
## 1st Qu.: 7925 Class :character 1st Qu.: 20.59 1st Qu.: 75.0
## Median :11800 Mode :character Median : 64.58 Median :177.0
## Mean :14271 Mean : 54.20 Mean :223.1
## 3rd Qu.:18968 3rd Qu.: 81.30 3rd Qu.:347.0
## Max. :93600 Max. :100.00 Max. :902.0
## Protein Fat Carbs Fiber
## Min. : 0.000 Min. : 0.000 Min. : 0.00 Min. : 0.000
## 1st Qu.: 1.710 1st Qu.: 0.590 1st Qu.: 1.55 1st Qu.: 0.000
## Median : 6.700 Median : 3.955 Median : 11.15 Median : 0.800
## Mean : 9.961 Mean : 10.313 Mean : 23.59 Mean : 2.342
## 3rd Qu.:16.500 3rd Qu.: 12.280 3rd Qu.: 38.35 3rd Qu.: 2.775
## Max. :88.320 Max. :100.000 Max. :100.00 Max. :79.000
## Sugar Calcium Potassium Sodium
## Min. : 0.000 Min. : 0.00 Min. : 0.0 Min. : 0.0
## 1st Qu.: 0.000 1st Qu.: 10.00 1st Qu.: 118.0 1st Qu.: 27.0
## Median : 2.330 Median : 24.00 Median : 210.0 Median : 113.0
## Mean : 9.000 Mean : 85.41 Mean : 275.6 Mean : 343.9
## 3rd Qu.: 9.795 3rd Qu.: 75.00 3rd Qu.: 328.0 3rd Qu.: 440.8
## Max. :99.800 Max. :7364.00 Max. :16500.0 Max. :38758.0
## VitaminC Chol Portion
## Min. : 0.00 Min. : 0.00 Length:5138
## 1st Qu.: 0.00 1st Qu.: 0.00 Class :character
## Median : 0.10 Median : 0.00 Mode :character
## Mean : 10.38 Mean : 32.44
## 3rd Qu.: 4.20 3rd Qu.: 53.00
## Max. :2732.00 Max. :3100.00
NROW(nutrition)
## [1] 5138
Calories. Do not modify R’s default bin selection. Make the plot presentable.library(readr)
nutrition = read_csv("nutrition.csv")
## Rows: 5138 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Desc, Portion
## dbl (13): ID, Water, Calories, Protein, Fat, Carbs, Fiber, Sugar, Calcium, P...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
hist(nutrition$Calories,
xlab = "Calories",
main = "Histogram of Calories for Various Foods",
border = "darkgreen",
col = "lightpink")
Calories’ histogram is right-skewed, and approximately unimodal.plot(Calories ~ I(4 * Protein + 4 * Carbs + 9 * Fat + 2 * Fiber), data = nutrition,
xlab = "Protein",
ylab = "Calories",
main = "Calories vs Protein",
pch = 20,
cex = 1,
col = "lightpink")
##HW-Q 3: For the following questions, we will use the data Advertising from the ISLR2 package using sales as the response \((y_i)\) and TV as the predictor \((x_i)\).
RSS \(=\)\(\sum^n_{i=1}\)\((y_i-\hat{y_i})^2\)
For the given data, please calculate the above formula.
Advertising <- read.csv("~/Downloads/Advertising.csv")
yi = Advertising$sales
xi = Advertising$TV
model1 = lm(yi ~ xi)
summary(model1)
##
## Call:
## lm(formula = yi ~ xi)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.3860 -1.9545 -0.1913 2.0671 7.2124
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.032594 0.457843 15.36 <2e-16 ***
## xi 0.047537 0.002691 17.67 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.259 on 198 degrees of freedom
## Multiple R-squared: 0.6119, Adjusted R-squared: 0.6099
## F-statistic: 312.1 on 1 and 198 DF, p-value: < 2.2e-16
RSS = deviance(model1)
RSS
## [1] 2102.531
TSS \(=\)\(\sum^n_{i=1}\)\((y_i-\bar{y_i})^2\)
is also known as the total sum of squares and it is used to calculated \(R^2\). For the given data, calculated the TSS.
R_sqr = 0.6119
TSS = RSS / (1 - R_sqr)
TSS
## [1] 5417.497
\(r^2 =\) \((Cor(x,y))^2 =\) \((\frac{\sum^n_{i=1}(x_i-\bar{x})(y_i-\bar{y})}{{\sqrt{\sum^n_{i=1}(x_i-\bar{x})^2}}\sqrt{\sum^n_{i=1}(y_i-\bar{y})^2}})^2\)
Using the data given in this exercise, please calculate the above quantity
R_sqr = summary(model1)$r.squared
R_sqr
## [1] 0.6118751