| title: “Data Science 100 Assignment 2” |
| author: “Ross Ciancio” |
| date: “28/01/2020” |
| output: html_document |
Below you will find a list of functions and what they do. Use any number of these functions to perform a descriptive analysis of the mtcars data set.
carsmat <- as.matrix(cars)
length(cars)
## [1] 11
mean(cars[, c("wt")])
## [1] 3.21725
min(cars[, c("mpg")])
## [1] 10.4
range(cars[, c("cyl")])
## [1] 4 8
mean(cars[, c("mpg")])
## [1] 20.09062
sd(cars[, c("disp")])
## [1] 123.9387
Use the command plot(cars[, c(___)]) to create a bunch of plots. Note that the c(___) indicates a list of the columns you wish to include. Select 4 or 5 of the most interesting values to investigate.
plot(cars[, c("mpg")])
plot(cars[, c("hp")])
plot(cars[, c("mpg","wt","hp","drat")])
We can use the command “plot(A,B)” to make a scatterplot of column A versus column B. Do an exploratory analysis on the weight of the car versus the miles per gallon of the car. Form a hypothesis that you could test.
plot(cars[, c("wt","mpg")])
After plotting the two columns, it can be plausible to formulate a hypothesis that states that the weight of cars and their miles per gallon are inversely correlated.
Test the hypothesis that you formed in 3. You may need to consult your instructor as to what r commands you may need.
cor(cars[, c("wt","mpg")])
## wt mpg
## wt 1.0000000 -0.8676594
## mpg -0.8676594 1.0000000
min(cars[, c("mpg")])
## [1] 10.4
min(cars[, c("wt")])
## [1] 1.513
max(cars[, c("mpg")])
## [1] 33.9
max(cars[, c("wt")])
## [1] 5.424
plot(cars[, c("wt","mpg")])
These were just trials^
x <- (cars[, c("wt")])
y <- (cars[, c("mpg")])
LinearEquation <- lm(y~x)
#Summary statistics
summary(LinearEquation)
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.5432 -2.3647 -0.1252 1.4096 6.8727
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.2851 1.8776 19.858 < 2e-16 ***
## x -5.3445 0.5591 -9.559 1.29e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared: 0.7528, Adjusted R-squared: 0.7446
## F-statistic: 91.38 on 1 and 30 DF, p-value: 1.294e-10
#Generate range of 30 numbers starting from 0 and ending at 5
XSequence <- seq(0,30, length=5)
#Plot
plot(x,y, xlab="x axis", ylab="y axis", main="Regression", ylim=c(0,30), xlim=c(0,5), pch=15, col="blue")
#Add trend lines to the plot
lines(XSequence, predict(LinearEquation, data.frame(x=XSequence)), col="red")
As shown above, the linear regression equation follows a trajectory with a negative slope. This indicates that the two variables are inversely correlated to one another. That would make sense; as the weight of the car increases, the fuel efficiency of the vehicles worsens.