We’ll set up the main lines of code needed to begin the assignment.
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
?mtcars()
## starting httpd help server ... done
data(mtcars)
summary(mtcars)
## mpg cyl disp hp
## Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
## 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
## Median :19.20 Median :6.000 Median :196.3 Median :123.0
## Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
## 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
## Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
## drat wt qsec vs
## Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
## 1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
## Median :3.695 Median :3.325 Median :17.71 Median :0.0000
## Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
## 3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
## Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
## am gear carb
## Min. :0.0000 Min. :3.000 Min. :1.000
## 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
## Median :0.0000 Median :4.000 Median :2.000
## Mean :0.4062 Mean :3.688 Mean :2.812
## 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :1.0000 Max. :5.000 Max. :8.000
class(mtcars)
## [1] "data.frame"
dim(mtcars)
## [1] 32 11
Setting up the variables needed for continuation. While also graphing plots for deeper research:
## corrplot 0.95 loaded
## [1] 2.520227
## mpg cyl disp hp drat wt
## mpg 1.0000000 -0.8521620 -0.8475514 -0.7761684 0.68117191 -0.8676594
## cyl -0.8521620 1.0000000 0.9020329 0.8324475 -0.69993811 0.7824958
## disp -0.8475514 0.9020329 1.0000000 0.7909486 -0.71021393 0.8879799
## hp -0.7761684 0.8324475 0.7909486 1.0000000 -0.44875912 0.6587479
## drat 0.6811719 -0.6999381 -0.7102139 -0.4487591 1.00000000 -0.7124406
## wt -0.8676594 0.7824958 0.8879799 0.6587479 -0.71244065 1.0000000
## qsec 0.4186840 -0.5912421 -0.4336979 -0.7082234 0.09120476 -0.1747159
## vs 0.6640389 -0.8108118 -0.7104159 -0.7230967 0.44027846 -0.5549157
## am 0.5998324 -0.5226070 -0.5912270 -0.2432043 0.71271113 -0.6924953
## gear 0.4802848 -0.4926866 -0.5555692 -0.1257043 0.69961013 -0.5832870
## carb -0.5509251 0.5269883 0.3949769 0.7498125 -0.09078980 0.4276059
## qsec vs am gear carb
## mpg 0.41868403 0.6640389 0.59983243 0.4802848 -0.55092507
## cyl -0.59124207 -0.8108118 -0.52260705 -0.4926866 0.52698829
## disp -0.43369788 -0.7104159 -0.59122704 -0.5555692 0.39497686
## hp -0.70822339 -0.7230967 -0.24320426 -0.1257043 0.74981247
## drat 0.09120476 0.4402785 0.71271113 0.6996101 -0.09078980
## wt -0.17471588 -0.5549157 -0.69249526 -0.5832870 0.42760594
## qsec 1.00000000 0.7445354 -0.22986086 -0.2126822 -0.65624923
## vs 0.74453544 1.0000000 0.16834512 0.2060233 -0.56960714
## am -0.22986086 0.1683451 1.00000000 0.7940588 0.05753435
## gear -0.21268223 0.2060233 0.79405876 1.0000000 0.27407284
## carb -0.65624923 -0.5696071 0.05753435 0.2740728 1.00000000
Now we’ll check if any data is invalid, while providing proof of observation:
## mpg cyl disp hp drat wt qsec vs am gear
## Mazda RX4 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## Mazda RX4 Wag FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## Datsun 710 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## Hornet 4 Drive FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## Hornet Sportabout FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## Valiant FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## Duster 360 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## Merc 240D FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## Merc 230 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## Merc 280 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## Merc 280C FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## Merc 450SE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## Merc 450SL FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## Merc 450SLC FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## Cadillac Fleetwood FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## Lincoln Continental FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## Chrysler Imperial FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## Fiat 128 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## Honda Civic FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## Toyota Corolla FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## Toyota Corona FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## Dodge Challenger FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## AMC Javelin FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## Camaro Z28 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## Pontiac Firebird FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## Fiat X1-9 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## Porsche 914-2 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## Lotus Europa FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## Ford Pantera L FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## Ferrari Dino FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## Maserati Bora FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## Volvo 142E FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## carb
## Mazda RX4 FALSE
## Mazda RX4 Wag FALSE
## Datsun 710 FALSE
## Hornet 4 Drive FALSE
## Hornet Sportabout FALSE
## Valiant FALSE
## Duster 360 FALSE
## Merc 240D FALSE
## Merc 230 FALSE
## Merc 280 FALSE
## Merc 280C FALSE
## Merc 450SE FALSE
## Merc 450SL FALSE
## Merc 450SLC FALSE
## Cadillac Fleetwood FALSE
## Lincoln Continental FALSE
## Chrysler Imperial FALSE
## Fiat 128 FALSE
## Honda Civic FALSE
## Toyota Corolla FALSE
## Toyota Corona FALSE
## Dodge Challenger FALSE
## AMC Javelin FALSE
## Camaro Z28 FALSE
## Pontiac Firebird FALSE
## Fiat X1-9 FALSE
## Porsche 914-2 FALSE
## Lotus Europa FALSE
## Ford Pantera L FALSE
## Ferrari Dino FALSE
## Maserati Bora FALSE
## Volvo 142E FALSE
## [1] 0
## mpg cyl disp hp drat wt qsec vs am gear carb
## 0 0 0 0 0 0 0 0 0 0 0
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [16] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [31] TRUE TRUE
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
## [1] mpg cyl disp hp drat wt qsec vs am gear carb
## <0 rows> (or 0-length row.names)
Let’s build a linear regression, and write down our results:
## [1] 4.609201
## Warning in geom_histogram(binwidth = 5, fill = "black", alpha = 0.7): All aesthetics have length 1, but the data has 32 rows.
## ℹ Please consider using `annotate()` or provide this layer with data containing
## a single row.
## [1] 6 11
## mpg cyl disp hp drat
## Min. :10.40 Min. :8 Min. :301.0 Min. :205.0 Min. :2.930
## 1st Qu.:11.12 1st Qu.:8 1st Qu.:352.5 1st Qu.:218.8 1st Qu.:3.053
## Median :13.80 Median :8 Median :400.0 Median :237.5 Median :3.220
## Mean :13.02 Mean :8 Mean :397.2 Mean :245.8 Mean :3.273
## 3rd Qu.:14.60 3rd Qu.:8 3rd Qu.:455.0 3rd Qu.:245.0 3rd Qu.:3.462
## Max. :15.00 Max. :8 Max. :472.0 Max. :335.0 Max. :3.730
## wt qsec vs am gear
## Min. :3.570 Min. :14.60 Min. :0 Min. :0.0000 Min. :3.000
## 1st Qu.:3.638 1st Qu.:15.52 1st Qu.:0 1st Qu.:0.0000 1st Qu.:3.000
## Median :4.545 Median :16.63 Median :0 Median :0.0000 Median :3.000
## Mean :4.500 Mean :16.51 Mean :0 Mean :0.1667 Mean :3.333
## 3rd Qu.:5.321 3rd Qu.:17.72 3rd Qu.:0 3rd Qu.:0.0000 3rd Qu.:3.000
## Max. :5.424 Max. :17.98 Max. :0 Max. :1.0000 Max. :5.000
## carb
## Min. :4.000
## 1st Qu.:4.000
## Median :4.000
## Mean :4.667
## 3rd Qu.:4.000
## Max. :8.000
Question 3.3: Looking at our ggplot, we can see a lot from it. To begin we are only looking at mpg via the X axis. The charts determine the MPG gallons compared to gallon amount. MPG gallons are huge in the 20 gallons margin. Second place being 15 gallons. And third place being 25 gallons. Down the middle if you will. Looking at the dataset mtcars, the numbers add up according to mpg. Since that’s our main factor. I added another ggplot, this time with disp as our main factor. The numbers add up, but the difference is uncanny. Of course due to difference in what each factor is, yet shows how each variable is different.
Question 3.4: Now let’s do the same thing, but use mpg_pred (predicts mpg) instead of just mpg on the data set. There is a difference via the y axis. 20 gallons stayed the same, as well as the other Top 3 rankings. But what you might notice, is all 3 changed in y. From the original ggplot, 20 gallons was at a good 10.5 in gallon amount. After the predict, it’s now at just 10. 15 gallons was at a high 9.5. One predict later, it’s down 1.5 aka 8 gallon amount. And 25 saw an increase since, it was at 4 gallon amount before the predict. Now sits at almost 5.5 gallon amount. What we can see from this, is 25 gallons grew in the amount people put in their cars. While our Top2 place holders decreased.
Question 3.5: My third ggplot uses a coefficient to see an interaction. I choose am since it’s close to mpg based on data searching. When running the code, it takes up most of the screen. From the way I interupt this, the data is high in all numbers. Hence the full screen.
Question 3.6: According to my data, no outliers are in the dataset.
Question 3.7: I do say yes. It does improve the predictability thanks to R^2.