output: html_document: default pdf_document: default —
#title: "HOMEWORK"
##author: "Vita Balmazovic"
##date: "2023-01-09"
data("mtcars")
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
The mtcars dataset has 32 units of observations. The following variables are categorical: vs, am. All other variables in the mtcars are numeric.These variables have the following units of measurement:
mpg: Miles per gallon. cyl: Number of cylinders. disp: Displacement, in cubic inches. hp: Horsepower. drat: Rear axle ratio. wt: Weight, in thousands of pounds. qsec: 1/4 mile time, in seconds. vs: V/S (0 = V-engine, 1 = straight engine). am: Transmission (0 = automatic, 1 = manual). gear: Number of gears. carb: Number of carburetors.
The dataset presented is built into the R Studio dataset and was retrieved January 7th, 2023.
The main goal of the analysis of this data was to see which type of car has the best statistics on average.
#Let’s say that I want to add a new variable called efficiency of the cars.
mtcars$efficiency <- mtcars$mpg / mtcars$hp
#Let’s say I want to correct a value that was mistakenly written for the 10th car
mtcars[10,1] <-20.8
print(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
## Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
## Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
## Merc 280 20.8 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
## Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
## Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
## Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
## Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
## Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
## Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
## Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
## Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
## Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
## Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
## AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
## Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
## Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
## Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
## Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
## Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
## Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
## Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
## Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
## Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
## efficiency
## Mazda RX4 0.19090909
## Mazda RX4 Wag 0.19090909
## Datsun 710 0.24516129
## Hornet 4 Drive 0.19454545
## Hornet Sportabout 0.10685714
## Valiant 0.17238095
## Duster 360 0.05836735
## Merc 240D 0.39354839
## Merc 230 0.24000000
## Merc 280 0.15609756
## Merc 280C 0.14471545
## Merc 450SE 0.09111111
## Merc 450SL 0.09611111
## Merc 450SLC 0.08444444
## Cadillac Fleetwood 0.05073171
## Lincoln Continental 0.04837209
## Chrysler Imperial 0.06391304
## Fiat 128 0.49090909
## Honda Civic 0.58461538
## Toyota Corolla 0.52153846
## Toyota Corona 0.22164948
## Dodge Challenger 0.10333333
## AMC Javelin 0.10133333
## Camaro Z28 0.05428571
## Pontiac Firebird 0.10971429
## Fiat X1-9 0.41363636
## Porsche 914-2 0.28571429
## Lotus Europa 0.26902655
## Ford Pantera L 0.05984848
## Ferrari Dino 0.11257143
## Maserati Bora 0.04477612
## Volvo 142E 0.19633028
#Let’s say that due to a mistake in timing, each qsec gets additional second on the measurement.
mtcars$`corrected qsec`<-mtcars$`qsec` + 1
#Let’s say I want to remove the incorrect car variable.
mtcars1 <- mtcars[ , -3]
head(mtcars1)
## mpg cyl hp drat wt qsec vs am gear carb efficiency
## Mazda RX4 21.0 6 110 3.90 2.620 16.46 0 1 4 4 0.1909091
## Mazda RX4 Wag 21.0 6 110 3.90 2.875 17.02 0 1 4 4 0.1909091
## Datsun 710 22.8 4 93 3.85 2.320 18.61 1 1 4 1 0.2451613
## Hornet 4 Drive 21.4 6 110 3.08 3.215 19.44 1 0 3 1 0.1945455
## Hornet Sportabout 18.7 8 175 3.15 3.440 17.02 0 0 3 2 0.1068571
## Valiant 18.1 6 105 2.76 3.460 20.22 1 0 3 1 0.1723810
## corrected qsec
## Mazda RX4 17.46
## Mazda RX4 Wag 18.02
## Datsun 710 19.61
## Hornet 4 Drive 20.44
## Hornet Sportabout 18.02
## Valiant 21.22
#Let’s say I want to change the name of the column that says drat.
colnames(mtcars1)[5] ="rar"
#DESCRIPTIVE STATISTICS
library(psych)
round(describe(mtcars1),1)
## vars n mean sd median trimmed mad min max range skew
## mpg 1 32 20.1 6.0 19.4 19.8 5.6 10.4 33.9 23.5 0.6
## cyl 2 32 6.2 1.8 6.0 6.2 3.0 4.0 8.0 4.0 -0.2
## hp 3 32 146.7 68.6 123.0 141.2 77.1 52.0 335.0 283.0 0.7
## drat 4 32 3.6 0.5 3.7 3.6 0.7 2.8 4.9 2.2 0.3
## rar 5 32 3.2 1.0 3.3 3.2 0.8 1.5 5.4 3.9 0.4
## qsec 6 32 17.8 1.8 17.7 17.8 1.4 14.5 22.9 8.4 0.4
## vs 7 32 0.4 0.5 0.0 0.4 0.0 0.0 1.0 1.0 0.2
## am 8 32 0.4 0.5 0.0 0.4 0.0 0.0 1.0 1.0 0.4
## gear 9 32 3.7 0.7 4.0 3.6 1.5 3.0 5.0 2.0 0.5
## carb 10 32 2.8 1.6 2.0 2.7 1.5 1.0 8.0 7.0 1.1
## efficiency 11 32 0.2 0.1 0.2 0.2 0.1 0.0 0.6 0.5 1.2
## corrected qsec 12 32 18.8 1.8 18.7 18.8 1.4 15.5 23.9 8.4 0.4
## kurtosis se
## mpg -0.4 1.1
## cyl -1.8 0.3
## hp -0.1 12.1
## drat -0.7 0.1
## rar 0.0 0.2
## qsec 0.3 0.3
## vs -2.0 0.1
## am -1.9 0.1
## gear -1.1 0.1
## carb 1.3 0.3
## efficiency 0.5 0.0
## corrected qsec 0.3 0.3
summary(mtcars1)
## mpg cyl hp drat
## Min. :10.40 Min. :4.000 Min. : 52.0 Min. :2.760
## 1st Qu.:15.43 1st Qu.:4.000 1st Qu.: 96.5 1st Qu.:3.080
## Median :19.45 Median :6.000 Median :123.0 Median :3.695
## Mean :20.14 Mean :6.188 Mean :146.7 Mean :3.597
## 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:180.0 3rd Qu.:3.920
## Max. :33.90 Max. :8.000 Max. :335.0 Max. :4.930
## rar qsec vs am
## Min. :1.513 Min. :14.50 Min. :0.0000 Min. :0.0000
## 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000 1st Qu.:0.0000
## Median :3.325 Median :17.71 Median :0.0000 Median :0.0000
## Mean :3.217 Mean :17.85 Mean :0.4375 Mean :0.4062
## 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :5.424 Max. :22.90 Max. :1.0000 Max. :1.0000
## gear carb efficiency corrected qsec
## Min. :3.000 Min. :1.000 Min. :0.04478 Min. :15.50
## 1st Qu.:3.000 1st Qu.:2.000 1st Qu.:0.08944 1st Qu.:17.89
## Median :4.000 Median :2.000 Median :0.15041 Median :18.71
## Mean :3.688 Mean :2.812 Mean :0.19055 Mean :18.85
## 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:0.24129 3rd Qu.:19.90
## Max. :5.000 Max. :8.000 Max. :0.58462 Max. :23.90
##MEANS: ###Mean of mpg is 20.09 ### Mean of cyl is 6.19 ### Mean of hp is 146.69 ### Mean of drat is 3.60 ### Mean of rar is 3.22 ### Mean of qsec is 17.85 ###Mean of vs: 0.44 ###Mean of am: 0.41 ###Mean of gear: 3.69 ###Mean of carb: 2.81 ###Mean of corrected qsec is 18.85
##RANGES: ###Range of mpg: 10.4 to 33.9 ###Range of cyl: 4.0 to 8.0 ###Range of hp: 52.0 to 335.0 ###Range of drat: 2.76 to 4.93 ###Range of wt: 1.513 to 5.424 ###Range of qsec: 14.5 to 22.9 ###Range of vs: 0.0 to 1.0 ###Range of am: 0.0 to 1.0 ###Range of gear: 3.0 to 5.0 ###Range of carb: 1.0 to 8.0 ###Range of corrected qsec: 15.5 to 23.9
`
library(ggplot2)
##
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
##
## %+%, alpha
library(corrplot)
## corrplot 0.92 loaded
ggplot(mtcars, aes(x = mpg)) +
geom_histogram(bins = 20)
####The x-axis of the histogram represents the values of the mpg
variable being plotted. ####The y-axis represents the frequency of those
values. ####The bars show the range of values within each group of
values. The height of each bar reflects the frequency of these values.
####From the shape of the histogram we cannot say it is normally
distributed.
mtcars_corr <- cor(mtcars)
corrplot(mtcars_corr, method = "ellipse")
####The x-axis and y-axis show the names of the variables in the mtcars dataset. #Each point represents the correlation between two variables. The color and shape of the points indicate the strength and direction of the correlation. A red point indicates a strong negative correlation, while a blue point indicates a strong positive correlation. #For example, mpg variable is negatively correlated with the disp (displacement) and hp (horsepower) variables, indicating that cars with higher mileages tend to have lower displacements and horsepower.