library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3 ✓ purrr 0.3.4
## ✓ tibble 3.0.6 ✓ dplyr 1.0.4
## ✓ tidyr 1.1.2 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
data(mpg)
ggplot(data = mpg)+
geom_point(mapping=aes(x=displ,y=hwy))
ggplot(mpg,aes(displ,hwy,color=class))+
geom_point()
ggplot(mpg, aes(x=displ,y=hwy,color=hwy>40))+
geom_point()
ggplot(mpg, aes(x=displ,y=hwy,color=displ<6))+
geom_point()
ggplot(data=mpg)+aes(x=displ,y=hwy,color=class)+geom_point()+facet_wrap(.~year)
ggplot(mpg, aes(x=drv,y=hwy,color=class))+
geom_point()
Using tidyverse library , gglpot package and mpg data set I have created various scatter plots which indicates teh relationship between variius varaibales. The graph also helps to identufy the outliners and its causes which I have mentioned in the graphs created to highlight the outliners. The only thing I cound not do was how to color them red.
The data source for mpg dataset is from the website https://fueleconomy.gov/. The fuel economy website is the official US government site which collects the data as er car model, class, mileage etc. The good part is this website is helpful if you want to find MPG of your own car.This particluar dataset is availabe along with the updated one.You can find the data set on https://www.fueleconomy.gov/feg/download.shtml. The mpg dataset only has two years of data 1999 and 2008 but the title is misleading as at firts glance it gives the impression that data from 1999 to 2008 (all years) is available.
MPG data has a data frame with 234 rows and 11 variables
The description of the dataset is as below: manufacturer -manufacturer name
model-model name
displ- engine displacement, in litres
year- year of manufacture
cyl-number of cylinders
trans-type of transmission
drv-the type of drive train, where f = front-wheel drive, r = rear wheel drive, 4 = 4wd
cty-city miles per gallon
hwy-highway miles per gallon
fl-fuel type
class-“type” of car
categorical variables in mpg:
manufacturer model trans drv fl class
continuous variables in mpg:
displ year cyl cty hwy
$ manufacturer: chr $ model : chr $ displ : num $ year : int $ cyl : int $ trans : chr $ drv : chr $ cty : int . $ hwy : int $ fl : chr $ class : chr
I have done the data analysis on the variables hwy and displ. My findings are as below: 1. Using these variables when I created teh facet graph there was difference in the outliers 1999 had subcompavt as outliers adn 2008 has 2 seaters as outliers.
If the dataset would have more numbers of years worth of date the extrapolation and analysis had been more details.Overall if gives goos perspective on how mileage and displacement number changed from 19 to 2008.
#References
?mpg - to find the description of data https://www.fueleconomy.gov/feg/download.shtml
https://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/R/R1_GettingStarted/R1_GettingStarted8.html
https://jrnold.github.io/r4ds-exercise-solutions/data-visualisation.html#exercise-3.3.1
Rstudio Primers
tidyverse.org/packages/