library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3     ✓ purrr   0.3.4
## ✓ tibble  3.0.6     ✓ dplyr   1.0.4
## ✓ tidyr   1.1.2     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
data(mpg)

Scatter Plot of ‘hwy’ vs ‘disply’

In the below graph the two points above 40 on y axis and 5 points above 6 on x axis are outliers.

ggplot(data = mpg)+
  geom_point(mapping=aes(x=displ,y=hwy))

ggplot(mpg,aes(displ,hwy,color=class))+
  geom_point()

The blue dots are outliers on the x axis could be hybrid and thus its hwy ,mileage is way higher than other class of cars

ggplot(mpg, aes(x=displ,y=hwy,color=hwy>40))+
  geom_point()

The blue dots are outliers on the y axis could be hybrid and thus its displ is way higher than other class of cars

ggplot(mpg, aes(x=displ,y=hwy,color=displ<6))+
  geom_point()

With the help of facet wrap I was able to derive 2 graphs side by side on the basis of the year.

In 1999 the outliers are 2 seaters and subcompact class of cars and suv has displ variation keeing the hwy constact at 20. In 2008 only the 2 seaters are outliers and no subcompact class.

ggplot(data=mpg)+aes(x=displ,y=hwy,color=class)+geom_point()+facet_wrap(.~year)

Examine relation between two other variables _ the fuel economy depends on the type of the drive, I have created scatter plot to see will the fuel economy of the vehicle class be dependent on the type of drive. Depending on the type drive subcompact cars hve diffenret mileage. SUV mileage is less than 20 is both drive type - f and r.

ggplot(mpg, aes(x=drv,y=hwy,color=class))+
  geom_point()

Abstract

Using tidyverse library , gglpot package and mpg data set I have created various scatter plots which indicates teh relationship between variius varaibales. The graph also helps to identufy the outliners and its causes which I have mentioned in the graphs created to highlight the outliners. The only thing I cound not do was how to color them red.

Introduction

The data source for mpg dataset is from the website https://fueleconomy.gov/. The fuel economy website is the official US government site which collects the data as er car model, class, mileage etc. The good part is this website is helpful if you want to find MPG of your own car.This particluar dataset is availabe along with the updated one.You can find the data set on https://www.fueleconomy.gov/feg/download.shtml. The mpg dataset only has two years of data 1999 and 2008 but the title is misleading as at firts glance it gives the impression that data from 1999 to 2008 (all years) is available.

Description of the data

MPG data has a data frame with 234 rows and 11 variables

The description of the dataset is as below: manufacturer -manufacturer name

model-model name

displ- engine displacement, in litres

year- year of manufacture

cyl-number of cylinders

trans-type of transmission

drv-the type of drive train, where f = front-wheel drive, r = rear wheel drive, 4 = 4wd

cty-city miles per gallon

hwy-highway miles per gallon

fl-fuel type

class-“type” of car

categorical variables in mpg:

manufacturer model trans drv fl class

continuous variables in mpg:

displ year cyl cty hwy

$ manufacturer: chr $ model : chr $ displ : num $ year : int $ cyl : int $ trans : chr $ drv : chr $ cty : int . $ hwy : int $ fl : chr $ class : chr

Data Anlaysis

I have done the data analysis on the variables hwy and displ. My findings are as below: 1. Using these variables when I created teh facet graph there was difference in the outliers 1999 had subcompavt as outliers adn 2008 has 2 seaters as outliers.

  1. I have created scatter plot to see will the fuel economy of the vehicle class be dependent on the type of drive. Depending on the type drive subcompact cars hve diffenret mileage. SUV mileage is less than 20 is both drive type - f and r.

Conclusion

If the dataset would have more numbers of years worth of date the extrapolation and analysis had been more details.Overall if gives goos perspective on how mileage and displacement number changed from 19 to 2008.

#References

?mpg - to find the description of data https://www.fueleconomy.gov/feg/download.shtml

https://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/R/R1_GettingStarted/R1_GettingStarted8.html

https://jrnold.github.io/r4ds-exercise-solutions/data-visualisation.html#exercise-3.3.1

https://r4ds.had.co.nz

Rstudio Primers

tidyverse.org/packages/