Display an R program to quickly explore a given dataset, including categorical analysis using the group by command and visualize the findings using ggplot2 features.
Step 1: Load the required library
we load ggplot2 and tidyverse library required for this program.
library (ggplot2)
library (tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ lubridate 1.9.4 ✔ tibble 3.2.1
✔ purrr 1.0.4 ✔ tidyr 1.3.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Step 2: load data set
we will use built-in mtcars dataset.
[1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
disp
Mazda RX4 160.0
Mazda RX4 Wag 160.0
Datsun 710 108.0
Hornet 4 Drive 258.0
Hornet Sportabout 360.0
Valiant 225.0
Duster 360 360.0
Merc 240D 146.7
Merc 230 140.8
Merc 280 167.6
Merc 280C 167.6
Merc 450SE 275.8
Merc 450SL 275.8
Merc 450SLC 275.8
Cadillac Fleetwood 472.0
Lincoln Continental 460.0
Chrysler Imperial 440.0
Fiat 128 78.7
Honda Civic 75.7
Toyota Corolla 71.1
Toyota Corona 120.1
Dodge Challenger 318.0
AMC Javelin 304.0
Camaro Z28 350.0
Pontiac Firebird 400.0
Fiat X1-9 79.0
Porsche 914-2 120.3
Lotus Europa 95.1
Ford Pantera L 351.0
Ferrari Dino 145.0
Maserati Bora 301.0
Volvo 142E 121.0
'data.frame': 32 obs. of 11 variables:
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
$ disp: num 160 160 108 258 360 ...
$ hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec: num 16.5 17 18.6 19.4 17 ...
$ vs : num 0 0 1 1 0 1 0 1 1 1 ...
$ am : num 1 1 1 0 0 0 0 0 0 0 ...
$ gear: num 4 4 4 3 3 3 3 4 4 4 ...
$ carb: num 4 4 1 1 2 1 4 2 2 4 ...
[1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
temp$ cyl<- as.factor (temp$ cyl)
str (temp)
'data.frame': 32 obs. of 11 variables:
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
$ disp: num 160 160 108 258 360 ...
$ hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec: num 16.5 17 18.6 19.4 17 ...
$ vs : num 0 0 1 1 0 1 0 1 1 1 ...
$ am : num 1 1 1 0 0 0 0 0 0 0 ...
$ gear: num 4 4 4 3 3 3 3 4 4 4 ...
$ carb: num 4 4 1 1 2 1 4 2 2 4 ...
Step 3: Group by categorical values
summary_data<- temp %>% group_by (cyl) %>%
summarise (avg_mpg = mean (mpg), .groups = 'drop' )
print (summary_data)
# A tibble: 3 × 2
cyl avg_mpg
<fct> <dbl>
1 4 26.7
2 6 19.7
3 8 15.1
Step 4: Visualizing the findings
ggplot (summary_data, aes (x= cyl,y= avg_mpg,fill= cyl
))+
geom_bar (stat= "identity" )+
labs (title= "Average MPG by cylinder count" ,
x= "Number of cylinders" ,
y= "Average MPG" )+
theme_minimal ()