Program 1

Author

1NT23IS244 - SECTION D - VARSHA.S

Develop an R program to quickly explore a given dataset, including categorical analysis using the group_by command, and visualize the findings using ggplot2 features.

Step 1: Load the required libraries

library(ggplot2)
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ lubridate 1.9.4     ✔ tibble    3.2.1
✔ purrr     1.0.4     ✔ tidyr     1.3.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Step 2: Load dataset

temp<-mtcars
temp$cyl
 [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
class(temp)
[1] "data.frame"
class(temp$cyl)
[1] "numeric"
mtcars[3]
                     disp
Mazda RX4           160.0
Mazda RX4 Wag       160.0
Datsun 710          108.0
Hornet 4 Drive      258.0
Hornet Sportabout   360.0
Valiant             225.0
Duster 360          360.0
Merc 240D           146.7
Merc 230            140.8
Merc 280            167.6
Merc 280C           167.6
Merc 450SE          275.8
Merc 450SL          275.8
Merc 450SLC         275.8
Cadillac Fleetwood  472.0
Lincoln Continental 460.0
Chrysler Imperial   440.0
Fiat 128             78.7
Honda Civic          75.7
Toyota Corolla       71.1
Toyota Corona       120.1
Dodge Challenger    318.0
AMC Javelin         304.0
Camaro Z28          350.0
Pontiac Firebird    400.0
Fiat X1-9            79.0
Porsche 914-2       120.3
Lotus Europa         95.1
Ford Pantera L      351.0
Ferrari Dino        145.0
Maserati Bora       301.0
Volvo 142E          121.0
str(temp)
'data.frame':   32 obs. of  11 variables:
 $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
 $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
 $ disp: num  160 160 108 258 360 ...
 $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
 $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
 $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
 $ qsec: num  16.5 17 18.6 19.4 17 ...
 $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
 $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
 $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
 $ carb: num  4 4 1 1 2 1 4 2 2 4 ...
(temp$cyl)
 [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4
temp$cyl<-as.factor(temp$cyl)
str(temp)
'data.frame':   32 obs. of  11 variables:
 $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
 $ cyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
 $ disp: num  160 160 108 258 360 ...
 $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
 $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
 $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
 $ qsec: num  16.5 17 18.6 19.4 17 ...
 $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
 $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
 $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
 $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

Step 3 : Group by categorical values

library(dplyr)
summary_data<- temp %>% group_by(cyl) %>%
  summarise(avg_mpg = mean(mpg), .groups = 'drop')
summary_data
# A tibble: 3 × 2
  cyl   avg_mpg
  <fct>   <dbl>
1 4        26.7
2 6        19.7
3 8        15.1

Step 4 : Visualizing the findings

ggplot(summary_data, aes(x=cyl,y=avg_mpg,fill=cyl
                         ))+
  geom_bar(stat="identity")+
  labs(title="Average MPG by cylinder count",
       x="Number of cylinders", 
       y="Average MPG")+
  theme_minimal()