lab 1

Author

harshitha

develop an R program to quickly explore a given dataset,including categorical analysis using the ’group_by()’command,and visualize the finding using ggplot2

##what we will do

In the program , we will:

1.load the required libraries and dataset. 2.explore the structure of the dataset 3.convert a numeric variable into a categorical variable. 4.perform a categorical analysis using ‘group_by()’and ’summarise()’. 5.visualize the results using ‘ggplot2’.

1 + 1
[1] 2

Step1:load required libraries and dataset ‘tidyverse’ is a collection of packages for data science. ’dplyr’is used for grouping and summarizing data.

[1] 4

The echo: false option disables the printing of code (only output is displayed).

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
data<- mtcars

##Step2:exlore the dataset

Before performing analysis ,we should understand the dataset.

we will check : Number of rows and columns

-coloumn names

-datatypes

-summary statistics

-first few rows

dim(data)

#columns names
names(data)

#summary of dataset
str(data)

#summary statistics
summary(data)

#first six rows
head
(data)

##Step 3:convert numeric variables to categorical the variables ‘cyl’ represents the number of cylinders in a car

although it is numeric(4, 6 ,8)it represents categories.
For categorical analysis ,we convert it into a factor.

#Convert 'cyl' to factor
data$cyl <- as.factor(data$cyl)

#Confirm conversion
str(data$cyl)
 Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
levels(data$cyl)
[1] "4" "6" "8"

##Step 4:Performs categorical analysis

we calculate the averages miles per gallon(‘mpg’)for each cylinder category

##How These Functioms work together

summary_data <- data %>%
  group_by(cyl) %>%
  summarise(
    avg_mpg = mean(mpg),
    .group = "drop"
  )
 summary_data
# A tibble: 3 × 3
  cyl   avg_mpg .group
  <fct>   <dbl> <chr> 
1 4        26.7 drop  
2 6        19.7 drop  
3 8        15.1 drop  

##step 5:visualize using a bar plate

ggplot(summary_data, aes(x = cyl, y = avg_mpg, fill = cyl)) +
  geom_bar(stat = "identity") +
  labs(
    title = "Average MPG by cylinder Count",
    x = "Number of cylinders",
    y = "Average MPG"
    ) +
  theme_minimal()