So to begin our data manipulation, we have to import our dataset and the libraries we’ll need.
data = read.csv("C:/Users/schou/Downloads/dataset.csv",sep = ",",
header=TRUE)
library(ggplot2)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ── ## ✔ dplyr 1.1.4 ✔ readr 2.1.5 ## ✔ forcats 1.0.0 ✔ stringr 1.5.1 ## ✔ lubridate 1.9.3 ✔ tibble 3.2.1 ## ✔ purrr 1.0.2 ✔ tidyr 1.3.1 ## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ── ## ✖ dplyr::filter() masks stats::filter() ## ✖ dplyr::lag() masks stats::lag() ## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Seeing if there are NA values in the dataset:
sum(is.na(data))
## [1] 0
Thankfully, this data has no NA values is already clean! All the column names are accurate as well.