I learned several different useful applications for R Studio in Prof. Paulsons’ course: R for Data Science: Analysis and Visualization, such as:
- Data Wrangling to filter, sort, change, and summarize data sets
- Data Visualization with various graphs, charts, and histograms
- Data Importation using CSV from external sources to analyzing it within R Studio
- Replicating data through R Notebooks and RMarkdown
- Tidy data sets utilizing ‘tidyr’
- Exploring real world data sets using RStudio
data("mtcars")
# groups data by cylinders, find mpg/weight
mtcars_summary <- mtcars %>%
group_by(cyl) %>%
summarise(
avg_mpg = mean(mpg),
avg_wt = mean(wt)
)
print(mtcars_summary)
## # A tibble: 3 × 3
## cyl avg_mpg avg_wt
## <dbl> <dbl> <dbl>
## 1 4 26.7 2.29
## 2 6 19.7 3.12
## 3 8 15.1 4.00
After locating electric vehicle (EV) registration data, the dataset was converted from Google Sheets to CSV format with assistance from AI tools to streamline preprocessing. The EV registration counts were then merged with 2024 state-level population data sourced from the U.S Census Bureau (via Statista) to create a unified dataset that enabled proportional comparison across the states.
This merged dataset was imported into RStudio for analysis and visualization. A pie chart was created to illustrate the concentration of EV registrations relative to state population, making it easier to identify regional patterns. This visualization highlights that approximately that half of the country’s EV Registration are concentrated within just three states.
Population data used to merge EV registration counts was sourced from:
- U.S. Census Bureau (2024), via Statista: Resident population of the U.S. by state
ev_data <- readr::read_csv(here("data", "EV_Cleaned.csv"))
## Rows: 51 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): State
## dbl (1): Registration Count
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Get the top 3 states by registration count
top3 <- ev_data %>%
arrange(desc(`Registration Count`)) %>%
slice(1:3)
# Calculate total national registrations
total_national <- sum(ev_data$`Registration Count`)
# Calculate 'Other States' total
others_total <- total_national - sum(top3$`Registration Count`)
# Create final data frame for plotting
pie_data <- top3 %>%
select(State, `Registration Count`) %>%
add_row(State = "Other States", `Registration Count` = others_total)
# Create pie chart
pie(pie_data$`Registration Count`,
labels = paste0(pie_data$State, " (", round(100 * pie_data$`Registration Count` / sum(pie_data$`Registration Count`), 1), "%)"),
main = "Top 3 States vs Rest of US - EV Registrations (2023)",
col = rainbow(4))