Project #2 DATA Tidying

Author

Michael Mayne

Project #2

Project # Precoding Approach

For this project, the goal focuses on working with and clearing untidy data. This project consists of working with and cleaning 3 data set then collecting and creating useful, analysis for the respective data sets.

This project need me to take untidy data for my classmates. This three I chosen was about Religions, Waste Management, and Disney Parks Attendance.

My goals are as follows:

Import all datasets as csv to load into the R console
All of the data sets are wide, so I will convert the data set from wide to long format. and organize via tidyverse. Usually having na.rm to deal with missing values as they don’t require a uniform average like the prior chess ELO data set earlier in the semester.
Perform analysis and use ggplot in order to create charts for each graph. For Waste management we can tell the distribution of facilities by county. Disney Park attendance see attendance growth between Florida and California respective parks, and finally in the Religions data comparison for their donations.

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   4.0.0     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Project #2 Code Base

Data Base #1: Household income by Religion

Uploading of Data Bases

Religion_donation <- read_csv("https://raw.githubusercontent.com/Mayneman000/DATA607Assignment/refs/heads/DATA/data%20reliligion.csv")

Rows: 14 Columns: 11
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (1): Religion
dbl (10): <$10k, $10-20k, $20-30k, $30-40k, $40-50k, $50-75k, $75-100k, $100...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

glimpse(Religion_donation)

Rows: 14
Columns: 11
$ Religion             <chr> "Agnostic", "Atheist", "Buddhist", "Catholic", "E…
$ `<$10k`              <dbl> 27, 12, 27, 418, 575, 9, 228, 20, 19, 289, 34, 23…
$ `$10-20k`            <dbl> 34, 27, 21, 617, 869, 7, 244, 27, 19, 495, 42, 23…
$ `$20-30k`            <dbl> 60, 37, 30, 732, 1064, 9, 236, 24, 25, 619, 37, 1…
$ `$30-40k`            <dbl> 81, 52, 34, 670, 982, 11, 238, 24, 25, 655, 48, 1…
$ `$40-50k`            <dbl> 76, 39, 33, 638, 881, 13, 197, 21, 30, 651, 51, 1…
$ `$50-75k`            <dbl> 137, 81, 58, 1116, 1486, 34, 212, 30, 73, 1107, 1…
$ `$75-100k`           <dbl> 102, 76, 62, 949, 949, 47, 156, 15, 59, 939, 87, …
$ `$100-150k`          <dbl> 109, 59, 39, 792, 723, 48, 156, 11, 87, 792, 96, …
$ `>$150k`             <dbl> 84, 74, 53, 792, 414, 54, 78, 6, 151, 753, 64, 41…
$ `Don't know/refused` <dbl> 96, 76, 54, 1163, 1529, 37, 339, 37, 87, 1096, 10…

#Convert wide to long format

Religion_donationLong <- Religion_donation %>%
  pivot_longer(
    cols = -Religion,
    names_to = "income",
    values_to = "frequencies"
  ) 

print(Religion_donationLong)

# A tibble: 140 × 3
   Religion income             frequencies
   <chr>    <chr>                    <dbl>
 1 Agnostic <$10k                       27
 2 Agnostic $10-20k                     34
 3 Agnostic $20-30k                     60
 4 Agnostic $30-40k                     81
 5 Agnostic $40-50k                     76
 6 Agnostic $50-75k                    137
 7 Agnostic $75-100k                   102
 8 Agnostic $100-150k                  109
 9 Agnostic >$150k                      84
10 Agnostic Don't know/refused          96
# ℹ 130 more rows

Conversion & Analysis

Comparison of the average income of religions

ggplot(Religion_donationLong, aes(x = Religion, y = frequencies, fill = income)) +
  geom_bar(stat = "identity", position = "fill") +
   geom_text(aes(label = frequencies), 
            position = position_fill(vjust = 0.5), 
            size = 3) + 
  coord_flip() +
  labs(title = "Income Spread by Religion", y = "Proportion of Believers")

Which religions group had the highest amount of high income members?

High_Earn <- Religion_donationLong %>%
  filter(income == ">$150k") %>%
  arrange(desc(frequencies)) %>%
  mutate(prop = frequencies/ sum(Religion_donationLong$frequencies))

print(High_Earn)

# A tibble: 14 × 4
   Religion                      income frequencies     prop
   <chr>                         <chr>        <dbl>    <dbl>
 1 Catholic                      >$150k         792 0.0223  
 2 Mainline Protestant           >$150k         753 0.0212  
 3 Unaffiliated                  >$150k         597 0.0168  
 4 Evangelical                   >$150k         414 0.0116  
 5 Jewish                        >$150k         151 0.00425 
 6 Agnostic                      >$150k          84 0.00236 
 7 Historically Black Protestant >$150k          78 0.00219 
 8 Atheist                       >$150k          74 0.00208 
 9 Mormon                        >$150k          64 0.00180 
10 Hindu                         >$150k          54 0.00152 
11 Buddhist                      >$150k          53 0.00149 
12 Orthodox                      >$150k          46 0.00129 
13 Muslim                        >$150k          41 0.00115 
14 Jehovah's Witness             >$150k           6 0.000169

ggplot(High_Earn, aes(reorder(x= Religion, frequencies), y= frequencies)) +
  geom_col()+
  coord_flip()+
  labs (title = "Number of 150k+ Earners by Religion", y = "Believers") +
  theme_linedraw()

According to the information provided, can safety assume by raw numbers and proportions that the Catholic church has the highest number of high income earners out of the individuals surveyed. This is followed by Mainline Protestant and those who claim no affiliation. Meanwhile the Jehovah’s Witness faith has the lowest number of high income Believers.

Continuation of Report in Part 2