DATA 607: Project 2

Introduction

For this week’s project, I will be reading in three different datasets to clean and tidy, then do some analysis. The three datasets are: World Population, MTA Daily Ridership, and Candy Hierarchy. I computed a similar analysis for all of the datasets which was visualization.

Dataset and Analysis 1: World Population

The world population dataset includes world populations of different continents, countries/territories, and capitals spanning several decades - specifically 1970 to 2022. It also includes ‘Area (kmÂ²)’, ‘Density (per kmÂ²)’, ‘Growth Rate’, and ‘World Population Percentage.’

world_pop <- read.csv("C:\\Users\\nakes\\OneDrive\\Desktop\\607\\world_population.csv")

head(world_pop, 4)

##   Rank CCA3 Country.Territory   Capital Continent X2022.Population
## 1   36  AFG       Afghanistan     Kabul      Asia         41128771
## 2  138  ALB           Albania    Tirana    Europe          2842321
## 3   34  DZA           Algeria   Algiers    Africa         44903225
## 4  213  ASM    American Samoa Pago Pago   Oceania            44273
##   X2020.Population X2015.Population X2010.Population X2000.Population
## 1         38972230         33753499         28189672         19542982
## 2          2866849          2882481          2913399          3182021
## 3         43451666         39543154         35856344         30774621
## 4            46189            51368            54849            58230
##   X1990.Population X1980.Population X1970.Population Area..km..
## 1         10694796         12486631         10752971     652230
## 2          3295066          2941651          2324731      28748
## 3         25518074         18739378         13795915    2381741
## 4            47818            32886            27075        199
##   Density..per.km.. Growth.Rate World.Population.Percentage
## 1           63.0587      1.0257                        0.52
## 2           98.8702      0.9957                        0.04
## 3           18.8531      1.0164                        0.56
## 4          222.4774      0.9831                        0.00

Tidy Data

In order to tidy this dataset, I transformed the dataset long because the populations per year were all separate variables. Now there is a variable for year and another for the population per year. Transforming the dataset long allows us to look at the population of the world by year, to look at growth.

world_pop2<-pivot_longer(data = world_pop,
                         cols = starts_with('X'),
                         names_to = "Year",
                         values_to = "Population per year",
                         names_pattern = '([0-9]+)') %>%
  janitor::clean_names()

head(world_pop2, 4)

## # A tibble: 4 × 11
##    rank cca3  country_territory capital continent area_km density_per_km
##   <int> <chr> <chr>             <chr>   <chr>       <int>          <dbl>
## 1    36 AFG   Afghanistan       Kabul   Asia       652230           63.1
## 2    36 AFG   Afghanistan       Kabul   Asia       652230           63.1
## 3    36 AFG   Afghanistan       Kabul   Asia       652230           63.1
## 4    36 AFG   Afghanistan       Kabul   Asia       652230           63.1
## # ℹ 4 more variables: growth_rate <dbl>, world_population_percentage <dbl>,
## #   year <chr>, population_per_year <int>

Analysis

In my analysis, the first plot shows how much of the entire world’s population percentage is made up by each continent. In the second graph, I plotted the population per year by continent.

#World population
ggplot(data=world_pop2, aes(x=continent, y=world_population_percentage)) +
  ggtitle("World Population of Each Continent") +
   ylab("Worl Population percentage") + xlab("Continent")+
  scale_fill_brewer(palette = "Set3") +
         geom_col(position = "dodge")

#Population per year
ggplot(data=world_pop2, aes(x=continent, y=population_per_year, fill=year)) +
  ggtitle("Population of Each Continent by Year") +
   ylab("Population") + xlab("Continent")+
  scale_fill_brewer(palette = "Set3") +
         geom_col(position = "dodge")

For the world population, we can see that Asia makes up most the population for the entire world, followed by North America. Oceania makes up the smallest percentage.

For the population by year, we can see that the continents that had a increase in population size was Africa, Asia, South American, and North America, while Europe and Oceania mostly stayed the same.

Dataset and Analysis 2: MTA Daily Ridership Data: Beginning 2020

The world population dataset includes daily riderships from difference modes of transportation from 2020 to 2024. The modes of transportation includes subways, buses, the lirr, the metro north, access-a-ride, bridges and tunnels, and the staten island railroad. The dataset also includes percent of comparable ridership pre-pandemic.

ride <- read_xlsx("C:\\Users\\nakes\\OneDrive\\Desktop\\607\\MTA_Daily_Ridership_Data__Beginning_2020_20240930.xlsx")

head(ride, 3)

## # A tibble: 3 × 15
##   Date                Subways: Total Estimated Ridershi…¹ Subways: % of Compar…²
##   <dttm>                                            <dbl>                  <dbl>
## 1 2020-03-01 00:00:00                             2212965                     97
## 2 2020-03-02 00:00:00                             5329915                     96
## 3 2020-03-03 00:00:00                             5481103                     98
## # ℹ abbreviated names: ¹`Subways: Total Estimated Ridership`,
## #   ²`Subways: % of Comparable Pre-Pandemic Day`
## # ℹ 12 more variables: `Buses: Total Estimated Ridership` <dbl>,
## #   `Buses: % of Comparable Pre-Pandemic Day` <dbl>,
## #   `LIRR: Total Estimated Ridership` <dbl>,
## #   `LIRR: % of Comparable Pre-Pandemic Day` <dbl>,
## #   `Metro-North: Total Estimated Ridership` <dbl>, …

Tidy Data

In order to tidy this dataset, I transformed the dataset long because the modes of transportation could have been one variable instead of separate ones. I transformed the variables separately in two different dataset. The first with data, mode of transport, and total ridership. The second with date, mode of transport, and percent of comparable pre pandemic ridership. Then I merged them together by date and mode of transport. Transforming the dataset long allows us to look at the mode of transportation by year, to see the how the trends changed.

ride <-ride %>% clean_names()
  
ride2 <- ride %>%
pivot_longer(cols=c('subways_total_estimated_ridership', 'buses_total_estimated_ridership', 'lirr_total_estimated_ridership', 'metro_north_total_estimated_ridership', 'access_a_ride_total_scheduled_trips', 'bridges_and_tunnels_total_traffic', 'staten_island_railway_total_estimated_ridership'),
                         names_to = "mode_of_transport", 
                         values_to = "total_ride_trip",
                         names_pattern = '([A-Za-z]+)') %>%
  select(date,mode_of_transport, total_ride_trip)

ride3 <- ride%>%
 pivot_longer(cols=c('subways_percent_of_comparable_pre_pandemic_day', 'buses_percent_of_comparable_pre_pandemic_day', 'lirr_percent_of_comparable_pre_pandemic_day', 'metro_north_percent_of_comparable_pre_pandemic_day', 'access_a_ride_percent_of_comparable_pre_pandemic_day', 'bridges_and_tunnels_percent_of_comparable_pre_pandemic_day', 'staten_island_railway_percent_of_comparable_pre_pandemic_day'),
                         names_to = "mode_of_transport", 
                         values_to = "pre_pandemic_percent",
                         names_pattern = '([A-Za-z]+)')  %>%
    select(date,mode_of_transport, pre_pandemic_percent)

ride4 <- merge(ride3,ride2,by.ride2=c('date','mode_of_transport'), by.ride3=c('date','mode_of_transport'))

head(ride4, 7)

##         date mode_of_transport pre_pandemic_percent total_ride_trip
## 1 2020-03-01            access                  113           19922
## 2 2020-03-01           bridges                   98          786960
## 3 2020-03-01             buses                   99          984908
## 4 2020-03-01              lirr                  100           86790
## 5 2020-03-01             metro                   59           55825
## 6 2020-03-01            staten                   52            1636
## 7 2020-03-01           subways                   97         2212965

Analysis

For the analysis, the goal is to look at ridership over the years by mode of transportation. I pulled out date from the date variable then created two mean variables of both total ridership and pre pandemic percent by year and mode of transport. Then I plotted both in two separate graphs.

ride4 <- ride4 %>%
  mutate(year = year(date))

ride5 <- ride4 %>%
  group_by(year, mode_of_transport) %>%
  summarize(mean_total_ride = mean(total_ride_trip))

## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.

ggplot(data=ride5, aes(x=year, y=mean_total_ride, fill=mode_of_transport)) +
  ggtitle("Estimated Average Ridership by Year ") +
   ylab("Estimated AVerage Count of Rides or Trips") + xlab("Year") +
  scale_fill_brewer(palette = "Set1") +
         geom_col(position = "dodge")

ride6 <- ride4 %>%
  group_by(year, mode_of_transport) %>%
  summarize(mean_prepand = mean(pre_pandemic_percent))

## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.

ggplot(data=ride6, aes(x=year, y=mean_prepand, fill=mode_of_transport)) +
  ggtitle("Average Percent Comparable Ridership Pre Pandemic by Year ") +
   ylab("Average Comparable Riderships Percent") + xlab("Year") +
  scale_fill_brewer(palette = "Dark2") +
         geom_col(position = "dodge")

From the Estimated Average Ridership by Year, we can see that subways were the most popular mode of transportation from 2020 to 2024 and have been increasing except from 2023 to 2024. It looks like most of the modes of transportation increased over the past few years. Buses shifted from third to second place from 2020 to 2021, then continued in second. The least used mode of transportation is the metro north.

For the Average Percent Comparable Ridership Pre Pandemic by Year, it seems like access-a-ride increased the most. Most of other modes of transportation also increased, however, it looked like buses dropped.

Data and Analysis 3: Candy Hierarchy Survey from 2017

The last dataset I chose was the candy hierarchy from 2017 which was a survey about people’s preference of candies they received on Halloween. The original dataset includes questions about whether or not the participant was going out for Halloween, gender, age, country, state/province/county, and then a list of many different candies where participants can choose how they feel about receiving this candy on Halloween with MEH for indifference, JOY for happy, and DESPAIR for they probably did not want it.

candy <- read.csv("C:\\Users\\nakes\\OneDrive\\Desktop\\607\\candyhierarchy2017.csv", header = TRUE, check.names = F)

head(candy, 2)

##   Internal ID Q1: GOING OUT? Q2: GENDER Q3: AGE Q4: COUNTRY
## 1    90258773                                              
## 2    90272821             No       Male      44        USA 
##   Q5: STATE, PROVINCE, COUNTY, ETC Q6 | 100 Grand Bar
## 1                                                    
## 2                               NM                MEH
##   Q6 | Anonymous brown globs that come in black and orange wrappers\t(a.k.a. Mary Janes)
## 1                                                                                       
## 2                                                                                DESPAIR
##   Q6 | Any full-sized candy bar Q6 | Black Jacks Q6 | Bonkers (the candy)
## 1                                                                        
## 2                           JOY              MEH                  DESPAIR
##   Q6 | Bonkers (the board game) Q6 | Bottle Caps Q6 | Box'o'Raisins
## 1                                                                  
## 2                       DESPAIR          DESPAIR            DESPAIR
##   Q6 | Broken glow stick Q6 | Butterfinger Q6 | Cadbury Creme Eggs
## 1                                                                 
## 2                DESPAIR           DESPAIR                     MEH
##   Q6 | Candy Corn
## 1                
## 2             MEH
##   Q6 | Candy that is clearly just the stuff given out for free at restaurants
## 1                                                                            
## 2                                                                     DESPAIR
##   Q6 | Caramellos Q6 | Cash, or other forms of legal tender Q6 | Chardonnay
## 1                                                                          
## 2             MEH                                       JOY             MEH
##   Q6 | Chick-o-Sticks (we don\xd5t know what that is) Q6 | Chiclets
## 1                                                                  
## 2                                             DESPAIR       DESPAIR
##   Q6 | Coffee Crisp Q6 | Creepy Religious comics/Chick Tracts
## 1                                                            
## 2           DESPAIR                                   DESPAIR
##   Q6 | Dental paraphenalia Q6 | Dots Q6 | Dove Bars Q6 | Fuzzy Peaches
## 1                                                                     
## 2                  DESPAIR       MEH            JOY            DESPAIR
##   Q6 | Generic Brand Acetaminophen Q6 | Glow sticks Q6 | Goo Goo Clusters
## 1                                                                        
## 2                          DESPAIR          DESPAIR               DESPAIR
##   Q6 | Good N' Plenty Q6 | Gum from baseball cards Q6 | Gummy Bears straight up
## 1                                                                              
## 2                 MEH                      DESPAIR                          MEH
##   Q6 | Hard Candy Q6 | Healthy Fruit Q6 | Heath Bar
## 1                                                  
## 2             MEH            DESPAIR            MEH
##   Q6 | Hershey's Dark Chocolate Q6 | Hershey\xd5s Milk Chocolate
## 1                                                               
## 2                           JOY                              JOY
##   Q6 | Hershey's Kisses Q6 | Hugs (actual physical hugs)
## 1                                                       
## 2                   MEH                          DESPAIR
##   Q6 | Jolly Rancher (bad flavor) Q6 | Jolly Ranchers (good flavor)
## 1                                                                  
## 2                         DESPAIR                               MEH
##   Q6 | JoyJoy (Mit Iodine!) Q6 | Junior Mints Q6 | Senior Mints
## 1                                                              
## 2                   DESPAIR           DESPAIR           DESPAIR
##   Q6 | Kale smoothie Q6 | Kinder Happy Hippo Q6 | Kit Kat Q6 | LaffyTaffy
## 1                                                                        
## 2            DESPAIR                 DESPAIR          JOY         DESPAIR
##   Q6 | LemonHeads Q6 | Licorice (not black) Q6 | Licorice (yes black)
## 1                                                                    
## 2             MEH                       MEH                       JOY
##   Q6 | Lindt Truffle Q6 | Lollipops Q6 | Mars Q6 | Maynards Q6 | Mike and Ike
## 1                                                                            
## 2                MEH        DESPAIR   DESPAIR       DESPAIR               MEH
##   Q6 | Milk Duds Q6 | Milky Way Q6 | Regular M&Ms Q6 | Peanut M&M\xd5s
## 1                                                                     
## 2            MEH            JOY               JOY                  MEH
##   Q6 | Blue M&M's Q6 | Red M&M's Q6 | Green Party M&M's Q6 | Independent M&M's
## 1                                                                             
## 2             JOY            JOY                    JOY                    JOY
##   Q6 | Abstained from M&M'ing. Q6 | Minibags of chips Q6 | Mint Kisses
## 1                                                                     
## 2                      DESPAIR                DESPAIR              MEH
##   Q6 | Mint Juleps Q6 | Mr. Goodbar Q6 | Necco Wafers Q6 | Nerds
## 1                                                               
## 2          DESPAIR          DESPAIR           DESPAIR    DESPAIR
##   Q6 | Nestle Crunch Q6 | Now'n'Laters Q6 | Peeps Q6 | Pencils Q6 | Pixy Stix
## 1                                                                            
## 2                JOY           DESPAIR    DESPAIR      DESPAIR        DESPAIR
##   Q6 | Real Housewives of Orange County Season 9 Blue-Ray
## 1                                                        
## 2                                                 DESPAIR
##   Q6 | Reese\xd5s Peanut Butter Cups Q6 | Reese's Pieces
## 1                                                       
## 2                                JOY                 JOY
##   Q6 | Reggie Jackson Bar Q6 | Rolos
## 1                                   
## 2                 DESPAIR        JOY
##   Q6 | Sandwich-sized bags filled with BooBerry Crunch Q6 | Skittles
## 1                                                                   
## 2                                              DESPAIR       DESPAIR
##   Q6 | Smarties (American) Q6 | Smarties (Commonwealth) Q6 | Snickers
## 1                                                                    
## 2                  DESPAIR                      DESPAIR           MEH
##   Q6 | Sourpatch Kids (i.e. abominations of nature) Q6 | Spotted Dick
## 1                                                                    
## 2                                           DESPAIR           DESPAIR
##   Q6 | Starburst Q6 | Sweet Tarts Q6 | Swedish Fish
## 1                                                  
## 2            MEH          DESPAIR               MEH
##   Q6 | Sweetums (a friend to diabetes) Q6 | Take 5 Q6 | Tic Tacs
## 1                                                               
## 2                              DESPAIR     DESPAIR       DESPAIR
##   Q6 | Those odd marshmallow circus peanut things Q6 | Three Musketeers
## 1                                                                      
## 2                                         DESPAIR                   JOY
##   Q6 | Tolberone something or other Q6 | Trail Mix Q6 | Twix
## 1                                                           
## 2                               JOY        DESPAIR       JOY
##   Q6 | Vials of pure high fructose corn syrup, for main-lining into your vein
## 1                                                                            
## 2                                                                     DESPAIR
##   Q6 | Vicodin Q6 | Whatchamacallit Bars Q6 | White Bread
## 1                                                        
## 2      DESPAIR                   DESPAIR          DESPAIR
##   Q6 | Whole Wheat anything Q6 | York Peppermint Patties Q7: JOY OTHER
## 1                                                                     
## 2                   DESPAIR                      DESPAIR        Mounds
##   Q8: DESPAIR OTHER                                          Q9: OTHER COMMENTS
## 1                                                                              
## 2                   Bottom line is Twix is really the only candy worth eating. 
##       Q10: DRESS  Q11: DAY Q12: MEDIA [Daily Dish] Q12: MEDIA [Science]
## 1                                               NA                   NA
## 2 White and gold    Sunday                      NA                    1
##   Q12: MEDIA [ESPN] Q12: MEDIA [Yahoo] Click Coordinates (x, y)
## 1                NA                 NA                         
## 2                NA                 NA                 (84, 25)

Tidy

This dataset was a little difficult to work with because it was a non-research intended survey where participants can input almost any response, and some the candies were not candy.

I decided to create my own question which was how do people feel about popular chocolate bars and candies. In order to tidy the dataset, I first renamed all the chocolate bars I considered to be popular, then transformed these candies longer into one variable since they were all separate variables. Then I filtered out missing responses to how people felt about these candy bars since participants had the option to skip responses. Transforming the dataset long allows us to look at ratings of each candy by popular candy bar, to access which is preferred over the other.

candy <- candy %>%
  clean_names() %>%
  rename(
         "100 Grand Bar" = q6_100_grand_bar,
         "Butterfinger" = q6_butterfinger, 
         "Heath Bar" = q6_heath_bar,
         "Kit Kat" = q6_kit_kat,
         "Nestle Crunch" = q6_nestle_crunch,
         "Reeses Pieces" = q6_reeses_pieces,
         "Snickers" = q6_snickers, 
         "Twix" = q6_twix, 
         "Milky Way" = q6_milky_way, 
         "Peanut M&MS"= q6_peanut_m_m_s, 
         "Hersheys Milk Chocolate" = q6_hershey_s_milk_chocolate)

## Warning in grepl(x = string, pattern = current_unicode, fixed = TRUE): input
## string 23 is invalid UTF-8

## Warning in grepl(x = string, pattern = current_unicode, fixed = TRUE): input
## string 41 is invalid UTF-8

## Warning in grepl(x = string, pattern = current_unicode, fixed = TRUE): input
## string 64 is invalid UTF-8

## Warning in grepl(x = string, pattern = current_unicode, fixed = TRUE): input
## string 82 is invalid UTF-8

## Warning in grepl(x = string, pattern = current_unicode, fixed = TRUE): input
## string 23 is invalid UTF-8

## Warning in grepl(x = string, pattern = current_unicode, fixed = TRUE): input
## string 41 is invalid UTF-8

## Warning in grepl(x = string, pattern = current_unicode, fixed = TRUE): input
## string 64 is invalid UTF-8

## Warning in grepl(x = string, pattern = current_unicode, fixed = TRUE): input
## string 82 is invalid UTF-8

## Warning in grepl(x = string, pattern = current_unicode, fixed = TRUE): input
## string 23 is invalid UTF-8

## Warning in grepl(x = string, pattern = current_unicode, fixed = TRUE): input
## string 41 is invalid UTF-8

## Warning in grepl(x = string, pattern = current_unicode, fixed = TRUE): input
## string 64 is invalid UTF-8

## Warning in grepl(x = string, pattern = current_unicode, fixed = TRUE): input
## string 82 is invalid UTF-8

## Warning in grepl(x = string, pattern = current_unicode, fixed = TRUE): input
## string 23 is invalid UTF-8

## Warning in grepl(x = string, pattern = current_unicode, fixed = TRUE): input
## string 41 is invalid UTF-8

## Warning in grepl(x = string, pattern = current_unicode, fixed = TRUE): input
## string 64 is invalid UTF-8

## Warning in grepl(x = string, pattern = current_unicode, fixed = TRUE): input
## string 82 is invalid UTF-8

## Warning in grepl(x = string, pattern = current_unicode, fixed = TRUE): input
## string 23 is invalid UTF-8

## Warning in grepl(x = string, pattern = current_unicode, fixed = TRUE): input
## string 41 is invalid UTF-8

## Warning in grepl(x = string, pattern = current_unicode, fixed = TRUE): input
## string 64 is invalid UTF-8

## Warning in grepl(x = string, pattern = current_unicode, fixed = TRUE): input
## string 82 is invalid UTF-8

## Warning in grepl(x = string, pattern = current_unicode, fixed = TRUE): input
## string 23 is invalid UTF-8

## Warning in grepl(x = string, pattern = current_unicode, fixed = TRUE): input
## string 41 is invalid UTF-8

## Warning in grepl(x = string, pattern = current_unicode, fixed = TRUE): input
## string 64 is invalid UTF-8

## Warning in grepl(x = string, pattern = current_unicode, fixed = TRUE): input
## string 82 is invalid UTF-8

## Warning in grepl(x = string, pattern = current_unicode, fixed = TRUE): input
## string 23 is invalid UTF-8

## Warning in grepl(x = string, pattern = current_unicode, fixed = TRUE): input
## string 41 is invalid UTF-8

## Warning in grepl(x = string, pattern = current_unicode, fixed = TRUE): input
## string 64 is invalid UTF-8

## Warning in grepl(x = string, pattern = current_unicode, fixed = TRUE): input
## string 82 is invalid UTF-8

## Warning in grepl(x = string, pattern = current_unicode, fixed = TRUE): input
## string 23 is invalid UTF-8

## Warning in grepl(x = string, pattern = current_unicode, fixed = TRUE): input
## string 41 is invalid UTF-8

## Warning in grepl(x = string, pattern = current_unicode, fixed = TRUE): input
## string 64 is invalid UTF-8

## Warning in grepl(x = string, pattern = current_unicode, fixed = TRUE): input
## string 82 is invalid UTF-8

## Warning in grepl(x = string, pattern = current_unicode, fixed = TRUE): input
## string 23 is invalid UTF-8

## Warning in grepl(x = string, pattern = current_unicode, fixed = TRUE): input
## string 41 is invalid UTF-8

## Warning in grepl(x = string, pattern = current_unicode, fixed = TRUE): input
## string 64 is invalid UTF-8

## Warning in grepl(x = string, pattern = current_unicode, fixed = TRUE): input
## string 82 is invalid UTF-8

## Warning in grepl(x = string, pattern = current_unicode, fixed = TRUE): input
## string 23 is invalid UTF-8

## Warning in grepl(x = string, pattern = current_unicode, fixed = TRUE): input
## string 41 is invalid UTF-8

## Warning in grepl(x = string, pattern = current_unicode, fixed = TRUE): input
## string 64 is invalid UTF-8

## Warning in grepl(x = string, pattern = current_unicode, fixed = TRUE): input
## string 82 is invalid UTF-8

candy2 <- candy %>%
pivot_longer(cols=c('100 Grand Bar', 'Butterfinger', 'Heath Bar', 'Kit Kat', 'Nestle Crunch', 'Reeses Pieces', 'Snickers', 'Twix', 'Milky Way', 'Peanut M&MS', 'Hersheys Milk Chocolate'),
                names_to = "candy", 
                values_to = "candy_rating") %>%
    select(internal_id, candy, candy_rating)

candy3 <- candy2 %>%
   filter(candy_rating!="")

head(candy3, 10)

## # A tibble: 10 × 3
##    internal_id candy         candy_rating
##          <int> <chr>         <chr>       
##  1    90272821 100 Grand Bar MEH         
##  2    90272821 Butterfinger  DESPAIR     
##  3    90272821 Heath Bar     MEH         
##  4    90272821 Kit Kat       JOY         
##  5    90272821 Nestle Crunch JOY         
##  6    90272821 Reeses Pieces JOY         
##  7    90272821 Snickers      MEH         
##  8    90272821 Twix          JOY         
##  9    90272821 Milky Way     JOY         
## 10    90272821 Peanut M&MS   MEH

Analysis

For my analysis, I created a count of candy_rating while grouping by candy bar and candy rating. Then plotted the results to see how participants generally feel about popular candy bars.

candy5 <- candy3 %>%
group_by(candy, candy_rating) %>%
  reframe(count_rating = table(candy_rating))

ggplot(data=candy5, aes(x=candy, y=count_rating, fill = candy_rating)) +
  ggtitle("Ratings of Popular Chocolate Candy and Bars in 2017 ") +
   ylab("Number of Ratings") + xlab("Chocolate Candy/Bar") +
  scale_fill_brewer(palette = "Set1") +
         geom_col(position = "dodge") +
    theme(axis.text.x = element_text(angle=60, hjust=1))

## Don't know how to automatically pick scale for object of type <table>.
## Defaulting to continuous.

From my plot, we can see that for JOY, the most popular chocolate candy bar is Kit Kat, followed by Snickers and Twix. For DESPAIR, the highest rating is Hersey’s Milk Chocolate and for MEH, the highest rating is 100 Grand Bars.

Conclusion

In project 2, three very dissimilar datasets were imported, tidied, cleaned, and analyzed. The first dataset was easier to tidy and clean compared to the last two. In the future, I would try a different analysis to possibly look at correlations.

DATA 607: Project 2

Nakesha Fray

2024-10-13

Introduction

Dataset and Analysis 1: World Population

Dataset and Analysis 2: MTA Daily Ridership Data: Beginning 2020

Data and Analysis 3: Candy Hierarchy Survey from 2017

Conclusion