Project_2_Data

Choose any three of the “wide” datasets identified in the Week 6 Discussion items. (You may use your own dataset; please don’t use my Sample Post dataset, since that was used in your Week 6 assignment!) For each of the three chosen datasets:

Create a .CSV file (or optionally, a MySQL database!) that includes all of the information included in the dataset. You’re encouraged to use a “wide” structure similar to how the information appears in the discussion item, so that you can practice tidying and transformations as described below.

Read the information from your .CSV file into R, and use tidyr and dplyr as needed to tidy and transform your data. [Most of your grade will be based on this step!] Perform the analysis requested in the discussion item.

Your code should be in an R Markdown file, posted to rpubs.com, and should include narrative descriptions of your data cleanup work, analysis, and conclusions.

Please include in your homework submission, for each of the three chosen datasets:

The URL to the .Rmd file in your GitHub repository, and The URL for your rpubs.com web page.

Dataset 1 - US Census: Population growth 2015 - 2020 of Southern States

This dataset contains population data for 4 states in the span of 6 years (2015 - 2020) and is sourced from the US Census Bureau.

We will analyse this data set in order to look for a things related to yearly population growth:

Descriptive statistics of the data
Compare yearly growth between the states
See which year had the largest population growth

#load the data
library(tidyverse)

## Warning: package 'tidyverse' was built under R version 4.2.3

## Warning: package 'ggplot2' was built under R version 4.2.3

## Warning: package 'tibble' was built under R version 4.2.3

## Warning: package 'tidyr' was built under R version 4.2.3

## Warning: package 'readr' was built under R version 4.2.3

## Warning: package 'dplyr' was built under R version 4.2.3

## Warning: package 'lubridate' was built under R version 4.2.3

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors

library(openintro)

## Loading required package: airports
## Loading required package: cherryblossom
## Loading required package: usdata

library(dplyr)
library(ggplot2)

population_data <- read.csv('https://raw.githubusercontent.com/nk014914/Data-607/main/Population_data.csv')

population_data

##             Name FIPS Abbreviations X2015.Population X2016.Population
## 1        Georgia   14            GA         10067378         10189016
## 2 North Carolina   37            NC          9932862         10036881
## 3                  NA                             NA               NA
## 4 South Carolina   45            SC          4817440          4879157
## 5       Virginia   51            VA          8254218          8312400
##   X2017.Population X2018.Population X2019.Population X2020.Population
## 1         10325943         10471428         10604413         10722092
## 2         10151700         10291929         10455811         10598314
## 3               NA               NA               NA               NA
## 4          4954035          5036155          5118397          5196026
## 5          8370206          8420184          8483598          8539322

#tidying the data

#Removing blank rows and renaming the columns to shorten for just the year
population_data <- population_data %>% 
  drop_na() %>%
  rename(state = Name, x2015 = X2015.Population, x2016 = X2016.Population, x2017 = X2017.Population, x2018 = X2018.Population, x2019 = X2019.Population, x2020 = X2020.Population)

#remove unused columns
population_data_subset <- subset(population_data, select= -c(FIPS,Abbreviations))

#transforming data - putting years into columns to make long data
population_data_subset <- population_data_subset %>% 
  gather(key = 'year', value = 'population', 2:7)

population_data_subset

##             state  year population
## 1         Georgia x2015   10067378
## 2  North Carolina x2015    9932862
## 3  South Carolina x2015    4817440
## 4        Virginia x2015    8254218
## 5         Georgia x2016   10189016
## 6  North Carolina x2016   10036881
## 7  South Carolina x2016    4879157
## 8        Virginia x2016    8312400
## 9         Georgia x2017   10325943
## 10 North Carolina x2017   10151700
## 11 South Carolina x2017    4954035
## 12       Virginia x2017    8370206
## 13        Georgia x2018   10471428
## 14 North Carolina x2018   10291929
## 15 South Carolina x2018    5036155
## 16       Virginia x2018    8420184
## 17        Georgia x2019   10604413
## 18 North Carolina x2019   10455811
## 19 South Carolina x2019    5118397
## 20       Virginia x2019    8483598
## 21        Georgia x2020   10722092
## 22 North Carolina x2020   10598314
## 23 South Carolina x2020    5196026
## 24       Virginia x2020    8539322

#remove x's in year column
population_data_final <- population_data_subset %>%
  mutate(across(c('year'), substr, 2, nchar(year)))

## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `across(c("year"), substr, 2, nchar(year))`.
## Caused by warning:
## ! The `...` argument of `across()` is deprecated as of dplyr 1.1.0.
## Supply arguments directly to `.fns` through an anonymous function instead.
## 
##   # Previously
##   across(a:b, mean, na.rm = TRUE)
## 
##   # Now
##   across(a:b, \(x) mean(x, na.rm = TRUE))

population_data_final

##             state year population
## 1         Georgia 2015   10067378
## 2  North Carolina 2015    9932862
## 3  South Carolina 2015    4817440
## 4        Virginia 2015    8254218
## 5         Georgia 2016   10189016
## 6  North Carolina 2016   10036881
## 7  South Carolina 2016    4879157
## 8        Virginia 2016    8312400
## 9         Georgia 2017   10325943
## 10 North Carolina 2017   10151700
## 11 South Carolina 2017    4954035
## 12       Virginia 2017    8370206
## 13        Georgia 2018   10471428
## 14 North Carolina 2018   10291929
## 15 South Carolina 2018    5036155
## 16       Virginia 2018    8420184
## 17        Georgia 2019   10604413
## 18 North Carolina 2019   10455811
## 19 South Carolina 2019    5118397
## 20       Virginia 2019    8483598
## 21        Georgia 2020   10722092
## 22 North Carolina 2020   10598314
## 23 South Carolina 2020    5196026
## 24       Virginia 2020    8539322

Analysis

#First I need to make data as a numeric vector because my y axis was not in increasing order
transform(population_data_final, population = as.numeric(population))

##             state year population
## 1         Georgia 2015   10067378
## 2  North Carolina 2015    9932862
## 3  South Carolina 2015    4817440
## 4        Virginia 2015    8254218
## 5         Georgia 2016   10189016
## 6  North Carolina 2016   10036881
## 7  South Carolina 2016    4879157
## 8        Virginia 2016    8312400
## 9         Georgia 2017   10325943
## 10 North Carolina 2017   10151700
## 11 South Carolina 2017    4954035
## 12       Virginia 2017    8370206
## 13        Georgia 2018   10471428
## 14 North Carolina 2018   10291929
## 15 South Carolina 2018    5036155
## 16       Virginia 2018    8420184
## 17        Georgia 2019   10604413
## 18 North Carolina 2019   10455811
## 19 South Carolina 2019    5118397
## 20       Virginia 2019    8483598
## 21        Georgia 2020   10722092
## 22 North Carolina 2020   10598314
## 23 South Carolina 2020    5196026
## 24       Virginia 2020    8539322

#checking if column is now numeric
class(population_data_final$population)

## [1] "numeric"

#Determine the summary statistics on population per year of the four states
population_summary <- population_data_final %>%
  group_by(year) %>%
  summarize(mean = mean(population), median = median(population), min = min(population), max = max(population), na.rm = T) %>%
  arrange(mean)

population_summary

## # A tibble: 6 × 6
##   year      mean   median     min      max na.rm
##   <chr>    <dbl>    <dbl>   <dbl>    <dbl> <lgl>
## 1 2015  8267974. 9093540  4817440 10067378 TRUE 
## 2 2016  8354364. 9174640. 4879157 10189016 TRUE 
## 3 2017  8450471  9260953  4954035 10325943 TRUE 
## 4 2018  8554924  9356056. 5036155 10471428 TRUE 
## 5 2019  8665555. 9469704. 5118397 10604413 TRUE 
## 6 2020  8763938. 9568818  5196026 10722092 TRUE

#visualize growth of population per state
ggplot(data=population_data_final, aes(x=year, y= population, group = state)) +
  geom_line(aes(color=state))+
  geom_point(aes(color=state))

Viewing the yearly state populations plotted, we can see a few things: - Georgia has the largest overall population every year, while South Carolina has the smallest (by a large difference in comparison to the other states.)

Georgia and North Carolina are closer in yearly population than the other two states as their lines are very close to one another.
Virginia seems to have the smallest yearly population growth as their line has the least amount of steepness in comparison to Virginia, North Carolina and Georgia.

#Determining which year has the largest population growth

#sum of population per year
sum2015 = sum( population_data_final$population [ population_data_final$year==2015] )
sum2016 = sum( population_data_final$population [ population_data_final$year==2016] )
sum2017 = sum( population_data_final$population [ population_data_final$year==2017] )
sum2018 = sum( population_data_final$population [ population_data_final$year==2018] )
sum2019 = sum( population_data_final$population [ population_data_final$year==2019] )
sum2020 = sum( population_data_final$population [ population_data_final$year==2020] )

population_total = c(sum2015,sum2016,sum2017,sum2018,sum2019,sum2020)

#add column with years and create dataframe
year = c('2015','2016','2017','2018','2019','2020')
sum_population = as.data.frame(year)

sum_population$population_total <- population_total

sum_population

##   year population_total
## 1 2015         33071898
## 2 2016         33417454
## 3 2017         33801884
## 4 2018         34219696
## 5 2019         34662219
## 6 2020         35055754

#calculate percent change per year over year, we will use the collapse package
library(collapse)

## Warning: package 'collapse' was built under R version 4.2.3

## collapse 2.0.10, see ?`collapse-package` or ?`collapse-documentation`

## 
## Attaching package: 'collapse'

## The following object is masked from 'package:lubridate':
## 
##     is.Date

## The following object is masked from 'package:tidyr':
## 
##     replace_na

## The following object is masked from 'package:stats':
## 
##     D

sum_population |> fmutate(growth = fgrowth(population_total))

##   year population_total   growth
## 1 2015         33071898       NA
## 2 2016         33417454 1.044863
## 3 2017         33801884 1.150387
## 4 2018         34219696 1.236061
## 5 2019         34662219 1.293182
## 6 2020         35055754 1.135343

Conclusion

Based on the analysis, we can see the largest population growth of the four states combined is between 2018 and 2019 with a growth of around 1.29. The smallest growth was from 2015 to 2016 at 1.04. An interesting find I saw was that the growth percentage was steadily increasing each year until 2019-2020 where it dropped to 1.13 the second lowest population growth of all 6 years.The recent drop in population growth could be attributed to economical/financial issue pertaining to the year, or the impeding issues cause by the corona virus, however more research will be needed to confirm or deny these assumptions.

Dataset 2 - Global Inflation Data

For this data set, we will look into the following:

Which region had the highest nad lowest CPI
Look at a single country (US) and any changes we may see per year along with its min/max

As suggested by Anna Moy and Mohammed Rahman, I used a csv called region in order to classify each country to a region for simpler data categorization.

#Load the csv
global <- read.csv('https://raw.githubusercontent.com/nk014914/Data-607/main/global_inflation_data.csv')
region <- read.csv('https://raw.githubusercontent.com/nk014914/Data-607/main/region.csv')

colnames(region) <- c("country_name", "region")
global_inflation <- merge(global, region, by = "country_name")

head(global_inflation)

##          country_name                                  indicator_name X1980
## 1         Afghanistan Annual average inflation (consumer prices) rate  13.4
## 2             Albania Annual average inflation (consumer prices) rate    NA
## 3             Algeria Annual average inflation (consumer prices) rate   9.7
## 4             Andorra Annual average inflation (consumer prices) rate    NA
## 5              Angola Annual average inflation (consumer prices) rate  46.7
## 6 Antigua and Barbuda Annual average inflation (consumer prices) rate  19.0
##   X1981 X1982 X1983 X1984 X1985 X1986 X1987 X1988 X1989 X1990 X1991  X1992
## 1  22.2  18.2  15.9  20.4   8.7  -2.1  18.4  27.5  71.5  47.4  43.8  58.19
## 2    NA    NA    NA    NA    NA    NA    NA    NA    NA  -0.2  35.7 226.00
## 3  14.6   6.6   7.8   6.3  10.4  14.0   5.9   5.9   9.2   9.3  25.9  31.70
## 4    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA    NA     NA
## 5   1.4   1.8   1.8   1.8   1.8   1.8   1.8   1.8   1.8   1.8  85.3 299.10
## 6  11.5   4.2   2.3   3.8   1.0   0.5   3.6   6.8   4.4   6.6   4.5   3.00
##     X1993  X1994  X1995   X1996  X1997  X1998  X1999 X2000 X2001  X2002 X2003
## 1   33.99  20.01   14.0   14.01  14.01  14.01  14.01   0.0 -43.4  51.93 35.66
## 2   85.00  22.60    7.8   12.70  33.20  20.60   0.40   0.0   3.1   5.20  2.40
## 3   20.50  29.00   29.8   18.70   5.70   5.00   2.60   0.3   4.2   1.40  4.30
## 4      NA     NA     NA      NA     NA     NA     NA    NA    NA   3.10  3.10
## 5 1379.50 949.80 2672.2 4146.00 221.50 107.40 248.20 325.0 152.6 108.90 98.20
## 6    3.10   6.50    2.7    3.00   0.40   3.30   1.10  -0.2   1.9   2.40  2.00
##   X2004 X2005 X2006 X2007 X2008 X2009 X2010 X2011 X2012 X2013 X2014 X2015 X2016
## 1 16.36 10.57  6.78  8.68 26.42 -6.81  2.18  11.8  6.44  7.39  4.67 -0.66  4.38
## 2  2.90  2.40  2.40  3.00  3.30  2.20  3.60   3.4  2.00  1.90  1.60  1.90  1.30
## 3  4.00  1.40  2.30  3.70  4.90  5.70  3.90   4.5  8.90  3.30  2.90  4.80  6.40
## 4  2.90  3.50  3.70  2.70  4.30 -1.20  1.70   2.6  1.50  0.50 -0.10 -1.10 -0.40
## 5 43.50 23.00 13.30 12.20 12.50 13.70 14.50  13.5 10.30  8.80  7.30  9.20 30.70
## 6  2.00  2.10  1.80  1.40  5.30 -0.60  3.40   3.5  3.40  1.10  1.10  1.00 -0.50
##   X2017 X2018 X2019 X2020 X2021 X2022 X2023 X2024              region
## 1  4.98  0.63   2.3  5.44  5.06 13.71   9.1    NA      Asia & Pacific
## 2  2.00  2.00   1.4  1.60  2.00  6.70   4.8   4.0              Europe
## 3  5.60  4.30   2.0  2.40  7.20  9.30   9.0   6.8         Arab States
## 4  2.60  1.00   0.5  0.10  1.70  6.20   5.2   3.5              Europe
## 5 29.80 19.60  17.1 22.30 25.80 21.40  13.1  22.3              Africa
## 6  2.40  1.20   1.4  1.10  1.60  7.50   5.0   2.9 South/Latin America

#Tidying the data

#drop unnecessary column and NAs
global_inflation <- subset(global_inflation, select = -c(indicator_name))
global_inflation <- global_inflation %>%
  select(country_name, region,X1980,X1980,X1981,X1982,X1983,X1984,X1985,X1986,X1987,X1988,X1989,X1990,X1991,X1992,X1993,X1994,X1995,X1996,X1997,X1998,X1999,X2000,X2001,X2002,X2003,X2004,X2005,X2006,X2007,X2008,X2009,X2010,X2011,X2012,X2013,X2014,X2015,X2016,X2017,X2018,X2019,X2020,X2021,X2022,X2023,X2024) %>%
  pivot_longer(`X1980`:`X2024`,names_to = "year", values_to = "CPI_inflation", values_drop_na = TRUE) 


#remove x's in year column
global_inflation_final <- global_inflation %>%
  mutate(across(c('year'), substr, 2, nchar(year)))

global_inflation_final

## # A tibble: 7,952 × 4
##    country_name region         year  CPI_inflation
##    <chr>        <chr>          <chr>         <dbl>
##  1 Afghanistan  Asia & Pacific 1980           13.4
##  2 Afghanistan  Asia & Pacific 1981           22.2
##  3 Afghanistan  Asia & Pacific 1982           18.2
##  4 Afghanistan  Asia & Pacific 1983           15.9
##  5 Afghanistan  Asia & Pacific 1984           20.4
##  6 Afghanistan  Asia & Pacific 1985            8.7
##  7 Afghanistan  Asia & Pacific 1986           -2.1
##  8 Afghanistan  Asia & Pacific 1987           18.4
##  9 Afghanistan  Asia & Pacific 1988           27.5
## 10 Afghanistan  Asia & Pacific 1989           71.5
## # ℹ 7,942 more rows

Analysis

#Determine min and max of the regions, sorting by desc
global_max <- global_inflation_final %>%
  group_by(region) %>%
  summarize(max_cpi = max(CPI_inflation)) %>%
  arrange(desc(max_cpi))

global_min <- global_inflation_final %>%
  group_by(region) %>%
  summarize(min_cpi = min(CPI_inflation)) %>%
  arrange(min_cpi)

global_min

## # A tibble: 9 × 2
##   region                  min_cpi
##   <chr>                     <dbl>
## 1 "Africa"                  -72.7
## 2 "Asia & Pacific"          -71.3
## 3 "South/Latin America"     -44.4
## 4 "Europe"                  -11.7
## 5 "Arab States"             -11.3
## 6 "Middle east"              -9.9
## 7 ""                         -2.4
## 8 "North America"            -0.3
## 9 "South/Central America"     3.5

global_max

## # A tibble: 9 × 2
##   region                  max_cpi
##   <chr>                     <dbl>
## 1 "South/Latin America"   65374. 
## 2 "Africa"                23773. 
## 3 "South/Central America" 13110. 
## 4 "Europe"                 5273. 
## 5 "Asia & Pacific"         3102. 
## 6 "Middle east"             487. 
## 7 "Arab States"             359. 
## 8 ""                        111. 
## 9 "North America"            13.5

#Filter out the US to look at its min/max and determine its lowes CPI avg by sorting

#Highest/lowest CPI year
us_CPI <- global_inflation_final %>%
  filter(country_name == "United States") %>%
  mutate(min = min(CPI_inflation), max = max(CPI_inflation)) %>%
  arrange(desc(CPI_inflation))
us_CPI

## # A tibble: 45 × 6
##    country_name  region        year  CPI_inflation   min   max
##    <chr>         <chr>         <chr>         <dbl> <dbl> <dbl>
##  1 United States North America 1980           13.5  -0.3  13.5
##  2 United States North America 1981           10.4  -0.3  13.5
##  3 United States North America 2022            8    -0.3  13.5
##  4 United States North America 1982            6.2  -0.3  13.5
##  5 United States North America 1990            5.4  -0.3  13.5
##  6 United States North America 1989            4.8  -0.3  13.5
##  7 United States North America 2021            4.7  -0.3  13.5
##  8 United States North America 1984            4.4  -0.3  13.5
##  9 United States North America 1991            4.2  -0.3  13.5
## 10 United States North America 1988            4.1  -0.3  13.5
## # ℹ 35 more rows

#CPI by year
us_CPI_year <- global_inflation_final %>%
  filter(country_name == "United States") %>%
  mutate(min = min(CPI_inflation), max = max(CPI_inflation)) %>%
  arrange(desc(year))

us_CPI_year

## # A tibble: 45 × 6
##    country_name  region        year  CPI_inflation   min   max
##    <chr>         <chr>         <chr>         <dbl> <dbl> <dbl>
##  1 United States North America 2024            2.8  -0.3  13.5
##  2 United States North America 2023            4.1  -0.3  13.5
##  3 United States North America 2022            8    -0.3  13.5
##  4 United States North America 2021            4.7  -0.3  13.5
##  5 United States North America 2020            1.3  -0.3  13.5
##  6 United States North America 2019            1.8  -0.3  13.5
##  7 United States North America 2018            2.4  -0.3  13.5
##  8 United States North America 2017            2.1  -0.3  13.5
##  9 United States North America 2016            1.3  -0.3  13.5
## 10 United States North America 2015            0.1  -0.3  13.5
## # ℹ 35 more rows

Conclusion

After tidying and analyzing the data, the highest CPI for all regions is 65374 in the South/Latin America region. The lowest CPI is seen in Africa at -72.7. The drastic difference between the CPIs can be attributed to outliers. When looking at just the US in CPI descending order, we see 2009 had the lowest CPI (min) at -0.3, while 1980 has the highest CPI (max) at 13.5. Looking at year over year and outside 1981 and 1980, the US CPI did not fluctuate much and remained relatively constant.

Dataset #3 - CPI data for January 2024

Dataset: “Bureau of Labor Statistics - CPI Report for Jan2024”

# Load the data
library(tidyverse)

CPI_Jan <- read.csv("https://raw.githubusercontent.com/sokkarbishoy/DATA607/main/news-release-table1-202401.csv")

head(CPI_Jan)

##              X
## 1             
## 2             
## 3 Indent Level
## 4             
## 5             
## 6            0
##   Table.1..Consumer.Price.Index.for.All.Urban.Consumers..CPI.U...U.S..city.average..by.expenditure.category..January.2024
## 1                                                                                   [1982-84=100, unless otherwise noted]
## 2                                                                                                                        
## 3                                                                                                    Expenditure category
## 4                                                                                                                        
## 5                                                                                                                        
## 6                                                                                                               All items
##                                X.1                X.2                X.3
## 1                                                                       
## 2                                                                       
## 3 Relative\nimportance\nDec.\n2023 Unadjusted indexes Unadjusted indexes
## 4                                          Jan.\n2023         Dec.\n2023
## 5                                                                       
## 6                          100.000            299.170            306.746
##                  X.4                       X.5                       X.6
## 1                                                                       
## 2                                                                       
## 3 Unadjusted indexes Unadjusted percent change Unadjusted percent change
## 4         Jan.\n2024   Jan.\n2023-\nJan.\n2024   Dec.\n2023-\nJan.\n2024
## 5                                                                       
## 6            308.417                       3.1                       0.5
##                                  X.7                                X.8
## 1                                                                      
## 2                                                                      
## 3 Seasonally adjusted percent change Seasonally adjusted percent change
## 4            Oct.\n2023-\nNov.\n2023            Nov.\n2023-\nDec.\n2023
## 5                                                                      
## 6                                0.2                                0.2
##                                  X.9
## 1                                   
## 2                                   
## 3 Seasonally adjusted percent change
## 4            Dec.\n2023-\nJan.\n2024
## 5                                   
## 6                                0.3

#We start tidying the data removing the first two rows, last three rows and empty rows

CPI_Jan <- CPI_Jan[-c(1:5, 16, 25, 45:49), ]

tail(CPI_Jan)

##    X
## 39 4
## 40 4
## 41 3
## 42 4
## 43 4
## 44 4
##    Table.1..Consumer.Price.Index.for.All.Urban.Consumers..CPI.U...U.S..city.average..by.expenditure.category..January.2024
## 39                                                                                                 Physicians' services(1)
## 40                                                                                                 Hospital services(1)(3)
## 41                                                                                                 Transportation services
## 42                                                                                 Motor vehicle maintenance and repair(1)
## 43                                                                                                 Motor vehicle insurance
## 44                                                                                                           Airline fares
##      X.1     X.2     X.3     X.4  X.5 X.6  X.7  X.8 X.9
## 39 1.828 415.197 412.930 415.427  0.1 0.6  0.6  0.2 0.6
## 40 1.987 385.064 404.407 410.695  6.7 1.6  0.1  0.5 1.6
## 41 6.294 376.743 409.749 412.643  9.5 0.7  1.0  0.1 1.0
## 42 1.233 371.780 392.897 396.004  6.5 0.8  0.3 -0.3 0.8
## 43 2.794 658.513 780.284 794.142 20.6 1.8  1.2  1.7 1.4
## 44 0.751 264.629 243.348 247.606 -6.4 1.7 -0.2  0.9 1.4

#change column names

new_columns <- c("Indent_Level",

                 "Expenditure_category",

                 "Relative_importance_Dec_2023",

                 "Jan_2023_Unadjusted_indexes", 

                 "Dec_2023_Unadjusted_indexes", 

                 "Jan_2024_Unadjusted_indexes",

                 "Jan_2023_Jan_2024_Unadjusted_percent_change",

                 "Dec_2023_Jan_2024_Unadjusted_percent_change", 

                 "Oct_2023_Nov_2023_Seasonally_adjusted_percent_change",

                 "Nov_2023_Dec_2023_Seasonally_adjusted_percent_change", 

                 "Dec_2023_Jan_2024_Seasonally_adjusted_percent_change")

colnames(CPI_Jan) <- new_columns

 

#change the type into INT

CPI_Jan$Indent_Level <- as.integer(CPI_Jan$Indent_Level)

head(CPI_Jan)

##    Indent_Level           Expenditure_category Relative_importance_Dec_2023
## 6             0                      All items                      100.000
## 7             1                           Food                       13.555
## 8             2                   Food at home                        8.167
## 9             3    Cereals and bakery products                        1.066
## 10            3 Meats, poultry, fish, and eggs                        1.722
## 11            3     Dairy and related products                        0.748
##    Jan_2023_Unadjusted_indexes Dec_2023_Unadjusted_indexes
## 6                      299.170                     306.746
## 7                      319.136                     325.409
## 8                      301.435                     303.005
## 9                      349.294                     353.844
## 10                     322.737                     320.143
## 11                     272.040                     267.889
##    Jan_2024_Unadjusted_indexes Jan_2023_Jan_2024_Unadjusted_percent_change
## 6                      308.417                                         3.1
## 7                      327.327                                         2.6
## 8                      305.037                                         1.2
## 9                      354.532                                         1.5
## 10                     319.752                                        -0.9
## 11                     268.941                                        -1.1
##    Dec_2023_Jan_2024_Unadjusted_percent_change
## 6                                          0.5
## 7                                          0.6
## 8                                          0.7
## 9                                          0.2
## 10                                        -0.1
## 11                                         0.4
##    Oct_2023_Nov_2023_Seasonally_adjusted_percent_change
## 6                                                   0.2
## 7                                                   0.2
## 8                                                   0.0
## 9                                                   0.3
## 10                                                 -0.2
## 11                                                  0.0
##    Nov_2023_Dec_2023_Seasonally_adjusted_percent_change
## 6                                                   0.2
## 7                                                   0.2
## 8                                                   0.1
## 9                                                  -0.1
## 10                                                  0.3
## 11                                                  0.1
##    Dec_2023_Jan_2024_Seasonally_adjusted_percent_change
## 6                                                   0.3
## 7                                                   0.4
## 8                                                   0.4
## 9                                                  -0.2
## 10                                                  0.0
## 11                                                  0.2

Finally we change the type of each columns to INT and Numeric

CPI_Jan <- transform(CPI_Jan, 

          Indent_Level = as.integer(Indent_Level),

          Relative_importance_Dec_2023 = as.numeric(Relative_importance_Dec_2023),

          Jan_2023_Unadjusted_indexes = as.numeric(Jan_2023_Unadjusted_indexes),

          Dec_2023_Unadjusted_indexes = as.numeric(Dec_2023_Unadjusted_indexes),

          Jan_2024_Unadjusted_indexes = as.numeric(Jan_2024_Unadjusted_indexes),

          Jan_2023_Jan_2024_Unadjusted_percent_change = as.numeric(Jan_2023_Jan_2024_Unadjusted_percent_change),

          Dec_2023_Jan_2024_Unadjusted_percent_change = as.numeric(Dec_2023_Jan_2024_Unadjusted_percent_change),

          Oct_2023_Nov_2023_Seasonally_adjusted_percent_change = as.numeric(Oct_2023_Nov_2023_Seasonally_adjusted_percent_change),

          Nov_2023_Dec_2023_Seasonally_adjusted_percent_change = as.numeric(Nov_2023_Dec_2023_Seasonally_adjusted_percent_change),

          Dec_2023_Jan_2024_Seasonally_adjusted_percent_change = as.numeric(Dec_2023_Jan_2024_Seasonally_adjusted_percent_change))

Analyzing the data:

in the next lines, we will answer the question: which categories saw the biggest from December to Jan. Amother part of tidying the data will be using the filter function to analyse the subgroups only which will give us accurate data.

Data shows that commodities expenditures such as Cereals, bakery products, as well as Energy and gasoline saw the least change.

CPI_Jan %>%

  filter(Indent_Level == 3) %>%

  arrange(Dec_2023_Jan_2024_Seasonally_adjusted_percent_change)

##    Indent_Level                          Expenditure_category
## 1             3                                      Fuel oil
## 2             3                          Used cars and trucks
## 3             3                                    Motor fuel
## 4             3                                       Apparel
## 5             3                   Medical care commodities(1)
## 6             3                   Cereals and bakery products
## 7             3                Meats, poultry, fish, and eggs
## 8             3                                  New vehicles
## 9             3                    Dairy and related products
## 10            3                           Alcoholic beverages
## 11            3               Tobacco and smoking products(1)
## 12            3                         Fruits and vegetables
## 13            3                            Other food at home
## 14            3                                       Shelter
## 15            3                         Medical care services
## 16            3                       Transportation services
## 17            3 Nonalcoholic beverages and beverage materials
## 18            3                                   Electricity
## 19            3                   Utility (piped) gas service
##    Relative_importance_Dec_2023 Jan_2023_Unadjusted_indexes
## 1                         0.084                     455.595
## 2                         2.012                     185.857
## 3                         3.372                     297.413
## 4                         2.512                     127.875
## 5                         1.489                     395.981
## 6                         1.066                     349.294
## 7                         1.722                     322.737
## 8                         3.684                     177.276
## 9                         0.748                     272.040
## 10                        0.854                     282.286
## 11                        0.542                    1388.790
## 12                        1.410                     351.029
## 13                        2.193                     264.746
## 14                       36.191                     369.585
## 15                        6.515                     601.551
## 16                        6.294                     376.743
## 17                        1.027                     213.359
## 18                        2.428                     266.528
## 19                        0.688                     285.407
##    Dec_2023_Unadjusted_indexes Jan_2024_Unadjusted_indexes
## 1                      393.189                     390.877
## 2                      186.383                     179.410
## 3                      282.246                     277.709
## 4                      125.794                     127.946
## 5                      410.365                     407.879
## 6                      353.844                     354.532
## 7                      320.143                     319.752
## 8                      178.269                     178.595
## 9                      267.889                     268.941
## 10                     287.873                     288.758
## 11                    1486.900                    1491.538
## 12                     350.250                     354.798
## 13                     270.223                     271.600
## 14                     389.433                     391.896
## 15                     599.464                     605.257
## 16                     409.749                     412.643
## 17                     215.872                     220.573
## 18                     269.170                     276.698
## 19                     230.862                     234.515
##    Jan_2023_Jan_2024_Unadjusted_percent_change
## 1                                        -14.2
## 2                                         -3.5
## 3                                         -6.6
## 4                                          0.1
## 5                                          3.0
## 6                                          1.5
## 7                                         -0.9
## 8                                          0.7
## 9                                         -1.1
## 10                                         2.3
## 11                                         7.4
## 12                                         1.1
## 13                                         2.6
## 14                                         6.0
## 15                                         0.6
## 16                                         9.5
## 17                                         3.4
## 18                                         3.8
## 19                                       -17.8
##    Dec_2023_Jan_2024_Unadjusted_percent_change
## 1                                         -0.6
## 2                                         -3.7
## 3                                         -1.6
## 4                                          1.7
## 5                                         -0.6
## 6                                          0.2
## 7                                         -0.1
## 8                                          0.2
## 9                                          0.4
## 10                                         0.3
## 11                                         0.3
## 12                                         1.3
## 13                                         0.5
## 14                                         0.6
## 15                                         1.0
## 16                                         0.7
## 17                                         2.2
## 18                                         2.8
## 19                                         1.6
##    Oct_2023_Nov_2023_Seasonally_adjusted_percent_change
## 1                                                  -1.1
## 2                                                   1.4
## 3                                                  -4.0
## 4                                                  -0.6
## 5                                                   0.5
## 6                                                   0.3
## 7                                                  -0.2
## 8                                                   0.0
## 9                                                   0.0
## 10                                                 -0.1
## 11                                                  1.1
## 12                                                  0.1
## 13                                                 -0.2
## 14                                                  0.4
## 15                                                  0.5
## 16                                                  1.0
## 17                                                  0.4
## 18                                                  1.0
## 19                                                  1.2
##    Nov_2023_Dec_2023_Seasonally_adjusted_percent_change
## 1                                                  -3.3
## 2                                                   0.6
## 3                                                  -0.6
## 4                                                   0.0
## 5                                                  -0.1
## 6                                                  -0.1
## 7                                                   0.3
## 8                                                   0.2
## 9                                                   0.1
## 10                                                  0.1
## 11                                                 -0.1
## 12                                                  0.0
## 13                                                  0.2
## 14                                                  0.4
## 15                                                  0.5
## 16                                                  0.1
## 17                                                  0.2
## 18                                                  0.6
## 19                                                 -0.6
##    Dec_2023_Jan_2024_Seasonally_adjusted_percent_change
## 1                                                  -4.5
## 2                                                  -3.4
## 3                                                  -3.3
## 4                                                  -0.7
## 5                                                  -0.6
## 6                                                  -0.2
## 7                                                   0.0
## 8                                                   0.0
## 9                                                   0.2
## 10                                                  0.3
## 11                                                  0.3
## 12                                                  0.4
## 13                                                  0.6
## 14                                                  0.6
## 15                                                  0.7
## 16                                                  1.0
## 17                                                  1.2
## 18                                                  1.2
## 19                                                  2.0

To view expenditures that saw the most change we can use the following code.

CPI_Jan %>%

  filter(Indent_Level == 3) %>%

  arrange(desc(Dec_2023_Jan_2024_Seasonally_adjusted_percent_change))

##    Indent_Level                          Expenditure_category
## 1             3                   Utility (piped) gas service
## 2             3 Nonalcoholic beverages and beverage materials
## 3             3                                   Electricity
## 4             3                       Transportation services
## 5             3                         Medical care services
## 6             3                            Other food at home
## 7             3                                       Shelter
## 8             3                         Fruits and vegetables
## 9             3                           Alcoholic beverages
## 10            3               Tobacco and smoking products(1)
## 11            3                    Dairy and related products
## 12            3                Meats, poultry, fish, and eggs
## 13            3                                  New vehicles
## 14            3                   Cereals and bakery products
## 15            3                   Medical care commodities(1)
## 16            3                                       Apparel
## 17            3                                    Motor fuel
## 18            3                          Used cars and trucks
## 19            3                                      Fuel oil
##    Relative_importance_Dec_2023 Jan_2023_Unadjusted_indexes
## 1                         0.688                     285.407
## 2                         1.027                     213.359
## 3                         2.428                     266.528
## 4                         6.294                     376.743
## 5                         6.515                     601.551
## 6                         2.193                     264.746
## 7                        36.191                     369.585
## 8                         1.410                     351.029
## 9                         0.854                     282.286
## 10                        0.542                    1388.790
## 11                        0.748                     272.040
## 12                        1.722                     322.737
## 13                        3.684                     177.276
## 14                        1.066                     349.294
## 15                        1.489                     395.981
## 16                        2.512                     127.875
## 17                        3.372                     297.413
## 18                        2.012                     185.857
## 19                        0.084                     455.595
##    Dec_2023_Unadjusted_indexes Jan_2024_Unadjusted_indexes
## 1                      230.862                     234.515
## 2                      215.872                     220.573
## 3                      269.170                     276.698
## 4                      409.749                     412.643
## 5                      599.464                     605.257
## 6                      270.223                     271.600
## 7                      389.433                     391.896
## 8                      350.250                     354.798
## 9                      287.873                     288.758
## 10                    1486.900                    1491.538
## 11                     267.889                     268.941
## 12                     320.143                     319.752
## 13                     178.269                     178.595
## 14                     353.844                     354.532
## 15                     410.365                     407.879
## 16                     125.794                     127.946
## 17                     282.246                     277.709
## 18                     186.383                     179.410
## 19                     393.189                     390.877
##    Jan_2023_Jan_2024_Unadjusted_percent_change
## 1                                        -17.8
## 2                                          3.4
## 3                                          3.8
## 4                                          9.5
## 5                                          0.6
## 6                                          2.6
## 7                                          6.0
## 8                                          1.1
## 9                                          2.3
## 10                                         7.4
## 11                                        -1.1
## 12                                        -0.9
## 13                                         0.7
## 14                                         1.5
## 15                                         3.0
## 16                                         0.1
## 17                                        -6.6
## 18                                        -3.5
## 19                                       -14.2
##    Dec_2023_Jan_2024_Unadjusted_percent_change
## 1                                          1.6
## 2                                          2.2
## 3                                          2.8
## 4                                          0.7
## 5                                          1.0
## 6                                          0.5
## 7                                          0.6
## 8                                          1.3
## 9                                          0.3
## 10                                         0.3
## 11                                         0.4
## 12                                        -0.1
## 13                                         0.2
## 14                                         0.2
## 15                                        -0.6
## 16                                         1.7
## 17                                        -1.6
## 18                                        -3.7
## 19                                        -0.6
##    Oct_2023_Nov_2023_Seasonally_adjusted_percent_change
## 1                                                   1.2
## 2                                                   0.4
## 3                                                   1.0
## 4                                                   1.0
## 5                                                   0.5
## 6                                                  -0.2
## 7                                                   0.4
## 8                                                   0.1
## 9                                                  -0.1
## 10                                                  1.1
## 11                                                  0.0
## 12                                                 -0.2
## 13                                                  0.0
## 14                                                  0.3
## 15                                                  0.5
## 16                                                 -0.6
## 17                                                 -4.0
## 18                                                  1.4
## 19                                                 -1.1
##    Nov_2023_Dec_2023_Seasonally_adjusted_percent_change
## 1                                                  -0.6
## 2                                                   0.2
## 3                                                   0.6
## 4                                                   0.1
## 5                                                   0.5
## 6                                                   0.2
## 7                                                   0.4
## 8                                                   0.0
## 9                                                   0.1
## 10                                                 -0.1
## 11                                                  0.1
## 12                                                  0.3
## 13                                                  0.2
## 14                                                 -0.1
## 15                                                 -0.1
## 16                                                  0.0
## 17                                                 -0.6
## 18                                                  0.6
## 19                                                 -3.3
##    Dec_2023_Jan_2024_Seasonally_adjusted_percent_change
## 1                                                   2.0
## 2                                                   1.2
## 3                                                   1.2
## 4                                                   1.0
## 5                                                   0.7
## 6                                                   0.6
## 7                                                   0.6
## 8                                                   0.4
## 9                                                   0.3
## 10                                                  0.3
## 11                                                  0.2
## 12                                                  0.0
## 13                                                  0.0
## 14                                                 -0.2
## 15                                                 -0.6
## 16                                                 -0.7
## 17                                                 -3.3
## 18                                                 -3.4
## 19                                                 -4.5

Below we analyze the major categories (indent level = 3);

CPI_Jan %>% 

  filter(Indent_Level == 3) %>%

  arrange(desc(Relative_importance_Dec_2023))

##    Indent_Level                          Expenditure_category
## 1             3                                       Shelter
## 2             3                         Medical care services
## 3             3                       Transportation services
## 4             3                                  New vehicles
## 5             3                                    Motor fuel
## 6             3                                       Apparel
## 7             3                                   Electricity
## 8             3                            Other food at home
## 9             3                          Used cars and trucks
## 10            3                Meats, poultry, fish, and eggs
## 11            3                   Medical care commodities(1)
## 12            3                         Fruits and vegetables
## 13            3                   Cereals and bakery products
## 14            3 Nonalcoholic beverages and beverage materials
## 15            3                           Alcoholic beverages
## 16            3                    Dairy and related products
## 17            3                   Utility (piped) gas service
## 18            3               Tobacco and smoking products(1)
## 19            3                                      Fuel oil
##    Relative_importance_Dec_2023 Jan_2023_Unadjusted_indexes
## 1                        36.191                     369.585
## 2                         6.515                     601.551
## 3                         6.294                     376.743
## 4                         3.684                     177.276
## 5                         3.372                     297.413
## 6                         2.512                     127.875
## 7                         2.428                     266.528
## 8                         2.193                     264.746
## 9                         2.012                     185.857
## 10                        1.722                     322.737
## 11                        1.489                     395.981
## 12                        1.410                     351.029
## 13                        1.066                     349.294
## 14                        1.027                     213.359
## 15                        0.854                     282.286
## 16                        0.748                     272.040
## 17                        0.688                     285.407
## 18                        0.542                    1388.790
## 19                        0.084                     455.595
##    Dec_2023_Unadjusted_indexes Jan_2024_Unadjusted_indexes
## 1                      389.433                     391.896
## 2                      599.464                     605.257
## 3                      409.749                     412.643
## 4                      178.269                     178.595
## 5                      282.246                     277.709
## 6                      125.794                     127.946
## 7                      269.170                     276.698
## 8                      270.223                     271.600
## 9                      186.383                     179.410
## 10                     320.143                     319.752
## 11                     410.365                     407.879
## 12                     350.250                     354.798
## 13                     353.844                     354.532
## 14                     215.872                     220.573
## 15                     287.873                     288.758
## 16                     267.889                     268.941
## 17                     230.862                     234.515
## 18                    1486.900                    1491.538
## 19                     393.189                     390.877
##    Jan_2023_Jan_2024_Unadjusted_percent_change
## 1                                          6.0
## 2                                          0.6
## 3                                          9.5
## 4                                          0.7
## 5                                         -6.6
## 6                                          0.1
## 7                                          3.8
## 8                                          2.6
## 9                                         -3.5
## 10                                        -0.9
## 11                                         3.0
## 12                                         1.1
## 13                                         1.5
## 14                                         3.4
## 15                                         2.3
## 16                                        -1.1
## 17                                       -17.8
## 18                                         7.4
## 19                                       -14.2
##    Dec_2023_Jan_2024_Unadjusted_percent_change
## 1                                          0.6
## 2                                          1.0
## 3                                          0.7
## 4                                          0.2
## 5                                         -1.6
## 6                                          1.7
## 7                                          2.8
## 8                                          0.5
## 9                                         -3.7
## 10                                        -0.1
## 11                                        -0.6
## 12                                         1.3
## 13                                         0.2
## 14                                         2.2
## 15                                         0.3
## 16                                         0.4
## 17                                         1.6
## 18                                         0.3
## 19                                        -0.6
##    Oct_2023_Nov_2023_Seasonally_adjusted_percent_change
## 1                                                   0.4
## 2                                                   0.5
## 3                                                   1.0
## 4                                                   0.0
## 5                                                  -4.0
## 6                                                  -0.6
## 7                                                   1.0
## 8                                                  -0.2
## 9                                                   1.4
## 10                                                 -0.2
## 11                                                  0.5
## 12                                                  0.1
## 13                                                  0.3
## 14                                                  0.4
## 15                                                 -0.1
## 16                                                  0.0
## 17                                                  1.2
## 18                                                  1.1
## 19                                                 -1.1
##    Nov_2023_Dec_2023_Seasonally_adjusted_percent_change
## 1                                                   0.4
## 2                                                   0.5
## 3                                                   0.1
## 4                                                   0.2
## 5                                                  -0.6
## 6                                                   0.0
## 7                                                   0.6
## 8                                                   0.2
## 9                                                   0.6
## 10                                                  0.3
## 11                                                 -0.1
## 12                                                  0.0
## 13                                                 -0.1
## 14                                                  0.2
## 15                                                  0.1
## 16                                                  0.1
## 17                                                 -0.6
## 18                                                 -0.1
## 19                                                 -3.3
##    Dec_2023_Jan_2024_Seasonally_adjusted_percent_change
## 1                                                   0.6
## 2                                                   0.7
## 3                                                   1.0
## 4                                                   0.0
## 5                                                  -3.3
## 6                                                  -0.7
## 7                                                   1.2
## 8                                                   0.6
## 9                                                  -3.4
## 10                                                  0.0
## 11                                                 -0.6
## 12                                                  0.4
## 13                                                 -0.2
## 14                                                  1.2
## 15                                                  0.3
## 16                                                  0.2
## 17                                                  2.0
## 18                                                  0.3
## 19                                                 -4.5

Conclusion

Conclusion: CPI data contains many more insights that I can be analyzed. Here are the questions we answered in the analysis above:

Which category saw the least change from Dec 2023 - Jan 2024? Ans: Fuel oil, Used cars and trucks, and Motor fuel. A: Fuel oil, Used cars and trucks, and Motor fuel.

Which category saw the biggest change from Dec 2023 - Jan 2024? Ans: Utility Gas services, electricity, and transportation services. A:Utility Gas services, electricity, and transportation services.

Which category saw the biggest change from Jan 2023 - Jan 2024? Ans: Transportation, Tobacco and smoking products, and Shelter. A: Transportation, Tobacco and smoking products, and Shelter.

Project_2_Data_607

Natalie Kalukeerthie and Bishoy Sokkar

2024-02-27

Dataset 1 - US Census: Population growth 2015 - 2020 of Southern States

Analysis

Conclusion

Dataset 2 - Global Inflation Data

Analysis

Conclusion

Dataset #3 - CPI data for January 2024

Conclusion