Dataset 1 - Bureau of Labor Statistics - CPI Report for Jan2024

First: Tidying the data.

to get started, we load the tidyverse package, which includes all the packages we need for the assignment and load the first dataset: CPI data for January 2024.

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
CPI_Jan <- read.csv("https://raw.githubusercontent.com/sokkarbishoy/DATA607/main/news-release-table1-202401.csv")
head(CPI_Jan)
##              X
## 1             
## 2             
## 3 Indent Level
## 4             
## 5             
## 6            0
##   Table.1..Consumer.Price.Index.for.All.Urban.Consumers..CPI.U...U.S..city.average..by.expenditure.category..January.2024
## 1                                                                                   [1982-84=100, unless otherwise noted]
## 2                                                                                                                        
## 3                                                                                                    Expenditure category
## 4                                                                                                                        
## 5                                                                                                                        
## 6                                                                                                               All items
##                                X.1                X.2                X.3
## 1                                                                       
## 2                                                                       
## 3 Relative\nimportance\nDec.\n2023 Unadjusted indexes Unadjusted indexes
## 4                                          Jan.\n2023         Dec.\n2023
## 5                                                                       
## 6                          100.000            299.170            306.746
##                  X.4                       X.5                       X.6
## 1                                                                       
## 2                                                                       
## 3 Unadjusted indexes Unadjusted percent change Unadjusted percent change
## 4         Jan.\n2024   Jan.\n2023-\nJan.\n2024   Dec.\n2023-\nJan.\n2024
## 5                                                                       
## 6            308.417                       3.1                       0.5
##                                  X.7                                X.8
## 1                                                                      
## 2                                                                      
## 3 Seasonally adjusted percent change Seasonally adjusted percent change
## 4            Oct.\n2023-\nNov.\n2023            Nov.\n2023-\nDec.\n2023
## 5                                                                      
## 6                                0.2                                0.2
##                                  X.9
## 1                                   
## 2                                   
## 3 Seasonally adjusted percent change
## 4            Dec.\n2023-\nJan.\n2024
## 5                                   
## 6                                0.3

Tidying the data

We start tidying the data removing the first two rows, last three rows and empty rows;

CPI_Jan <- CPI_Jan[-c(1:5, 16, 25, 45:49), ]

tail(CPI_Jan)
##    X
## 39 4
## 40 4
## 41 3
## 42 4
## 43 4
## 44 4
##    Table.1..Consumer.Price.Index.for.All.Urban.Consumers..CPI.U...U.S..city.average..by.expenditure.category..January.2024
## 39                                                                                                 Physicians' services(1)
## 40                                                                                                 Hospital services(1)(3)
## 41                                                                                                 Transportation services
## 42                                                                                 Motor vehicle maintenance and repair(1)
## 43                                                                                                 Motor vehicle insurance
## 44                                                                                                           Airline fares
##      X.1     X.2     X.3     X.4  X.5 X.6  X.7  X.8 X.9
## 39 1.828 415.197 412.930 415.427  0.1 0.6  0.6  0.2 0.6
## 40 1.987 385.064 404.407 410.695  6.7 1.6  0.1  0.5 1.6
## 41 6.294 376.743 409.749 412.643  9.5 0.7  1.0  0.1 1.0
## 42 1.233 371.780 392.897 396.004  6.5 0.8  0.3 -0.3 0.8
## 43 2.794 658.513 780.284 794.142 20.6 1.8  1.2  1.7 1.4
## 44 0.751 264.629 243.348 247.606 -6.4 1.7 -0.2  0.9 1.4

Next we change the columns names to be easier to identify

new_columns <- c("Indent_Level",
                 "Expenditure_category",
                 "Relative_importance_Dec_2023",
                 "Jan_2023_Unadjusted_indexes", 
                 "Dec_2023_Unadjusted_indexes", 
                 "Jan_2024_Unadjusted_indexes",
                 "Jan_2023_Jan_2024_Unadjusted_percent_change",
                 "Dec_2023_Jan_2024_Unadjusted_percent_change", 
                 "Oct_2023_Nov_2023_Seasonally_adjusted_percent_change",
                 "Nov_2023_Dec_2023_Seasonally_adjusted_percent_change", 
                 "Dec_2023_Jan_2024_Seasonally_adjusted_percent_change")

colnames(CPI_Jan) <- new_columns
 
#change the type into INT
CPI_Jan$Indent_Level <- as.integer(CPI_Jan$Indent_Level)
head(CPI_Jan)
##    Indent_Level           Expenditure_category Relative_importance_Dec_2023
## 6             0                      All items                      100.000
## 7             1                           Food                       13.555
## 8             2                   Food at home                        8.167
## 9             3    Cereals and bakery products                        1.066
## 10            3 Meats, poultry, fish, and eggs                        1.722
## 11            3     Dairy and related products                        0.748
##    Jan_2023_Unadjusted_indexes Dec_2023_Unadjusted_indexes
## 6                      299.170                     306.746
## 7                      319.136                     325.409
## 8                      301.435                     303.005
## 9                      349.294                     353.844
## 10                     322.737                     320.143
## 11                     272.040                     267.889
##    Jan_2024_Unadjusted_indexes Jan_2023_Jan_2024_Unadjusted_percent_change
## 6                      308.417                                         3.1
## 7                      327.327                                         2.6
## 8                      305.037                                         1.2
## 9                      354.532                                         1.5
## 10                     319.752                                        -0.9
## 11                     268.941                                        -1.1
##    Dec_2023_Jan_2024_Unadjusted_percent_change
## 6                                          0.5
## 7                                          0.6
## 8                                          0.7
## 9                                          0.2
## 10                                        -0.1
## 11                                         0.4
##    Oct_2023_Nov_2023_Seasonally_adjusted_percent_change
## 6                                                   0.2
## 7                                                   0.2
## 8                                                   0.0
## 9                                                   0.3
## 10                                                 -0.2
## 11                                                  0.0
##    Nov_2023_Dec_2023_Seasonally_adjusted_percent_change
## 6                                                   0.2
## 7                                                   0.2
## 8                                                   0.1
## 9                                                  -0.1
## 10                                                  0.3
## 11                                                  0.1
##    Dec_2023_Jan_2024_Seasonally_adjusted_percent_change
## 6                                                   0.3
## 7                                                   0.4
## 8                                                   0.4
## 9                                                  -0.2
## 10                                                  0.0
## 11                                                  0.2

Finally we change the type of each columns to INT and Numeric

CPI_Jan <- transform(CPI_Jan, 
          Indent_Level = as.integer(Indent_Level),
          Relative_importance_Dec_2023 = as.numeric(Relative_importance_Dec_2023),
          Jan_2023_Unadjusted_indexes = as.numeric(Jan_2023_Unadjusted_indexes),
          Dec_2023_Unadjusted_indexes = as.numeric(Dec_2023_Unadjusted_indexes),
          Jan_2024_Unadjusted_indexes = as.numeric(Jan_2024_Unadjusted_indexes),
          Jan_2023_Jan_2024_Unadjusted_percent_change = as.numeric(Jan_2023_Jan_2024_Unadjusted_percent_change),
          Dec_2023_Jan_2024_Unadjusted_percent_change = as.numeric(Dec_2023_Jan_2024_Unadjusted_percent_change),
          Oct_2023_Nov_2023_Seasonally_adjusted_percent_change = as.numeric(Oct_2023_Nov_2023_Seasonally_adjusted_percent_change),
          Nov_2023_Dec_2023_Seasonally_adjusted_percent_change = as.numeric(Nov_2023_Dec_2023_Seasonally_adjusted_percent_change),
          Dec_2023_Jan_2024_Seasonally_adjusted_percent_change = as.numeric(Dec_2023_Jan_2024_Seasonally_adjusted_percent_change))

Analysis

in the next lines, we will answer the question: which categories saw the biggest from December to Jan. Amother part of tidying the data will be using the filter function to analyse the subgroups only which will give us accurate data.

Data shows that commodities expenditures such as Cereals, bakery products, as well as Energy and gasoline saw the least change.

CPI_Jan %>%
  filter(Indent_Level == 3) %>%
  arrange(Dec_2023_Jan_2024_Seasonally_adjusted_percent_change)
##    Indent_Level                          Expenditure_category
## 1             3                                      Fuel oil
## 2             3                          Used cars and trucks
## 3             3                                    Motor fuel
## 4             3                                       Apparel
## 5             3                   Medical care commodities(1)
## 6             3                   Cereals and bakery products
## 7             3                Meats, poultry, fish, and eggs
## 8             3                                  New vehicles
## 9             3                    Dairy and related products
## 10            3                           Alcoholic beverages
## 11            3               Tobacco and smoking products(1)
## 12            3                         Fruits and vegetables
## 13            3                            Other food at home
## 14            3                                       Shelter
## 15            3                         Medical care services
## 16            3                       Transportation services
## 17            3 Nonalcoholic beverages and beverage materials
## 18            3                                   Electricity
## 19            3                   Utility (piped) gas service
##    Relative_importance_Dec_2023 Jan_2023_Unadjusted_indexes
## 1                         0.084                     455.595
## 2                         2.012                     185.857
## 3                         3.372                     297.413
## 4                         2.512                     127.875
## 5                         1.489                     395.981
## 6                         1.066                     349.294
## 7                         1.722                     322.737
## 8                         3.684                     177.276
## 9                         0.748                     272.040
## 10                        0.854                     282.286
## 11                        0.542                    1388.790
## 12                        1.410                     351.029
## 13                        2.193                     264.746
## 14                       36.191                     369.585
## 15                        6.515                     601.551
## 16                        6.294                     376.743
## 17                        1.027                     213.359
## 18                        2.428                     266.528
## 19                        0.688                     285.407
##    Dec_2023_Unadjusted_indexes Jan_2024_Unadjusted_indexes
## 1                      393.189                     390.877
## 2                      186.383                     179.410
## 3                      282.246                     277.709
## 4                      125.794                     127.946
## 5                      410.365                     407.879
## 6                      353.844                     354.532
## 7                      320.143                     319.752
## 8                      178.269                     178.595
## 9                      267.889                     268.941
## 10                     287.873                     288.758
## 11                    1486.900                    1491.538
## 12                     350.250                     354.798
## 13                     270.223                     271.600
## 14                     389.433                     391.896
## 15                     599.464                     605.257
## 16                     409.749                     412.643
## 17                     215.872                     220.573
## 18                     269.170                     276.698
## 19                     230.862                     234.515
##    Jan_2023_Jan_2024_Unadjusted_percent_change
## 1                                        -14.2
## 2                                         -3.5
## 3                                         -6.6
## 4                                          0.1
## 5                                          3.0
## 6                                          1.5
## 7                                         -0.9
## 8                                          0.7
## 9                                         -1.1
## 10                                         2.3
## 11                                         7.4
## 12                                         1.1
## 13                                         2.6
## 14                                         6.0
## 15                                         0.6
## 16                                         9.5
## 17                                         3.4
## 18                                         3.8
## 19                                       -17.8
##    Dec_2023_Jan_2024_Unadjusted_percent_change
## 1                                         -0.6
## 2                                         -3.7
## 3                                         -1.6
## 4                                          1.7
## 5                                         -0.6
## 6                                          0.2
## 7                                         -0.1
## 8                                          0.2
## 9                                          0.4
## 10                                         0.3
## 11                                         0.3
## 12                                         1.3
## 13                                         0.5
## 14                                         0.6
## 15                                         1.0
## 16                                         0.7
## 17                                         2.2
## 18                                         2.8
## 19                                         1.6
##    Oct_2023_Nov_2023_Seasonally_adjusted_percent_change
## 1                                                  -1.1
## 2                                                   1.4
## 3                                                  -4.0
## 4                                                  -0.6
## 5                                                   0.5
## 6                                                   0.3
## 7                                                  -0.2
## 8                                                   0.0
## 9                                                   0.0
## 10                                                 -0.1
## 11                                                  1.1
## 12                                                  0.1
## 13                                                 -0.2
## 14                                                  0.4
## 15                                                  0.5
## 16                                                  1.0
## 17                                                  0.4
## 18                                                  1.0
## 19                                                  1.2
##    Nov_2023_Dec_2023_Seasonally_adjusted_percent_change
## 1                                                  -3.3
## 2                                                   0.6
## 3                                                  -0.6
## 4                                                   0.0
## 5                                                  -0.1
## 6                                                  -0.1
## 7                                                   0.3
## 8                                                   0.2
## 9                                                   0.1
## 10                                                  0.1
## 11                                                 -0.1
## 12                                                  0.0
## 13                                                  0.2
## 14                                                  0.4
## 15                                                  0.5
## 16                                                  0.1
## 17                                                  0.2
## 18                                                  0.6
## 19                                                 -0.6
##    Dec_2023_Jan_2024_Seasonally_adjusted_percent_change
## 1                                                  -4.5
## 2                                                  -3.4
## 3                                                  -3.3
## 4                                                  -0.7
## 5                                                  -0.6
## 6                                                  -0.2
## 7                                                   0.0
## 8                                                   0.0
## 9                                                   0.2
## 10                                                  0.3
## 11                                                  0.3
## 12                                                  0.4
## 13                                                  0.6
## 14                                                  0.6
## 15                                                  0.7
## 16                                                  1.0
## 17                                                  1.2
## 18                                                  1.2
## 19                                                  2.0

To view expenditures that saw the most change we can use the following code.

CPI_Jan %>%
  filter(Indent_Level == 3) %>%
  arrange(desc(Dec_2023_Jan_2024_Seasonally_adjusted_percent_change))
##    Indent_Level                          Expenditure_category
## 1             3                   Utility (piped) gas service
## 2             3 Nonalcoholic beverages and beverage materials
## 3             3                                   Electricity
## 4             3                       Transportation services
## 5             3                         Medical care services
## 6             3                            Other food at home
## 7             3                                       Shelter
## 8             3                         Fruits and vegetables
## 9             3                           Alcoholic beverages
## 10            3               Tobacco and smoking products(1)
## 11            3                    Dairy and related products
## 12            3                Meats, poultry, fish, and eggs
## 13            3                                  New vehicles
## 14            3                   Cereals and bakery products
## 15            3                   Medical care commodities(1)
## 16            3                                       Apparel
## 17            3                                    Motor fuel
## 18            3                          Used cars and trucks
## 19            3                                      Fuel oil
##    Relative_importance_Dec_2023 Jan_2023_Unadjusted_indexes
## 1                         0.688                     285.407
## 2                         1.027                     213.359
## 3                         2.428                     266.528
## 4                         6.294                     376.743
## 5                         6.515                     601.551
## 6                         2.193                     264.746
## 7                        36.191                     369.585
## 8                         1.410                     351.029
## 9                         0.854                     282.286
## 10                        0.542                    1388.790
## 11                        0.748                     272.040
## 12                        1.722                     322.737
## 13                        3.684                     177.276
## 14                        1.066                     349.294
## 15                        1.489                     395.981
## 16                        2.512                     127.875
## 17                        3.372                     297.413
## 18                        2.012                     185.857
## 19                        0.084                     455.595
##    Dec_2023_Unadjusted_indexes Jan_2024_Unadjusted_indexes
## 1                      230.862                     234.515
## 2                      215.872                     220.573
## 3                      269.170                     276.698
## 4                      409.749                     412.643
## 5                      599.464                     605.257
## 6                      270.223                     271.600
## 7                      389.433                     391.896
## 8                      350.250                     354.798
## 9                      287.873                     288.758
## 10                    1486.900                    1491.538
## 11                     267.889                     268.941
## 12                     320.143                     319.752
## 13                     178.269                     178.595
## 14                     353.844                     354.532
## 15                     410.365                     407.879
## 16                     125.794                     127.946
## 17                     282.246                     277.709
## 18                     186.383                     179.410
## 19                     393.189                     390.877
##    Jan_2023_Jan_2024_Unadjusted_percent_change
## 1                                        -17.8
## 2                                          3.4
## 3                                          3.8
## 4                                          9.5
## 5                                          0.6
## 6                                          2.6
## 7                                          6.0
## 8                                          1.1
## 9                                          2.3
## 10                                         7.4
## 11                                        -1.1
## 12                                        -0.9
## 13                                         0.7
## 14                                         1.5
## 15                                         3.0
## 16                                         0.1
## 17                                        -6.6
## 18                                        -3.5
## 19                                       -14.2
##    Dec_2023_Jan_2024_Unadjusted_percent_change
## 1                                          1.6
## 2                                          2.2
## 3                                          2.8
## 4                                          0.7
## 5                                          1.0
## 6                                          0.5
## 7                                          0.6
## 8                                          1.3
## 9                                          0.3
## 10                                         0.3
## 11                                         0.4
## 12                                        -0.1
## 13                                         0.2
## 14                                         0.2
## 15                                        -0.6
## 16                                         1.7
## 17                                        -1.6
## 18                                        -3.7
## 19                                        -0.6
##    Oct_2023_Nov_2023_Seasonally_adjusted_percent_change
## 1                                                   1.2
## 2                                                   0.4
## 3                                                   1.0
## 4                                                   1.0
## 5                                                   0.5
## 6                                                  -0.2
## 7                                                   0.4
## 8                                                   0.1
## 9                                                  -0.1
## 10                                                  1.1
## 11                                                  0.0
## 12                                                 -0.2
## 13                                                  0.0
## 14                                                  0.3
## 15                                                  0.5
## 16                                                 -0.6
## 17                                                 -4.0
## 18                                                  1.4
## 19                                                 -1.1
##    Nov_2023_Dec_2023_Seasonally_adjusted_percent_change
## 1                                                  -0.6
## 2                                                   0.2
## 3                                                   0.6
## 4                                                   0.1
## 5                                                   0.5
## 6                                                   0.2
## 7                                                   0.4
## 8                                                   0.0
## 9                                                   0.1
## 10                                                 -0.1
## 11                                                  0.1
## 12                                                  0.3
## 13                                                  0.2
## 14                                                 -0.1
## 15                                                 -0.1
## 16                                                  0.0
## 17                                                 -0.6
## 18                                                  0.6
## 19                                                 -3.3
##    Dec_2023_Jan_2024_Seasonally_adjusted_percent_change
## 1                                                   2.0
## 2                                                   1.2
## 3                                                   1.2
## 4                                                   1.0
## 5                                                   0.7
## 6                                                   0.6
## 7                                                   0.6
## 8                                                   0.4
## 9                                                   0.3
## 10                                                  0.3
## 11                                                  0.2
## 12                                                  0.0
## 13                                                  0.0
## 14                                                 -0.2
## 15                                                 -0.6
## 16                                                 -0.7
## 17                                                 -3.3
## 18                                                 -3.4
## 19                                                 -4.5

Below we analyze the major categories (indent level = 3);

CPI_Jan %>% 
  filter(Indent_Level == 3) %>%
  arrange(desc(Jan_2023_Jan_2024_Unadjusted_percent_change))
##    Indent_Level                          Expenditure_category
## 1             3                       Transportation services
## 2             3               Tobacco and smoking products(1)
## 3             3                                       Shelter
## 4             3                                   Electricity
## 5             3 Nonalcoholic beverages and beverage materials
## 6             3                   Medical care commodities(1)
## 7             3                            Other food at home
## 8             3                           Alcoholic beverages
## 9             3                   Cereals and bakery products
## 10            3                         Fruits and vegetables
## 11            3                                  New vehicles
## 12            3                         Medical care services
## 13            3                                       Apparel
## 14            3                Meats, poultry, fish, and eggs
## 15            3                    Dairy and related products
## 16            3                          Used cars and trucks
## 17            3                                    Motor fuel
## 18            3                                      Fuel oil
## 19            3                   Utility (piped) gas service
##    Relative_importance_Dec_2023 Jan_2023_Unadjusted_indexes
## 1                         6.294                     376.743
## 2                         0.542                    1388.790
## 3                        36.191                     369.585
## 4                         2.428                     266.528
## 5                         1.027                     213.359
## 6                         1.489                     395.981
## 7                         2.193                     264.746
## 8                         0.854                     282.286
## 9                         1.066                     349.294
## 10                        1.410                     351.029
## 11                        3.684                     177.276
## 12                        6.515                     601.551
## 13                        2.512                     127.875
## 14                        1.722                     322.737
## 15                        0.748                     272.040
## 16                        2.012                     185.857
## 17                        3.372                     297.413
## 18                        0.084                     455.595
## 19                        0.688                     285.407
##    Dec_2023_Unadjusted_indexes Jan_2024_Unadjusted_indexes
## 1                      409.749                     412.643
## 2                     1486.900                    1491.538
## 3                      389.433                     391.896
## 4                      269.170                     276.698
## 5                      215.872                     220.573
## 6                      410.365                     407.879
## 7                      270.223                     271.600
## 8                      287.873                     288.758
## 9                      353.844                     354.532
## 10                     350.250                     354.798
## 11                     178.269                     178.595
## 12                     599.464                     605.257
## 13                     125.794                     127.946
## 14                     320.143                     319.752
## 15                     267.889                     268.941
## 16                     186.383                     179.410
## 17                     282.246                     277.709
## 18                     393.189                     390.877
## 19                     230.862                     234.515
##    Jan_2023_Jan_2024_Unadjusted_percent_change
## 1                                          9.5
## 2                                          7.4
## 3                                          6.0
## 4                                          3.8
## 5                                          3.4
## 6                                          3.0
## 7                                          2.6
## 8                                          2.3
## 9                                          1.5
## 10                                         1.1
## 11                                         0.7
## 12                                         0.6
## 13                                         0.1
## 14                                        -0.9
## 15                                        -1.1
## 16                                        -3.5
## 17                                        -6.6
## 18                                       -14.2
## 19                                       -17.8
##    Dec_2023_Jan_2024_Unadjusted_percent_change
## 1                                          0.7
## 2                                          0.3
## 3                                          0.6
## 4                                          2.8
## 5                                          2.2
## 6                                         -0.6
## 7                                          0.5
## 8                                          0.3
## 9                                          0.2
## 10                                         1.3
## 11                                         0.2
## 12                                         1.0
## 13                                         1.7
## 14                                        -0.1
## 15                                         0.4
## 16                                        -3.7
## 17                                        -1.6
## 18                                        -0.6
## 19                                         1.6
##    Oct_2023_Nov_2023_Seasonally_adjusted_percent_change
## 1                                                   1.0
## 2                                                   1.1
## 3                                                   0.4
## 4                                                   1.0
## 5                                                   0.4
## 6                                                   0.5
## 7                                                  -0.2
## 8                                                  -0.1
## 9                                                   0.3
## 10                                                  0.1
## 11                                                  0.0
## 12                                                  0.5
## 13                                                 -0.6
## 14                                                 -0.2
## 15                                                  0.0
## 16                                                  1.4
## 17                                                 -4.0
## 18                                                 -1.1
## 19                                                  1.2
##    Nov_2023_Dec_2023_Seasonally_adjusted_percent_change
## 1                                                   0.1
## 2                                                  -0.1
## 3                                                   0.4
## 4                                                   0.6
## 5                                                   0.2
## 6                                                  -0.1
## 7                                                   0.2
## 8                                                   0.1
## 9                                                  -0.1
## 10                                                  0.0
## 11                                                  0.2
## 12                                                  0.5
## 13                                                  0.0
## 14                                                  0.3
## 15                                                  0.1
## 16                                                  0.6
## 17                                                 -0.6
## 18                                                 -3.3
## 19                                                 -0.6
##    Dec_2023_Jan_2024_Seasonally_adjusted_percent_change
## 1                                                   1.0
## 2                                                   0.3
## 3                                                   0.6
## 4                                                   1.2
## 5                                                   1.2
## 6                                                  -0.6
## 7                                                   0.6
## 8                                                   0.3
## 9                                                  -0.2
## 10                                                  0.4
## 11                                                  0.0
## 12                                                  0.7
## 13                                                 -0.7
## 14                                                  0.0
## 15                                                  0.2
## 16                                                 -3.4
## 17                                                 -3.3
## 18                                                 -4.5
## 19                                                  2.0

###Conclusion: CPI data contains many more insights that I can be analyzed. here are the questions we answered in the analysis above:

Dataset: “Bureau of Labor Statistics - CPI Report for Jan2024”
First: Tidying the data.

Dataset 2 - US Census: Population growth 2015 - 2020 of Southern States by Natalie Kalukeerthie.

population_data <- read.csv('https://raw.githubusercontent.com/nk014914/Data-607/main/Population_data.csv')

population_data
##             Name FIPS Abbreviations X2015.Population X2016.Population
## 1        Georgia   14            GA         10067378         10189016
## 2 North Carolina   37            NC          9932862         10036881
## 3                  NA                             NA               NA
## 4 South Carolina   45            SC          4817440          4879157
## 5       Virginia   51            VA          8254218          8312400
##   X2017.Population X2018.Population X2019.Population X2020.Population
## 1         10325943         10471428         10604413         10722092
## 2         10151700         10291929         10455811         10598314
## 3               NA               NA               NA               NA
## 4          4954035          5036155          5118397          5196026
## 5          8370206          8420184          8483598          8539322
#cleaning the data

#Removing blank rows and renaming the columns to shorten for just the year
population_data <- population_data %>% 
  drop_na() %>%
  rename(state = Name, x2015 = X2015.Population, x2016 = X2016.Population, x2017 = X2017.Population, x2018 = X2018.Population, x2019 = X2019.Population, x2020 = X2020.Population)

#remove unused columns
population_data_subset <- subset(population_data, select= -c(FIPS,Abbreviations))

#transforming data - putting years into columns to make long data
population_data_subset <- population_data_subset %>% 
  gather(key = 'year', value = 'population', 2:7)

population_data_subset
##             state  year population
## 1         Georgia x2015   10067378
## 2  North Carolina x2015    9932862
## 3  South Carolina x2015    4817440
## 4        Virginia x2015    8254218
## 5         Georgia x2016   10189016
## 6  North Carolina x2016   10036881
## 7  South Carolina x2016    4879157
## 8        Virginia x2016    8312400
## 9         Georgia x2017   10325943
## 10 North Carolina x2017   10151700
## 11 South Carolina x2017    4954035
## 12       Virginia x2017    8370206
## 13        Georgia x2018   10471428
## 14 North Carolina x2018   10291929
## 15 South Carolina x2018    5036155
## 16       Virginia x2018    8420184
## 17        Georgia x2019   10604413
## 18 North Carolina x2019   10455811
## 19 South Carolina x2019    5118397
## 20       Virginia x2019    8483598
## 21        Georgia x2020   10722092
## 22 North Carolina x2020   10598314
## 23 South Carolina x2020    5196026
## 24       Virginia x2020    8539322
#remove x's in year column
population_data_final <- population_data_subset %>%
  mutate(across(c('year'), substr, 2, nchar(year)))
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `across(c("year"), substr, 2, nchar(year))`.
## Caused by warning:
## ! The `...` argument of `across()` is deprecated as of dplyr 1.1.0.
## Supply arguments directly to `.fns` through an anonymous function instead.
## 
##   # Previously
##   across(a:b, mean, na.rm = TRUE)
## 
##   # Now
##   across(a:b, \(x) mean(x, na.rm = TRUE))
population_data_final
##             state year population
## 1         Georgia 2015   10067378
## 2  North Carolina 2015    9932862
## 3  South Carolina 2015    4817440
## 4        Virginia 2015    8254218
## 5         Georgia 2016   10189016
## 6  North Carolina 2016   10036881
## 7  South Carolina 2016    4879157
## 8        Virginia 2016    8312400
## 9         Georgia 2017   10325943
## 10 North Carolina 2017   10151700
## 11 South Carolina 2017    4954035
## 12       Virginia 2017    8370206
## 13        Georgia 2018   10471428
## 14 North Carolina 2018   10291929
## 15 South Carolina 2018    5036155
## 16       Virginia 2018    8420184
## 17        Georgia 2019   10604413
## 18 North Carolina 2019   10455811
## 19 South Carolina 2019    5118397
## 20       Virginia 2019    8483598
## 21        Georgia 2020   10722092
## 22 North Carolina 2020   10598314
## 23 South Carolina 2020    5196026
## 24       Virginia 2020    8539322

Analysis

We will create a few visuals in order to dig into the raw data a bit deeper, some things I will be looking for are:

  1. Descriptive statistics of the data
  2. Compare yearly growth between the states
  3. See which year had the largest population growth
#First I need to make data as a numeric vector because my y axis was not in increasing order
transform(population_data_final, population = as.numeric(population))
##             state year population
## 1         Georgia 2015   10067378
## 2  North Carolina 2015    9932862
## 3  South Carolina 2015    4817440
## 4        Virginia 2015    8254218
## 5         Georgia 2016   10189016
## 6  North Carolina 2016   10036881
## 7  South Carolina 2016    4879157
## 8        Virginia 2016    8312400
## 9         Georgia 2017   10325943
## 10 North Carolina 2017   10151700
## 11 South Carolina 2017    4954035
## 12       Virginia 2017    8370206
## 13        Georgia 2018   10471428
## 14 North Carolina 2018   10291929
## 15 South Carolina 2018    5036155
## 16       Virginia 2018    8420184
## 17        Georgia 2019   10604413
## 18 North Carolina 2019   10455811
## 19 South Carolina 2019    5118397
## 20       Virginia 2019    8483598
## 21        Georgia 2020   10722092
## 22 North Carolina 2020   10598314
## 23 South Carolina 2020    5196026
## 24       Virginia 2020    8539322
#checking if column is now numeric
class(population_data_final$population)
## [1] "numeric"
#Determine the summary statistics on population per year of the four states
population_summary <- population_data_final %>%
  group_by(year) %>%
  summarize(mean = mean(population), median = median(population), min = min(population), max = max(population), na.rm = T) %>%
  arrange(mean)

population_summary 
## # A tibble: 6 × 6
##   year      mean   median     min      max na.rm
##   <chr>    <dbl>    <dbl>   <dbl>    <dbl> <lgl>
## 1 2015  8267974. 9093540  4817440 10067378 TRUE 
## 2 2016  8354364. 9174640. 4879157 10189016 TRUE 
## 3 2017  8450471  9260953  4954035 10325943 TRUE 
## 4 2018  8554924  9356056. 5036155 10471428 TRUE 
## 5 2019  8665555. 9469704. 5118397 10604413 TRUE 
## 6 2020  8763938. 9568818  5196026 10722092 TRUE
#visualize growth of population per state
ggplot(data=population_data_final, aes(x=year, y= population, group = state)) +
  geom_line(aes(color=state))+
  geom_point(aes(color=state))

Viewing the yearly state populations plotted, we can see a few things: - Georgia has the largest overall population every year, while South Carolina has the smallest (by a large difference in comparison to the other states.)

#Determining which year has the largest population growth

#sum of population per year
sum2015 = sum( population_data_final$population [ population_data_final$year==2015] )
sum2016 = sum( population_data_final$population [ population_data_final$year==2016] )
sum2017 = sum( population_data_final$population [ population_data_final$year==2017] )
sum2018 = sum( population_data_final$population [ population_data_final$year==2018] )
sum2019 = sum( population_data_final$population [ population_data_final$year==2019] )
sum2020 = sum( population_data_final$population [ population_data_final$year==2020] )

population_total = c(sum2015,sum2016,sum2017,sum2018,sum2019,sum2020)

#add column with years and create dataframe
year = c('2015','2016','2017','2018','2019','2020')
sum_population = as.data.frame(year)

sum_population$population_total <- population_total

sum_population
##   year population_total
## 1 2015         33071898
## 2 2016         33417454
## 3 2017         33801884
## 4 2018         34219696
## 5 2019         34662219
## 6 2020         35055754
#calculate percent change per year over year, we will use th collapse package
library(collapse)
## collapse 2.0.10, see ?`collapse-package` or ?`collapse-documentation`
## 
## Attaching package: 'collapse'
## The following object is masked from 'package:lubridate':
## 
##     is.Date
## The following object is masked from 'package:tidyr':
## 
##     replace_na
## The following object is masked from 'package:stats':
## 
##     D
sum_population |> fmutate(growth = fgrowth(population_total))
##   year population_total   growth
## 1 2015         33071898       NA
## 2 2016         33417454 1.044863
## 3 2017         33801884 1.150387
## 4 2018         34219696 1.236061
## 5 2019         34662219 1.293182
## 6 2020         35055754 1.135343

Conclusion

Based on the analysis, we can see the largest population growth of the four states combined is between 2018 o 2019 with a growth of around 1.29. The smallest growth was from 2015 to 2016 at 1.04. An interesting find I saw was that the growth percentage was steadily increasing each year until 2019-2020 we it dropped to 1.13 the second lowest population growth of all 6 years.

—— Please Help with below if possible, below is a draft and will continued and submitted later ———–——————————————————————————————————————————————

Dataset 3 - Nobel Prize Winners.

First we load the data from the stored location on my Github.

nobel <- read.csv("https://raw.githubusercontent.com/sokkarbishoy/DATA607/main/complete(1).csv")

Tidying the data

Questions to answer with the analysis:
Which field has received the largest prize amount? Which field has received the lowest?
What proportion of the awards are given to women vs men? How has this changed over time?

Looking at the data, there were some empty values in the Gender category. Upon closer look it appears to be organizations. So in the code below we will replace empty values in the gender variable to Organization, And then compare the proportion of awards given to Males, Females, and organizations.

nobel$gender[is.na(nobel$gender)] <- "Organization"

###Analysis

gender_freq <- table(nobel$gender)

gender_data <- data.frame(gender = names(gender_freq), frequency = as.numeric(gender_freq))

ggplot(gender_data, aes(x = gender, y = frequency, fill = gender)) +
  geom_bar(stat = "identity") +
  labs(title = "Frequency of Gender/Organization in Nobel Dataset",
       x = "Gender/Organization", y = "Frequency") +
  scale_fill_manual(values = c("male" = "lightblue", "female" = "lightpink", "organization" = "lightgreen"))

As of now I am working on my last dataset and will be finished before the lecture.
My work will be in tidying and removing the NA values with actual values to answer the two questions above. Please advice of the best way if possible.