The source of this data set is from the website https://github.com/kwstat/untidydata2/blob/main/inst/messydata/gdp_by_county.xlsx. In this dataset I have used Current Dollar GDP sheet.
The purpose of this dataset is to analyse how GDP of different county’s in USA is varying from 2012 to 2014.
Variable description is as shown below:
Characteristics of Dataset There are numeric and char data types. Via summary of dataset we found there are null values in linecode and year columns.
Loading all the required libraries and packages
readxl - to read xlsx file
tidyverse - to transform and manipulate data
ggplot2 - for visualisation plots
library("readxl")
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.8 ✔ dplyr 1.0.9
## ✔ tidyr 1.2.0 ✔ stringr 1.4.0
## ✔ readr 2.1.2 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(ggplot2)
library(kableExtra)
##
## Attaching package: 'kableExtra'
##
## The following object is masked from 'package:dplyr':
##
## group_rows
Data summary - str(), summary()
colnames - to change the column names
rm(list = ls())
library("readxl")
gdp <- read_excel("Assignment1-Kandru-gdp-by-county-Data-Wrangling-and-Visualization.xlsx",col_names=TRUE,skip=3,sheet=1, na = "(D)")
## New names:
## • `` -> `...1`
## • `` -> `...2`
## • `` -> `...3`
## • `` -> `...4`
## • `` -> `...5`
colnames(gdp)[c(1,2,3,4,5)] = c('FIPS','Countyname','postalcode','linecode','industry_name')
str(gdp)
## tibble [12,456 × 9] (S3: tbl_df/tbl/data.frame)
## $ FIPS : chr [1:12456] "01001" "01001" "01001" "01001" ...
## $ Countyname : chr [1:12456] "Autauga" "Autauga" "Autauga" "Autauga" ...
## $ postalcode : chr [1:12456] "AL" "AL" "AL" "AL" ...
## $ linecode : num [1:12456] 1 2 3 4 1 2 3 4 1 2 ...
## $ industry_name: chr [1:12456] "All Industries" "Private goods-producing industries" "Private services-providing industries" "Government and government enterprises" ...
## $ 2012 : num [1:12456] 1383941 286396 948490 149055 5599194 ...
## $ 2013 : num [1:12456] 1363368 310468 904599 148301 6365080 ...
## $ 2014 : num [1:12456] 1402516 323582 928438 150496 6547396 ...
## $ 2015 : num [1:12456] 1539406 346355 1037309 155742 6436107 ...
summary(gdp)
## FIPS Countyname postalcode linecode
## Length:12456 Length:12456 Length:12456 Min. :1.00
## Class :character Class :character Class :character 1st Qu.:1.75
## Mode :character Mode :character Mode :character Median :2.50
## Mean :2.50
## 3rd Qu.:3.25
## Max. :4.00
## NA's :4
## industry_name 2012 2013 2014
## Length:12456 Min. : 735 Min. : 1069 Min. : 1088
## Class :character 1st Qu.: 99618 1st Qu.: 107671 1st Qu.: 107457
## Mode :character Median : 300890 Median : 323052 Median : 329138
## Mean : 2590566 Mean : 2716786 Mean : 2833624
## 3rd Qu.: 1038078 3rd Qu.: 1101818 3rd Qu.: 1134311
## Max. :610107941 Max. :619538778 Max. :657009064
## NA's :46 NA's :202 NA's :192
## 2015
## Min. : 366
## 1st Qu.: 106921
## Median : 326972
## Mean : 2932933
## 3rd Qu.: 1136410
## Max. :691948578
## NA's :124
kbl(gdp[1:30,]) %>%
kable_paper(bootstrap_options = "striped", full_width = F)
| FIPS | Countyname | postalcode | linecode | industry_name | 2012 | 2013 | 2014 | 2015 |
|---|---|---|---|---|---|---|---|---|
| 01001 | Autauga | AL | 1 | All Industries | 1383941 | 1363368 | 1402516 | 1539406 |
| 01001 | Autauga | AL | 2 | Private goods-producing industries | 286396 | 310468 | 323582 | 346355 |
| 01001 | Autauga | AL | 3 | Private services-providing industries | 948490 | 904599 | 928438 | 1037309 |
| 01001 | Autauga | AL | 4 | Government and government enterprises | 149055 | 148301 | 150496 | 155742 |
| 01003 | Baldwin | AL | 1 | All Industries | 5599194 | 6365080 | 6547396 | 6436107 |
| 01003 | Baldwin | AL | 2 | Private goods-producing industries | 681871 | 698500 | 711443 | 735432 |
| 01003 | Baldwin | AL | 3 | Private services-providing industries | 4306173 | 5037325 | 5194273 | 5023903 |
| 01003 | Baldwin | AL | 4 | Government and government enterprises | 611151 | 629255 | 641681 | 676772 |
| 01005 | Barbour | AL | 1 | All Industries | 639833 | 701750 | 689212 | 743779 |
| 01005 | Barbour | AL | 2 | Private goods-producing industries | 253232 | 303099 | 286467 | 307299 |
| 01005 | Barbour | AL | 3 | Private services-providing industries | 274894 | 288270 | 291688 | 324052 |
| 01005 | Barbour | AL | 4 | Government and government enterprises | 111707 | 110381 | 111057 | 112428 |
| 01007 | Bibb | AL | 1 | All Industries | 297560 | 325906 | 329087 | 322307 |
| 01007 | Bibb | AL | 2 | Private goods-producing industries | 98048 | 113691 | 108170 | 102984 |
| 01007 | Bibb | AL | 3 | Private services-providing industries | 116767 | 129718 | 138327 | 133371 |
| 01007 | Bibb | AL | 4 | Government and government enterprises | 82746 | 82498 | 82591 | 85952 |
| 01009 | Blount | AL | 1 | All Industries | 632761 | 701145 | 688525 | 819608 |
| 01009 | Blount | AL | 2 | Private goods-producing industries | 187947 | 235705 | 206791 | 201373 |
| 01009 | Blount | AL | 3 | Private services-providing industries | 310042 | 330581 | 347601 | 481303 |
| 01009 | Blount | AL | 4 | Government and government enterprises | 134773 | 134859 | 134133 | 136932 |
| 01011 | Bullock | AL | 1 | All Industries | 191052 | 190103 | 178408 | 178902 |
| 01011 | Bullock | AL | 2 | Private goods-producing industries | 67063 | 71650 | 60743 | 61008 |
| 01011 | Bullock | AL | 3 | Private services-providing industries | 78889 | 72767 | 71542 | 71217 |
| 01011 | Bullock | AL | 4 | Government and government enterprises | 45101 | 45686 | 46123 | 46678 |
| 01013 | Butler | AL | 1 | All Industries | 514216 | 522547 | 512826 | 542129 |
| 01013 | Butler | AL | 2 | Private goods-producing industries | 189427 | 166962 | 155406 | 156906 |
| 01013 | Butler | AL | 3 | Private services-providing industries | 263801 | 294015 | 294944 | 321026 |
| 01013 | Butler | AL | 4 | Government and government enterprises | 60988 | 61569 | 62476 | 64197 |
| 01015 | Calhoun | AL | 1 | All Industries | 4073476 | 3988587 | 3943024 | 3999134 |
| 01015 | Calhoun | AL | 2 | Private goods-producing industries | 1051376 | 990407 | 868396 | 849069 |
gdp <- na.omit(gdp)
colSums(is.na(gdp))
## FIPS Countyname postalcode linecode industry_name
## 0 0 0 0 0
## 2012 2013 2014 2015
## 0 0 0 0
gdp <- subset(gdp, select = -c(linecode))
kbl(gdp[1:30,]) %>%
kable_paper(bootstrap_options = "striped", full_width = F)
| FIPS | Countyname | postalcode | industry_name | 2012 | 2013 | 2014 | 2015 |
|---|---|---|---|---|---|---|---|
| 01001 | Autauga | AL | All Industries | 1383941 | 1363368 | 1402516 | 1539406 |
| 01001 | Autauga | AL | Private goods-producing industries | 286396 | 310468 | 323582 | 346355 |
| 01001 | Autauga | AL | Private services-providing industries | 948490 | 904599 | 928438 | 1037309 |
| 01001 | Autauga | AL | Government and government enterprises | 149055 | 148301 | 150496 | 155742 |
| 01003 | Baldwin | AL | All Industries | 5599194 | 6365080 | 6547396 | 6436107 |
| 01003 | Baldwin | AL | Private goods-producing industries | 681871 | 698500 | 711443 | 735432 |
| 01003 | Baldwin | AL | Private services-providing industries | 4306173 | 5037325 | 5194273 | 5023903 |
| 01003 | Baldwin | AL | Government and government enterprises | 611151 | 629255 | 641681 | 676772 |
| 01005 | Barbour | AL | All Industries | 639833 | 701750 | 689212 | 743779 |
| 01005 | Barbour | AL | Private goods-producing industries | 253232 | 303099 | 286467 | 307299 |
| 01005 | Barbour | AL | Private services-providing industries | 274894 | 288270 | 291688 | 324052 |
| 01005 | Barbour | AL | Government and government enterprises | 111707 | 110381 | 111057 | 112428 |
| 01007 | Bibb | AL | All Industries | 297560 | 325906 | 329087 | 322307 |
| 01007 | Bibb | AL | Private goods-producing industries | 98048 | 113691 | 108170 | 102984 |
| 01007 | Bibb | AL | Private services-providing industries | 116767 | 129718 | 138327 | 133371 |
| 01007 | Bibb | AL | Government and government enterprises | 82746 | 82498 | 82591 | 85952 |
| 01009 | Blount | AL | All Industries | 632761 | 701145 | 688525 | 819608 |
| 01009 | Blount | AL | Private goods-producing industries | 187947 | 235705 | 206791 | 201373 |
| 01009 | Blount | AL | Private services-providing industries | 310042 | 330581 | 347601 | 481303 |
| 01009 | Blount | AL | Government and government enterprises | 134773 | 134859 | 134133 | 136932 |
| 01011 | Bullock | AL | All Industries | 191052 | 190103 | 178408 | 178902 |
| 01011 | Bullock | AL | Private goods-producing industries | 67063 | 71650 | 60743 | 61008 |
| 01011 | Bullock | AL | Private services-providing industries | 78889 | 72767 | 71542 | 71217 |
| 01011 | Bullock | AL | Government and government enterprises | 45101 | 45686 | 46123 | 46678 |
| 01013 | Butler | AL | All Industries | 514216 | 522547 | 512826 | 542129 |
| 01013 | Butler | AL | Private goods-producing industries | 189427 | 166962 | 155406 | 156906 |
| 01013 | Butler | AL | Private services-providing industries | 263801 | 294015 | 294944 | 321026 |
| 01013 | Butler | AL | Government and government enterprises | 60988 | 61569 | 62476 | 64197 |
| 01015 | Calhoun | AL | All Industries | 4073476 | 3988587 | 3943024 | 3999134 |
| 01015 | Calhoun | AL | Private goods-producing industries | 1051376 | 990407 | 868396 | 849069 |
This is untidy data as it violates below mentioned tidy principles
Not Each variable have its own column.
Not Each observation have its own row.
Using Pivot_longer
Here in this dataset, we have a set of columns(5,6,7,8) with names (2012,2013,2014,2015) which are values but not variables.
So applying Pivot_longer function, converting values to variables by creating another new columns named ‘Years’ and ‘GDP’
gdp_updated = gdp %>% pivot_longer(cols=c(starts_with('2012'),starts_with('2013'),starts_with('2014'),starts_with('2015')), names_to = "year", values_to = "GDP")
kbl(gdp_updated[1:35,]) %>%
kable_paper(bootstrap_options = "striped", full_width = F)
| FIPS | Countyname | postalcode | industry_name | year | GDP |
|---|---|---|---|---|---|
| 01001 | Autauga | AL | All Industries | 2012 | 1383941 |
| 01001 | Autauga | AL | All Industries | 2013 | 1363368 |
| 01001 | Autauga | AL | All Industries | 2014 | 1402516 |
| 01001 | Autauga | AL | All Industries | 2015 | 1539406 |
| 01001 | Autauga | AL | Private goods-producing industries | 2012 | 286396 |
| 01001 | Autauga | AL | Private goods-producing industries | 2013 | 310468 |
| 01001 | Autauga | AL | Private goods-producing industries | 2014 | 323582 |
| 01001 | Autauga | AL | Private goods-producing industries | 2015 | 346355 |
| 01001 | Autauga | AL | Private services-providing industries | 2012 | 948490 |
| 01001 | Autauga | AL | Private services-providing industries | 2013 | 904599 |
| 01001 | Autauga | AL | Private services-providing industries | 2014 | 928438 |
| 01001 | Autauga | AL | Private services-providing industries | 2015 | 1037309 |
| 01001 | Autauga | AL | Government and government enterprises | 2012 | 149055 |
| 01001 | Autauga | AL | Government and government enterprises | 2013 | 148301 |
| 01001 | Autauga | AL | Government and government enterprises | 2014 | 150496 |
| 01001 | Autauga | AL | Government and government enterprises | 2015 | 155742 |
| 01003 | Baldwin | AL | All Industries | 2012 | 5599194 |
| 01003 | Baldwin | AL | All Industries | 2013 | 6365080 |
| 01003 | Baldwin | AL | All Industries | 2014 | 6547396 |
| 01003 | Baldwin | AL | All Industries | 2015 | 6436107 |
| 01003 | Baldwin | AL | Private goods-producing industries | 2012 | 681871 |
| 01003 | Baldwin | AL | Private goods-producing industries | 2013 | 698500 |
| 01003 | Baldwin | AL | Private goods-producing industries | 2014 | 711443 |
| 01003 | Baldwin | AL | Private goods-producing industries | 2015 | 735432 |
| 01003 | Baldwin | AL | Private services-providing industries | 2012 | 4306173 |
| 01003 | Baldwin | AL | Private services-providing industries | 2013 | 5037325 |
| 01003 | Baldwin | AL | Private services-providing industries | 2014 | 5194273 |
| 01003 | Baldwin | AL | Private services-providing industries | 2015 | 5023903 |
| 01003 | Baldwin | AL | Government and government enterprises | 2012 | 611151 |
| 01003 | Baldwin | AL | Government and government enterprises | 2013 | 629255 |
| 01003 | Baldwin | AL | Government and government enterprises | 2014 | 641681 |
| 01003 | Baldwin | AL | Government and government enterprises | 2015 | 676772 |
| 01005 | Barbour | AL | All Industries | 2012 | 639833 |
| 01005 | Barbour | AL | All Industries | 2013 | 701750 |
| 01005 | Barbour | AL | All Industries | 2014 | 689212 |
##Using Pivot_wider
Here in this dataset, different industry names are scattered accross multiple rows(which violates tidy priciple).
So by applying pivot_wider function to scatter all four types of industry names(column)
gdp_final = gdp_updated %>% pivot_wider(names_from = "industry_name", values_from = "GDP")
kbl(gdp_final[1:35,]) %>%
kable_paper(bootstrap_options = "striped", full_width = F)
| FIPS | Countyname | postalcode | year | All Industries | Private goods-producing industries | Private services-providing industries | Government and government enterprises |
|---|---|---|---|---|---|---|---|
| 01001 | Autauga | AL | 2012 | 1383941 | 286396 | 948490 | 149055 |
| 01001 | Autauga | AL | 2013 | 1363368 | 310468 | 904599 | 148301 |
| 01001 | Autauga | AL | 2014 | 1402516 | 323582 | 928438 | 150496 |
| 01001 | Autauga | AL | 2015 | 1539406 | 346355 | 1037309 | 155742 |
| 01003 | Baldwin | AL | 2012 | 5599194 | 681871 | 4306173 | 611151 |
| 01003 | Baldwin | AL | 2013 | 6365080 | 698500 | 5037325 | 629255 |
| 01003 | Baldwin | AL | 2014 | 6547396 | 711443 | 5194273 | 641681 |
| 01003 | Baldwin | AL | 2015 | 6436107 | 735432 | 5023903 | 676772 |
| 01005 | Barbour | AL | 2012 | 639833 | 253232 | 274894 | 111707 |
| 01005 | Barbour | AL | 2013 | 701750 | 303099 | 288270 | 110381 |
| 01005 | Barbour | AL | 2014 | 689212 | 286467 | 291688 | 111057 |
| 01005 | Barbour | AL | 2015 | 743779 | 307299 | 324052 | 112428 |
| 01007 | Bibb | AL | 2012 | 297560 | 98048 | 116767 | 82746 |
| 01007 | Bibb | AL | 2013 | 325906 | 113691 | 129718 | 82498 |
| 01007 | Bibb | AL | 2014 | 329087 | 108170 | 138327 | 82591 |
| 01007 | Bibb | AL | 2015 | 322307 | 102984 | 133371 | 85952 |
| 01009 | Blount | AL | 2012 | 632761 | 187947 | 310042 | 134773 |
| 01009 | Blount | AL | 2013 | 701145 | 235705 | 330581 | 134859 |
| 01009 | Blount | AL | 2014 | 688525 | 206791 | 347601 | 134133 |
| 01009 | Blount | AL | 2015 | 819608 | 201373 | 481303 | 136932 |
| 01011 | Bullock | AL | 2012 | 191052 | 67063 | 78889 | 45101 |
| 01011 | Bullock | AL | 2013 | 190103 | 71650 | 72767 | 45686 |
| 01011 | Bullock | AL | 2014 | 178408 | 60743 | 71542 | 46123 |
| 01011 | Bullock | AL | 2015 | 178902 | 61008 | 71217 | 46678 |
| 01013 | Butler | AL | 2012 | 514216 | 189427 | 263801 | 60988 |
| 01013 | Butler | AL | 2013 | 522547 | 166962 | 294015 | 61569 |
| 01013 | Butler | AL | 2014 | 512826 | 155406 | 294944 | 62476 |
| 01013 | Butler | AL | 2015 | 542129 | 156906 | 321026 | 64197 |
| 01015 | Calhoun | AL | 2012 | 4073476 | 1051376 | 1951870 | 1070229 |
| 01015 | Calhoun | AL | 2013 | 3988587 | 990407 | 1995846 | 1002334 |
| 01015 | Calhoun | AL | 2014 | 3943024 | 868396 | 2036779 | 1037850 |
| 01015 | Calhoun | AL | 2015 | 3999134 | 849069 | 2086608 | 1063457 |
| 01017 | Chambers | AL | 2012 | 567650 | 177977 | 301425 | 88248 |
| 01017 | Chambers | AL | 2013 | 629384 | 214956 | 325399 | 89029 |
| 01017 | Chambers | AL | 2014 | 687210 | 220871 | 346159 | 120180 |
Each variable have its own column (achieved by using pivot_longer)
Each observation must have its own row (acheived by using pivot_wider)
No cell has has two values
colnames(gdp_final)[c(5,6,7,8)] = c('All_Industries','Private_goods_producing_industries', 'Private_services_producing_industries','Government_and_government_enterprises')
kbl(gdp_final[1:35,]) %>%
kable_paper(bootstrap_options = "striped", full_width = F)
| FIPS | Countyname | postalcode | year | All_Industries | Private_goods_producing_industries | Private_services_producing_industries | Government_and_government_enterprises |
|---|---|---|---|---|---|---|---|
| 01001 | Autauga | AL | 2012 | 1383941 | 286396 | 948490 | 149055 |
| 01001 | Autauga | AL | 2013 | 1363368 | 310468 | 904599 | 148301 |
| 01001 | Autauga | AL | 2014 | 1402516 | 323582 | 928438 | 150496 |
| 01001 | Autauga | AL | 2015 | 1539406 | 346355 | 1037309 | 155742 |
| 01003 | Baldwin | AL | 2012 | 5599194 | 681871 | 4306173 | 611151 |
| 01003 | Baldwin | AL | 2013 | 6365080 | 698500 | 5037325 | 629255 |
| 01003 | Baldwin | AL | 2014 | 6547396 | 711443 | 5194273 | 641681 |
| 01003 | Baldwin | AL | 2015 | 6436107 | 735432 | 5023903 | 676772 |
| 01005 | Barbour | AL | 2012 | 639833 | 253232 | 274894 | 111707 |
| 01005 | Barbour | AL | 2013 | 701750 | 303099 | 288270 | 110381 |
| 01005 | Barbour | AL | 2014 | 689212 | 286467 | 291688 | 111057 |
| 01005 | Barbour | AL | 2015 | 743779 | 307299 | 324052 | 112428 |
| 01007 | Bibb | AL | 2012 | 297560 | 98048 | 116767 | 82746 |
| 01007 | Bibb | AL | 2013 | 325906 | 113691 | 129718 | 82498 |
| 01007 | Bibb | AL | 2014 | 329087 | 108170 | 138327 | 82591 |
| 01007 | Bibb | AL | 2015 | 322307 | 102984 | 133371 | 85952 |
| 01009 | Blount | AL | 2012 | 632761 | 187947 | 310042 | 134773 |
| 01009 | Blount | AL | 2013 | 701145 | 235705 | 330581 | 134859 |
| 01009 | Blount | AL | 2014 | 688525 | 206791 | 347601 | 134133 |
| 01009 | Blount | AL | 2015 | 819608 | 201373 | 481303 | 136932 |
| 01011 | Bullock | AL | 2012 | 191052 | 67063 | 78889 | 45101 |
| 01011 | Bullock | AL | 2013 | 190103 | 71650 | 72767 | 45686 |
| 01011 | Bullock | AL | 2014 | 178408 | 60743 | 71542 | 46123 |
| 01011 | Bullock | AL | 2015 | 178902 | 61008 | 71217 | 46678 |
| 01013 | Butler | AL | 2012 | 514216 | 189427 | 263801 | 60988 |
| 01013 | Butler | AL | 2013 | 522547 | 166962 | 294015 | 61569 |
| 01013 | Butler | AL | 2014 | 512826 | 155406 | 294944 | 62476 |
| 01013 | Butler | AL | 2015 | 542129 | 156906 | 321026 | 64197 |
| 01015 | Calhoun | AL | 2012 | 4073476 | 1051376 | 1951870 | 1070229 |
| 01015 | Calhoun | AL | 2013 | 3988587 | 990407 | 1995846 | 1002334 |
| 01015 | Calhoun | AL | 2014 | 3943024 | 868396 | 2036779 | 1037850 |
| 01015 | Calhoun | AL | 2015 | 3999134 | 849069 | 2086608 | 1063457 |
| 01017 | Chambers | AL | 2012 | 567650 | 177977 | 301425 | 88248 |
| 01017 | Chambers | AL | 2013 | 629384 | 214956 | 325399 | 89029 |
| 01017 | Chambers | AL | 2014 | 687210 | 220871 | 346159 | 120180 |
major_county <- filter(gdp_final,Countyname %in% c("Los Angeles", "New York", "Harris", "Santa Clara") & postalcode %in% c("CA", "NY", "TX", "CA"))
kbl(major_county[1:15,]) %>%
kable_paper(bootstrap_options = "striped", full_width = F)
| FIPS | Countyname | postalcode | year | All_Industries | Private_goods_producing_industries | Private_services_producing_industries | Government_and_government_enterprises |
|---|---|---|---|---|---|---|---|
| 06037 | Los Angeles | CA | 2012 | 593671075 | 70641406 | 464547602 | 58482067 |
| 06037 | Los Angeles | CA | 2013 | 617139519 | 74129717 | 482219285 | 60790517 |
| 06037 | Los Angeles | CA | 2014 | 654038890 | 75338855 | 514271617 | 64428418 |
| 06037 | Los Angeles | CA | 2015 | 691948578 | 75466445 | 547957509 | 68524623 |
| 06085 | Santa Clara | CA | 2012 | 195550832 | 55682457 | 130284946 | 9583429 |
| 06085 | Santa Clara | CA | 2013 | 213384895 | 58899838 | 144409875 | 10075181 |
| 06085 | Santa Clara | CA | 2014 | 233442273 | 62954143 | 159775321 | 10712810 |
| 06085 | Santa Clara | CA | 2015 | 262053352 | 69535372 | 181130859 | 11387121 |
| 36061 | New York | NY | 2012 | 610107941 | 7877198 | 574476082 | 27754660 |
| 36061 | New York | NY | 2013 | 619538778 | 8217717 | 581904074 | 29416987 |
| 36061 | New York | NY | 2014 | 657009064 | 8988338 | 617415093 | 30605633 |
| 36061 | New York | NY | 2015 | 690040650 | 10101652 | 648976531 | 30962467 |
| 48201 | Harris | TX | 2012 | 377095884 | 147166060 | 207419397 | 22510427 |
| 48201 | Harris | TX | 2013 | 395892886 | 158361188 | 213813227 | 23718471 |
| 48201 | Harris | TX | 2014 | 402388140 | 151076820 | 226394473 | 24916846 |
ggplot(data = major_county) +
geom_point(mapping = aes(x = year, y = All_Industries,color=Countyname))
Santa Clara county has low GDP from 2012 to 2015. However, it is increasing gradually.
New York and Los Angeles county are very competitive and have almost negligible gdp difference.
major_county_mutated = major_county %>%
mutate(
avg_gdp = All_Industries+Private_goods_producing_industries+Private_services_producing_industries+Government_and_government_enterprises
)
kbl(major_county_mutated[1:15,]) %>%
kable_paper(bootstrap_options = "striped", full_width = F)
| FIPS | Countyname | postalcode | year | All_Industries | Private_goods_producing_industries | Private_services_producing_industries | Government_and_government_enterprises | avg_gdp |
|---|---|---|---|---|---|---|---|---|
| 06037 | Los Angeles | CA | 2012 | 593671075 | 70641406 | 464547602 | 58482067 | 1187342150 |
| 06037 | Los Angeles | CA | 2013 | 617139519 | 74129717 | 482219285 | 60790517 | 1234279038 |
| 06037 | Los Angeles | CA | 2014 | 654038890 | 75338855 | 514271617 | 64428418 | 1308077780 |
| 06037 | Los Angeles | CA | 2015 | 691948578 | 75466445 | 547957509 | 68524623 | 1383897155 |
| 06085 | Santa Clara | CA | 2012 | 195550832 | 55682457 | 130284946 | 9583429 | 391101664 |
| 06085 | Santa Clara | CA | 2013 | 213384895 | 58899838 | 144409875 | 10075181 | 426769789 |
| 06085 | Santa Clara | CA | 2014 | 233442273 | 62954143 | 159775321 | 10712810 | 466884547 |
| 06085 | Santa Clara | CA | 2015 | 262053352 | 69535372 | 181130859 | 11387121 | 524106704 |
| 36061 | New York | NY | 2012 | 610107941 | 7877198 | 574476082 | 27754660 | 1220215881 |
| 36061 | New York | NY | 2013 | 619538778 | 8217717 | 581904074 | 29416987 | 1239077556 |
| 36061 | New York | NY | 2014 | 657009064 | 8988338 | 617415093 | 30605633 | 1314018128 |
| 36061 | New York | NY | 2015 | 690040650 | 10101652 | 648976531 | 30962467 | 1380081300 |
| 48201 | Harris | TX | 2012 | 377095884 | 147166060 | 207419397 | 22510427 | 754191768 |
| 48201 | Harris | TX | 2013 | 395892886 | 158361188 | 213813227 | 23718471 | 791785772 |
| 48201 | Harris | TX | 2014 | 402388140 | 151076820 | 226394473 | 24916846 | 804776279 |
theme_set(theme_bw())
ggplot(data = major_county_mutated, mapping = aes(x = year, y = avg_gdp)) +
geom_boxplot()+ labs(x="Year", y="avg_adp",title = "Box Plot of average gdp by year")
From the plots we can conclude that USA has been increased its GDP from 2012 to 2015. Apparently, New York and Los Angles are one of the top countys which have high GDP growth.