For this project, I used three different datasets from the Week 5 Discussion 5A post. With these three datasets, I prepared each of them by creating a .csv file and importing the data. Then I worked on tidying the data, and performing an analysis on the dataset. I also made sure that the code within the Quarto Markdown file is reproducible in a clean environment. I used a similar process to what we did in Assignment 5A with the Airline Delays, as I feel like that is very similar assignment to this.
Dataset 3:
World GDP by Country: 1960-2022 posted by Sinem Kilicdere
Next I filtered the data for US only data using the following code chunk. I also converted the GDP value to billions by dividing the value by 1000000000.
# A tibble: 63 × 4
Country `Country Code` Year GDP
<chr> <chr> <chr> <dbl>
1 United States USA 1960 543300000000
2 United States USA 1961 563300000000
3 United States USA 1962 605100000000
4 United States USA 1963 638600000000
5 United States USA 1964 685800000000
6 United States USA 1965 743700000000
7 United States USA 1966 815000000000
8 United States USA 1967 861700000000
9 United States USA 1968 942500000000
10 United States USA 1969 1019900000000
# ℹ 53 more rows
US_only_GDP_df <- US_only_GDP_df |>mutate(GDP =as.numeric(GDP) /1000000000)US_only_GDP_df |>gt() |>cols_hide(columns =c(`Country`, `Country Code`)) |>tab_header(title ="US GDP by Year (billions)", )
US GDP by Year (billions)
Year
GDP
1960
543.300
1961
563.300
1962
605.100
1963
638.600
1964
685.800
1965
743.700
1966
815.000
1967
861.700
1968
942.500
1969
1019.900
1970
1073.303
1971
1164.850
1972
1279.110
1973
1425.376
1974
1545.243
1975
1684.904
1976
1873.412
1977
2081.826
1978
2351.599
1979
2627.333
1980
2857.307
1981
3207.041
1982
3343.789
1983
3634.038
1984
4037.613
1985
4338.979
1986
4579.631
1987
4855.215
1988
5236.438
1989
5641.580
1990
5963.144
1991
6158.129
1992
6520.327
1993
6858.559
1994
7287.236
1995
7639.749
1996
8073.122
1997
8577.554
1998
9062.818
1999
9631.174
2000
10250.948
2001
10581.930
2002
10929.113
2003
11456.442
2004
12217.193
2005
13039.199
2006
13815.587
2007
14474.227
2008
14769.858
2009
14478.065
2010
15048.964
2011
15599.728
2012
16253.972
2013
16843.191
2014
17550.680
2015
18206.021
2016
18695.111
2017
19477.337
2018
20533.057
2019
21380.976
2020
21060.474
2021
23315.081
2022
25462.700
I created a line chart for the US GDP data using the following code chunk. This was done so that it is easier to see trends.
library(ggplot2)ggplot(US_only_GDP_df, aes(x =as.numeric(Year), y = GDP)) +geom_line(linewidth =1) +geom_point(size =2) +scale_x_continuous(breaks =seq(1960, 2022, by =10)) +theme(axis.text.x =element_text(angle =45, vjust =1, hjust =1)) +labs(title ="US GDP Over Time",subtitle ="GDP (billions)",x ="Year",y ="GDP (millions)",color ="Region" )
Ignoring unknown labels:
• colour : "Region"
The plot for the US’ GDP trends upwards between 1960 and 2022.
Next I created a data frame of China’s GDP data by using a filter. I also converted the GDP into billions for China’s values by dividing the GDP value by 1000000000.
# A tibble: 63 × 4
Country `Country Code` Year GDP
<chr> <chr> <chr> <dbl>
1 United States USA 1960 543.
2 United States USA 1961 563.
3 United States USA 1962 605.
4 United States USA 1963 639.
5 United States USA 1964 686.
6 United States USA 1965 744.
7 United States USA 1966 815
8 United States USA 1967 862.
9 United States USA 1968 942.
10 United States USA 1969 1020.
# ℹ 53 more rows
China_only_GDP_df <- China_only_GDP_df |>mutate(GDP =as.numeric(GDP) /1000000000)China_only_GDP_df |>gt() |>cols_hide(columns =c(`Country`, `Country Code`)) |>tab_header(title ="China GDP by Year (billions)", )
China GDP by Year (billions)
Year
GDP
1960
59.71625
1961
50.05669
1962
47.20919
1963
50.70662
1964
59.70813
1965
70.43601
1966
76.72001
1967
72.88137
1968
70.84628
1969
79.70562
1970
92.60264
1971
99.80060
1972
113.68929
1973
138.54320
1974
144.18896
1975
163.42950
1976
153.93924
1977
174.93590
1978
149.54075
1979
178.28059
1980
191.14921
1981
195.86638
1982
205.08970
1983
230.68675
1984
259.94651
1985
309.48803
1986
300.75810
1987
272.97297
1988
312.35363
1989
347.76805
1990
360.85791
1991
383.37332
1992
426.91571
1993
444.73128
1994
564.32188
1995
734.48486
1996
863.74931
1997
961.60202
1998
1029.06071
1999
1094.01048
2000
1211.33163
2001
1339.40084
2002
1470.55757
2003
1660.28061
2004
1955.34681
2005
2285.96124
2006
2752.11854
2007
3550.32757
2008
4594.33679
2009
5101.69109
2010
6087.19172
2011
7551.54532
2012
8532.18562
2013
9570.47058
2014
10475.62478
2015
11061.57320
2016
11233.31402
2017
12310.49118
2018
13894.90749
2019
14279.96849
2020
14687.74356
2021
17820.45934
2022
17963.17052
I created a line chart for China’s GDP data using the following code chunk. This was done so that it is easier to see trends.
library(ggplot2)ggplot(China_only_GDP_df, aes(x =as.numeric(Year), y = GDP)) +geom_line(linewidth =1) +geom_point(size =2) +scale_x_continuous(breaks =seq(1960, 2022, by =10)) +theme(axis.text.x =element_text(angle =45, vjust =1, hjust =1)) +labs(title ="China GDP Over Time (billions)",x ="Year",y ="GDP (billions)",color ="Region" )
Ignoring unknown labels:
• colour : "Region"
The plot for China’s GDP also trends upwards between 1960 and 2022.
Finally, I created a plot for both US and China’s GDP data in order to make it easier to compare both countries together.
US_only_GDP_df <- US_only_GDP_df |>mutate(Source ="US")China_only_GDP_df <- China_only_GDP_df |>mutate(Source ="China")combined_df <-bind_rows(US_only_GDP_df, China_only_GDP_df) ggplot(combined_df, aes(x =as.numeric(Year), y = GDP, color = Source, group = Source)) +geom_line(linewidth =1) +geom_point(size =2) +scale_x_continuous(breaks =seq(1960, 2022, by =10)) +theme(axis.text.x =element_text(angle =45, vjust =1, hjust =1)) +labs(title ="China GDP Over Time (billions)",x ="Year",y ="GDP (billions)",color ="Region" )
Combining the plots for the US and China shows that the US’ GDP grows at a faster rate than China’s GDP.
Conclusion
The approach for each of the three datasets was very similar but different at the same time. Each data set had different conditional factors to account for like the scale of the values, GDP numbers were very large so I had to divide them by a billion while the values for the takeout spending were very easy to manage.