Enrollment Number: M2024ANLT033
NAME: YASH DUBEY
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
You can also embed plots, for example:
Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.
The goal is to explore global economic indicators using the Penn World Table(pwt) dataset. Objectives are:
Examine trends of Gross Domestic Product(GDP), Distribution of GDP across countries over time and Correlation Between Capital and Gross Domestic Product.
Analyze the relationship between capital and labour inputs and economic output.
Visualize key patterns through graphs using modern visualization libraries.
##Data Wrangling
###Load Necessary Libraries
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.4.2
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.4.2
library(readxl)
## Warning: package 'readxl' was built under R version 4.4.2
pwt <- read_excel("C:\\Users\\Shambhavi Dubey\\Downloads\\pwt.xlsx", sheet="Data")
head(pwt)
## # A tibble: 6 × 52
## countrycode country currency_unit year rgdpe rgdpo pop emp avh hc
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ABW Aruba Aruban Guilder 1950 NA NA NA NA NA NA
## 2 ABW Aruba Aruban Guilder 1951 NA NA NA NA NA NA
## 3 ABW Aruba Aruban Guilder 1952 NA NA NA NA NA NA
## 4 ABW Aruba Aruban Guilder 1953 NA NA NA NA NA NA
## 5 ABW Aruba Aruban Guilder 1954 NA NA NA NA NA NA
## 6 ABW Aruba Aruban Guilder 1955 NA NA NA NA NA NA
## # ℹ 42 more variables: ccon <dbl>, cda <dbl>, cgdpe <dbl>, cgdpo <dbl>,
## # cn <dbl>, ck <dbl>, ctfp <dbl>, cwtfp <dbl>, rgdpna <dbl>, rconna <dbl>,
## # rdana <dbl>, rnna <dbl>, rkna <dbl>, rtfpna <dbl>, rwtfpna <dbl>,
## # labsh <dbl>, irr <dbl>, delta <dbl>, xr <dbl>, pl_con <dbl>, pl_da <dbl>,
## # pl_gdpo <dbl>, i_cig <chr>, i_xm <chr>, i_xr <chr>, i_outlier <chr>,
## # i_irr <chr>, cor_exp <dbl>, statcap <dbl>, csh_c <dbl>, csh_i <dbl>,
## # csh_g <dbl>, csh_x <dbl>, csh_m <dbl>, csh_r <dbl>, pl_c <dbl>, …
We will take key variables such as GDP (rgdpna), employment (emp), and capital stock (rnna) for the latest available years.
data_filtered <- pwt %>%
select(country, year, rgdpna, emp, rnna) %>%
filter(year >= 2000)
# To fix the data for 10 countries
top_10_countries <- data_filtered %>%
group_by(country) %>%
summarise(avg_gdp = mean(rgdpna, na.rm = TRUE)) %>%
arrange(desc(avg_gdp)) %>%
slice(1:10) %>%
pull(country)
data_top10 <- data_filtered %>%
filter(country %in% top_10_countries)
summary <- data_top10 %>%
group_by(country) %>%
summarise(
avg_gdp = mean(rgdpna, na.rm = TRUE),
avg_employment = mean(emp, na.rm = TRUE),
avg_capital = mean(rnna, na.rm = TRUE)
) %>%
arrange(desc(avg_gdp))
summary
## # A tibble: 10 × 4
## country avg_gdp avg_employment avg_capital
## <chr> <dbl> <dbl> <dbl>
## 1 United States 17071575. 145. 60020724.
## 2 China 13268370. 776. 46257026.
## 3 India 5346138. 464. 20411299.
## 4 Japan 4727847. 66.6 25584686.
## 5 Germany 3791036. 41.0 19029880.
## 6 Russian Federation 3364024. 69.9 18577700.
## 7 Brazil 2656801. 83.5 11487776.
## 8 France 2643988. 27.0 16007275.
## 9 United Kingdom 2620262. 29.8 13433031.
## 10 Italy 2437963. 24.7 17882868.
data_top10 %>%
group_by(country, year) %>%
summarise(mean_gdp = mean(rgdpna, na.rm = TRUE)) %>%
ggplot(aes(x = year, y = mean_gdp, color = country)) +
geom_line() +
labs(
title = "Gross Domestic Producy Trends for Top 10 Countries",
x = "Year",
y = "Average Real GDP",
color = "Country"
) +
theme_minimal()
## `summarise()` has grouped output by 'country'. You can override using the
## `.groups` argument.
### Capital vs. Employment (Bubble Chart)
ggplot(data_top10, aes(x = rnna, y = emp, size = rgdpna, color = country)) +
geom_point(alpha = 0.7) +
scale_size(range = c(2, 10)) +
labs(
title = "Capital vs Employment for Top 10 Countries",
x = "Capital Stock",
y = "Employment",
size = "Gross Domestic Product",
color = "Country"
) +
theme_minimal()
ggplot(data_top10, aes(x = country, y = rgdpna, fill = country)) +
geom_boxplot() +
coord_flip() +
labs(
title = "Distribution of Gross Domestic Product Across Top 10 Countries",
x = "Country",
y = "Real GDP",
fill = "Country"
) +
theme_minimal()
ggplot(data_top10, aes(x = rnna, y = rgdpna, color = country)) +
geom_point(size = 3, alpha = 0.6) +
geom_smooth(method = "lm", se = FALSE, color = "black") +
labs(
title = "Correlation Between Capital and GDP",
x = "Capital Stock",
y = "Real Gross Domestic Product(GDP)",
color = "Country"
) +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
This analysis provides a foundational understanding of global economic trends using the Penn World dataset. Future research can incorporate additional variables such as human capital and productivity measures for a deeper exploration of economic dynamics.