Enrollment Number: M2024ANLT033

NAME: YASH DUBEY

summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

Problem Definition

The goal is to explore global economic indicators using the Penn World Table(pwt) dataset. Objectives are:

##Data Wrangling

###Load Necessary Libraries

library(dplyr)
## Warning: package 'dplyr' was built under R version 4.4.2
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.4.2
library(readxl)
## Warning: package 'readxl' was built under R version 4.4.2

Load Data

pwt <- read_excel("C:\\Users\\Shambhavi Dubey\\Downloads\\pwt.xlsx", sheet="Data")
head(pwt)
## # A tibble: 6 × 52
##   countrycode country currency_unit   year rgdpe rgdpo   pop   emp   avh    hc
##   <chr>       <chr>   <chr>          <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ABW         Aruba   Aruban Guilder  1950    NA    NA    NA    NA    NA    NA
## 2 ABW         Aruba   Aruban Guilder  1951    NA    NA    NA    NA    NA    NA
## 3 ABW         Aruba   Aruban Guilder  1952    NA    NA    NA    NA    NA    NA
## 4 ABW         Aruba   Aruban Guilder  1953    NA    NA    NA    NA    NA    NA
## 5 ABW         Aruba   Aruban Guilder  1954    NA    NA    NA    NA    NA    NA
## 6 ABW         Aruba   Aruban Guilder  1955    NA    NA    NA    NA    NA    NA
## # ℹ 42 more variables: ccon <dbl>, cda <dbl>, cgdpe <dbl>, cgdpo <dbl>,
## #   cn <dbl>, ck <dbl>, ctfp <dbl>, cwtfp <dbl>, rgdpna <dbl>, rconna <dbl>,
## #   rdana <dbl>, rnna <dbl>, rkna <dbl>, rtfpna <dbl>, rwtfpna <dbl>,
## #   labsh <dbl>, irr <dbl>, delta <dbl>, xr <dbl>, pl_con <dbl>, pl_da <dbl>,
## #   pl_gdpo <dbl>, i_cig <chr>, i_xm <chr>, i_xr <chr>, i_outlier <chr>,
## #   i_irr <chr>, cor_exp <dbl>, statcap <dbl>, csh_c <dbl>, csh_i <dbl>,
## #   csh_g <dbl>, csh_x <dbl>, csh_m <dbl>, csh_r <dbl>, pl_c <dbl>, …

Selecting and Filtering Relevant Data

We will take key variables such as GDP (rgdpna), employment (emp), and capital stock (rnna) for the latest available years.

data_filtered <- pwt %>%
  select(country, year, rgdpna, emp, rnna) %>%
  filter(year >= 2000)

# To fix the data for 10 countries

top_10_countries <- data_filtered %>%
  group_by(country) %>%
  summarise(avg_gdp = mean(rgdpna, na.rm = TRUE)) %>%
  arrange(desc(avg_gdp)) %>%
  slice(1:10) %>%
  pull(country)

data_top10 <- data_filtered %>%
  filter(country %in% top_10_countries)

Table Output

Summary of Gross Domestic Product by Country

summary <- data_top10 %>%
  group_by(country) %>%
  summarise(
    avg_gdp = mean(rgdpna, na.rm = TRUE),
    avg_employment = mean(emp, na.rm = TRUE),
    avg_capital = mean(rnna, na.rm = TRUE)
  ) %>%
  arrange(desc(avg_gdp))
summary
## # A tibble: 10 × 4
##    country              avg_gdp avg_employment avg_capital
##    <chr>                  <dbl>          <dbl>       <dbl>
##  1 United States      17071575.          145.    60020724.
##  2 China              13268370.          776.    46257026.
##  3 India               5346138.          464.    20411299.
##  4 Japan               4727847.           66.6   25584686.
##  5 Germany             3791036.           41.0   19029880.
##  6 Russian Federation  3364024.           69.9   18577700.
##  7 Brazil              2656801.           83.5   11487776.
##  8 France              2643988.           27.0   16007275.
##  9 United Kingdom      2620262.           29.8   13433031.
## 10 Italy               2437963.           24.7   17882868.

Visualization

Distribution of GDP Across Countries (Boxplot)

ggplot(data_top10, aes(x = country, y = rgdpna, fill = country)) +
  geom_boxplot() +
  coord_flip() +
  labs(
    title = "Distribution of Gross Domestic Product Across Top 10 Countries",
    x = "Country",
    y = "Real GDP",
    fill = "Country"
  ) +
  theme_minimal()

Correlation Between Capital and GDP (Scatterplot)

ggplot(data_top10, aes(x = rnna, y = rgdpna, color = country)) +
  geom_point(size = 3, alpha = 0.6) +
  geom_smooth(method = "lm", se = FALSE, color = "black") +
  labs(
    title = "Correlation Between Capital and GDP",
    x = "Capital Stock",
    y = "Real Gross Domestic Product(GDP)",
    color = "Country"
  ) +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

Summary and Interpretation of Analysis

Summary

  1. GDP shows significant variations among the top 10 countries, with consistent growth trends in some economies over the analyzed period.
  2. The relationship between capital stock and employment highlights differences in capital intensity among these leading economies.
  3. Boxplots illustrate the distribution of GDP within countries, revealing disparities and consistency.
  4. The scatterplot indicates a positive correlation between capital stock and GDP, emphasizing the importance of capital investments.

Interpretation

This analysis provides a foundational understanding of global economic trends using the Penn World dataset. Future research can incorporate additional variables such as human capital and productivity measures for a deeper exploration of economic dynamics.