Data Preparation

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(Hmisc) #function describe
## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
## Loading required package: ggplot2
## 
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:dplyr':
## 
##     src, summarize
## The following objects are masked from 'package:base':
## 
##     format.pval, units
library(psych) #function describeby
## 
## Attaching package: 'psych'
## The following object is masked from 'package:Hmisc':
## 
##     describe
## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha
# Downloaded data from worldbank and upload to Github
# Import dataset from Github
issued_bond<- "https://raw.githubusercontent.com/Benson90/606-Project/main/World_Bank__IBRD__Bonds__1947-Present_.csv"
issued_bond_ds <-read.csv(issued_bond)

# Frames to include only the variables I need
issued_bond_nf <- issued_bond_ds %>%
  select("Currency","Volume","Coupon","Settlement.Date","Maturity","USD.Equivalent")

#rename column Marutiry to maturity.year
names(issued_bond_nf) <- c("Currency", "Volume", "Coupon","Settlement.Date","Maturity.Year","USD.Equivalent")

head(issued_bond_nf)
##   Currency  Volume  Coupon        Settlement.Date Maturity.Year USD.Equivalent
## 1      HKD 2.0e+08 0.01055 02/28/2022 12:00:00 AM             2       25637081
## 2      IDR 2.9e+10 0.04040 02/24/2022 12:00:00 AM             5        2021610
## 3      EUR 8.0e+07 0.01300 02/18/2022 12:00:00 AM            20       91408000
## 4      CLP 4.5e+09 0.05700 02/18/2022 12:00:00 AM             3        5590062
## 5      USD 4.5e+08 0.01750 02/17/2022 12:00:00 AM             7      450000000
## 6      MXN 1.1e+09 0.06750 02/11/2022 12:00:00 AM             5       53011537

Research question

You should phrase your research question in a way that matches up with the scope of inference your dataset allows for.

Are Currency predictive of the coupon rate?

Cases

What are the cases, and how many are there?

These data include all bonds issued in the world. There 12247 observations in the given data set.

Data collection

Describe the method of data collection.

Data is provided by the World Bank Group.

Type of study

What type of study is this (observational/experiment)?

This is an observational study.

Data Source

If you collected the data, state self-collected. If not, provide a citation/link.

https://finances.worldbank.org/Other/World-Bank-IBRD-Bonds-1947-Present-/3fps-tcuv

Dataset Owner is World Bank Group Finances.

Dependent Variable

What is the response variable? Is it quantitative or qualitative?

The response variable is coupon and is quantitative

Independent Variable

You should have two independent variables, one quantitative and one qualitative.

The independent variable is Currency and is qualitative. And The independent variable is USD.Equivalent and is quantitative

Relevant summary statistics

Provide summary statistics for each the variables. Also include appropriate visualizations related to your research question (e.g. scatter plot, boxplots, etc). This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.

describe(issued_bond_nf$Coupon)
##    vars     n mean   sd median trimmed  mad min max range  skew kurtosis se
## X1    1 12149 0.05 0.21   0.04    0.04 0.06   0 9.5   9.5 36.53  1470.04  0
coupon_bond_group <- describeBy(issued_bond_nf$Coupon, 
           group = issued_bond_nf$Currency, mat=TRUE)

ggplot(coupon_bond_group, aes(x=mean, y=n)) + 
  geom_point(aes(col=group1)) +
  ggtitle("Currency and average coupon plot") +
  xlab("Average coupon rate") +
  ylab("Count coupon") +
  labs(color='Currency')