Introduction

Your company wants to expand its operations into other countries and your division is responsible for determining the new location. Your manager wants to explore different economic variables within a country to determine whether it is a good fit. He has picked out three main indicators - GDP per capita (measuring average income), Employment Ratio (measuring percent of population with a job), and the average price level.

1 average income

2 percent of pop with a job

3 average price level

The data can be loaded using the read.csv function below, which accepts either a location on your computer or a website containing a .csv file. The header=TRUE argument tells R that the first row contains the variable names, while stringsAsFactors=FALSE tells R to leave character types as characters.

## Read Original .csv file from website
pwt0 <- read.csv(
  "https://github.com/cmann3/bsad391/raw/main/pwt.csv", 
  header=TRUE, stringsAsFactors = FALSE
)

Since the data should not be re-downloaded if you make a mistake, make a backup copy of the data set, and knit the document a minimal amount of times.

## Create a copy of the original. Use `pwt` for your work.
## Rerun this line to 'reset' the data set.
pwt  <- pwt0

The data set contains the following variables from 2019:

1. Add New Countries (rows)

The data set you were given is missing data for three new countries. After searching, you were able to find the following information about the three countries. (Notice that no GDP data for Venezuela was found.)

  • Germany
    • currency: Euro
    • population: 83.51705
    • GDP: 4,312,886
    • Employment: 44.7952
    • Price: 0.824545
  • Nigeria
    • currency: Naira
    • population: 200.9636
    • GDP: 1,006,573
    • Employment: 73.02055
    • Price: 0.421534
  • Venezuela
    • currency: Bolivar Fuerte
    • population: 28.51583
    • GDP: NA
    • Employment: 11.6944
    • Price: 18.2746

Add the data as new rows in the pwt data set. Show the bottom of the data set using the tail function.

(See 2.4, don’t forget to wrap the data into a list!)


## 1 Adding New Countries

NewCountries <- list(
  c("Germany", "Euro", 83.51705, 4312886, 44.7952, 0.824545),
  c("Nigeria", "Naira", 200.9636, 1006573, 73.02055, 0.421534),
  c("Venezuela", "Bolivar Fuerte", 28.51583, NA, 11.6944, 18.2746)
)


## Adding the New Countries to pwt
#had to delete the code I had here so it would Knit, but, the code I did have gave:

# subscript out of bounds error message, I am unsure what this means. 
# checked it out, need to use data frame



## Re-adding the New Countries to pwt
#had to delete this one also, but this gave a :

# Error in rbind(deparse.level, ...) : numbers of columns of arguments do not match error message, checking again for what else to use? structure? 

# side note, I tried to search Google to figure out what would be the correct formula thing here to use because I couldn't figure it out from my notes. Used a lot of info from Reddit, so unsure if it will be correct or not. If this runs correctly, this is a note to self to add to this section of my notes.


## 3rd try adding New Countries
str(pwt)
## 'data.frame':    174 obs. of  6 variables:
##  $ country : chr  "Aruba" "Angola" "Albania" "United Arab Emirates" ...
##  $ currency: chr  "Aruban Guilder" "Kwanza" "Lek" "UAE Dirham" ...
##  $ pop     : num  0.106 31.825 2.881 9.771 44.781 ...
##  $ gdp     : num  3073 222008 37189 648055 975024 ...
##  $ emp     : num  0.0476 16.645 1.0759 5.8088 20.6432 ...
##  $ price   : num  0.889 0.441 0.476 0.73 0.48 ...
str(as.data.frame(NewCountries))
## 'data.frame':    6 obs. of  3 variables:
##  $ c..Germany....Euro....83.51705....4312886....44.7952....0.824545.    : chr  "Germany" "Euro" "83.51705" "4312886" ...
##  $ c..Nigeria....Naira....200.9636....1006573....73.02055....0.421534.  : chr  "Nigeria" "Naira" "200.9636" "1006573" ...
##  $ c..Venezuela....Bolivar.Fuerte....28.51583...NA...11.6944....18.2746.: chr  "Venezuela" "Bolivar Fuerte" "28.51583" NA ...
names(pwt)
## [1] "country"  "currency" "pop"      "gdp"      "emp"      "price"
names(as.data.frame(NewCountries))
## [1] "c..Germany....Euro....83.51705....4312886....44.7952....0.824545."    
## [2] "c..Nigeria....Naira....200.9636....1006573....73.02055....0.421534."  
## [3] "c..Venezuela....Bolivar.Fuerte....28.51583...NA...11.6944....18.2746."

2. Create New Variables (columns)

Create the following columns in the pwt data set:

  • gdp_capita - GDP per capita (GDP divided by the population)
  • emp_ratio - Employment ratio (Employment divided by the population)

Calculate the mean and median for each of the two new variables. Also calculate the mean price. Place the values into a named vector and show the results.

(See 2.4 for column creation, 2.3 for mean and median, 2.2 for named vector creation.)


## 2 Create New Variables - adding columns

# creating the new variables
pwt$gdp <- pwt$gdp / pwt$pop
pwt$emp <- pwt$emp / pwt$pop

# calculate mean and median
MeanGdp <- mean(pwt$gdp, na.rm = TRUE)
MedianGdp <- median(pwt$gdp, na.rm = TRUE)
MeanEmp <- mean(pwt$emp, na.rm =  TRUE)
MedianEmp <- median(pwt$emp, na.rm = TRUE)


# Mean price
MeanPrice <- mean(pwt$price, na.rm = TRUE)


# Create a vector
CountriesVector <- c(
  MeanGdp = MeanGdp,
  MedianGdp = MedianGdp,
  MeanEmp = MeanEmp,
  MedianEmp = MedianEmp,
  MeanPrice = MeanPrice
)


# Show vector results
CountriesVector
##      MeanGdp    MedianGdp      MeanEmp    MedianEmp    MeanPrice 
## 2.199787e+04 1.373444e+04 4.286017e-01 4.267963e-01 5.778350e-01

3. Which Countries are the Top

Your manager argues that the company should only invest in countries that doing well or have very high prices.

Which countries have prices more than twice the average?

Which countries have a GDP per capita that is more than twice the average and an employment ratio that is more than 1.5 times the average?

Which countries satisfy either high prices or economic strength?

(See 2.2 for sub-setting variable based on a condition, 2.1 for logical comparisons)


## 3 Which Countries are the Top

# Countries that are more than twice the average
MoreThanTwiceAvgCountries <- pwt[
  pwt$price > 2 * mean(pwt$price, na.rm = TRUE), "country"
  ]

MoreThanTwiceAvgCountries
## [1] "Bermuda"        "Switzerland"    "Cayman Islands" "Denmark"       
## [5] "Iceland"        "Norway"
# Venezuela



# Countries with more than twice average GDP AND 1.5 times the average employment ratio
MoreThanTwiceGdpAnd1.5Emp <- pwt[
  pwt$gdp > 2 * mean(pwt$gdp, na.rm = TRUE) &
  pwt$emp > 1.5 * mean(pwt$emp, na.rm = TRUE),
  "country"
]

MoreThanTwiceGdpAnd1.5Emp
## [1] "Cayman Islands" "Luxembourg"     "Qatar"          "Singapore"
#results gave Cayman Islands, Luxembourg, Qatar, and Singapore


# Countries that satisfy either high prices or economic strength
CountriesThatSatisfy <- pwt[
  pwt$price > 2 * mean(pwt$price, na.rm = TRUE) |
  (pwt$gdp > 2 * mean(pwt$gdp, na.rm = TRUE) &
  pwt$emp > 1.5 * mean(pwt$emp, na.rm = TRUE)),
  "country"
]

CountriesThatSatisfy
## [1] "Bermuda"        "Switzerland"    "Cayman Islands" "Denmark"       
## [5] "Iceland"        "Luxembourg"     "Norway"         "Qatar"         
## [9] "Singapore"
# results gave: Bermuda, Switzerland, Cayman Islands, Denmark, Iceland, Luxembourg, Norway, Qatar, Singapore

4. Country Currencies

Your manager also has another theory - countries that use a type of “Dollar” are economically stronger on average. Use the stringr package to detect patterns in the currency variable.

library(stringr)

Determine which countries use the Euro, which countries use the US Dollar, and which countries use any type of “Dollar”.

(See 2.6 for help with string patterns, and 2.1 for testing equality)


## 4. Country Currencies

# loading the stringr package
library(stringr)


# Euro Using Countries
EuroCountries <- pwt[str_detect(pwt$currency, "Euro"), "country"]

EuroCountries
##  [1] "Austria"     "Belgium"     "Cyprus"      "Spain"       "Estonia"    
##  [6] "Finland"     "France"      "Greece"      "Ireland"     "Italy"      
## [11] "Lithuania"   "Luxembourg"  "Latvia"      "Malta"       "Montenegro" 
## [16] "Netherlands" "Portugal"    "Slovakia"    "Slovenia"
# results gave: Austria, Belgium, Cyprus, Spain, Estonia, Finland, France, Greece, Ireland, Italy, Lithuania, Luxembourg, Latvia, Malta, Montenegro, Netherlands, Portugal, Slovakia, Slovenia 

# US Dollar Using Countries
UsDollarCountries <- pwt[str_detect(pwt$currency, "Dollar|USD"), "country"]

UsDollarCountries
##  [1] "Australia"                "Bahamas"                 
##  [3] "Belize"                   "Bermuda"                 
##  [5] "Barbados"                 "Brunei Darussalam"       
##  [7] "Canada"                   "Cayman Islands"          
##  [9] "Ecuador"                  "Fiji"                    
## [11] "Grenada"                  "Guyana"                  
## [13] "China, Hong Kong SAR"     "Jamaica"                 
## [15] "Liberia"                  "Saint Lucia"             
## [17] "Montserrat"               "Namibia"                 
## [19] "New Zealand"              "State of Palestine"      
## [21] "Singapore"                "El Salvador"             
## [23] "Suriname"                 "Trinidad and Tobago"     
## [25] "Taiwan"                   "United States of America"
## [27] "St. Vincent & Grenadines" "British Virgin Islands"  
## [29] "Zimbabwe"
# results gave: Australia, Bahamas, Belize, Bermuda, Barbados, Brunei Darussalam, Canada, Cayman Islands, Ecuador, Fiji, Grenada, Guyana, China Hong Kong SAR, Jamaica, Liberia, Saint Lucia, Monteserrat, Namibia, New Zealands, State of Palestine, Singapore, El Salvador, Suriname, Trinidad and Tobadgo, Taiwan, USA, St. Vincent & Grenadines, British Virgin Islands, Zimbabwe


## side note: I don't know if I was actually supposed to type out all of the country results, but I couldn't tell if it was going to pop up in the code or not so I included the results just in case.


# any type of "Dollar" countries
AnyDollarCountries <- pwt[str_detect(pwt$currency, "Dollar"), "country"]

AnyDollarCountries
##  [1] "Australia"                "Bahamas"                 
##  [3] "Belize"                   "Bermuda"                 
##  [5] "Barbados"                 "Brunei Darussalam"       
##  [7] "Canada"                   "Cayman Islands"          
##  [9] "Ecuador"                  "Fiji"                    
## [11] "Grenada"                  "Guyana"                  
## [13] "China, Hong Kong SAR"     "Jamaica"                 
## [15] "Liberia"                  "Saint Lucia"             
## [17] "Montserrat"               "Namibia"                 
## [19] "New Zealand"              "State of Palestine"      
## [21] "Singapore"                "El Salvador"             
## [23] "Suriname"                 "Trinidad and Tobago"     
## [25] "Taiwan"                   "United States of America"
## [27] "St. Vincent & Grenadines" "British Virgin Islands"  
## [29] "Zimbabwe"
# Results gave (another long list): Australia, Bahamas, Belize, Bermuda, Barbados, Brunei Darussalam, Canada, Cayman Islands, Ecuador, Fiji, Grenada, Guyana, China Hong Kong SAR, Jamaica, Liberia, Saint Lucia, Monteserrat, Namibia, New Zealand, State of Palestine, Singapore, El Salvador, Suriname, Trinidad and Tobago, Taiwan, USA, St. Vincent & Grenadines, British Virgin Islands, Zimbabwe

5. Currency Strength?

Calculate and show the mean GDP per capita for countries that use the Euro, the US Dollar, and any Dollar. How does that compare to the mean? Should we use this as a condition for expanding our company?

Calculate and show the mean price for countries that use the different types. How does that compare to the mean?


## 5. Currency Strength

# Mean GDP for Euro using Countries
MeanGdpEuroCountries <- mean(pwt$gdp[pwt$currency %in% EuroCountries], na.rm = TRUE)

MeanGdpEuroCountries
## [1] NaN
# Mean GDP for US Dollar using Countries
MeanGdpUsdCountries <- mean(pwt$gdp[pwt$currency %in% UsDollarCountries], na.rm = TRUE)

MeanGdpUsdCountries
## [1] NaN
# Mean GDP for Any Dollar using Countries
MeanGdpAnyDollarCountries <- mean(pwt$gdp[pwt$currency %in% AnyDollarCountries], na.rm = TRUE)

MeanGdpAnyDollarCountries
## [1] NaN
# The results gave NaN (not a number), so by default, I would say that is technically lower than the mean. I don't think this would be a good set of information to go off of to determine if we should expand the company, because I think the results of NaN mean it is inconclusive if that would be a good business decision or not.