Your company wants to expand its operations into other countries and your division is responsible for determining the new location. Your manager wants to explore different economic variables within a country to determine whether it is a good fit. He has picked out three main indicators - GDP per capita (measuring average income), Employment Ratio (measuring percent of population with a job), and the average price level.
The data can be loaded using the read.csv function
below, which accepts either a location on your computer or a website
containing a .csv file. The header=TRUE argument tells R
that the first row contains the variable names, while
stringsAsFactors=FALSE tells R to leave character types as
characters.
## Read Original .csv file from website
pwt0 <- read.csv(
"https://github.com/cmann3/bsad391/raw/main/pwt.csv",
header=TRUE, stringsAsFactors = FALSE
)
Since the data should not be re-downloaded if you make a mistake, make a backup copy of the data set, and knit the document a minimal amount of times.
## Create a copy of the original. Use `pwt` for your work.
## Rerun this line to 'reset' the data set.
pwt <- pwt0
The data set contains the following variables from 2019:
country - country name (character vector)currency - currency name (character
vector)pop - population in millions (numeric
vector)gdp - Real GP in million 2017 Us Dollars (numeric
vector)emp - number of persons with a job (numeric
vector)price - price level relative to US (numeric
vector)The data set you were given is missing data for three new countries. After searching, you were able to find the following information about the three countries. (Notice that no GDP data for Venezuela was found.)
Add the data as new rows in the pwt data set. Show the
bottom of the data set using the tail function.
(See 2.4, don’t forget to wrap the data into a list!)
## 1 Adding New Countries
NewCountries <- list(
c("Germany", "Euro", 83.51705, 4312886, 44.7952, 0.824545),
c("Nigeria", "Naira", 200.9636, 1006573, 73.02055, 0.421534),
c("Venezuela", "Bolivar Fuerte", 28.51583, NA, 11.6944, 18.2746)
)
## Adding the New Countries to pwt
#had to delete the code I had here so it would Knit, but, the code I did have gave:
# subscript out of bounds error message, I am unsure what this means.
# checked it out, need to use data frame
## Re-adding the New Countries to pwt
#had to delete this one also, but this gave a :
# Error in rbind(deparse.level, ...) : numbers of columns of arguments do not match error message, checking again for what else to use? structure?
# side note, I tried to search Google to figure out what would be the correct formula thing here to use because I couldn't figure it out from my notes. Used a lot of info from Reddit, so unsure if it will be correct or not. If this runs correctly, this is a note to self to add to this section of my notes.
## 3rd try adding New Countries
str(pwt)
## 'data.frame': 174 obs. of 6 variables:
## $ country : chr "Aruba" "Angola" "Albania" "United Arab Emirates" ...
## $ currency: chr "Aruban Guilder" "Kwanza" "Lek" "UAE Dirham" ...
## $ pop : num 0.106 31.825 2.881 9.771 44.781 ...
## $ gdp : num 3073 222008 37189 648055 975024 ...
## $ emp : num 0.0476 16.645 1.0759 5.8088 20.6432 ...
## $ price : num 0.889 0.441 0.476 0.73 0.48 ...
str(as.data.frame(NewCountries))
## 'data.frame': 6 obs. of 3 variables:
## $ c..Germany....Euro....83.51705....4312886....44.7952....0.824545. : chr "Germany" "Euro" "83.51705" "4312886" ...
## $ c..Nigeria....Naira....200.9636....1006573....73.02055....0.421534. : chr "Nigeria" "Naira" "200.9636" "1006573" ...
## $ c..Venezuela....Bolivar.Fuerte....28.51583...NA...11.6944....18.2746.: chr "Venezuela" "Bolivar Fuerte" "28.51583" NA ...
names(pwt)
## [1] "country" "currency" "pop" "gdp" "emp" "price"
names(as.data.frame(NewCountries))
## [1] "c..Germany....Euro....83.51705....4312886....44.7952....0.824545."
## [2] "c..Nigeria....Naira....200.9636....1006573....73.02055....0.421534."
## [3] "c..Venezuela....Bolivar.Fuerte....28.51583...NA...11.6944....18.2746."
Create the following columns in the pwt data set:
gdp_capita - GDP per capita (GDP divided by the
population)emp_ratio - Employment ratio (Employment divided by
the population)Calculate the mean and median for each of the two new variables. Also calculate the mean price. Place the values into a named vector and show the results.
(See 2.4 for column creation, 2.3 for mean and median, 2.2 for named vector creation.)
## 2 Create New Variables - adding columns
# creating the new variables
pwt$gdp <- pwt$gdp / pwt$pop
pwt$emp <- pwt$emp / pwt$pop
# calculate mean and median
MeanGdp <- mean(pwt$gdp, na.rm = TRUE)
MedianGdp <- median(pwt$gdp, na.rm = TRUE)
MeanEmp <- mean(pwt$emp, na.rm = TRUE)
MedianEmp <- median(pwt$emp, na.rm = TRUE)
# Mean price
MeanPrice <- mean(pwt$price, na.rm = TRUE)
# Create a vector
CountriesVector <- c(
MeanGdp = MeanGdp,
MedianGdp = MedianGdp,
MeanEmp = MeanEmp,
MedianEmp = MedianEmp,
MeanPrice = MeanPrice
)
# Show vector results
CountriesVector
## MeanGdp MedianGdp MeanEmp MedianEmp MeanPrice
## 2.199787e+04 1.373444e+04 4.286017e-01 4.267963e-01 5.778350e-01
Your manager argues that the company should only invest in countries that doing well or have very high prices.
Which countries have prices more than twice the average?
Which countries have a GDP per capita that is more than twice the average and an employment ratio that is more than 1.5 times the average?
Which countries satisfy either high prices or economic strength?
(See 2.2 for sub-setting variable based on a condition, 2.1 for logical comparisons)
## 3 Which Countries are the Top
# Countries that are more than twice the average
MoreThanTwiceAvgCountries <- pwt[
pwt$price > 2 * mean(pwt$price, na.rm = TRUE), "country"
]
MoreThanTwiceAvgCountries
## [1] "Bermuda" "Switzerland" "Cayman Islands" "Denmark"
## [5] "Iceland" "Norway"
# Venezuela
# Countries with more than twice average GDP AND 1.5 times the average employment ratio
MoreThanTwiceGdpAnd1.5Emp <- pwt[
pwt$gdp > 2 * mean(pwt$gdp, na.rm = TRUE) &
pwt$emp > 1.5 * mean(pwt$emp, na.rm = TRUE),
"country"
]
MoreThanTwiceGdpAnd1.5Emp
## [1] "Cayman Islands" "Luxembourg" "Qatar" "Singapore"
#results gave Cayman Islands, Luxembourg, Qatar, and Singapore
# Countries that satisfy either high prices or economic strength
CountriesThatSatisfy <- pwt[
pwt$price > 2 * mean(pwt$price, na.rm = TRUE) |
(pwt$gdp > 2 * mean(pwt$gdp, na.rm = TRUE) &
pwt$emp > 1.5 * mean(pwt$emp, na.rm = TRUE)),
"country"
]
CountriesThatSatisfy
## [1] "Bermuda" "Switzerland" "Cayman Islands" "Denmark"
## [5] "Iceland" "Luxembourg" "Norway" "Qatar"
## [9] "Singapore"
# results gave: Bermuda, Switzerland, Cayman Islands, Denmark, Iceland, Luxembourg, Norway, Qatar, Singapore
Your manager also has another theory - countries that use a type of
“Dollar” are economically stronger on average. Use the
stringr package to detect patterns in the currency
variable.
library(stringr)
Determine which countries use the Euro, which countries use the US Dollar, and which countries use any type of “Dollar”.
(See 2.6 for help with string patterns, and 2.1 for testing equality)
## 4. Country Currencies
# loading the stringr package
library(stringr)
# Euro Using Countries
EuroCountries <- pwt[str_detect(pwt$currency, "Euro"), "country"]
EuroCountries
## [1] "Austria" "Belgium" "Cyprus" "Spain" "Estonia"
## [6] "Finland" "France" "Greece" "Ireland" "Italy"
## [11] "Lithuania" "Luxembourg" "Latvia" "Malta" "Montenegro"
## [16] "Netherlands" "Portugal" "Slovakia" "Slovenia"
# results gave: Austria, Belgium, Cyprus, Spain, Estonia, Finland, France, Greece, Ireland, Italy, Lithuania, Luxembourg, Latvia, Malta, Montenegro, Netherlands, Portugal, Slovakia, Slovenia
# US Dollar Using Countries
UsDollarCountries <- pwt[str_detect(pwt$currency, "Dollar|USD"), "country"]
UsDollarCountries
## [1] "Australia" "Bahamas"
## [3] "Belize" "Bermuda"
## [5] "Barbados" "Brunei Darussalam"
## [7] "Canada" "Cayman Islands"
## [9] "Ecuador" "Fiji"
## [11] "Grenada" "Guyana"
## [13] "China, Hong Kong SAR" "Jamaica"
## [15] "Liberia" "Saint Lucia"
## [17] "Montserrat" "Namibia"
## [19] "New Zealand" "State of Palestine"
## [21] "Singapore" "El Salvador"
## [23] "Suriname" "Trinidad and Tobago"
## [25] "Taiwan" "United States of America"
## [27] "St. Vincent & Grenadines" "British Virgin Islands"
## [29] "Zimbabwe"
# results gave: Australia, Bahamas, Belize, Bermuda, Barbados, Brunei Darussalam, Canada, Cayman Islands, Ecuador, Fiji, Grenada, Guyana, China Hong Kong SAR, Jamaica, Liberia, Saint Lucia, Monteserrat, Namibia, New Zealands, State of Palestine, Singapore, El Salvador, Suriname, Trinidad and Tobadgo, Taiwan, USA, St. Vincent & Grenadines, British Virgin Islands, Zimbabwe
## side note: I don't know if I was actually supposed to type out all of the country results, but I couldn't tell if it was going to pop up in the code or not so I included the results just in case.
# any type of "Dollar" countries
AnyDollarCountries <- pwt[str_detect(pwt$currency, "Dollar"), "country"]
AnyDollarCountries
## [1] "Australia" "Bahamas"
## [3] "Belize" "Bermuda"
## [5] "Barbados" "Brunei Darussalam"
## [7] "Canada" "Cayman Islands"
## [9] "Ecuador" "Fiji"
## [11] "Grenada" "Guyana"
## [13] "China, Hong Kong SAR" "Jamaica"
## [15] "Liberia" "Saint Lucia"
## [17] "Montserrat" "Namibia"
## [19] "New Zealand" "State of Palestine"
## [21] "Singapore" "El Salvador"
## [23] "Suriname" "Trinidad and Tobago"
## [25] "Taiwan" "United States of America"
## [27] "St. Vincent & Grenadines" "British Virgin Islands"
## [29] "Zimbabwe"
# Results gave (another long list): Australia, Bahamas, Belize, Bermuda, Barbados, Brunei Darussalam, Canada, Cayman Islands, Ecuador, Fiji, Grenada, Guyana, China Hong Kong SAR, Jamaica, Liberia, Saint Lucia, Monteserrat, Namibia, New Zealand, State of Palestine, Singapore, El Salvador, Suriname, Trinidad and Tobago, Taiwan, USA, St. Vincent & Grenadines, British Virgin Islands, Zimbabwe
Calculate and show the mean GDP per capita for countries that use the Euro, the US Dollar, and any Dollar. How does that compare to the mean? Should we use this as a condition for expanding our company?
Calculate and show the mean price for countries that use the different types. How does that compare to the mean?
## 5. Currency Strength
# Mean GDP for Euro using Countries
MeanGdpEuroCountries <- mean(pwt$gdp[pwt$currency %in% EuroCountries], na.rm = TRUE)
MeanGdpEuroCountries
## [1] NaN
# Mean GDP for US Dollar using Countries
MeanGdpUsdCountries <- mean(pwt$gdp[pwt$currency %in% UsDollarCountries], na.rm = TRUE)
MeanGdpUsdCountries
## [1] NaN
# Mean GDP for Any Dollar using Countries
MeanGdpAnyDollarCountries <- mean(pwt$gdp[pwt$currency %in% AnyDollarCountries], na.rm = TRUE)
MeanGdpAnyDollarCountries
## [1] NaN
# The results gave NaN (not a number), so by default, I would say that is technically lower than the mean. I don't think this would be a good set of information to go off of to determine if we should expand the company, because I think the results of NaN mean it is inconclusive if that would be a good business decision or not.