yaml

title: “Lab 2: Merging and Analyzing Data” author: “Grayson Grabois” output: html_document editor_options: chunk_output_type: console —

Question 1: Loading and Exploring the Dataset

First, we will load bth data sets. Because they are CSV files, I will use the read.csv function.

ag_output <- read.csv("ag_output.csv")
gdppc <- read.csv("gdppc.csv")

###shows first few rows of data so we can visualize what the dataset looks like
head(ag_output)
##         Country Country.Code   Outall_Q     Land_Q   Labor_Q
## 1       Nigeria          NGA 67238499.9 65431.1311 26809.343
## 2         Benin          BEN  4644196.3  4658.0306  1341.492
## 3 Cote d'Ivoire          CIV 13861494.7 14790.5359  4715.421
## 4         Ghana          GHA 15922946.4  9405.3255  5578.694
## 5        Guinea          GIN  5451421.8  6555.7681  2432.623
## 6 Guinea-Bissau          GNB   519897.2   699.9943   343.297
head(gdppc)
##                  Country.Name Country.Code GDPPerCapita
## 1                       Aruba          ABW   30559.5335
## 2 Africa Eastern and Southern          AFE    1628.0245
## 3                 Afghanistan          AFG     357.2612
## 4  Africa Western and Central          AFW    1777.2350
## 5                      Angola          AGO    2929.6945
## 6                     Albania          ALB    6846.4261
###this checks the dimentions in each dataset so each observation as a row and then each variable as columns 
dim(ag_output)
## [1] 179   5
dim(gdppc)
## [1] 266   3

Question 2: Merging Data

# Merges datasets by their common Country.Code
ag_merge <- merge(ag_output, gdppc, by = "Country.Code")

# Check dimensions of the merged dataset making sure nothing is wrong
dim(ag_merge)
## [1] 174   7
# Attaches merged data set so its easier to find.
attach(ag_merge)

head(ag_merge)
##   Country.Code              Country Outall_Q     Land_Q  Labor_Q
## 1          AFG          Afghanistan  6736355 11677.0378 3523.311
## 2          AGO               Angola  7820210  6755.0958 7102.524
## 3          ALB              Albania  2219637   932.8246  443.723
## 4          ARE United Arab Emirates  1302831   247.4919   88.776
## 5          ARG            Argentina 76191086 48593.2519 1421.964
## 6          ARM              Armenia  1675969   872.1348  750.752
##           Country.Name GDPPerCapita
## 1          Afghanistan     357.2612
## 2               Angola    2929.6945
## 3              Albania    6846.4261
## 4 United Arab Emirates   49899.0653
## 5            Argentina   13935.6811
## 6              Armenia    6571.9745

Question 3: Examining Association Between Variables

# This creates a  scatterplot with Outall_Q (agricultural output) on the vertical axis and the total amount of agriculural land on the x axis. 
#I notice that most countries have only a little bit of agricultural land and they all have similar outputs. And then any country with more than 50000 units of land has an output that scatters.

plot(Outall_Q ~ Land_Q, 
     main = "Agricultural Output vs. Land Area",
     xlab = "Agricultural Land", 
     ylab = "Agricultural Output",
     col = "blue")

Question 4: Creating New Variables

# This creates a new variable called OutPerHec which diviides the output of agricultural by the land showing the output. 
OutPerHec <- Outall_Q / Land_Q

#shows first few values of this new output variable
head(OutPerHec)
## [1]  576.8891 1157.6756 2379.4799 5264.1367 1567.9355 1921.6855
# Creates a histogram of the output variable.
hist(OutPerHec, 
     main = " Agricultural Output per Hectare",
     xlab = "Output per Hectare", 
     col = "blue", 
     border = "black")

###I observe that the histogram is right skewed which means most of the data is clustered on the left side. This can indicate that most countries have a similar output. This aligns with the scatterplot above that shows that most countries have a smaller amount of land and outputs compared to the few countries that have more. 

Question 5: Associations Between Variables

# This creates a scatterplot with output on the y axis and GDP on the x axis. 
plot(OutPerHec ~ GDPPerCapita, 
     main = "Output per Hectare vs. GDP Per Capita",
     xlab = "GDP ", 
     ylab = "Output", 
     col = "blue" )

#This plot is more spread out but still mostly clustered near the origin. But it does show that countries with a high output also have a high GDP and both variables rise and fall together.

Question 6: More Associations Between Variables

# This creates a new variable that is the amount of output divided b the amount of labor involved in productionto show how much output each laborer produces.
OutPerLab <- Outall_Q / Labor_Q

# This shows the first few values of OutPerLab
head(OutPerLab)
## [1]  1911.939  1101.047  5002.304 14675.489 53581.586  2232.387
# Create histogram of OutPerLab
hist(OutPerLab, 
     main = "Agricultural Output per Laborer",
     xlab = "Output per Laborer", 
     col = "blue")

#This is probably the most right skewed of all the graphs showing that a few countries have managed to be way more productive with their laborers than others. 

# This creates a scatterplot comparing the agricultural output per laborer on the y axis and the GDP per capita on the x axis.
plot(OutPerLab ~ GDPPerCapita, 
     main = "Output per Laborer vs. GDP Per Capita",
     xlab = "GDP Per Capita ($)", 
     ylab = "Output per Laborer ($1000s per 1000 workers)", 
     col = " blue")

###again, this is mostly clustered towards the origin but is also more spread out compared to other plots. This could indicate that countries with higher GDPs are able to be more productive with each worker.

Question 7: What about the United States?

# This finds the Agricultural Output for the USA
OutPerHec[Country.Code == "USA"]
## [1] 1734.344
# This finds the Agricultural Output per Laborer in America
OutPerLab[Country.Code == "USA"]
## [1] 155777.8

Question 8: What countries maximize their land?

# This returns the countries where OutPerHec > 10,000
Country.Name[OutPerHec > 10000]
## [1] "Bahrain"           "Brunei Darussalam" "Ireland"          
## [4] "Kuwait"            "Malta"             "Netherlands"      
## [7] "Norway"
# This makes a new variable that sums the total global output and then divides that by the total amount of agricultural land. To get the output per Hectare
global_outperhec <- sum(Outall_Q) / sum(Land_Q)

# This makes a new variable that sums the total global output and then divides that by the total labor. This gets the total output per laborer
global_outperlab <- sum(Outall_Q) / sum(Labor_Q)

# Print the results
global_outperhec
## [1] 2038.442
global_outperlab
## [1] 5028.743