Here I am tasked with creating three vectors – x, y, and z – as shown below.
x<-c(5,10,15,20,25,30)
y<-c(-1,NA,75,3,5,8)
z<-5
Now I multiply the first two vectors (x & y) by z, and store the resulting product in two new objects. I also print the new objects.
new_x<-x*z
new_y<-y*z
print(new_x)
## [1] 25 50 75 100 125 150
print(new_y)
## [1] -5 NA 375 15 25 40
Here, I will replace the missing element of the vector “new_y” with the value 2.5, using the ifelse() function. I will also print this modified vector.
new_y<-ifelse(test = is.na(new_y) == TRUE, yes = 2.5, no = new_y)
print(new_y)
## [1] -5.0 2.5 375.0 15.0 25.0 40.0
As you can see, RMarkdown has added a new decimal place to the other numberes in “new_y”, because one of its elements is now 2.5.
Here, I will load PRB data from my professor’s github account.
#using the excellent readr package
library(readr)
mydata<-read_csv(file = "https://raw.githubusercontent.com/coreysparks/data/master/PRB2008_All.csv")
## Parsed with column specification:
## cols(
## .default = col_integer(),
## Country = col_character(),
## Continent = col_character(),
## Region = col_character(),
## Population. = col_double(),
## Rate.of.natural.increase = col_double(),
## ProjectedPopMid2025 = col_double(),
## ProjectedPopMid2050 = col_double(),
## IMR = col_double(),
## TFR = col_double(),
## PercPop1549HIVAIDS2001 = col_double(),
## PercPop1549HIVAIDS2007 = col_double(),
## PercPpUnderNourished0204 = col_double(),
## PopDensPerSqMile = col_double()
## )
## See spec(...) for full column specifications.
Here, I will print the first ten country names from our new data frame
print(mydata$Country[1:10])
## [1] "Afghanistan" "Albania" "Algeria"
## [4] "Andorra" "Angola" "Antigua and Barbuda"
## [7] "Argentina" "Armenia" "Australia"
## [10] "Austria"
There are lots of ways we can count data in R. Here are two of them:
#Assuming there are no missing countries, we can ask for a summary of mydata$Country
summary(mydata$Country)
## Length Class Mode
## 209 character character
#Or, we can ask for the number of rows in the data frame (assuming each observation is a country, with no missing countries)
nrow(mydata)
## [1] 209
If we wanted to make extra-super-sure that no countries are missing (that is, no countries are “NA”), we could call the following function, which will only count those observations which are NOT NA:
length(which(!is.na(mydata$Country)))
## [1] 209
Here, I will discover how many countries are missing the e0Total (life expectancy) variable.
#similar to the last exercise, we just ask R for the length of NA's in e0Total
length(which(is.na(mydata$e0Total)))
## [1] 2
So, we see that there are two countries missing e0Total.
Finally, I will see specifically which countries are missing the e0Total variable:
#Here, I will call for a subset of country names where e0Total equals NA
subset(mydata$Country,is.na(mydata$e0Total))
## [1] "Andorra" "Monaco"
We see that Andorra and Monaco are missing e0Total (life expectancy) data – probably on account of them being very small countries.