Rithika Kumar
September 12, 2019
Download country_profile_variables.csv from Canvas. Place in your directory
Use read.csv() to load the dataset
Now we have a data set to explore!
The first thing we want to know when we start working with a new DF is:
Each column name that we have here is the name of the variable in the dataset
#Displaying the top 2 rows within our dataset
head(country.profile,2)
## country Region Surface.area..km2. Population.in.thousands..2017.
## 1 Afghanistan SouthernAsia 652864 35530TIP: Use ? to get hep on a function eg. ?class()
$ sign lets you call a column from the df)class(country.profile$country)
## [1] "factor"
class(country_prof$Sex.ratio..m.per.100.f..2017.)
#[1] "numeric"Class of Objects (Source: adapted from Evelyne Brie)
summary(country.profile$country)
summary(country.profile$Sex.ratio..m.per.100.f..2017.)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# -99.0 96.4 99.0 100.2 101.7 301.2 If you only call country.profile$country without any funciton it will display the whole column. Give it a try.
We find strangely that there is a country with a sex ratio of -99. We don’t want this in our df and so let’s try to get rid of it.
Before going ahead, let’s remind outselves of indexing ie the square brackets
country_prof$SR_updated <- NA # Creating an empty variable
country_prof$SR_updated[country.profile$
Sex.ratio..m.per.100.f..2017. ==-99] <-0
# All countries with SR = 99 are attributed a value of 1
country_prof$SR_updated[country.profile$
Sex.ratio..m.per.100.f..2017. > -99] <- 1
# All countries with SR > -99 are attributed a value of 0Saving our New Dataset as an .Rdata File Using save()
Let’s save our new “country2” dataset as an .Rdata file, a format designed for use with R.
Now go to your WD and see if this file has been saved there.
Relevant function: class()
Relevant function: table()
Looking at this table that you just created, identify the region that has the most number of countries within it?
Now create a subset called w.asia of the countries that lie in Western Asia
Relevant function: subset()
Relevant function: max() or summary()