Getting Started with R
You will familiarize yourself with the R computing language, R Studio
environment, and R Markdown language. If you are already familiar with
these, then lucky you, you are ahead of me! If not, read on.
There are two ways to complete this project: Downloading and
installing R on your own machine (or using an already installed version
of R on a campus computer), or running this notebook in the cloud
version of R.
Installing R on Your Personal Machine
- Download R and RStudio
(recommended over using R alone)
- (Windows) Install Rtools. Note
that there are instructions on this page for creating a text file called
.Renviron that identifies a path for R to use to find Rtools. It seemse
complicated, but is fairly straightforward in practice.
Using R Studio in the Cloud - Preferred
- I have set up a Jupyter Notebook for use in this class to make it
really easy for you to run these programs in a cloud version of R
Studio. The link is in the class navigation pane.
R Markdown and Notebooks
This is the html document produced by an R Markdown Notebook. You can
download the code to run in R above (Code>Download Rmd). When you
execute code within the notebook, the results appear beneath the code.
You can download this file, experiment with it, and change it to suit
your needs. When you are ready to finalize your results, make sure that
you have run all of the relevant chunks of code and then click preview
(do not knit!) above to create a html document of your own with code and
results included.
The grey box below is called a chunk. Try executing this chunk by
clicking the green arrow on the far left, the Run button within
the chunk, or by placing your cursor inside it and pressing
Ctrl+Shift+Enter.
## This is a comment and the # tells R not to read it. Run the command below to print out the message "Hello World."
print ("Hello World")
[1] "Hello World"
Once you’ve installed RStudio, you have to install your packages (but
only do this once). Open a script (File > New File > R Script) and
type the commands below (or run them in the notebook).
If you use the class jupyterhub, you may not need to install any
packages.
## Run this code if you haven't run it before, but only needs to be run once after you have installed R.Not needed in JupyterHub.
if (!require("pacman")) install.packages("pacman") #pacman is the package that installs packages nicely
Loading required package: pacman
Warning: there is no package called ‘pacman’trying URL 'https://cran.rstudio.com/src/contrib/pacman_0.5.1.tar.gz'
Content type 'application/x-gzip' length 274400 bytes (267 KB)
==================================================
downloaded 267 KB
* installing *source* package ‘pacman’ ...
** package ‘pacman’ successfully unpacked and MD5 sums checked
** using staged installation
** R
** inst
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (pacman)
The downloaded source packages are in
‘/tmp/RtmpXRhW6O/downloaded_packages’
Updating HTML index of packages in '.Library'
Making 'packages.html' ... done
pacman::p_load(rmarkdown, dplyr, gapminder, tidyverse, xml2, ggplot2, dplyr, lifecycle) #these are the packages you may need
After installing your packages, you need to load the ones you will be
using in your program every time.
## Only needs to be run once per session.
library(ggplot2)
library(gapminder)
library(dplyr)
Attaching package: ‘dplyr’
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
Hint: You can also include the commands above at the start of a
script if you decide to create and run your own code.
Introduction to Data Work in R
#Below, you can practice a calculation in R. If you run the code below, R will output the sum.
3+1
[1] 4
#But maybe you want to perform an operation and save the results in a variable so that you can use them later in other calculations?
#This code shows you how to assign a value to a variable x using a calculation, and then prints the value of x.
x<-3+1
x
[1] 4
#In fact, R thinks in vectors and matrices works best when you assign a bunch of values to a variable.
x1<-c(1,2,3,4) #This makes a 1 dimensional vector
x2<-matrix(c(1,2,3,4), nrow=2, ncol=2) #This makes a 2x2 matrix
x1
x2
You can try modifying the numbers above yourself and making different
size matrices, etc.
Data Frames
Data frames are a useful way that R organizes data. These are
flexible versions of matrices that have special properties that make
them easy to work with. Later, you will import some actual data from the
internet into a data frame, but for now we will use the gapminder data
that is included in the package you loaded earlier.
Gapminder (gapminder.org) data is data on life expectancy and GDP by
country over time. It can be used to calculate Preston Curves, as
discussed in class. For now, let’s just perform a few operations on
it.
#Some quick summary statistics
summary(gapminder)
country continent year lifeExp pop
Afghanistan: 12 Africa :624 Min. :1952 Min. :23.60 Min. :6.001e+04
Albania : 12 Americas:300 1st Qu.:1966 1st Qu.:48.20 1st Qu.:2.794e+06
Algeria : 12 Asia :396 Median :1980 Median :60.71 Median :7.024e+06
Angola : 12 Europe :360 Mean :1980 Mean :59.47 Mean :2.960e+07
Argentina : 12 Oceania : 24 3rd Qu.:1993 3rd Qu.:70.85 3rd Qu.:1.959e+07
Australia : 12 Max. :2007 Max. :82.60 Max. :1.319e+09
(Other) :1632
gdpPercap
Min. : 241.2
1st Qu.: 1202.1
Median : 3531.8
Mean : 7215.3
3rd Qu.: 9325.5
Max. :113523.1
# Load relevant data into data frame DF_2007 for the year 2007 and the continent of Africa.
## What is the median life expectancy in the continent in 2007?
DF<-data.frame(gapminder)
DF_2007<-filter(DF, year==2007 & continent=="Africa")
summary(DF_2007)
country continent year lifeExp pop
Algeria : 1 Africa :52 Min. :2007 Min. :39.61 Min. : 199579
Angola : 1 Americas: 0 1st Qu.:2007 1st Qu.:47.83 1st Qu.: 2909226
Benin : 1 Asia : 0 Median :2007 Median :52.93 Median : 10093310
Botswana : 1 Europe : 0 Mean :2007 Mean :54.81 Mean : 17875763
Burkina Faso: 1 Oceania : 0 3rd Qu.:2007 3rd Qu.:59.44 3rd Qu.: 19363654
Burundi : 1 Max. :2007 Max. :76.44 Max. :135031164
(Other) :46
gdpPercap
Min. : 277.6
1st Qu.: 863.0
Median : 1452.3
Mean : 3089.0
3rd Qu.: 3993.5
Max. :13206.5
Plot the relationship below. The plot represents the variation in
life expectancy (within a single continent) in a single period in time.
The level of health technology should be (approximately) the same across
all points and so the plotted relationship is like a snapshot of the
variation in life expectancy with income at that particular time and
state of the world.
## Create a scatter plot of life expectancy and GDP per capita in 2007 for the continent of Africa.
ggplot(DF_2007, aes(x=gdpPercap, y=lifeExp, color=continent))+geom_point()

Now let’s look at life expectancy as a function of income for a
single country at different periods of time.
# Load relevant data into data frame DF_country for a country of your choosing for all years of data.
DF_country<-filter(gapminder, country=="Japan")
summary(DF_country)
country continent year lifeExp pop
Japan :12 Africa : 0 Min. :1952 Min. :63.03 Min. : 86459025
Afghanistan: 0 Americas: 0 1st Qu.:1966 1st Qu.:70.75 1st Qu.: 99576898
Albania : 0 Asia :12 Median :1980 Median :76.25 Median :116163724
Algeria : 0 Europe : 0 Mean :1980 Mean :74.83 Mean :111758808
Angola : 0 Oceania : 0 3rd Qu.:1993 3rd Qu.:79.69 3rd Qu.:124736076
Argentina : 0 Max. :2007 Max. :82.60 Max. :127467972
(Other) : 0
gdpPercap
Min. : 3217
1st Qu.: 9030
Median :17997
Mean :17751
3rd Qu.:27270
Max. :31656
Plot the relationship below. The plot represents the evolution of
life expectancy as the economy grows for one country, incorporating the
social, political, and health contexts of that particular country, as
well as changing health technology.
## Create a scatter plot of life expectancy and GDP per capita for the country of Japan.
ggplot(DF_country, aes(x=gdpPercap, y=lifeExp, color=country))+geom_point()

To Do
To complete this assignment, answer the following questions. You can
submit these results however you choose, but a simple option would be to
modify this notebook to answer the questions above, execute the code,
and hit “preview” in R Studio to create a html document that can be
submitted with code and results.
- What is the median life expectancy in the world in 1952? In
2007?
DF_1952 <- filter(gapminder, year == 1952)
DF_2007 <- filter(gapminder, year == 2007)
med_1952_LF <- median(DF_1952$lifeExp)
med_2007_LF <- median(DF_2007$lifeExp)
The median life expectancy in the world in 1952 is 45.1355, and in
2007 is 71.9355.
- Choose one continent other than Africa. What is the median life
expectancy in that continent in 1952 and 2007? Plot the life expectancy
against GDP per capita for every country in that continent for the year
2007.
DF_1952_Asia <- filter(DF_1952, continent == "Asia")
DF_2007_Asia <- filter(DF_2007, continent == "Asia")
med_1952_Asia_LF <- median(DF_1952_Asia$lifeExp)
med_2007_Asia_LF <- median(DF_2007_Asia$lifeExp)
I chose Asia. The median life expectancy in Asia in 1952 is 44.869
and in 2007 is 72.396.
Then the life expectancy against GDP per capita for every country in
Asia in 2007 is below.
ggplot(DF_2007_Asia, aes(x=gdpPercap, y=lifeExp, color=country))+geom_point()

- Choose a country other than Japan. Plot the life expectancy against
GDP per capita from 1952 to 2007.
I chose Vietnam. Here is the plot of the life expectancty against GDP
per caipta from 1952 to 2007 in Vietnam.
DF_Vietnam <- filter(gapminder, country == "Vietnam")
ggplot(DF_Vietnam, aes(x=gdpPercap, y=lifeExp, color=country))+geom_point()

- What is your experience with R? If you haven’t used R before, do you
have experience with any other programming languages or statistical
software (SPSS, SAS, Stata, etc.) or Excel?
I have taken some statistics courses and learned how to use R in
those courses. So, I know fundamental knowledge about R and can write
easy and unsophisticated code in R. In addition, I took some CS courses
and learned Python, Java, and Matlab. However, I already forgot how to
code in those languages.
---
title: "Econ 448 First Week Exercise"
author: "Melissa Knox"
output: html_notebook
---
## Getting Started with R
You will familiarize yourself with the R computing language, R Studio environment, and R Markdown language. If you are already familiar with these, then lucky you, you are ahead of me! If not, read on.

There are two ways to complete this project: Downloading and installing R on your own machine (or using an already installed version of R on a campus computer), or running this notebook in the cloud version of R.  

### Installing R on Your Personal Machine
* Download [R](https://cran.r-project.org/) and [RStudio](https://www.rstudio.com/products/rstudio/download/) (recommended over using R alone)
* (Windows) Install [Rtools](https://cran.r-project.org/bin/windows/Rtools/).  Note that there are instructions on this page for creating a text file called .Renviron that identifies a path for R to use to find Rtools.  It seemse complicated, but is fairly straightforward in practice.

### Using R Studio in the Cloud - Preferred
* I have set up a Jupyter Notebook for use in this class to make it really easy for you to run these programs in a cloud version of R Studio.  The link is in the class navigation pane.  

## R Markdown and Notebooks
This is the html document produced by an [R Markdown](http://rmarkdown.rstudio.com) Notebook. You can download the code to run in R above (Code>Download Rmd). When you execute code within the notebook, the results appear beneath the code. You can download this file, experiment with it, and change it to suit your needs. When you are ready to finalize your results, make sure that you have run all of the relevant chunks of code and then click preview (do not knit!) above to create a html document of your own with code and results included.

The grey box below is called a chunk.  Try executing this chunk by clicking the green arrow on the far left, the *Run* button within the chunk, or by placing your cursor inside it and pressing *Ctrl+Shift+Enter*.

```{r}
## This is a comment and the # tells R not to read it.  Run the command below to print out the message "Hello World."
print ("Hello World")
```

Once you've installed RStudio, you have to install your packages (but only do this once). Open a script (File > New File > R Script) and type the commands below (or run them in the notebook).

If you use the class jupyterhub, you may not need to install any packages.
```{r}
## Run this code if you haven't run it before, but only needs to be run once after you have installed R.Not needed in JupyterHub.
if (!require("pacman")) install.packages("pacman") #pacman is the package that installs packages nicely
pacman::p_load(rmarkdown, dplyr, gapminder, tidyverse, xml2, ggplot2, dplyr, lifecycle) #these are the packages you may need
```

After installing your packages, you need to load the ones you will be using in your program every time.  
```{r, warning=False}
## Only needs to be run once per session.
library(ggplot2)
library(gapminder)
library(dplyr)
```

Hint: You can also include the commands above at the start of a script if you decide to create and run your own code.

## Introduction to Data Work in R
 
```{r}
#Below, you can practice a calculation in R.  If you run the code below, R will output the sum.
3+1
```

```{r}
#But maybe you want to perform an operation and save the results in a variable so that you can use them later in other calculations?
#This code shows you how to assign a value to a variable x using a calculation, and then prints the value of x.
x<-3+1
x
```


```{r}
#In fact, R thinks in vectors and matrices works best when you assign a bunch of values to a variable.  
x1<-c(1,2,3,4) #This makes a 1 dimensional vector
x2<-matrix(c(1,2,3,4), nrow=2, ncol=2) #This makes a 2x2 matrix
x1
x2
```
You can try modifying the numbers above yourself and making different size matrices, etc.

## Data Frames
Data frames are a useful way that R organizes data.  These are flexible versions of matrices that have special properties that make them easy to work with.  Later, you will import some actual data from the internet into a data frame, but for now we will use the gapminder data that is included in the package you loaded earlier.  

Gapminder (gapminder.org) data is data on life expectancy and GDP by country over time.  It can be used to calculate Preston Curves, as discussed in class.  For now, let's just perform a few operations on it.

```{r}
#Some quick summary statistics
summary(gapminder)
```

```{r}
# Load relevant data into data frame DF_2007 for the year 2007 and the continent of Africa.
## What is the median life expectancy in the continent in 2007?
DF<-data.frame(gapminder)
DF_2007<-filter(DF, year==2007 & continent=="Africa")
summary(DF_2007)
```

Plot the relationship below.  The plot represents the variation in life expectancy (within a single continent) in a single period in time.  The level of health technology should be (approximately) the same across all points and so the plotted relationship is like a snapshot of the variation in life expectancy with income at that particular time and state of the world.
```{r}
## Create a scatter plot of life expectancy and GDP per capita in 2007 for the continent of Africa.
ggplot(DF_2007, aes(x=gdpPercap, y=lifeExp, color=continent))+geom_point()

```

Now let's look at life expectancy as a function of income for a single country at different periods of time.  
```{r}
# Load relevant data into data frame DF_country for a country of your choosing for all years of data.
DF_country<-filter(gapminder, country=="Japan")
summary(DF_country)
```

Plot the relationship below.  The plot represents the evolution of life expectancy as the economy grows for one country, incorporating the social, political, and health contexts of that particular country, as well as changing health technology.
```{r}
## Create a scatter plot of life expectancy and GDP per capita for the country of Japan.  
ggplot(DF_country, aes(x=gdpPercap, y=lifeExp, color=country))+geom_point()
```
****
## To Do
To complete this assignment, answer the following questions. You can submit these results however you choose, but a simple option would be to modify this notebook to answer the questions above, execute the code, and hit "preview" in R Studio to create a html document that can be submitted with code and results.

1. What is the median life expectancy in the world in 1952? In 2007?
```{r}
DF_1952 <- filter(gapminder, year == 1952)
DF_2007 <- filter(gapminder, year == 2007)
med_1952_LF <- median(DF_1952$lifeExp)
med_2007_LF <- median(DF_2007$lifeExp)
```
The median life expectancy in the world in 1952 is `r med_1952_LF`, and in 2007 is `r med_2007_LF`.

2. Choose one continent other than Africa.  What is the median life expectancy in that continent in 1952 and 2007? Plot the life expectancy against GDP per capita for every country in that continent for the year 2007.
```{r}
DF_1952_Asia <- filter(DF_1952, continent == "Asia")
DF_2007_Asia <- filter(DF_2007, continent == "Asia")
med_1952_Asia_LF <- median(DF_1952_Asia$lifeExp)
med_2007_Asia_LF <- median(DF_2007_Asia$lifeExp)
```
I chose Asia. 
The median life expectancy in Asia in 1952 is `r med_1952_Asia_LF` and in 2007 is `r med_2007_Asia_LF`.

Then the life expectancy against GDP per capita for every country in Asia in 2007 is below.
```{r}
ggplot(DF_2007_Asia, aes(x=gdpPercap, y=lifeExp, color=country))+geom_point()
```


3. Choose a country other than Japan.  Plot the life expectancy against GDP per capita from 1952 to 2007.

I chose Vietnam. Here is the plot of the life expectancty against GDP per caipta from 1952 to 2007 in Vietnam.

```{r}
DF_Vietnam <- filter(gapminder, country == "Vietnam")
ggplot(DF_Vietnam, aes(x=gdpPercap, y=lifeExp, color=country))+geom_point()
```


4. What is your experience with R? If you haven't used R before, do you have experience with any other programming languages or statistical software (SPSS, SAS, Stata, etc.) or Excel?

I have taken some statistics courses and learned how to use R in those courses. So, I know fundamental knowledge about R and can write easy and unsophisticated code in R. In addition, I took some CS courses and learned Python, Java, and Matlab. However, I already forgot how to code in those languages. 

****
