R vs. Packages
We have already seen data types and some basic properties. We will continue to understand more about data handling and related tools. Please note that everything related with data can either be done
When it comes to packages, some packages are used more common than others. We will try to use both approaches. I encourage you to check alternative packages. There is always something cool!
Getting Data III
Another way of getting data to R - if available - is to use packages. Luckily, there are many useful packages for this purpose. Just to name a few:
WDI allows accessing Worldbank data.
NHSRdatasets allows accessing NHS data.
fredr allows accessing St. Louis FED data.
ecb allows accessing ECB data.
pdfetch allows accessing public economic and financial data.
Why visualization matters?
Group 1 Data
| 1 |
10 |
8.04 |
| 2 |
8 |
6.95 |
| 3 |
13 |
7.58 |
| 4 |
9 |
8.81 |
| 5 |
11 |
8.33 |
| 6 |
14 |
9.96 |
| 7 |
6 |
7.24 |
| 8 |
4 |
4.26 |
| 9 |
12 |
10.84 |
| 10 |
7 |
4.82 |
| 11 |
5 |
5.68 |
Group 2 Data
| 1 |
10 |
9.14 |
| 2 |
8 |
8.14 |
| 3 |
13 |
8.74 |
| 4 |
9 |
8.77 |
| 5 |
11 |
9.26 |
| 6 |
14 |
8.10 |
| 7 |
6 |
6.13 |
| 8 |
4 |
3.10 |
| 9 |
12 |
9.13 |
| 10 |
7 |
7.26 |
| 11 |
5 |
4.74 |
Group 3 Data
| 1 |
10 |
7.46 |
| 2 |
8 |
6.77 |
| 3 |
13 |
12.74 |
| 4 |
9 |
7.11 |
| 5 |
11 |
7.81 |
| 6 |
14 |
8.84 |
| 7 |
6 |
6.08 |
| 8 |
4 |
5.39 |
| 9 |
12 |
8.15 |
| 10 |
7 |
6.42 |
| 11 |
5 |
5.73 |
Group 4 Data
| 1 |
8 |
6.58 |
| 2 |
8 |
5.76 |
| 3 |
8 |
7.71 |
| 4 |
8 |
8.84 |
| 5 |
8 |
8.47 |
| 6 |
8 |
7.04 |
| 7 |
8 |
5.25 |
| 8 |
19 |
12.50 |
| 9 |
8 |
5.56 |
| 10 |
8 |
7.91 |
| 11 |
8 |
6.89 |
Visualization: Anscombe’s quartet
Practice
- Using the built-in data set airquality, create a scatter plot comparing the Temp and Ozone variables. What can you say about the graph?
- Create a histogram of the Temp variable. Try to adjust bins so that there are (approximately) 20 bins.
- Plot the frequency of observations in each Month. What can you see?
- Create a boxplot to view the distribution of Ozone for each month
ggplot
ggplot is a package that is developed to put some structure for visualization. The official cheet sheet is here.
Plot = data + Aesthetics + Geometry
- Geometries: Visual elements used for our data e.g., point, line, histogram etc.
- Aesthetics: Defines the data columns which affect various aspects of the geom e.g., x,y. color,fill size etc.