Graphing with R

This file will show you how to create graphs for qualitative and quantitative data.

First read in the Data Frame. There are many Data Frames that are available to you to read into R Studio. Just Copy and paste the Data Frame command from Canvas into R Studio. Here is an example of one we will use in this help document. The command head(Data Frame) allows you to see the name of the variables and some of the first few rows of the Data Frame.

Favcolor <- read.csv("https://krkozak.github.io/MAT160/fav_color_data.csv")
head(Favcolor)
##   color  X X.1 X.2 X.3 X.4 X.5
## 1   red NA  NA  NA  NA  NA  NA
## 2  blue NA  NA  NA  NA  NA  NA
## 3  blue NA  NA  NA  NA  NA  NA
## 4 other NA  NA  NA  NA  NA  NA
## 5 green NA  NA  NA  NA  NA  NA
## 6   red NA  NA  NA  NA  NA  NA

Qualitative Plots

Bar Plot

To create a barplot: gf_bar(~variable, data=Data Frame, …)

All bar graphs are created with the same command. The … means that there are options that you can add. You replace the word “variable” with the variable in your Data Frame that you are interested in, and you replace the word “Data Frame” with the name of the Data Frame that you are working with. Using the Data Frame Favcolor, you can create a bar graph using the command

gf_bar(~color, data=Favcolor)

This command creates a basic graph. If you want to add a title, you want to customize the labels on the x and y axis, and you don’t like the black of the graph, then there are commands that allows you title the graph, pick new labels, and pick a color. To add a title: title=“title you want”), label on the x axis: xlab=“label you want”, label on the y-axis: ylab=“label you want”, and a color: fill=“color you want the bars to be”. The following graph was created using all of these options. You don’t need to do all. As an example, R automatically labels the y-axis as count, so you don’t need that option.

gf_bar(~color, data=Favcolor, title="Favorite color", xlab="Favorite color", ylab="Count", fill="blue")

Dot Plot

A dot plot is a graph that is similar to a bar graph, but you can see the actual data values. A dot plot is gf_dotplot(~count, data=Data frame, title=“title you want”).

An example, for the favorite color Data Frame.

gf_dotplot(~color, data=Favcolor, title="Favorite Color")
## `stat_bindot()` using `bins = 30`. Pick better value with `binwidth`.

Useful plots for quantitative data

There are many ways in R to access a Data Frame. One way is to type the name of the Data Frame and then before pressing enter type %>%. %>% in a way links this line to the next line. The %>% is called a piping command and it allows you to use the first line in the second line. Then you can create the type of graph you want. If you want to put two graphs on the same plot, you can put %>% at the end of the second line and then type the name of the second graph you want on the next line. It is most useful for overlaying graphs. Another way is to put data=Data Frame in the command line. The third way is to use Data Frame$variable in the command sign. We will use data=Data Frame in the command line.

Dot plot

You can also do a dot plot for quantitative data. Again the command is gf_dotplot(variable, data=Data Frame, title=”name of the graph”)

Recurance <- read.csv("https://krkozak.github.io/MAT160/cancer_recurance.csv")
head(Recurance)
##   tummor
## 1     19
## 2     18
## 3     17
## 4      1
## 5     21
## 6     22
gf_dotplot(~tummor, data=Recurance, title="Recurance of tummor after chemotherapy")
## `stat_bindot()` using `bins = 30`. Pick better value with `binwidth`.

Histogram

To create a histogram: gf_histogram(variable, data=Data Frame, title=“type the title you want”, xlab=“type the label you want for the horizontal axis”) – produces histogram.

gf_histogram(~tummor, data=Recurance, title="Recurance of tummor after chemmotherapy")

Frequency Poloygon

To create a frequency polygon: gf_freqpoly(variable, data=Data Frame, title=”type the title you want”, xlab=”type the label you want for the horizontal axis”)

gf_freqpoly(~tummor, data=Recurance, title="Recurance of tummor after chemotheary")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Density Plot:

To produce a density plot: gf_density(variable, data=Data Frame, title=“type the title you want”, xlab=“type the label you want for the horizontal axis”) As an example

gf_density(~tummor, data=Recurance, title="Recurance of tummor after chemmotherapy")

Overlapping two or more graphs

You can create a density plot with a histogram plot over it. This is the advantage of using the %>% notation. As an example:

gf_density(~tummor, data=Recurance, title="Recurance of tummor after chemmotherapy")%>% 
gf_histogram(~tummor, data=Recurance, title="Recurance of tummor after chemmotherapy")

Filtering for a Quantitative variable:

To separate a Data Frame for a particular value of a quantitative variable: If you want to look at the dot plot of a quantitative variable based on qualitative variable values. As an example, if you want to take a Data Frame that has cancer types and the survival times for the types, you can do: gf_point(quantitative variable~qualitative variable, data=Data Frame)

Cancer <- read.csv("https://krkozak.github.io/MAT160/cancer.csv")
gf_point(survival~organ, data=Cancer)

Jitter Plot

A jitter plot will move the dots so they are not on top of each other and you can see the data for each individual. To create a jitter plot of a quantitative variable separated by a qualitative variable, use: gf_jitter(quantitative variable~qualitative variable, data=Data Frame)

As an example

gf_jitter(survival~organ, data=Cancer)

Density plots and Histograms seperated by a qalitative variable.

If you want to create a density plot separated by a quantitative variable, you use the command:

gf_density(~variable|quantitative variable, data=Data Frame).

Note between variable and quantitative variable is the character |, which is a vertical line.

As an example:

gf_density(~survival|organ, data=Cancer)

Creating a new Data Frame that is filtered for a particular qualitative variable

If you want to have a Data Frame that contains only some of the Data Frame in the original Data Frame, but only for particular values of a quantitative variable, you can create a new Data Frame.This is called filtering for a particular variable value and then calling it a new Data Frame using the command newData Frame<- Data Frame%>% Filter(variable == “value”)

As an example if using the Data Frame Cancer <-read.csv(“https://krkozak.github.io/MAT160/cancer.csv”)

And you want to filter it for breast cancer, then it would be

Breast <-
Cancer%>%
filter(organ == "Breast")

Now you can do all commands on this new Data Frame that contains just breast cancer values.

gf_histogram(~survival, data=Breast, title="Survival Time for Breast Cancer", xlab="survival time (days)", fill="blue")

Violin Plots

Another useful plot is a violin plot. This will also show you how the data is distributed. To create a violin plot, use

gf_violin(quantitative variable~ qualitative variable, data=Data Frame) As an example:

gf_violin(survival~organ, data=Cancer)

Scatter Plot

To create a scatter plot:

gf_point(dependent variable ~ independent variable, data=Data Frame) Example: create a scatter plot of temperature vs elevation. Include title and x and y labels.

Elevation <-read.csv("https://krkozak.github.io/MAT160/elevation_temperature.txt")
gf_point(Temperature ~ Elevation, data=Elevation, title='Temperature vs Elevation', xlab="Elevation (ft)", ylab="temperature (degree F)")

Time Series Plot

To create a time-series graph where you start the y axis at 0: gf_line( dependent variable~ independent variable, data=Data Frame, ylim=c(0,number over max), title=“type in a title you want”, xlab=“type in a label for the horizontal axis”, ylab=“type in a label for the vertical axis”)

The ylim=c(0,number over max y) lets you set the limits on the y-axis.

Example: Create a time series plot for number of rooms rented by day.

Dates <-read.csv("https://krkozak.github.io/MAT160/hotel_rooms.txt")
gf_line(Rooms~ Time, data=Dates, title="Number of Rooms versus Day", ylim=c(0,90), xlab=" Time (Days since day 1)", ylab="Number of Rooms")