This file will show you how to create graphs for qualitative and quantitative data.
First read in the Data Frame. There are many Data Frames that are available to you to read into R Studio. Just Copy and paste the Data Frame command from Canvas into R Studio. Here is an example of one we will use in this help document. The command head(Data Frame) allows you to see the name of the variables and some of the first few rows of the Data Frame.
Favcolor <- read.csv("https://krkozak.github.io/MAT160/fav_color_data.csv")
head(Favcolor)
## color X X.1 X.2 X.3 X.4 X.5
## 1 red NA NA NA NA NA NA
## 2 blue NA NA NA NA NA NA
## 3 blue NA NA NA NA NA NA
## 4 other NA NA NA NA NA NA
## 5 green NA NA NA NA NA NA
## 6 red NA NA NA NA NA NA
To create a barplot: gf_bar(~variable, data=Data Frame, …)
All bar graphs are created with the same command. The … means that there are options that you can add. You replace the word “variable” with the variable in your Data Frame that you are interested in, and you replace the word “Data Frame” with the name of the Data Frame that you are working with. Using the Data Frame Favcolor, you can create a bar graph using the command
gf_bar(~color, data=Favcolor)
This command creates a basic graph. If you want to add a title, you want to customize the labels on the x and y axis, and you don’t like the black of the graph, then there are commands that allows you title the graph, pick new labels, and pick a color. To add a title: title=“title you want”), label on the x axis: xlab=“label you want”, label on the y-axis: ylab=“label you want”, and a color: fill=“color you want the bars to be”. The following graph was created using all of these options. You don’t need to do all. As an example, R automatically labels the y-axis as count, so you don’t need that option.
gf_bar(~color, data=Favcolor, title="Favorite color", xlab="Favorite color", ylab="Count", fill="blue")
A dot plot is a graph that is similar to a bar graph, but you can see the actual data values. A dot plot is gf_dotplot(~count, data=Data frame, title=“title you want”).
An example, for the favorite color Data Frame.
gf_dotplot(~color, data=Favcolor, title="Favorite Color")
## `stat_bindot()` using `bins = 30`. Pick better value with `binwidth`.
There are many ways in R to access a Data Frame. One way is to type the name of the Data Frame and then before pressing enter type %>%. %>% in a way links this line to the next line. The %>% is called a piping command and it allows you to use the first line in the second line. Then you can create the type of graph you want. If you want to put two graphs on the same plot, you can put %>% at the end of the second line and then type the name of the second graph you want on the next line. It is most useful for overlaying graphs. Another way is to put data=Data Frame in the command line. The third way is to use Data Frame$variable in the command sign. We will use data=Data Frame in the command line.
You can also do a dot plot for quantitative data. Again the command is gf_dotplot(variable, data=Data Frame, title=”name of the graph”)
Recurance <- read.csv("https://krkozak.github.io/MAT160/cancer_recurance.csv")
head(Recurance)
## tummor
## 1 19
## 2 18
## 3 17
## 4 1
## 5 21
## 6 22
gf_dotplot(~tummor, data=Recurance, title="Recurance of tummor after chemotherapy")
## `stat_bindot()` using `bins = 30`. Pick better value with `binwidth`.
To create a histogram: gf_histogram(variable, data=Data Frame, title=“type the title you want”, xlab=“type the label you want for the horizontal axis”) – produces histogram.
gf_histogram(~tummor, data=Recurance, title="Recurance of tummor after chemmotherapy")
To create a frequency polygon: gf_freqpoly(variable, data=Data Frame, title=”type the title you want”, xlab=”type the label you want for the horizontal axis”)
gf_freqpoly(~tummor, data=Recurance, title="Recurance of tummor after chemotheary")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
To produce a density plot: gf_density(variable, data=Data Frame, title=“type the title you want”, xlab=“type the label you want for the horizontal axis”) As an example
gf_density(~tummor, data=Recurance, title="Recurance of tummor after chemmotherapy")
You can create a density plot with a histogram plot over it. This is the advantage of using the %>% notation. As an example:
gf_density(~tummor, data=Recurance, title="Recurance of tummor after chemmotherapy")%>%
gf_histogram(~tummor, data=Recurance, title="Recurance of tummor after chemmotherapy")
To separate a Data Frame for a particular value of a quantitative variable: If you want to look at the dot plot of a quantitative variable based on qualitative variable values. As an example, if you want to take a Data Frame that has cancer types and the survival times for the types, you can do: gf_point(quantitative variable~qualitative variable, data=Data Frame)
Cancer <- read.csv("https://krkozak.github.io/MAT160/cancer.csv")
gf_point(survival~organ, data=Cancer)
A jitter plot will move the dots so they are not on top of each other and you can see the data for each individual. To create a jitter plot of a quantitative variable separated by a qualitative variable, use: gf_jitter(quantitative variable~qualitative variable, data=Data Frame)
As an example
gf_jitter(survival~organ, data=Cancer)
If you want to create a density plot separated by a quantitative variable, you use the command:
gf_density(~variable|quantitative variable, data=Data Frame).
Note between variable and quantitative variable is the character |, which is a vertical line.
As an example:
gf_density(~survival|organ, data=Cancer)
If you want to have a Data Frame that contains only some of the Data Frame in the original Data Frame, but only for particular values of a quantitative variable, you can create a new Data Frame.This is called filtering for a particular variable value and then calling it a new Data Frame using the command newData Frame<- Data Frame%>% Filter(variable == “value”)
As an example if using the Data Frame Cancer <-read.csv(“https://krkozak.github.io/MAT160/cancer.csv”)
And you want to filter it for breast cancer, then it would be
Breast <-
Cancer%>%
filter(organ == "Breast")
Now you can do all commands on this new Data Frame that contains just breast cancer values.
gf_histogram(~survival, data=Breast, title="Survival Time for Breast Cancer", xlab="survival time (days)", fill="blue")
Another useful plot is a violin plot. This will also show you how the data is distributed. To create a violin plot, use
gf_violin(quantitative variable~ qualitative variable, data=Data Frame) As an example:
gf_violin(survival~organ, data=Cancer)
To create a scatter plot:
gf_point(dependent variable ~ independent variable, data=Data Frame) Example: create a scatter plot of temperature vs elevation. Include title and x and y labels.
Elevation <-read.csv("https://krkozak.github.io/MAT160/elevation_temperature.txt")
gf_point(Temperature ~ Elevation, data=Elevation, title='Temperature vs Elevation', xlab="Elevation (ft)", ylab="temperature (degree F)")
To create a time-series graph where you start the y axis at 0: gf_line( dependent variable~ independent variable, data=Data Frame, ylim=c(0,number over max), title=“type in a title you want”, xlab=“type in a label for the horizontal axis”, ylab=“type in a label for the vertical axis”)
The ylim=c(0,number over max y) lets you set the limits on the y-axis.
Example: Create a time series plot for number of rooms rented by day.
Dates <-read.csv("https://krkozak.github.io/MAT160/hotel_rooms.txt")
gf_line(Rooms~ Time, data=Dates, title="Number of Rooms versus Day", ylim=c(0,90), xlab=" Time (Days since day 1)", ylab="Number of Rooms")