Favcolor <- read.csv("https://krkozak.github.io/MAT160/fav_color_data.csv")
Recurance <- read.csv("https://krkozak.github.io/MAT160/cancer_recurance.csv")
Cancer <- read.csv("https://krkozak.github.io/MAT160/cancer.csv")
Elevation <-read.csv("https://krkozak.github.io/MAT160/elevation_temperature.txt")
Dates <-read.csv("https://krkozak.github.io/MAT160/hotel_rooms.txt")
This file will show you how to create graphs for categorical and quantitative data.
First read in the Data Frame. There are many Data Frames that are available to you to read into r Studio. See r Studio Basics for how best to do this. The command head(Data_Frame) or Glimpse allows you to see the name of the variables and some of the first six rows of the Data_Frame.
head(Favcolor)
## color X X.1 X.2 X.3 X.4 X.5
## 1 red NA NA NA NA NA NA
## 2 blue NA NA NA NA NA NA
## 3 blue NA NA NA NA NA NA
## 4 other NA NA NA NA NA NA
## 5 green NA NA NA NA NA NA
## 6 red NA NA NA NA NA NA
glimpse(Favcolor)
## Rows: 27
## Columns: 7
## $ color <chr> "red", "blue", "blue", "other", "green", "red", "yellow", "red",…
## $ X <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ X.1 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ X.2 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ X.3 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ X.4 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ X.5 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
To create a barplot: gf_bar(~variable, data=Data Frame, …)
All bar graphs are created with the same command. The … means that there are options that you can add. You replace the word “variable” with the variable in your Data Frame that you are interested in, and you replace the word “Data Frame” with the name of the Data Frame that you are working with. Using the Data Frame Favcolor, you can create a bar graph using the command
gf_bar(~color, data=Favcolor)
This command creates a basic graph. If you want to add a title, you want to customize the labels on the x and y axis, and you don’t like the black of the graph, then there are commands that allows you title the graph, pick new labels, and pick a color. To add a title: title=“title you want”), label on the x axis: xlab=“label you want”, label on the y-axis: ylab=“label you want”, and a color: fill=“color you want the bars to be”. The following graph was created using all of these options. You don’t need to do all. As an example, r automatically labels the y-axis as count, so you don’t need that option if you like the label of count.
gf_bar(~color, data=Favcolor, title="Favorite color of CCC Statistics Students", xlab="Favorite color", ylab="Number of Students", fill="blue")
A dot plot is a graph that is similar to a bar graph, but you can see the actual data values. A dot plot is gf_dotplot(~count, data=Data frame, title=“title you want”).
An example, for the favorite color Data Frame.
gf_dotplot(~color, data=Favcolor, title="Favorite Color of CCC Student", xlab="favorite Color", ylab="Number of Students", fill = "blue")
## Bin width defaults to 1/30 of the range of the data. Pick better value with
## `binwidth`.
Dot plots are not that useful and they are not suggested.
To create a histogram: gf_histogram(~variable, data=Data Frame, title=“type the title you want”, xlab=“type the label you want for the horizontal axis”, ylab=“type the label you want for the vertical axis”) – produces a histogram.
gf_histogram(~tummor, data=Recurance, title="Recurance of tummor after chemmotherapy", xlab="Time to Recurance (months)", ylab="Number of People", fill = "blue")
To produce a density plot: gf_density(variable, data=Data Frame, title=“type the title you want”, xlab=“type the label you want for the horizontal axis”) As an example
gf_density(~tummor, data=Recurance, title="Recurance of tummor after chemmotherapy", xlab="Time to Recurance (months)", fill = "blue")
You can also do a dot plot for categorical data. Again the command is gf_dotplot(variable, data=Data Frame, title=“name of the graph”)
gf_dotplot(~tummor, data=Recurance, title="Recurance of tummor after chemotherapy", xlab="Time to Recurance (months)", ylab="Number of People", binwidth = 3, fill="blue")
There are many ways in r to access a Data Frame.
One way is to type the name of the Data Frame and then before pressing enter type |> or %>%. These symbols are called piping commands, and it links lines together. This allows you to use the first line in the second line. Then you can create the type of graph you want. This method accesses the variable names in the commands since r knows which data frame is being used.
Another way is to put data=Data_Frame in the command line.
The third way is to use Data_Frame$variable in the command sign.
The examples in this help file uses data=Data_Frame in the command line.
If you want to put two graphs on the same plot, you can put |> at the end of the first line and then on the second line type the name of the next graph you want to create. It is most useful for overlaying graphs. You can create a density plot with a histogram plot over it. This is the advantage of using the piping notation. As an example:
gf_density(~tummor, data=Recurance, title="Recurance of tummor after chemmotherapy", xlab="Time to Reaccurance (months)", fill="green")|>
gf_histogram(~tummor, data=Recurance, fill="blue")
To facet (separate) a Data Frame for a particular value of a qualitative variable: If you want to look at the density plot of a quantitative variable based on categoprical variable values. As an example, if you want to take a Data Frame that has cancer types and the survival times for the types of cancer, you can do: gf_density(~quantitative_variable|categorical_variable, data=Data_Frame)
or
gf_density(~qunatitative_variable, data=Data_Frame, fill=~categorical_variable)
Note between quantatitve_variable and categorical_variable is the character |, which is a vertical line.
As an example, this graph is has been faceted for the type of organ the cancer is in.
gf_density(~survival|organ, data=Cancer, title="Recurance of tummor after chemmotherapy", xlab="Time to Reaccurance (months)", fill = ~organ)
gf_density(~survival, data=Cancer, title="Recurance of tummor after chemmotherapy", xlab="Time to Reaccurance (months)", fill=~organ)
A Jitter plot will move the dots so they are not on top of each other and you can see the data for each individual. To create a jitter plot of a quantitative variable separated by a categorical variable, use: gf_jitter(quantitative variable~categorical variable, data=Data Frame)
As an example
gf_jitter(survival~organ, data=Cancer, title="Recurance of tummor after chemmotherapy", xlab="Time to Reaccurance (months)", color = "blue")
Creating a new Data Frame that is filtered for a particular qualitative variable. This means you want to have a Data Frame that contains only some of the units of observations in the original Data Frame. To do this, you can create a new Data Frame.This is called filtering for a particular variable value and then calling it a new Data Frame using the command
newData_Frame<-
Data_Frame |>
Filter(variable == “value”)
As an example if using the Data Frame Cancer
And you want to filter it for breast cancer, then it would be
Breast <-
Cancer|>
filter(organ == "Breast")
Now you can do all commands on this new Data Frame that contains just breast cancer values.
gf_density(~survival, data=Breast, title="Survival Time for Breast Cancer", xlab="survival time (days)", fill="blue")
To create a scatter plot:
gf_point(response_variable ~ explanatory_variable, data=Data Frame)
Example: create a scatter plot of temperature vs elevation. Include title and x and y labels.
gf_point(Temperature ~ Elevation, data=Elevation, title='Temperature vs Elevation', xlab="Elevation (ft)", ylab="temperature (degree F)", color="blue")
To create a time-series graph: gf_line( dependent variable~ independent variable, data=Data Frame, title=“type in a title you want”, xlab=“type in a label for the horizontal axis”, ylab=“type in a label for the vertical axis”)
Example: Create a time series plot for number of rooms rented by day. The data is in dates.
head(Dates)
## Day Time Rooms
## 1 Mon. 1 1 65
## 2 Tues. 2 2 67
## 3 Wed. 3 3 63
## 4 Thurs. 4 4 54
## 5 Fri. 5 5 68
## 6 Sat. 6 6 72
gf_line(Rooms~ Time, data=Dates, title="Number of Rooms versus Day", xlab=" Time (Days since day 1)", ylab="Number of Rooms", color="blue")
If you want to change the limits on the y-axis, use this as part of your command: ylim=c(lower y value,upper y value). Similarly, one can change the limits on the x-axis using xlim=c(lower x value, upper x value).