Favcolor <- read.csv("https://krkozak.github.io/MAT160/fav_color_data.csv")
Recurance <- read.csv("https://krkozak.github.io/MAT160/cancer_recurance.csv")
Cancer <- read.csv("https://krkozak.github.io/MAT160/cancer.csv")
Elevation <-read.csv("https://krkozak.github.io/MAT160/elevation_temperature.txt")
Dates <-read.csv("https://krkozak.github.io/MAT160/hotel_rooms.txt")

Graphing with r

This file will show you how to create graphs for categorical and quantitative data.

Head or Glimpse command

First read in the Data Frame. There are many Data Frames that are available to you to read into r Studio. See r Studio Basics for how best to do this. The command head(Data_Frame) or Glimpse allows you to see the name of the variables and some of the first six rows of the Data_Frame.

head(Favcolor)
##   color  X X.1 X.2 X.3 X.4 X.5
## 1   red NA  NA  NA  NA  NA  NA
## 2  blue NA  NA  NA  NA  NA  NA
## 3  blue NA  NA  NA  NA  NA  NA
## 4 other NA  NA  NA  NA  NA  NA
## 5 green NA  NA  NA  NA  NA  NA
## 6   red NA  NA  NA  NA  NA  NA
glimpse(Favcolor)
## Rows: 27
## Columns: 7
## $ color <chr> "red", "blue", "blue", "other", "green", "red", "yellow", "red",…
## $ X     <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ X.1   <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ X.2   <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ X.3   <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ X.4   <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ X.5   <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …

Categorical Plots

Bar Plot

To create a barplot: gf_bar(~variable, data=Data Frame, …)

All bar graphs are created with the same command. The … means that there are options that you can add. You replace the word “variable” with the variable in your Data Frame that you are interested in, and you replace the word “Data Frame” with the name of the Data Frame that you are working with. Using the Data Frame Favcolor, you can create a bar graph using the command

gf_bar(~color, data=Favcolor)

bar graph with no fill or title

This command creates a basic graph. If you want to add a title, you want to customize the labels on the x and y axis, and you don’t like the black of the graph, then there are commands that allows you title the graph, pick new labels, and pick a color. To add a title: title=“title you want”), label on the x axis: xlab=“label you want”, label on the y-axis: ylab=“label you want”, and a color: fill=“color you want the bars to be”. The following graph was created using all of these options. You don’t need to do all. As an example, r automatically labels the y-axis as count, so you don’t need that option if you like the label of count.

gf_bar(~color, data=Favcolor, title="Favorite color of CCC Statistics Students", xlab="Favorite color", ylab="Number of Students", fill="blue")

bar graph with fill, title, and axes labels

Dot Plot

A dot plot is a graph that is similar to a bar graph, but you can see the actual data values. A dot plot is gf_dotplot(~count, data=Data frame, title=“title you want”).

An example, for the favorite color Data Frame.

gf_dotplot(~color, data=Favcolor, title="Favorite Color of CCC Student", xlab="favorite Color", ylab="Number of Students", fill = "blue")
## Bin width defaults to 1/30 of the range of the data. Pick better value with
## `binwidth`.

dot plot with fill and title

Dot plots are not that useful and they are not suggested.

Quantitative data

Histogram

To create a histogram: gf_histogram(~variable, data=Data Frame, title=“type the title you want”, xlab=“type the label you want for the horizontal axis”, ylab=“type the label you want for the vertical axis”) – produces a histogram.

gf_histogram(~tummor, data=Recurance, title="Recurance of tummor after chemmotherapy", xlab="Time to Recurance (months)", ylab="Number of People", fill = "blue")

Histogram with file and title

Density Plot:

To produce a density plot: gf_density(variable, data=Data Frame, title=“type the title you want”, xlab=“type the label you want for the horizontal axis”) As an example

gf_density(~tummor, data=Recurance, title="Recurance of tummor after chemmotherapy", xlab="Time to Recurance (months)", fill = "blue")

density graph with title, fill, and labels

Dot plot

You can also do a dot plot for categorical data. Again the command is gf_dotplot(variable, data=Data Frame, title=“name of the graph”)

gf_dotplot(~tummor, data=Recurance, title="Recurance of tummor after chemotherapy", xlab="Time to Recurance (months)", ylab="Number of People", binwidth = 3, fill="blue")

dot plot with title, fill, and label

Accessing Datasets

There are many ways in r to access a Data Frame.

  • One way is to type the name of the Data Frame and then before pressing enter type |> or %>%. These symbols are called piping commands, and it links lines together. This allows you to use the first line in the second line. Then you can create the type of graph you want. This method accesses the variable names in the commands since r knows which data frame is being used.

  • Another way is to put data=Data_Frame in the command line.

  • The third way is to use Data_Frame$variable in the command sign.

  • The examples in this help file uses data=Data_Frame in the command line.

Overlapping two or more graphs

If you want to put two graphs on the same plot, you can put |> at the end of the first line and then on the second line type the name of the next graph you want to create. It is most useful for overlaying graphs. You can create a density plot with a histogram plot over it. This is the advantage of using the piping notation. As an example:

gf_density(~tummor, data=Recurance, title="Recurance of tummor after chemmotherapy", xlab="Time to Reaccurance (months)", fill="green")|> 
gf_histogram(~tummor, data=Recurance, fill="blue")

density plot and histogam plotted together with title and label

Faceting for a qualitative variable:

To facet (separate) a Data Frame for a particular value of a qualitative variable: If you want to look at the density plot of a quantitative variable based on categoprical variable values. As an example, if you want to take a Data Frame that has cancer types and the survival times for the types of cancer, you can do: gf_density(~quantitative_variable|categorical_variable, data=Data_Frame)

or

gf_density(~qunatitative_variable, data=Data_Frame, fill=~categorical_variable)

Note between quantatitve_variable and categorical_variable is the character |, which is a vertical line.

As an example, this graph is has been faceted for the type of organ the cancer is in.

gf_density(~survival|organ, data=Cancer, title="Recurance of tummor after chemmotherapy", xlab="Time to Reaccurance (months)", fill = ~organ)

density plot facetted two different ways

gf_density(~survival, data=Cancer, title="Recurance of tummor after chemmotherapy", xlab="Time to Reaccurance (months)", fill=~organ)

density plot facetted two different ways

Jitter Plot

A Jitter plot will move the dots so they are not on top of each other and you can see the data for each individual. To create a jitter plot of a quantitative variable separated by a categorical variable, use: gf_jitter(quantitative variable~categorical variable, data=Data Frame)

As an example

gf_jitter(survival~organ, data=Cancer, title="Recurance of tummor after chemmotherapy", xlab="Time to Reaccurance (months)", color = "blue")

jitter plot with title, label, and color

Filtering

Creating a new Data Frame that is filtered for a particular qualitative variable. This means you want to have a Data Frame that contains only some of the units of observations in the original Data Frame. To do this, you can create a new Data Frame.This is called filtering for a particular variable value and then calling it a new Data Frame using the command

newData_Frame<-
Data_Frame |>
Filter(variable == “value”)

As an example if using the Data Frame Cancer

And you want to filter it for breast cancer, then it would be

Breast <-
  Cancer|>
  filter(organ == "Breast")

Now you can do all commands on this new Data Frame that contains just breast cancer values.

gf_density(~survival, data=Breast, title="Survival Time for Breast Cancer", xlab="survival time (days)", fill="blue")

density plot witth title, labels, and fill

Scatter Plot

To create a scatter plot:

gf_point(response_variable ~ explanatory_variable, data=Data Frame)

Example: create a scatter plot of temperature vs elevation. Include title and x and y labels.

gf_point(Temperature ~ Elevation, data=Elevation, title='Temperature vs Elevation', xlab="Elevation (ft)", ylab="temperature (degree F)", color="blue")

Scatter plot with title labels and color

Time Series Plot

To create a time-series graph: gf_line( dependent variable~ independent variable, data=Data Frame, title=“type in a title you want”, xlab=“type in a label for the horizontal axis”, ylab=“type in a label for the vertical axis”)

Example: Create a time series plot for number of rooms rented by day. The data is in dates.

head(Dates)
##        Day Time Rooms
## 1   Mon. 1    1    65
## 2  Tues. 2    2    67
## 3   Wed. 3    3    63
## 4 Thurs. 4    4    54
## 5   Fri. 5    5    68
## 6   Sat. 6    6    72
gf_line(Rooms~ Time, data=Dates, title="Number of Rooms versus Day", xlab=" Time (Days since day 1)", ylab="Number of Rooms", color="blue")

time series graph with title, labels, and color

Changing the limits on the axes

If you want to change the limits on the y-axis, use this as part of your command: ylim=c(lower y value,upper y value). Similarly, one can change the limits on the x-axis using xlim=c(lower x value, upper x value).