Chapter 2 Graphing functions In this lesson, you will learn how to use R to graph mathematical functions.
It’s important to point out at the beginning that much of what you will be learning – much of what will be new to you here – actually has to do with the mathematical structure of functions and not R.
2.1 Graphing mathematical functions Recall that a function is a transformation from an input to an output. Functions are used to represent the relationship between quantities. In evaluating a function, you specify what the input will be and the function translates it into the output.
In much of the traditional mathematics notation you have used, functions have names like f or g or y, and the input is notated as x. Other letters are used to represent parameters. For instance, it’s common to write the equation of a line this way
y=mx+b
In order to apply mathematical concepts to realistic settings in the world, it’s important to recognize three things that a notation like y=mx+b does not support well:
For these reasons, the notation that you will use needs to be more general than the notation commonly used in high-school algebra. At first, this will seem odd, but the oddness doesn’t have to do so much with the fact that the notation is used by the computer so much as for the mathematical reasons given above.
But there is one aspect of the notation that stems directly from the use of the keyboard to communicate with the computer. In writing mathematical operations, you’ll use expressions like a * b and 2 ^ n and a / b rather than the traditional ab or 2n or a/b, and you will use parentheses both for grouping expressions and for applying functions to their inputs.
In plotting a function, you need to specify several things:
library(mosaicCalc)
## Loading required package: mosaic
## Registered S3 method overwritten by 'mosaic':
## method from
## fortify.SpatialPolygonsDataFrame ggplot2
##
## The 'mosaic' package masks several functions from core packages in order to add
## additional features. The original behavior of these functions should not be affected by this.
##
## Attaching package: 'mosaic'
## The following objects are masked from 'package:dplyr':
##
## count, do, tally
## The following object is masked from 'package:Matrix':
##
## mean
## The following object is masked from 'package:ggplot2':
##
## stat
## The following objects are masked from 'package:stats':
##
## binom.test, cor, cor.test, cov, fivenum, IQR, median, prop.test,
## quantile, sd, t.test, var
## The following objects are masked from 'package:base':
##
## max, mean, min, prod, range, sample, sum
## Loading required package: mosaicCore
##
## Attaching package: 'mosaicCore'
## The following objects are masked from 'package:dplyr':
##
## count, tally
##
## Attaching package: 'mosaicCalc'
## The following object is masked from 'package:stats':
##
## D
slice_plot(3 * x - 2 ~ x, domain(x = range(0, 10)))
Often, it’s natural to write such relationships with the parameters
represented by symbols. (This can help you remember which parameter is
which, e.g., which is the slope and which is the intercept. When you do
this, remember to give a specific numerical value for the parameters,
like this:
library(mosaicCalc)
m = -3
b = -2
slice_plot(m * x + b ~ x, domain(x = range(0, 10)))
Try these examples:
library(mosaicCalc)
A = 100
slice_plot( A * x ^ 2 ~ x, domain(x = range(-2, 3)))
A = 5
slice_plot( A * x ^ 2 ~ x, domain(x = range(0, 3)), color="red" )
slice_plot( cos(t) ~ t, domain(t = range(0,4*pi) ))
You can use makeFun( ) to give a name to the function. For instance:
library(mosaicCalc)
g <- makeFun(2*x^2 - 5*x + 2 ~ x)
slice_plot(g(x) ~ x , domain(x = range(-2, 2)))
Once the function is named, you can evaluate it by giving an input. For
instance:
g(x = 2)
## [1] 0
g(x = 5)
## [1] 27
Of course, you can also construct new expressions from the function you have created. Try this somewhat complicated expression:
library(mosaicCalc)
slice_plot(sqrt(abs(g(x))) ~ x, domain(x = range(-5,5)))
2.1.1 Exercises 2.1.1.1 Exercise 1 Try out this command:
library(mosaicCalc)
x <- 10
slice_plot(A * x ^ 2 ~ A, domain(A = range(-2, 3)))
Explain why the graph doesn’t look like a parabola, even though it’s a
graph of Ax2.
ANSWER: Notice that the input to the function is A, not x. The value of x has been set to 10 — the graph is being made over the range of A from −2 to 3.
2.1.1.2 Exercise 2 Translate each of these expressions in traditional math notation into a plot. Hand in the command that you gave to make the plot (not the plot itself).
ANSWER:
library(mosaicCalc)
slice_plot( 4 * x - 7 ~ x, domain(x = range(0, 10) ))
b. cos 5x in the window x from −1 to 1.
ANSWER:
library(mosaicCalc)
slice_plot( cos(5 * x) ~ x, domain(x = range(-1, 1)))
c. cos 2t in the window t from 0 to 5.
ANSWER:
library(mosaicCalc)
slice_plot( cos(2 * t) ~ t, domain(t = range(0,5) ))
d. √t cos 5t in the window t from 0 to 5. (Hint: √(t)is sqrt(t).)
ANSWER:
library(mosaicCalc)
slice_plot( sqrt(t) * cos(5 * t) ~ t, domain(t = range(0, 5) ))
2.1.1.3 Exercise 3 Find the value of each of the functions above at x =
10.543 or at t = 10.543. (Hint: Give the function a name and compute the
value using an expression like g(x = 10.543) or f(t = 10.543).)
Pick the closest numerical value
2.1.1.4 Exercise 4 Reproduce each of these plots. Hand in the command you used to make the identical plot: a.
library(mosaicCalc)
slice_plot(2*x - 3 ~ x, domain(x = range(0, 5)))
b.
library(mosaicCalc)
slice_plot(t^2 ~ t, domain(t = range(-2, 2)))
2.1.1.5 Exercise 5 What happens when you use a symbolic parameter (e.g., m in m*x + b ~ x, but try to make a plot without selecting a specific numerical value for the parameter?
ANSWER: You get an error message saying that the “object is not found”.
2.1.1.6 Exercise 6 What happens when you don’t specify a range for an input, but just a single number, as in the second of these two commands:
library(mosaicCalc)
slice_plot(3 * x ~ x, domain(x= range(1,4))
slice_plot(3 * x ~ x, domain(x = 14))
slice_plot(3 * x ~ x)
Give a description of what happened and speculate on why.
ANSWER: If no domain is specified or if the domain has only one number rather than a range, slice_plot() an error message is generated.
2.2 Making scatterplots Often, the mathematical models that you will create will be motivated by data. For a deep appreciation of the relationship between data and models, you will want to study statistical modeling. Here, though, we will take a first cut at the subject in the form of curve fitting, the process of setting parameters of a mathematical function to make the function a close representation of some data.
This means that you will have to learn something about how to access data in computer files, how data are stored, and how to visualize the data. Fortunately, R and the mosaic package make this straightforward.
The data files you will be using are stored as spreadsheets on the Internet. Typically, the spreadsheet will have multiple variables; each variable is stored as one column. (The rows are “cases,” sometimes called “data points.”) To read the data in to R, you need to know the name of the file and its location. Often, the location will be an address on the Internet.
Here, we’ll work with “Income-Housing.csv”, which is located at “http://www.mosaic-web.org/go/datasets/Income-Housing.csv”. This file gives information from a survey on housing conditions for people in different income brackets in the US. (Source: Susan E. Mayer (1997) What money can’t buy: Family income and children’s life chances Harvard Univ. Press p. 102.)
Here’s how to read it into R:
Housing = read.csv("http://www.mosaic-web.org/go/datasets/Income-Housing.csv")
here are two important things to notice about the above statement. First, the read.csv() function is returning a value that is being stored in an object called housing. The choice of Housing as a name is arbitrary; you could have stored it as x or Equador or whatever. It’s convenient to pick names that help you remember what’s being stored where.
Second, the name “http://www.mosaic-web.org/go/datasets/Income-Housing.csv”
is surrounded by quotation marks. These are the single-character double
quotes, that is, ” and not repeated single quotes ’ ’ or the backquote
. Whenever you are reading data from a file, the name of the file should be in such single-character double quotes. That way, R knows to treat the characters literally and not as the name of an object such ashousing.
Once the data are read in, you can look at the data just by typing the name of the object (without quotes!) that is holding the data. For instance,
Housing
## Income IncomePercentile CrimeProblem AbandonedBuildings IncompleteBathroom
## 1 3914 5 39.6 12.6 2.6
## 2 10817 15 32.4 10.0 3.3
## 3 21097 30 26.7 7.1 2.3
## 4 34548 50 23.9 4.1 2.1
## 5 51941 70 21.4 2.3 2.4
## 6 72079 90 19.9 1.2 2.0
## NoCentralHeat ExposedWires AirConditioning TwoBathrooms MotorVehicle
## 1 32.3 5.5 52.3 13.9 57.3
## 2 34.7 5.0 55.4 16.9 82.1
## 3 28.1 2.4 61.7 24.8 91.7
## 4 21.4 2.1 69.8 39.6 97.0
## 5 14.9 1.4 73.9 51.2 98.0
## 6 9.6 1.0 76.7 73.2 99.0
## TwoVehicles ClothesWasher ClothesDryer Dishwasher Telephone
## 1 17.3 57.8 37.5 16.5 68.7
## 2 34.3 61.4 38.0 16.0 79.7
## 3 56.4 78.6 62.0 25.8 90.8
## 4 75.3 84.4 75.2 41.6 96.5
## 5 86.6 92.8 88.9 58.2 98.3
## 6 92.9 97.1 95.6 79.7 99.5
## DoctorVisitsUnder7 DoctorVisits7To18 NoDoctorVisitUnder7 NoDoctorVisit7To18
## 1 3.6 2.6 13.7 31.2
## 2 3.7 2.6 14.9 32.0
## 3 3.6 2.1 13.8 31.4
## 4 4.0 2.3 10.4 27.3
## 5 4.0 2.5 7.7 23.9
## 6 4.7 3.1 5.3 17.5
All of the variables in the data set will be shown (although just four of them are printed here).
You can see the names of all of the variables in a compact format with the names( ) command:
names(Housing)
## [1] "Income" "IncomePercentile" "CrimeProblem"
## [4] "AbandonedBuildings" "IncompleteBathroom" "NoCentralHeat"
## [7] "ExposedWires" "AirConditioning" "TwoBathrooms"
## [10] "MotorVehicle" "TwoVehicles" "ClothesWasher"
## [13] "ClothesDryer" "Dishwasher" "Telephone"
## [16] "DoctorVisitsUnder7" "DoctorVisits7To18" "NoDoctorVisitUnder7"
## [19] "NoDoctorVisit7To18"
When you want to access one of the variables, you give the name of the whole data set followed by the name of the variable, with the two names separated by a $ sign, like this:
Housing$Income
## [1] 3914 10817 21097 34548 51941 72079
Housing$CrimeProblem
## [1] 39.6 32.4 26.7 23.9 21.4 19.9
Even though the output from names( ) shows the variable names in quotation marks, you won’t use quotations around the variable names.
Spelling and capitalization are important. If you make a mistake, no matter how trifling to a human reader, R will not figure out what you want. For instance, here’s a misspelling of a variable name, which results in nothing (NULL) being returned.
Housing$crim
## NULL
Usually the most informative presentation of data is graphical. One of the most familiar graphical forms is the scatter-plot, a format in which each “case” or “data point” is plotted as a dot at the coordinate location given by two variables. For instance, here’s a scatter plot of the fraction of household that regard their neighborhood as having a crime problem, versus the median income in their bracket.
gf_point(CrimeProblem ~ Income, data = Housing )
The R statement closely follows the English equivalent: “plot as points
CrimeProblem versus (or, as a function of) Income, using the data from
the housing object.
Graphics are constructed in layers. If you want to plot a mathematical function over the data, you’ll need to use a plotting function to make another layer. Then, to display the two layers in the same plot, connect them with the %>% symbol (called a “pipe”). Note that %>% can never go at the start of a new line.
gf_point(
CrimeProblem ~ Income, data=Housing ) %>%
slice_plot(
40 - Income/2000 ~ Income, color = "red")
The mathematical function drawn is not a very good match to the data,
but this reading is about how to draw graphs, not how to choose a family
of functions or find parameters!
If, when plotting your data, you prefer to set the limits of the axes to something of your own choice, you can do this. For instance:
gf_point(
CrimeProblem ~ Income, data = Housing) %>%
slice_plot(
40 - Income / 2000 ~ Income, color = "blue") %>%
gf_lims(
x = range(0,100000),
y=range(0,50))
Properly made scientific graphics should have informative axis names.
You can set the axis names directly using gf_labs:
gf_point(
CrimeProblem ~ Income, data=Housing) %>%
gf_labs(x= "Income Bracket ($US per household)/year",
y = "Fraction of Households",
main = "Crime Problem") %>%
gf_lims(x = range(0,100000), y = range(0,50))
Notice the use of double-quotes to delimit the character strings, and
how x and y are being used to refer to the horizontal and vertical axes
respectively.
2.2.1 Exercises 2.2.1.1 Exercise 1 Make each of these plots:
s = read.csv(
"http://www.mosaic-web.org/go/datasets/stan-data.csv")
gf_point(temp ~ time, data=s)
- Describe in everyday English the pattern you see in coffee cooling: b.
Here’s a record of the tide level in Hawaii over about 100 hours:
h = read.csv(
"http://www.mosaic-web.org/go/datasets/hawaii.csv")
gf_point(water ~ time, data=h)
- Describe in everyday English the pattern you see in the tide data:
2.2.1.2 Exercise 2 Construct the R commands to duplicate each of these
plots. Hand in your commands (not the plot):
Utilities <- read.csv(
"http://www.mosaic-web.org/go/datasets/utilities.csv")
gf_point(
temp ~ month, data=Utilities) %>%
gf_labs(x = "Month (Jan=1, Dec=12)",
y = "Temperature (F)",
main = "Ave. Monthly Temp.")
b. From the “utilities.csv” data file, make this plot of household
monthly bill for natural gas versus average temperature. The line has
slope −5 USD/degree and intercept 300 USD. ANSWER:
gf_point(
gasbill ~ temp, data=Utilities) %>%
gf_labs(xlab = "Temperature (F)",
ylab = "Expenditures ($US)",
main = "Natural Gas Use") %>%
slice_plot( 300 - 5*temp ~ temp, color="blue")
2.3 Graphing functions of two variables You’ve already seen how to plot a graph of a function of one variable, for instance:
slice_plot(
95 - 73*exp(-.2*t) ~ t,
domain(t = 0:20) )
This lesson is about plotting functions of two variables. For the most
part, the format used will be a contour plot.
You use contour_plot() to plot with two input variables. You need to list the two variables on the right of the + sign, and you need to give a range for each of the variables. For example:
contour_plot(
sin(2*pi*t/10)*exp(-.2*x) ~ t & x,
domain(t = range(0,20), x = range(0,10)))
Each of the contours is labeled, and by default the plot is filled with
color to help guide the eye. If you prefer just to see the contours,
without the color fill, use the tile=FALSE argument.
contour_plot(
sin(2*pi*t/10)*exp(-.2*x) ~ t & x,
domain(t=0:20, x=0:10))
Occasionally, people want to see the function as a surface, plotted in 3
dimensions. You can get the computer to display a perspective
3-dimensional plot by using the interactive_plot() function. As you’ll
see by mousing around the plot, it is interactive.
interactive_plot(
sin(2*pi*t/10)*exp(-.5*x) ~ t & x,
domain(t = 0:20, x = 0:10))
It’s very hard to read quantitative values from a surface plot — the contour plots are much more useful for that. On the other hand, people seem to have a strong intuition about shapes of surfaces. Being able to translate in your mind from contours to surfaces (and vice versa) is a valuable skill.
To create a function that you can evaluate numerically, construct the function with makeFun(). For example:
g <- makeFun(
sin(2*pi*t/10)*exp(-.2*x) ~ t & x)
contour_plot(
g(t, x) ~ t + x,
domain(t=0:20, x=0:10))
g(x = 4, t = 7)
## [1] -0.4273372
Make sure to name the arguments explicitly when inputting values. That way you will be sure that you haven’t reversed them by accident. For instance, note that this statement gives a different value than the above:
g(4, 7)
## [1] 0.1449461
The reason for the discrepancy is that when the arguments are given without names, it’s the position in the argument sequence that matters. So, in the above, 4 is being used for the value of t and 7 for the value of x. It’s very easy to be confused by this situation, so a good practice is to identify the arguments explicitly by name:
g(t = 7, x = 4)
## [1] -0.4273372