Mienie Roberts
August 27th, 2018
Week 1 Day 1:
We will be using R and RStudio in this class. Please download both to your computer.
Check out the following opinions: Opinion 1. Opinion 2. Opinion 3. Opinion 4.
R’s capabilities are simply amazing. Check out:
RStudio is a graphical user interface for R which includes a set of integrated tools designed to help you be more productive with R. It includes:
Note: Once R and Rstudio are installed, it is not necessary to start R, because Rstudio will start it
This notebook covers the following topics:
A histogram is a visual representation of the distribution of a dataset. The shape of a histogram allows you to easily see where most of the data is situated. In particular, you can see where the middle of distribution is located, how closely the data lie around the middle, and where possible outliers are to be found. As shown in the figures below, a histogram consists of an x-axis, a y-axis and bars of different heights. The x-axis is divided into intervals (called “bins”), and on each bin a vertical bar is constructed whose height represents the number of data values within that bin. Note that histograms (unlike bar charts) don’t have gaps between the bars (if it looks like there’s a gap, that’s because that particular bin has no data in it).
Example: Suppose you are interested in the distribution of ages for employees working in a certain office. The following data is available: 36, 25, 38, 46, 55, 68, 72, 55, 36, 38, 67, 45, 22, 48, 91, 46, 52, 61, 58, 55. We use R to construct a histogram to represent the distribution of the data.
age<-c(36, 25, 38, 46, 55, 68, 72, 55, 36, 38, 67, 45, 22, 48, 91, 46, 52, 61, 58, 55)
hist(age) The output appears under the ‘Plots’ tab.
Introduction to Geogebra.
GeoGebra is dynamic mathematics software for all levels of education that brings together geometry, algebra, spreadsheets, graphing, statistics and calculus in one easy-to-use package. GeoGebra is a rapidly expanding community of millions of users located in just about every country. GeoGebra has become the leading provider of dynamic mathematics software, supporting science, technology, engineering and mathematics (STEM) education and innovations in teaching and learning worldwide.
Please navigate to the website:
Then click on “Start Graphing”.
The following window will appear:
Figure 1
Create new objects:
You may create new objects (e.g. points, lines, functions) by either using the Graphics Tools provided in the Toolbar, or by entering their equations and coordinates into the Algebra Input and pressing the Enter key.
Draw the graph \(y=x\)
The output should look as follows:
Straight line
Type the following in the input bar:
C = (5, 4)
s = Segment[A, C]
D = Midpoint[s]
d = Circle[D, C]
Intersect[c, d]
Line[C, E]
Exploring Parameters of a Quadratic Polynomial
In this activity you will explore the impact of parameters on a quadratic polynomial. You will experience how GeoGebra could be integrated into a ‘traditional’ teaching environment and used for active and student-centered learning.
Follow the construction steps of this activity:
Task: Which shape does the function graph have?
The output should look as follows:
Parabola
Use the ↑ up and ↓ down arrow keys.
Again click on the polynomial in the Algebra View.
Use the ← left and → right arrow keys.
Task: How does this impact the graph and the equation of the polynomial?
Double-click the equation of the polynomial. Use the keyboard to change the equation to \(f(x) = 3 x^2\).
Task: How does the function graph change?
Repeat changing the equation by typing in different values for the parameter.
Let’s try out a more dynamic way of exploring the impact of a parameter on a polynomial \(f(x) = a x^2\) by using sliders to modify the parameter values.
Create a variable \(a = 1\).
Display the variable a as a slider in the Graphics View. Hint: Click on the symbol next to number a in the Algebra View. Change the slider value by dragging the appearing point on the line with the mouse.
Enter the quadratic polynomial \(f(x) = a * x^2\). Hint: Don’t forget to enter an asterisk * or space between \(a\) and \(x^2\).
Create a slider \(b\) using the Slider tool
Hint: Activate the tool and click on the Graphics View. Use the default settings and click Apply
Hint: GeoGebra will overwrite the old function f with the new definition.
A histogram is a visual representation of the distribution of a dataset. The shape of a histogram allows you to easily see where most of the data is situated. In particular, you can see where the middle of distribution is located, how closely the data lie around the middle, and where possible outliers are to be found. As shown in the figures below, a histogram consists of an x-axis, a y-axis and bars of different heights. The x-axis is divided into intervals (called “bins”), and on each bin a vertical bar is constructed whose height represents the number of data values within that bin. Note that histograms (unlike bar charts) don’t have gaps between the bars (if it looks like there’s a gap, that’s because that particular bin has no data in it).
Example: Suppose you are interested in the distribution of ages for employees working in a certain office. The following data is available: 36, 25, 38, 46, 55, 68, 72, 55, 36, 38, 67, 45, 22, 48, 91, 46, 52, 61, 58, 55. We use R to construct a histogram to represent the distribution of the data.
age<-c(36, 25, 38, 46, 55, 68, 72, 55, 36, 38, 67, 45, 22, 48, 91, 46, 52, 61, 58, 55)
hist(age) The output appears under the ‘Plots’ tab, and looks like this:
The ‘hist’ command has many options that enable the user to change the display. For example, the user can control the number of bins by using the ‘breaks’ option. The title of the histogram by using the ‘main’ option, and the x- and y-axis labels using the ‘xlab’ and ‘ylab’ options.
Example: The following command creates a histogram with 7 nonempty bins, with title “Age of Employees” and x label “Employee ages”:
hist(age,breaks=7,main="Age of Employees",xlab="Employee ages")For other options available to change type ‘help(“hist”)’ in the console and read the ‘Help’ window information.
In this activity you are going to use the following tools, algebraic input and commands. Make sure you know how to use them before you begin with the actual construction.
Slider
Segment
Intersect
Slope
Move
Delete
Enter: \(y = 0.8 x + 3.2\)
This is the equation of a straight line with \(m=0.8\) and \(y-intercept = 3.2\).
Construction steps for straight line
Please download Kaltura if you need software to record your screen. Here are the steps:
Kaltura is a video platform service. You can download it for free from CANVAS.
Log into Canvas
Go to Math 3315
Click on “My media”:
Figure 2.1
Figure 2.2
Download and start using the Kaltura software to record audio and your screen.
You should be able to upload the video to the media library on Canvas and I will grade it from there.
R has strong graphic capabilities. We looked at histograms last week. Today we will consider xy-plots.
The command ‘xyplot’ can be used to plot one variable against another. The command uses the ‘lattice’ package, so before using it you must load the package.
You can find install new packages as follows:
Install a package
3. Click on “Install”.
The package should be available in the library after R installed it:
4. Make sure to select the package if you want to use it:
Select the lattice package
You can also follow the next steps to accomplish the same thing:
Example: Load a new package called ‘lattice’.
library(lattice)If you get an error message, it probably means you haven’t installed ‘lattice’.
To demonstrate ‘xyplot’ we will be using data from the ‘mosaicData package’, so you must load this package as well.
Example: Load the package called ‘mosaicData’.
library(mosaicData)You will find ‘mosaicData’ listed in the ‘Packages’ window.
Click on the blue word “mosaicData” to see the different data sets:
There are many datasets available. You can just click on the dataset and obtain more information about the dataset:
Datasets in the mosaicData-package
Click on “Births78” to see a description of that particular dataset:
A complete description of the dataset is given:
Description of the dataset Births78
Example: Create an xyplot to display the number of births in the United States for each day in 1978.
xyplot(births~day_of_year, data=Births78)
The ‘xyplot’ command also has many options that enable the user to change the display. The options ‘main’, ‘xlab’, and ‘ylab’ give the title, x-label, and y-label (note these are exactly the same as for the ‘hist’ command).
Example: The following command creates a xy plot, with title “Births by Day”, x-label “Day of Year”, and y-label “Births”:
xyplot(births~day_of_year, data=Births78, xlab="Day of Year", ylab="Births", main="Births per Day")Apart from polynomials there are different types of functions available in GeoGebra (e.g. trigonometric functions, absolute value function, exponential function). Functions are treated as objects and can be used in combination with geometric constructions.
Enter the absolute value function \(f(x) = abs(x)\).
Enter the constant function \(g(x) = 3\).
Intersect both functions.
Hint: Make sure to intersect the two functions and then specify the starting x-value and the end x-value (I chose -4,4 in this case:
Absolute value function
y=sin(x)
It is clear that \(y=sin(x)\) is a periodic function.
Settings
Settings within settings
Algebra window
radians
It is easy to see that the period for \(y=sinx\) is \(2\pi\).
Please complete Presentation 1.
Due on September 6th.
100 points
The student is required to create a lecture video explaining all the steps mentioned below to demonstrate understanding of basic concepts and sliders in GeoGebra and understanding of basic functions and histograms in R.
Please make sure to use screen capturing software (For example Quicktime or Kaltura or any other software). Post the recording to google docs or dropbox or any other cloud-based platform and submit the link to CANVAS.
Presentation 1 Part 1 (50 points). Please complete the following by using GeoGebra:
Question 1: (10 points)
Graph a parabola.
Perform a horizontal shift on the parent function (parabola).
Perform a vertical shift on the parabola.
Reflect the function around the x-axis.
Stretch and compress the function.
Question 2: (10 points)
Draw a parabola with two roots.
Draw a parabola with zero roots.
Draw a parabola with one root.
Question 3: (30 points)
Consider the parabola y= ax2+bx+c. Graph this parabola.
Create sliders for the parameters a, b and c.
Explain how a change in each of the parameters will affect the parabola by moving the sliders.
Basic commands
Further basic commands
Sequences
x= (4, 5, 6, 6, 7, 7, 7, 9, 12).
Create a histogram to denote the frequency distribution of the data points.
LaTeX is a high-quality typesetting system; it includes features designed for the production of technical and scientific documentation. LaTeX is the de facto standard for the communication and publication of scientific documents. LaTeX is available as free software.
Navigate to the following website:
https://latexbase.com/d/5985b068-faf7-4e23-b803-aaaaa4432364
To the left is a Latex document and to the right is the output (in .pdf-format) of the Latex document.
The student is able to make changes to the Latex document and will be able to see the results of the changes on the document to the right.
Basic Latex program
Change the title
Author
Changed date
Mathematical expression
fraction
The output should look as follows:
Pie charts are created with the function
pie(x, labels=)
where x is a non-negative numeric vector indicating the area of each slice and labels= notes a character vector of names for the slices.
slices<-c(10,12,4,16,8)
lbls <-c("US", "UK", "Australia", "Germany", "France")
pie(slices, labels=lbls, main="Pie Chart of Countries")There are many ways to create a scatterplot in R. The basic function is plot(x, y), where \(x\) and \(y\) are numeric vectors denoting the \((x,y)\) points to plot. We will use the dataset “mtcars” that comes with R.
You can access the dataset by clicking on “Packages” and then on “datasets”:
Then click on “mtcars”, a dataset on Motor Trend Car Road Tests.
The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).
A data frame with 32 observations on 11 (numeric) variables.
Scatter plots are similar to line graphs in that they use horizontal and vertical axes to plot data points. However, they have a very specific purpose. Scatter plots show how much one variable is affected by another. The relationship between two variables is called their correlation.
Suppose we want to investigate the correlation between the weight and miles per gallon of a car.
First make sure to attach the dataset:
attach(mtcars)For the plot, we use the following code:
plot(wt, mpg, main="Scatterplot Example", xlab="Car Weight", ylab="Miles per Gallon", pch=19)Recall that the title can be displayed by using the option “main” and we can label the x and y axes with the “xlab” and “ylab” options.
There seems to be a negative correlation between the weight and the mpg of a car.
We can fit a regression line:
require(stats)
linearmodel<-(lm(mpg~wt))
plot((mpg~wt), col="red")
abline(lm(mpg~wt))www.geogebra.org
\(f(x)=x^2\)
4.Create tangent \(t\) to function \(f\) through point \(A\).
Create the slope of tangent \(t\) using \(m=Slope(t)\).
Define point \(S=(x(A), m)\).
Connect points \(A\) and \(S\) using a segment.
Complete Presentation 2 posted on CANVAS.
Boxplots can be created for individual variables or for variables by group. The format is ‘boxplot(x, data=y)’, where ‘x’ is a formula and ‘data=’ denotes the data frame providing the data.
To illustrate this command, we will use the ‘Dimes’ dataset in the ‘mosaicData’ package:
We click on the ‘Dimes’ dataset to access a description of the data frame.
It’s clear that the data frame contains 2 variables: ‘mass’ and ‘year’. We are interested in the ‘mass’ of dimes.
Example: We can access the ‘mass’ column by using the “$” sign as follows:
library(mosaicData)
Dimes$mass## [1] 2.259 2.247 2.254 2.298 2.287 2.254 2.268 2.214 2.268 2.274 2.271
## [12] 2.268 2.298 2.292 2.274 2.234 2.238 2.252 2.249 2.234 2.275 2.230
## [23] 2.236 2.233 2.255 2.277 2.256 2.282 2.235 2.235
Example: We are interested in the median weight and plot a box plot as follows:
boxplot(Dimes$mass)We can also use the “Summary”-function as follows:
summary(Dimes$mass)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.214 2.236 2.256 2.258 2.274 2.298
It is clear that the minimum weight is 2.214 grams and the maximum weight is 2.298 grams. The median weight is 2.256 grams, etc.
In descriptive statistics, the interquartile range (IQR), also called the midspread or middle 50%, or technically H-spread, is a measure of statistical dispersion, being equal to the difference between 75th and 25th percentiles, or between upper and lower quartiles, IQR = \(Q_3\)-\(Q_1\).
In this case the IQR is:
\(Q_3\)-\(Q_1\) = \(2.274-2.236=0.038\) grams.
This exercise shows how to compare dime masses before 2000 to dime masses after 2000 by creating a pair of box plots side by side.
plot(Dimes$year)less.than.2000=c(Dimes$year<2000)
less.than.2000## [1] FALSE FALSE TRUE TRUE TRUE FALSE FALSE TRUE FALSE FALSE TRUE
## [12] TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE TRUE TRUE
## [23] TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
Now, we can change our TRUE and FALSE value names to what we want them to actually be labeled as on our graphs.
less.than.2000[less.than.2000==TRUE]=("Before 2000")
less.than.2000[less.than.2000==FALSE]=("After 2000")
less.than.2000## [1] "After 2000" "After 2000" "Before 2000" "Before 2000" "Before 2000"
## [6] "After 2000" "After 2000" "Before 2000" "After 2000" "After 2000"
## [11] "Before 2000" "Before 2000" "Before 2000" "Before 2000" "Before 2000"
## [16] "Before 2000" "Before 2000" "After 2000" "After 2000" "After 2000"
## [21] "Before 2000" "Before 2000" "Before 2000" "Before 2000" "After 2000"
## [26] "After 2000" "After 2000" "After 2000" "After 2000" "After 2000"
Now, we can view our data in a comparison format with a box plot.
boxplot(mass~less.than.2000, data=Dimes)Suppose you want to import a dataset called “itemdata.csv” into R. Here are the steps:
Fig. 4.1
Click on “File”>>”Import Dataset”>>”From CSV”.
Fig 4.2
Click on “Browse” and navigate to your file.
Fig 4.3
Click on “Open” and “Import”.
Fig 4.4
The data should be imported into RStudio now.
Please complete Presentation 3 posted on CANVAS.
Bar graphs are produced with the ‘barplot(height)’ function, where height is a vector or matrix. If you want to create a bar graph for a single categorical variable, you must first create a frequency table for the values. To show this, we use the ‘HELPrct’ dataset, which you may view under the ‘mosaicData’ package:
We are interested in the ‘substance’ variable:
Example: Creating a bar plot. First we create a table of frequencies.
freq=table(HELPrct$substance)
freq##
## alcohol cocaine heroin
## 177 152 124
Then, we can create the bar plot:
barplot(freq)
Again, there are settings we can change to make this look nicer. The option ‘col’ changes the bar colors. The ‘main’ option changes the title of the plot. The ‘xlab’ and ‘ylab’ options changes the title of the x-axis and y-axis titles. The ‘las’ option changes the orientation of the x- and y- labels.
Example: Changing settings in a bar plot.
barplot(freq, col="deepskyblue", main="Frequencies", xlab="Substance", ylab="Count", las=1)
box()For horizontal bar graphs, we have to add the horizontal=TRUE option. The option ‘las’ changes the x- and y-axis label orientations. The option ‘cex.axis’ changes the size of the x-axis labels and ‘cex.names’ changes the size of the y-axis labels. Another new option is ‘xlim’ which extended the x-axis limits.
Example: Creating horizontal bar plots and changing more options.
barplot(freq, horiz=TRUE, col=rainbow(3), main="Frequencies", xlab="Count", ylab="Substance", xlim=c(0,200), cex.axis=.8, cex.names=0.6, las=1)
box()See
for a list of colors available. You can also go to ‘help(par)’ for more information on parameters in graphics. For example you can search ‘par’ for ‘las’ and it will tell you the settings for label orientation.
Exercise 5.1:
You can also create bar graphs using multiple variables for each category.
Example: Creating a bar plot with multiple variables for each category.
Using the data set ‘AirPassengers’ we need to divide the information by month. For our example we’ll just do Jan, Feb, and Mar.
Jan=AirPassengers[seq(1,144,by=12)]
Feb=AirPassengers[seq(2,144,by=12)]
Mar=AirPassengers[seq(3,144,by=12)]Next, we’ll create a data frame using our months.
AirPassengersByMonth=data.frame(Jan,Feb,Mar)Create a sequence ‘Years’ and make these the row names for your data frame.
Years=seq(1949,1960)
row.names(AirPassengersByMonth)=YearsNow, we want to plot the first 5 years of data in a bar plot. Using the ‘barplot’ command we plot a transposed matrix of the data.
barplot(t(data.matrix(AirPassengersByMonth[1:5,])), beside=TRUE, ylim=c(0,300), col=gray.colors(3), main="Passengers by Month")
legend("topleft", legend=colnames(AirPassengersByMonth), fill=c(gray.colors(3)), title="Months")GeoGebra’s Graphics View can be exported as a picture to your computer’s clipboard. Thus, they can be easily inserted into text processing or presentation documents.
Navigate to the following website and follow the instructions:
https://www.geogebra.org/m/jnAcctVg
Save the following image to your computer:
Build the dynamic application:
Make sure to download the following image:
Follow the steps:
Move point A with the mouse. How does this affect the picture?
Move the picture with the mouse and observe how this affects its image.
Move the line of reflection by dragging the two points with the mouse. How does this affect the image?
Complete Presentation 5 (posted on CANVAS).
A heat map is a graphical representation of data where the individual values contained in a matrix are represented as colors.
The data for this example are the total player stats to date (current as of 2.5.2013) for the Utah Jazz. The data is already prepared for import and is available here.
Load the data into R using read.csv:
jazz <- read.csv("http://emptymind.org/r/datasets/jazz.csv", sep=",")If you did this correctly, a new variable called ‘jazz’ will appear in your Workspace.
The dataset is currently sorted by Total Points scored but let’s say we wanted to sort it by Games Played instead. You can sort the data in any fashion you see fit, simply reference the Column Name you wish to sort by.
jazz <- jazz[order(jazz$G),] The file has already been prepared with the appropriate column names however our heatmap is going to make a lot more sense if we use the Player name to name the rows rather than a number.
row.names(jazz) <- jazz$PlayerNow that we have used the Player name to name the rows, we can get rid of it from our Dataset.
jazz1=subset(jazz,select=-Player)When we imported the data, it was imported it a dataframe variable. In order to use the built in heatmap function, we need to convert our dataframe to a datamatrix.
jazz_matrix <- data.matrix(jazz1)Now we will generate the heatmap visual display:
jazz_heatmap <- heatmap(jazz_matrix, Rowv=NA, Colv=NA, col = cm.colors(256), scale="column", margins=c(3,3))Make sure to install the “plotly”-package.
Then call the “plotly”-package.
library(plotly)## Loading required package: ggplot2
##
## Attaching package: 'ggplot2'
## The following object is masked from 'mtcars':
##
## mpg
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
By default, Plotly for R runs locally in your web browser or in the R Studio viewer.
We use the built-in “midwest” dataset and consider the “percollege”-variable which denotes the Percent college educated population and create boxplots:
library(plotly)
p <- plot_ly(midwest, x = ~percollege, color = ~state, type = "box")
pPlotly graphs are interactive. Click on legend entries to toggle traces, click-and-drag on the chart to zoom, double-click to autoscale, shift-and-drag to pan.
We use the built-in dataset “volcano”, which provides topographic information on Auckland’s Maunga Whau Volcano which is a matrix with 87 rows and 61 columns, rows corresponding to grid lines running east to west and columns to grid lines running south to north. It is digitized from a topographic map. We create a heatmap from the data.
library(plotly)
plot_ly(z = ~volcano)## No trace type specified:
## Based on info supplied, a 'heatmap' trace seems appropriate.
## Read more about this trace type -> https://plot.ly/r/reference/#heatmap
The plotly package depends on ggplot2 which bundles a data set on monthly housing sales in Texan cities acquired from the TAMU real estate center. After the loading the package, the data is “lazily loaded” into your session, so you may reference it by name:
library(plotly)
txhousing## # A tibble: 8,602 x 9
## city year month sales volume median listings inventory date
## <chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Abilene 2000 1 72 5380000 71400 701 6.3 2000
## 2 Abilene 2000 2 98 6505000 58700 746 6.6 2000.
## 3 Abilene 2000 3 130 9285000 58100 784 6.8 2000.
## 4 Abilene 2000 4 98 9730000 68600 785 6.9 2000.
## 5 Abilene 2000 5 141 10590000 67300 794 6.8 2000.
## 6 Abilene 2000 6 156 13910000 66900 780 6.6 2000.
## 7 Abilene 2000 7 152 12635000 73500 742 6.2 2000.
## 8 Abilene 2000 8 131 10710000 75000 765 6.4 2001.
## 9 Abilene 2000 9 104 7615000 64500 771 6.5 2001.
## 10 Abilene 2000 10 101 7040000 59300 764 6.6 2001.
## # ... with 8,592 more rows
In attempt to understand house price behavior over time, we could plot date on x, median on y, and group the lines connecting these x/y pairs by city. Using ggplot2, we can initiate a ggplot object with the ggplot() function which accepts a data frame and a mapping from data variables to visual aesthetics. By just initiating the object, ggplot2 won’t know how to geometrically represent the mapping until we add a layer to the plot via one of geom_() (or stat_()) functions (in this case, we want geom_line()). In this case, it is also a good idea to specify alpha transparency so that 5 lines plotted on top of each other appear as solid black, to help avoid overplotting.
p <- ggplot(txhousing, aes(date, median)) +
geom_line(aes(group = city), alpha = 0.2)
p## Warning: Removed 446 rows containing missing values (geom_path).
Install the “mapview”-package. Lets map the dataset “breweries” that consists of selected breweries in Franconia in Germany:
library(mapview)
mapview(breweries)Make sure to click on the map for more information.
Complete Project 5 posted on CANVAS.
Insert a heading into the Graphics View of GeoGebra:
Activate the Text tool (ABC drop-down menu) and click on the upper part of the Graphics View.
Type the following text into the appearing window:
Reflecting a point at the coordinate axes
Change the properties of the text using the Stylebar (e.g. wording, font style, font size, formatting).
Adjust the position of the text using the Move tool.
Fix the position of the text so it can’t be moved accidentally (Properties dialog – tab Basic – Fix object).
Dynamic text refers to existing objects and adapts automatically to modifications, for example in
\(A = (3, 1)\)
the coordinates change whenever point \(A\) is moved.
Activate the ABC Text tool and click on the Graphics View.
Type A = into the appearing window.
Hint: This will be the static part of the text and won’t change if point A is moved.
3.Insert the dynamic part of this text by selecting point A from the Objects drop-down list.
Click OK.
Click on the “Move” tool and move the point around. The coordinates should update to the new position.