Harold Nelson
1/26/2022
Create a function my_range() that returns the value of the range of a numeric vector.
## [1] 6.688954
## [1] -3.012472 3.676481
Note that the built-in range() function does not do the arithmetic.
Create a function range_95() that returns the difference between the 95th percentile and the 5th percentile of a numeric vector.
## 95%
## 3.278338
Create a function range_85() that returns the difference between the 85th percentile and the 15th percentile of a numeric vector.
## 85%
## 2.082558
We’ve created separate functions range_85() and range_95(). In addition we have the built-in function IQR(), which is essentially range_75(). Create a function gen_range(x,pct), where the parameter pct takes the place of the 75, 85, and 95 in our examples.
gen_range = function(x,pct){
top = quantile(x,pct/100)
bottom = quantile(x,1 - pct/100)
return(top - bottom)
}
rn = rnorm(1000)
gen_range(rn,85)
## 85%
## 1.981619
## 85%
## 1.981619
Create a function rmsd(x,y) which returns the square root of the mean of the squares of the differences between x and y.
rmsd = function(x,y){
diffs = x - y
diffs_sq = diffs^2
mdiffs_sq = mean(diffs_sq)
return(sqrt(mdiffs_sq))
}
x = rnorm(1000)
y = rnorm(1000)
rmsd(x,y)
## [1] 1.451501
Create a function mad(x,y) which returns the mean of the absolute values of the differences between x and y.
mad = function(x,y){
diffs = x - y
abs_diffs = abs(diffs)
return(mean(abs_diffs))
}
x = c(1,2,3,4)
y = c(2,1,4,3)
mad(x,y)
## [1] 1
You probably noticed that the quantile() function produces a named vector as a result. You may want to know why. The answer is that its second argument may be a vector of percentiles. In that case, the labels would be important.
## 10% 25% 50% 75% 90%
## -1.26904475 -0.66170522 0.05865024 0.78821800 1.40129157
Create an inverse of the quantile() function, qinv(x,val). The parameter x is a numeric vector.The parameter val is a single number. The function returns the fraction of the values of x that are less than val.
## [1] 0.982
When we apply the summary function to a numeric vector like county$pop2017, we get some useful results.
Example
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 88 10976 25857 103763 67756 10163507 3
## 'summaryDefault' Named num [1:7] 88 10976 25857 103763 67756 ...
## - attr(*, "names")= chr [1:7] "Min." "1st Qu." "Median" "Mean" ...
res is a named vector. We can wrap this function inside another function and add to the output vector before producing a final result.
tb_summary = function(x){
res = summary(x)
out = c(res,sd(x,na.rm = T))
names(out) = c(names(res),"SD")
return(out)
}
tb_summary(county$pop2017)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 88.0 10975.5 25857.0 103763.4 67756.0 10163507.0 3.0
## SD
## 333194.5
The base R function is easy to use. Get a histogram of the weight variable in cdc2. You need to load the data first.
Do this using ggplot2.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
By comparison with the base R hist(), the ggplot2 version is more complexity.
Let’s write an R function that uses ggplot2 to do the histogram but is no more complex in usage that the base R function.
Here’s our first try.
gg_hist = function(df,var) {
ggplot(data = df,aes(x = var)) +
geom_histogram()
}
gg_hist(cdc2,weight)
## Error in FUN(X[[i]], ...): object 'weight' not found
gg_hist = function(df,var) {
ggplot(data = df,aes(x = {{var}})) +
geom_histogram()
}
gg_hist(cdc2,weight)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
That failed. We need to “embrace” the “var”.
Use this method to create the function gg_density()
gg_density = function(df,var) {
ggplot(data = df,aes(x = {{var}})) +
geom_density()
}
gg_density(cdc2,weight)
Create a function gg_hist_wrap() that produces a histogram of var and facets it by a categorical variable cat.
gg_hist_wrap = function(df,var,cat){
ggplot(data = df,aes(x = {{var}})) +
geom_histogram() +
facet_wrap(~{{ageCat}})
}
gg_hist_wrap(cdc2,weight,ageCat)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
This actually failed the first time I tried it. I googled for a solution and found this.
https://community.rstudio.com/t/problem-with-facet-wrap-and-curly-curly/36975.
Then for some reason, the error was resolved.
First, let’s get immersed in a 19th century method of computing probabilities using a normal curve table.
Watch https://www.youtube.com/watch?v=xI9ZHGOSaCg
It’s only 11 minutes.