Maj Jason Freels
Air Force Institute of Technology
Introduce the programming language and the
IDE
Discuss the advantages and disadvantages of R
Outline the basic structure of R
Show how to interact with the R environment
Detail a few help resources
An open-source, interpreted computer language and suite of statistical operators for calculations on vectors and matrices
A coherent, and integrated collection of tools and graphical facilities for data analysis and reproducible research
A well-developed programming language which includes conditionals, loops, user-defined recursive functions, and input-output facilities
One of the fastest growing technical programming languages in the world
The S programming language
Computing language originally developed at Bell Labs (1976) for data analysis
Licensed by AT&T/Lucent to Insightful Corp. under the product name: S-Plus.
The R programming language
Written and released as open source software by Ross Ihaka and Robert Gentleman at University of Auckland
R plays on name "S"
Since 1997: international R-core team (~15 people) and thousands of code writers and statisticians share their work via R packages
Most programming languages are designed for
Machine efficiency is concerned with producing numerical results as fast as possible
Pre-compiled languages (C, C++, FORTRAN) can produce numerical results extremely fast
However, pre-compiled code can be difficult to read for non-developers
Compiled languages were developed to make it easier to collaborate
Several compiled languages are proprietary - making it hard to connect with outside resources
R was built for
Human efficiency is concerned with how fast a language can produce a final product
As a compiled, interpreted language R may not compute as fast as pre-compiled languages
R might cost you
R can save you
Plus, R makes it easy to speed up your analysis by pulling in code from many other languages
Suppose we're interested in estimating which automobile design factors have a statistically significant effect on fuel efficiency
This type of analysis is often done using a technique called statistical linear regression
Right now, the how's and why's of linear regression aren't important
The focus of this example is how quickly R allows the user to go from a data set to a polished document showing the results of an analysis
This example uses the Motor Trend Car Road Tests data set that installs with R - to view this data set type mtcars
Many R packages come with data sets pre-loaded, to view the names of the data sets type data(package = "datasets")
A new RStudio tab should open showing the names of the data sets in the datasets package
To read the documentation on a data set, type ?mtcars or help(mtcars)
You may need to present your results to someone...what might they want to see?
The data set itself
Plots to illustrate and support your conclusions
A table of results from your analysis
R/RStudio simplifies the process by bringing everything together
Data
Analysis (code, plots & results)
Presentation & Visualization
\[ \begin{array}{rrrrr} \hline & Estimate & Std. Error & t value & Pr(>|t|) \\ \hline (Intercept) & 12.30 & 18.72 & 0.66 & 0.52 \\ cyl & -0.11 & 1.05 & -0.11 & 0.92 \\ disp & 0.01 & 0.02 & 0.75 & 0.46 \\ hp & -0.02 & 0.02 & -0.99 & 0.33 \\ drat & 0.79 & 1.64 & 0.48 & 0.64 \\ wt & -3.72 & 1.89 & -1.96 & 0.06 \\ qsec & 0.82 & 0.73 & 1.12 & 0.27 \\ vs & 0.32 & 2.10 & 0.15 & 0.88 \\ am & 2.52 & 2.06 & 1.23 & 0.23 \\ gear & 0.66 & 1.49 & 0.44 & 0.67 \\ carb & -0.20 & 0.83 & -0.24 & 0.81 \\ \hline \end{array} \]
Only open-source language with a standardized set of development tools to extend the environment
Gives full access to the programming environment, easily see what functions are doing, fix bugs, and extend software
Promotes reproducible research by providing open and accessible tools for data analysis and reporting
Provides a forum allowing researchers to explore and expand the methods used to analyze data
Scientists around the world are the co-owners to the software tools needed to carry out research
Over 7,000 R packages available (as of: 8/2015)
The product of thousands of experts in many fields - R is CUTTING EDGE
Many percieve the R learning curve to be steep, minimal GUI
Addressed by RStudio
In reality, R's learning curve is no steeper than that of any other language
The real issue relates to R how is structured compared to other languages
This is explained in more detail below
Little commercial support; new users complain of hostility on R help sites
Figuring out correct methods to use a function on your own can be frustrating
Working with large datasets is limited by RAM
Data prep can be messier and more mistake prone in R vs. SPSS or SAS
In MatLab & JMP (for example) updates are released every 6-12 months
In preparing these updates, effort is made to ensure that the user experience is consistent across every function
This means that there is essentially a single learning curve to becoming proficient at using MatLab or JMP
A potential drawback of this development model is that users may not have access to newer methods until the the next update is released
R is more like the wild west - updates and new capabilities are released every day
Some effort is made by the core R team to ensure a consistent user experience
But, R is purposefully focused on letting researchers from around the world share their work with the world quickly
A potential drawback of this development model is that it's NOT uncommon for multiple packages to have functions with the exact same name that produce completely different results
Thus, R does not have a steeper learning curve, but CAN, in some cases, have multiple learning curves for different packages
R was initially a functional programming language which has taken on attributes of an object-oriented programming during its development
To understand computations in R, remember these five things
In cooking, simple methods are applied to combine ingredients into finished products
At the most basic level these methods are:
Heating stuff up
Cooling stuff down
Breaking stuff apart
Mixing stuff together
Specialized tools can adjust how a method is applied to certain "classes" of ingredients
Double boiler pan
Waffle maker
Microwave
Bread maker
The process of scripting with R is similar to cooking
At the most basic level some of the methods used in R scripting are:
plot - basic method for plotting simple numeric objects
print - basic method for printing simple numeric objects
summary - basic method for summarizing simple numeric objects
There are several others
Specialized tools (functions) adjust how a method is applied to certain "classes" of objects
plot.function( ) - applies the plot method to function objects
plot.stepfun( ) - applies the plot method to step-function objects
plot.ts( ) - applies the plot method to time-series objects
my.table <- xtable::xtable(mtcars)
print(my.table)Line 1 - xtable( ) first checks class(mtcars) = data.frame
The xtable( ) then searches for a method to convert a data.frame-class object to an xtable-class object
In this case, the function used is xtable.data.frame( )
R then stores the converted object under the name my.table
Line 2 - print( ) then checks class(my.table) = xtable, data.frame
print then searches for a method to print an xtable-class object
In this case, the function used is print.xtable( )
Many R Packages create new, complex objects with new classes
The Survival package creates Surv-class objects
The xtable package creates xtable-class objects
The SMRD pacakge creats life.data-class objects
Along with the new classes of objects, packages also define new methods for manipulating these objects
Downloading packages expands our "kitchen", providing more sophisticated tools for creating and manipulating complex objects, SO...
...we go from this...
...to this...
Functions, vectors, datasets, character strings... are stored as objects
Graphics are written out and are NOT stored as objects
Objects are classified by two criteria:
MODE: how objects are stored in R - character, numeric, logical, factor, list, & function
CLASS: how objects are treated by functions - vector,list, matrix, array, data frame & hundreds of other classes created by specific functions
Scripting is the process of matching objects and function calls such that the desired output objects (e.g., statistical results) and graphics are returned.
R has three methods to assign names to objects
Left assignment
Right assigment
Equals sign
R treats each of these methods the same (in most cases)
Object naming rules
Variable names can only contain letters and numbers separated by "." or "_"
Variable Names CANNOT begin with a number
var <- 5 ### Left assignment200 -> Var ### Right assignmentmeaning.of_life = 42 ### Or equal signvar; Var; meaning.of_life[1] 5 [1] 200 [1] 42
R scripting errors result when an object's properties don't match what was expected by the function call
These functions are helpful to find what properties are assigned to an object
mode(object.name) ### Returns an object's mode class(object.name) ### Returns an object's classattributes(object.name) ### Returns the attributes associated with an objectstr(object.name) ### Returns the complete list of properties assigned to an objectSometimes errors result if you accidentally overwrite an object in the current environment
These functions are helpful for managing objects defined in the current working environment
ls( ) ### Returns a list of active objects in the current working environmentrm( ) ### Removes an object from the current working environmentrm(list = ls()) ### Removes all objects from the current working environment (use carefully)Errors can result if a script, saved in one file location, calls a file in another location
These functions are helpful for managing your local file structure
Additionally, creating projects in RStudio can help avoid these errors
getwd( ) ### Returns the location of the current working directorysetwd("C:/Users/Desktop") ### Reassigns the location of the working directoryfile.choose( ) ### Opens a new file explorer window to choose a file?cos ### Searches the local package library for "cos"??xyz ### Searches the full R documentation for "xyz" - also help(xyz)vignette() ### Lists "how-to" demos available for each package in the libraryhelp.search("t.test") ### Provides a categorized search of R documentation for "t.test"Quick R http://www.statmethods.net/
R-bloggers http://www.r-bloggers.com/
StackOverflow http://stackoverflow.com/questions/tagged/r/
Next, go beyond the basics and use R and RStudio to create and manipulate objects
Check out these presentations