R can be regarded as an implementation of the S language which was developed at Bell Laboratories by Rick Becker, John Chambers and Allan Wilks, and also forms the basis of the S-Plus systems. The evolution of the S language is characterized by four books by John Chambers and coauthors. For R, the basic reference is The New S Language: A Programming Environment for Data Analysis and Graphics by Richard A. Becker, John M. Chambers and Allan R. Wilks. The new features of the 1991 release of S are covered in Statistical Models in S edited by John M. Chambers and Trevor J. Hastie. The formal methods and classes of the methods package are based on those described in Programming with Data by John M. Chambers
R is an integrated suite of software facilities for data manipulation, calculation and graphical display. Among other things it has
The term “environment” is intended to characterize it as a fully planned and coherent system, rather than an incremental accretion of very specific and inflexible tools, as is frequently the case with other data analysis software. R is very much a vehicle for newly developing methods of interactive data analysis. It has developed rapidly, and has been extended by a large collection of packages. However, most programs written in R are essentially ephemeral, written for a single piece of data analysis.
To start programing in R. You need to install R statistical programing language from the internet. The engine is open source, however the installation on its own is not user friendly hence you need to install an integrated development environment.
There are several flavours to choose from: * R Studio * Visual Studio for R * Eclipse * Microsoft Visual Studio Community Edition * Power BI * IntelliJ * Tinn-R or RKWard
These are just afew of the IDEs I have come across although there are many more. My personal favourite one is R Studio.
A package is a suitable way to organize your own work and, if you want to, share it with others. Typically, a package will include code (not only R code!), documentation for the package and for the functions inside, some tests to check everything works as it should, and data sets.
The basic information about a package is provided in the DESCRIPTION file, where you can find out what the package does, who the author is, what version the documentation belongs to, the date, the type of license its use, and the package dependencies.
R packages are collections of functions and data sets developed by the community. They increase the power of R by improving existing base R functionalities, or by adding new ones. R has a vast array of packages available at the cran and bioconductor repositories. In the last few years, the number of packages has grown exponentially!
The packages are meant to assist you accomplish specific tasks. It is important to note that some of these packages accomplish similar task. The choice of the package to use is left at the discretion of the user, environment and complexity of task at hand.
A repository is a place where packages are located so you can install them from it. Although you or your organization might have a local repository, typically they are online and accesible to everyone. Three of the most popular repositories for R packages are:
CRAN: the official repository, it is a network of ftp and web servers mantained by the R community around the world. It is coordinated by the R foundation, and for a package to be published here it needs to pass several tests that ensure the package is following CRAN policies. You can find more details here.
Bioconductor: this is a topic specific repository, intended for open source software for bioinformatics. As CRAN, it has its own submission and review processes, and its community is very active having several conferences and meetings per year.
Github : although this is not R specific, github is probably the most popular repository for open source projects. Its popularity comes from the unlimited space for open source, the integration with git, a version control software, and its ease to share and collaborate with others. But be aware that there is no review process associated to it.
The most common way is to use the CRAN repository, then you just need the name of the package and use the command install.packages("package"). For example, the oldest package published in CRAN and still online and being updated is the vioplot package, from Daniel Adler. To install it from CRAN, you will need just to use:
# install.packages("vioplot")
After running this, you will receive some messages on screen. They will depend on what operative system you are using, the dependences, and if the package was succesfully installed.
A CRAN is a network of servers (each of them called a mirror), so you can specify which one you would like to use. If you are using R through the RGui interface you can do it by selecting it from the list which appears just after you use the install.packages() command. On RStudio, the mirror is already selected by default. You can also select your mirror by using the chooseCRANmirror(), or directly inside the install.packages() function by using the repo parameter. You can see the list of available mirrors with getCRANmirrors() or directly on this link.
# install.packages("vioplot", repo = "https://lib.ugent.be/CRAN/")
The line above installs package vioplot usinge the Ghent University Library mirror (Belgium).
In the case of Bioconductor, the standard way of installing a package is by first executing the following script:
# source("https://bioconductor.org/biocLite.R")
This will install some basic functions needed to install bioconductor packages, such as the biocLite() function. If you want to install the core packages of Bioconductor just type it without further arguments: In the case of Bioconductor, the standard way of installing a package is by first executing the following script:
# biocLite()
If, however, you are interested in just a few particular packages from this repository you can type directly their names as a character vector:
# biocLite(c("GenomicFeatures", "AnnotationDbi"))
Each repository has its own way to install a package from them, so in the case that you are regularly using packages from different sources this behavior can be a bit frustrating. A more efficient way is probably to use the devtools package to simplify this process, because it contains specific functions for each repository, including CRAN.
You can install devtools as usual with install.packages("devtools"), but you might need also to install Rtools on Windows, Xcode command line tools on Mac, or r-base-dev and r-devel on Linux.
Having installed devtools, you will be able to use the utility functions to install another packages. The options are:
After you spend more time with R, is normal that you use install.packages() a few times per week or even per day, and given the speed at what R packages are developed, is possible that sooner than later you will need to update or replace your beloved packages. In this section, you will find a few functions that can help you to manage your collection.
# installed.packages()
remove.packages(), in your case:# remove.packages("vioplot")
# old.packages()
# update.packages()
But for a specific package, just use once again:
# install.packages("vioplot")
If you prefer a graphical user interface (that is, pointing and clicking) to install packages, both RStudio and the RGui include them. In RStudio you will find it at Tools -> Install Package, and there you will get a pop-up window to type the package you want to install. While in the RGui you will find the utilities under the Packages menu.
After a package is installed, you are ready to use its functionalities. If you just need a sporadic use of a few functions or data inside a package you can access them with the notation packagename::functionname().
If you will make a more intensive use of the package, then maybe is worth to load it into memory. The simplest way to do this is with the library() command. Please note that the input of install.packages() is a character vector and requires the name to be in quotes, while library() accepts either character or name and makes it possible for you to write the name of the package without quotes.
After this, you no longer need the package::function() notation, and you can access directly its functionalities as any other R base functions or data.
You may have heard about the require() function: it is indeed possible to load a package with this function, but the difference is that it will not throw an error if the package is not installed.
So use this function carefully!
Strictly speaking a library() is the command used to load a package, and it refers to the place where the package is contained, usually a folder on your computer, while a package is the collection of functions bundled conveniently.
Maybe it can help a quote from Hadley Wickham, Chief data scientist at RStudio.
a package is a like a book, a library is like a library; you use library() to check a package out of the library
Another very useful source of help included in most of the packages is the vignettes, which are documents where the authors show some functionalities of their package in a more detailed way. Following vignettes is a great way to get your hands dirty with the common uses of the package, so it’s a perfect way to start working with it before doing your own analysis.
If you prefer to stay in the command line, the vignette() command will show you the list of vignettes, vignette(package = “packagename”), the ones included in a given package, and after you have located the one you want to explore, just use the vignette(“vignettename”) command.
For example, one of the most popular packages for visualization is ggplot2. You might probably have installed it on your computer already, but if not, this is your chance to do it and test your new install.packages() skills.
Assuming you have already installed ggplot2, you can check what vignettes are included on it: vignette(package = "ggplot2") Two vignettes are available for ggplot2, ggplot2-specs and extending-ggplot2. You can check the first one with: vignette("ggplot2-specs")On RStudio, this will be displayed on the Help tab on the right, while in the RGui or on the command line this will open a browser window with the vignette.
At this point, you should be able to install and get the most from your R packages, but there still is one final question the air: where do you find the packages you need?
The typical way of discovering packages is just by learning R, in many tutorials and courses the most popular packages are usually mentioned. The DataCamp courses can be a good example here: the “Cleaning Data in R” teaches all about tidyr, the “Data Analysis in R, the data.table Way” deals with the data.table package, and so on.
For each topic you would like to cover in R, there is probably an interesting package you can find.
But what if you have a specific problem and you don’t know where to start, for example, as I stated in the introduction of this post, what if you are interested in analyze some Korean texts? Or what if would you like to collect some weather data? Or estimate evapotranspiration?
You have reviewed several repositories, and yes, you know you could check the list of CRAN packages here, but with more than 10000 options, is very easy to get lost.
Let’s look at some alternatives!
One alternative can be to browse categories of CRAN packages, thanks to the CRAN task views. That’s right! CRAN, the official repository, also gives you the option to browse through packages. The task views are basically themes or categories that group packages based on their functionality.
R and Rstudio are open source softwares. The installation files are maintained and updated regulary by a team of community contributors as such it is important to check for updates online and update them regularly.
On R studio type and run the code below to clear all variable and stored objects in R.
rm(list=ls(all=TRUE)) # clear everything
To check if the latest R package you are using is upto date you need to install and load a package called installr.
if(!require(installr)) {
install.packages("installr"); require(installr)} #load / install+load installr
There are functions under this package that help you accomplish the updating tasks by checking the lastest installation files from the internet.
# updateR()
The function above will start the updating process of your R installation. It will check for newer versions, and if one is available, will guide you through the decisions you’d need to make. If the R intallation running is upto date the function returns FALSE hence no need to update. Press any Key “any-key”, and the function will proceed with copying all of your packages from your old (well, current) R installation, into your newer R installation. You can then erase all of the packages in your old R installation. Next is an option to update all of your packages in the new version of R.
# updateR(F, T, T, F, T, F, T) # install, move, update.package, quit R.
This is a step by step guide on the functions:
# check.for.updates.R() # tells you if there is a new version of R or not.
This function downloads the latest R istallation on to your machine.
# install.R() # download and run the latest R installer
The next package comes in handy especially when there are major updates on R base installation. The new version creates a new folder op packages in your machine hence R is linked to it. By default the folder is normally empty and hence it is important to transfer packages from the old folder to the new folder
# copy.packages.between.libraries() # copy your packages to the newest R installation
The copy write task from the one version before it (if ask=T, it will ask you between which two versions to perform the copying)
# updateR(F, T, F, F, F, F, T) # only install R (if there is a newer version), and quits it.
Sometimes packages become obsolete and are nologer available for newer versions R. To overcome this challenge it is important work with projects under version control.
THE END