With over 2 million users worldwide R is rapidly becoming the leading programming language in statistics and data science. Every year, the number of R users grows by 40% and an increasing number of organizations are using it in their day-to-day activities. R is a programming language and free software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing. The R language is widely used among statisticians and data miners for developing statistical software and data analysis. A GNU package, source code for the R software environment is written primarily in C, Fortran and R itself, and is freely available under the GNU General Public License. Pre-compiled binary versions are provided for various operating systems. Although R has a command line interface, there are several graphical user interfaces, such as RStudio, an integrated development environment.
R and its libraries implement a wide variety of statistical and graphical techniques, including linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, and others. R is easily extensible through functions and extensions, and the R community is noted for its active contributions in terms of packages. Many of R’s standard functions are written in R itself, which makes it easy for users to follow the algorithmic choices made. For computationally intensive tasks, C, C++, and Fortran code can be linked and called at run time. Advanced users can write C, C++, Java, .NET or Python code to manipulate R objects directly. R is highly extensible through the use of user-submitted packages for specific functions or specific areas of study. Due to its S heritage, R has stronger object-oriented programming facilities than most statistical computing languages. Another strength of R is static graphics, which can produce publication-quality graphs, including mathematical symbols. Dynamic and interactive graphics are available through additional packages.
The calculation is interpreted as the sum of two single-element vectors, resulting in a single-element vector. The prefix [1] indicates that the list of elements following it on the same line starts with the first element of the vector (a feature that is useful when the output extends over multiple lines). Like other similar languages such as APL and MATLAB, R supports matrix arithmetic. R’s data structures include vectors, matrices, arrays, data frames (similar to tables in a relational database) and lists.
The capabilities of R are extended through user-created packages, which allow specialized statistical techniques, graphical devices, import/export capabilities, reporting tools (knitr,Sweave), etc. These packages are developed primarily in R, and sometimes in Java, C, C++, and Fortran. The R packaging system is also used by researchers to create compendia to organize research data, code and report files in a systematic way for sharing and public archiving.
A list of changes in R releases is maintained in various “news” files at CRAN.
The most commonly used graphical integrated development environment for R is RStudio. A similar development interface is R Tools for Visual Studio. Interfaces with more of a point-and-click approach include Rattle GUI, R Commander, and RKWard.
R has vibrant and active local communities worldwide for users to network, share ideas and learn. There are regular R-user meetups and a more focused R-Ladies groups which promotes gender diversity.
R is comparable to popular commercial statistical packages, such as SAS, SPSS, and Stata, but R is available to users at no charge under a free software license.
R is not the only language that can be used for data analysis. Why R rather than another? Here is a list: Interactive language: Data analysis is inherently an interactive process. Data structures: R has a fantastic mechanism for creating data structures. Graphics: It is easy to produce graphs for exploring data. The default graphs can be tweaked to get publication-quality graphs. Missing values: Missing values are an integral part of the R language. Many functions have arguments that control how missing values are to be handled. Functions as first class objects: Functions, like mean and median, are objects that you can use like data. Packages: R has a package system that makes it extremely easy for people to add their own functionality so it is indistinguishable from the central part of R. Community: Have People.
References: