R is a programming language popularly used for statistical computing and graphical presentation.
R is commonly used for data analysis and visualization.
The advantages of R lie in its wide range of resources for data analysis, data visualization, data science, and machine learning.
Another advantage of R:
R offers many statistical techniques such as tests, classification, clustering, and data reduction.
R makes it easy for users to create graphics, such as pie charts, histograms, box plots, scatter plots, and more.
R is cross-platform and can be used on Windows, Mac, and Linux.
R is free and open-source, meaning anyone can contribute to its development.
R has a large user community.
R provides a wide range of functions and packages to solve various problems.
R stands for Ridwan (just kidding, mate! No worries!)
For R to work properly, there are two things you need to install. First, R itself. Second, RStudio, which serves as a tool to run R. Technically, R can be run from the command prompt, but that would be very difficult.
To install R, download it from https://cloud.r-project.org/ or https://posit.co/download/rstudio-desktop/, or just Google “R install” and it will likely appear at the top of the search results.
Once downloaded, simply install R by following the instructions.
To write text, use single or double quotes.
'Halo nama saya Ridwan dan saya cinta R'
## [1] "Halo nama saya Ridwan dan saya cinta R"
For numbers, no quotes are needed. Just write them directly.
1
## [1] 1
For simple calculations, you can directly add the numbers together.
1 + 1
## [1] 2
In R, output can appear directly without a command, but R also provides commands to explicitly display output.
x <- "Ridwan"
print(x)
## [1] "Ridwan"
for (x in 1:10){
print(x)
}
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10
Variables in R are containers used to store data. In R, a variable is
created directly when you assign a value to it using the
<- symbol (you can also use =, but in some
contexts, the = sign is not allowed in R). To display the
output of a variable, simply type the variable’s name.
In the example below, nama is a variable that contains
the text value Ridwan. Likewise, usia is a
variable that holds the value 17. If you type
nama and usia in R, the values stored in those
variables will be displayed.
nama <- "Ridwan"
usia <- 17
nama
## [1] "Ridwan"
usia
## [1] 17
Alternatively, you can also use the print() command.
print(nama)
## [1] "Ridwan"
print(usia)
## [1] 17
A variable must start with a letter and can be combined with numbers, other letters, periods, and underscores (_).
Variable names are case-sensitive. nama, Nama, and NAMA are three different variables.
You cannot use reserved words (which are usually used as arguments in functions), such as TRUE, FALSE, NULL, etc.
Variable names cannot begin with a number.
In R, you can use common mathematical operators, such as multiplication (*), division (/), addition (+), subtraction (-), and exponentiation (^).
5+5
## [1] 10
5-9
## [1] -4
6/5
## [1] 1.2
7*5
## [1] 35
2^2
## [1] 4
R also has built-in mathematical functions, such as
mean(), max(), min(), and
others.
For example, if we have 10 data points stored in a variable called
data, we can use these built-in functions to find the
maximum, minimum, or mean values.
data <- c(1,6,4,5,7,2,3,6,8,6)
max(data)
## [1] 8
min(data)
## [1] 1
mean(data)
## [1] 4.8
We can also calculate the range by subtracting the minimum value from the maximum value.
max(data) - min (data)
## [1] 7
Or, you can express it as a variable.
range_coba <- max(data) - min (data)
range_coba
## [1] 7
A function is a set of code that only works when called. We can create functions according to our needs. For example, if I want to create a function where every number I input will be increased by 5, I can do it like this:
func_ridwan <- function(x) {
x + 5
}
If we input the number 5, the result will be 10. Here’s how you could define the function in R:
tambah_lima <- function(x) {
return(x + 5)
}
# Memanggil fungsi dengan input 5
tambah_lima(5)
When you input 5, the output will be 10.
func_ridwan(5)
## [1] 10
R has a built-in function to create plots, which is plot(x, y).
x represents the points on the x-axis, and y represents the points on the y-axis.
For example, if you want to plot a point at the coordinates x = 1 and y = 3, you can do it like this:
plot(1,3)
Bisa juga titik-titik ganda/banyak. Akan lebih mudah kalau kita buat variabel dulu.
x <- c(1, 2, 3, 4, 5)
y <- c(3, 7, 8, 9, 12)
plot(x,y)
You can also connect the points with a line by adding the type argument.
plot(x,y, type = "l")
You can also customize the color and shape of the points, the line thickness, and other aspects according to your preferences. For example, you can make the points red, change the line thickness, and modify the point shape. Here’s how you could do it:
plot(x,y, type = "l", col= "red")
You can also create overlapping plots (multiple plots on the same graph) using the points() or lines() functions after the initial plot. Here’s an example:
line1 <- c(1,2,3,4,5,10)
line2 <- c(2,5,7,8,9,10)
plot(line1, type = "l", col = "blue")
lines(line2, type="l", col = "red")
R can also create pie charts using the pie() function.
x <- c(10,20,30,40)
pie(x, init.angle = 90)
R can also create bar charts using the barplot()
function. You can customize it by coloring, shading, making it
horizontal, and adjusting it according to your needs.
x <- c("A", "B", "C", "D")
y <- c(2, 4, 6, 8)
barplot(y, names.arg = x)
R was originally created for statistical calculations. Some basic statistical calculations provided include:
With the data from earlier (variable “data”), we can create a
statistical summary using the summary() function.
summary(data)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 3.25 5.50 4.80 6.00 8.00
he summary() function provides the minimum, maximum, median, and the first and third quartiles.
For descriptive statistics, such as the trio of mean, median, and mode, R has built-in functions for mean and median. For the mode, you can create a custom function using other built-in functions.
mean(data)
## [1] 4.8
median(data)
## [1] 5.5
names(sort(-table(data)))[1]
## [1] "6"
For percentiles, we can find the value of a specific percentile using
the quantile() function. For example, if we want to find
the 60th percentile, we can do the following:
quantile(data, c(0.6))
## 60%
## 6
Remember IQR, or inter-quartile range, which is the middle value of the data between the first and third quartiles? It is obtained by subtracting Q1 from Q3.
IQR(data)
## [1] 2.75
It’s also fine to calculate it manually, by subtracting Q1 from Q3.
quantile(data, c(0.75)) - quantile(data, c(0.25))
## 75%
## 2.75
To calculate data dispersion, such as standard deviation, R also has
built-in functions like sd(), or var() for
variance.
sd(data)
## [1] 2.250926
var(data)
## [1] 5.066667
If you need to calculate variance manually, you should remember that the standard deviation (sd) is the square root of the variance. So:
sd(data)^2
## [1] 5.066667
See, the result is the same as the built-in function.
The data is taken from the mtcars database, and we will analyze various aspects related to car types and their fuel consumption (in miles per gallon or mpg).
mtcars$mpg
## [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
## [16] 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 15.8 19.7
## [31] 15.0 21.4
Data:
mtcars$mpg.Find or create:
length()
function.