1 Set Up

  1. Install R, then R Studio. https://posit.co/download/rstudio-desktop/

  2. Reading up on the help files from official R website.

2 Coding Best Practices

Good coding style is like using correct punctuation. You can manage without it, but it sure makes things easier to read. As with styles of punctuation, there are many possible variations. The following guide describes the style that I use (in this book and elsewhere). It is based on Google’s R style guide, with a few tweaks. You don’t have to use my style, but you really should use a consistent style. 

http://adv-r.had.co.nz/Style.html

2.1 Notation and naming

  • Files names should be meaningful/informative.

  • Variable and function names should be lowercase. Use underscore ( _ ) to separate words.

    • Where possible, avoid using names of existing functions and variables. 

2.2 Syntax

  • Place spaces around all infix operators (=+-<-, etc.). 

2.3 Organisation - Commenting guidelines

Comment your code. Each line of a comment should begin with the comment symbol and a single space: #. Comments should explain the why, not the what. 

Use commented lines of - and = to break up your file into easily readable chunks.

3 Data Types vs Data Structures

  • Data Type: In R, “data type” refers to the classification of variables or values, such as numeric, integer, character, etc., which determine how data is stored in memory and what operations can be performed on them.

  • Data Structure: A “data structure” refers to the way data is organized and stored in R objects, such as vectors, lists, matrices

4 Basic Data Types

In R, data types refer to the classification of variables or values that determine how they are stored in memory and what operations can be performed on them. Here are the primary data types in R:

  1. Numeric: Used for numeric (real) values.

    • Examples: 1.5, 3, 0.25
  2. Integer: Used for integer values.

    • Examples: 1L, 2L, 100L
  3. Logical: Used for Boolean values (TRUE or FALSE).

    • Example: TRUE, FALSE
  4. Character: Used for strings of text.

    • Example: "Hello", "R programming"
  5. Factor: A special data type for categorical variables.

    • Example: factor(c("low", "medium", "high"))
  6. Date: Used for dates.

    • Example: as.Date("2023-01-01")
  7. POSIXct and POSIXlt: Used for date-time values (seconds since the Unix epoch and date-time components respectively).

    • Example: as.POSIXct("2023-01-01 12:00:00")
  8. Raw: Used for storing raw bytes.

    • Example: as.raw(c(0x01, 0x02, 0x03))
  9. Complex: Used for complex numbers with real and imaginary parts.

    • Example: 1 + 2i

These data types help R manage different kinds of information and determine how operations such as arithmetic, comparisons, and transformations are performed on them. Understanding these types is crucial for effective data manipulation, analysis, and programming in R.

  • R Data types are used to specify the kind of data that can be stored in a variable. 

  • For effective memory consumption and precise computation, the right data type must be selected. 

  • Each R data type has unique properties and associated operations.

    • Different forms of data that can be saved and manipulated are defined and categorized using data types in computer languages including R.

4.1 Comments

  • numeric - (10.5, 55, 787)

    • Decimals
  • integer - (1L, 55L, 100L, where the letter “L” declares this as an integer)

  • character (a.k.a. string) - (“k”, “R is exciting”, “FALSE”, “11.5”).

    • Addresses or names, states or countries.
  • logical (a.k.a. boolean) - (TRUE or FALSE)

    • Comparison, TRUE is usually 1 and FALSE is usually 0.
  • complex - (9 + 3i, where “i” is the imaginary part)

    • Popular in physics, but not in business/economics.
  • factor in R represents categorical data where the possible values (levels) are known and finite.

?datasets()
df <- mtcars
?as.integer
df$mpg2 <- as.integer(df$mpg) 
df$mpg3 <- as.numeric(df$mpg2) 


str(df)
## 'data.frame':    32 obs. of  13 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...
##  $ mpg2: int  21 21 22 21 18 18 14 24 22 19 ...
##  $ mpg3: num  21 21 22 21 18 18 14 24 22 19 ...
2 > 3
## [1] FALSE

5 Data Structures

In R, data structures are fundamental components used to store and organize data efficiently. Each data structure has specific characteristics that determine how data is stored in memory and what operations can be performed on it.

Understanding these structures is crucial for effectively manipulating and analyzing data in R.

  1. Vectors:

    • Atomic Vectors: These are one-dimensional arrays that can hold elements of the same data type, such as numeric, character, or logical.

      • Example: c(1, 2, 3, 4) creates a numeric vector.
    • Lists: Lists are also one-dimensional but can hold elements of different types or structures.

      • Example: list(1, "a", TRUE) creates a list with numeric, character, and logical elements.
  2. Matrices:

    • Matrices are two-dimensional arrays where all elements are of the same data type (numeric, character, etc.).

    • Example: matrix(1:6, nrow=2, ncol=3) creates a 2x3 matrix.

  3. Arrays:

    • Arrays generalize matrices to multiple dimensions, where all elements are of the same data type.

    • Example: array(1:24, dim=c(2, 3, 4)) creates a 2x3x4 array.

  4. Data Frames:

    • Data frames are two-dimensional structures similar to tables in databases or spreadsheets.

    • Columns can be of different data types (numeric, character, etc.).

    • Example: data.frame(id=c(1, 2, 3), name=c("Alice", "Bob", "Charlie")) creates a data frame with columns “id” and “name”.

  5. Factors:

    • Factors are used to represent categorical data in R.

    • They are stored as integers with corresponding levels (categories).

    • Example: factor(c("low", "high", "medium", "low"), levels=c("low", "medium", "high")) creates a factor with levels “low”, “medium”, and “high”.

  6. Lists:

    • Lists in R are versatile data structures that can contain elements of different types and lengths.

    • Example: list(1, "a", c(1, 2, 3)) creates a list with numeric, character, and numeric vector elements.

  7. Data Tables (from data.table package):

    • Data tables are enhanced data frames optimized for large datasets and efficient operations.

    • Example: data.table(id=c(1, 2, 3), name=c("Alice", "Bob", "Charlie")) creates a data table with columns “id” and “name”.

Each structure offers different capabilities and efficiencies depending on the nature of the data and the tasks being performed. By leveraging the appropriate data structure, you can optimize your workflow and enhance your ability to work with data in R effectively.

https://swcarpentry.github.io/r-novice-inflammation/13-supp-data-structures.html

6 Arithmetic Operations

Let me begin by introducing basic math operations.

https://www.codecademy.com/resources/docs/r/operators

6.1 Addition Operation

2+2 # addition
## [1] 4

6.2 Subtraction Operation

5 - 4
## [1] 1

6.3 Multiplication Operation

2 * 3
## [1] 6

6.4 Division Operation

8/3
## [1] 2.666667


# Plot data ---------------------------

hist(mtcars$mpg)

7 Appendix

sessionInfo()
## R version 4.5.1 (2025-06-13)
## Platform: aarch64-apple-darwin20
## Running under: macOS Sequoia 15.5
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/New_York
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.37     R6_2.6.1          fastmap_1.2.0     xfun_0.52        
##  [5] cachem_1.1.0      knitr_1.50        htmltools_0.5.8.1 rmarkdown_2.29   
##  [9] lifecycle_1.0.4   cli_3.6.5         sass_0.4.10       jquerylib_0.1.4  
## [13] compiler_4.5.1    rstudioapi_0.17.1 tools_4.5.1       evaluate_1.0.4   
## [17] bslib_0.9.0       yaml_2.3.10       rlang_1.1.6       jsonlite_2.0.0