The Basics

This cheat sheet contains commands I either find myself googling frequently, or are just general good functions/commands to be familiar with for a variety of analyses! I will likely update this over time.




Other Helpful Data Commands:


Re-naming Variables in a Dataframe:

View Names in df

Re-name all names in dataset, generally speaking

Re-name only Row Names:

Re-Name only Columns



Changing The Class of Variables in R:


Setting a Variable as Ordinal/Numeric/Categorical

Change the level names of a categorical variable

Change ALL variables to be all numeric:






Missing Data Checks

View & Delete Missingness - (this does listwise deletion):


Function to change missing values to ‘NA’ (useful for if you forgot to tell R what your missing values were when you read in your data)



Other Missing Data Descriptives:

Look at the percent missing:


Compute the covariance coverage using the md.pairs() in the mice package






Subsetting Data in R

Typical Ways to Subset Data:

1.) my_df[1:3] (no comma) will subset my_df, returning the first three columns as a data frame.

2.) my_df[1:3, ] (with comma, numbers to left of the comma) will subset my_df and return the first three rows as a data frame.

3.) my_df[, 1:3] (with comma, numbers to right of the comma) will subset my_df and return the first three columns as a data frame, the same as my_df[1:3].

Easy Example to Physically See How Subsetting Works in R:

For subsetting rows:

For subsetting columns:

Quick Subsetting of Rows & Columns Together:

##           [,1]      [,2]      [,3]      [,4]
## [1,]  9.254785 12.614388  9.346585 10.090685
## [2,]  9.318042  8.092890 10.899995 10.440230
## [3,] 11.330883  9.718729 12.053058  9.108824
## [4,] 10.699723  9.391665 11.243303  7.814188
## [5,] 10.043442  8.910559 10.507499 10.974342
## [6,]  9.537012  9.758896 10.937656  9.163288
  • Can see that we successfully selected rows 1-6 and columns 1-4.

  • If you wanted to select columns or rows that were not right next to each other, can do this:

##           [,1]      [,2]      [,3]
## [1,]  9.254785 12.614388  9.346585
## [2,] 10.699723  9.391665 11.243303
## [3,]  9.612798  8.997450 11.177066



Alternative Methods for Quick Subsetting (using package ‘dplyr’):

dplyr is nice if you want to subset from your original dataset based on a set of variables that are similarly named. For example:






Recoding & Re-Scaling Variables:

Quick Way to Manually Re-Code ALL Variables in a df (using the package ‘car’):

To demonstrate, making fake data

  • ‘Survey’ function for ‘var1’ –>draws 20 samples from integers 1:5 w/ replacement

Now, let’s just change var1 to have different values (currently has values 1-5)

  • Brackets allow subsetting of columns…could also do ‘survey$var1’
  • This will change all ‘5’ in column 1 (var1) to be 6, and all ‘1s’ to be ‘1.5.’

If we wanted to rescale all values in all columns of the entire survey dataset, we could do so like this:

##      var1 var2 age
## [1,]    6  1.5  45
## [2,]    4  1.5  19
## [3,]    2  1.5  19
## [4,]    2  1.5  43
## [5,]    3  2.0  21
## [6,]    3  2.0  20
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
## [1,]  6.0  4.0  2.0  2.0    3    3    3    3    2     3   1.5     3   1.5
## [2,]  1.5  1.5  1.5  1.5    2    2    2    2    3     3   3.0     3   4.0
## [3,] 45.0 19.0 19.0 43.0   21   20   21   32   24    17  13.0    15  35.0
##      [,14] [,15] [,16] [,17] [,18] [,19] [,20]
## [1,]   1.5     6     2     3     3     2     2
## [2,]   4.0     4     4     6     6     6     6
## [3,]  45.0    11    27    13    40    40    41

Rescaling a Variable to be between 0-1 (using the ‘scales’ package):

  • Use same ‘survey’ dataset from above:

Plots


Tables & Output


Frequency Table:

## animal
##  cat  dog seal 
##    3    2    2


2-Way Cross-Tabulation Using ‘gmodels’:

## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## | Chi-square contribution |
## |           N / Row Total |
## |           N / Col Total |
## |         N / Table Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  10 
## 
##  
##              | vary 
##         varx |         1 |         2 |         4 |         5 | Row Total | 
## -------------|-----------|-----------|-----------|-----------|-----------|
##            1 |         1 |         1 |         0 |         0 |         2 | 
##              |     0.267 |     0.050 |     0.400 |     0.200 |           | 
##              |     0.500 |     0.500 |     0.000 |     0.000 |     0.200 | 
##              |     0.333 |     0.250 |     0.000 |     0.000 |           | 
##              |     0.100 |     0.100 |     0.000 |     0.000 |           | 
## -------------|-----------|-----------|-----------|-----------|-----------|
##            2 |         1 |         1 |         0 |         0 |         2 | 
##              |     0.267 |     0.050 |     0.400 |     0.200 |           | 
##              |     0.500 |     0.500 |     0.000 |     0.000 |     0.200 | 
##              |     0.333 |     0.250 |     0.000 |     0.000 |           | 
##              |     0.100 |     0.100 |     0.000 |     0.000 |           | 
## -------------|-----------|-----------|-----------|-----------|-----------|
##            3 |         1 |         1 |         0 |         0 |         2 | 
##              |     0.267 |     0.050 |     0.400 |     0.200 |           | 
##              |     0.500 |     0.500 |     0.000 |     0.000 |     0.200 | 
##              |     0.333 |     0.250 |     0.000 |     0.000 |           | 
##              |     0.100 |     0.100 |     0.000 |     0.000 |           | 
## -------------|-----------|-----------|-----------|-----------|-----------|
##            4 |         0 |         0 |         2 |         1 |         3 | 
##              |     0.900 |     1.200 |     3.267 |     1.633 |           | 
##              |     0.000 |     0.000 |     0.667 |     0.333 |     0.300 | 
##              |     0.000 |     0.000 |     1.000 |     1.000 |           | 
##              |     0.000 |     0.000 |     0.200 |     0.100 |           | 
## -------------|-----------|-----------|-----------|-----------|-----------|
##            5 |         0 |         1 |         0 |         0 |         1 | 
##              |     0.300 |     0.900 |     0.200 |     0.100 |           | 
##              |     0.000 |     1.000 |     0.000 |     0.000 |     0.100 | 
##              |     0.000 |     0.250 |     0.000 |     0.000 |           | 
##              |     0.000 |     0.100 |     0.000 |     0.000 |           | 
## -------------|-----------|-----------|-----------|-----------|-----------|
## Column Total |         3 |         4 |         2 |         1 |        10 | 
##              |     0.300 |     0.400 |     0.200 |     0.100 |           | 
## -------------|-----------|-----------|-----------|-----------|-----------|
## 
## 

Helpful Tricks (but no treats, sigh)



Paste and Paste0:

The difference between paste() and paste0() is that the argument ‘sep’ by default is ” ” (paste) and “” (paste0).

##  [1] "variable1"  "variable2"  "variable3"  "variable4"  "variable5" 
##  [6] "variable6"  "variable7"  "variable8"  "variable9"  "variable10"
##  [1] "variable1self.report"  "variable2self.report" 
##  [3] "variable3self.report"  "variable4self.report" 
##  [5] "variable5self.report"  "variable6self.report" 
##  [7] "variable7self.report"  "variable8self.report" 
##  [9] "variable9self.report"  "variable10self.report"


Fun Sounds for Long Analyses:

Add a fun sound at the end of your code so you know your model is done!


Change Max Print Default:

Long output? Change default print settings


Find Column Number w/ Vbl Name:

Finding column number of a particular variable


Add Quotation Marks Easily:

Call/work with variable names without needing quotation marks using the Hmisc

  • I find this particularly helpful for when I need to subset something!!!!


noquote:

Conversely, get rid of quotations using ‘noquote’

## [1] var1 var2 var3


Positive Definite Check:

Checking if Matrix is Positive Definite! (comes in handy for SEM models)



dpylr tricks:

dpylr has a ton of helpful commands. There are some really good tutorials out there that explore the many useful aspects of this package, such as this one here.


purr tricks:

Also a really handy package for various things. Check out this tutorial on purr here.


tidyr tricks:

Again, there are many great features that tidyr has. See the tutorial here or also here. The latter link has dpylr functions included too!

One of my favorites though in tidyr is the ‘complete’ function.The complete() function allows you to fill in the gaps for all observations that had no data. Essentially, you can define the observations that you want to complete, & then tell R what value to use to plug into the missing gaps.



more tips soon to come!