For this semester, you have two options:

Only complete the part of the assignment based on your choice for computer or virtual machine!!

Your Own Computer

Install Software

##r chunk - do not change these
R.version
##                _                           
## platform       x86_64-apple-darwin17.0     
## arch           x86_64                      
## os             darwin17.0                  
## system         x86_64, darwin17.0          
## status                                     
## major          4                           
## minor          0.2                         
## year           2020                        
## month          06                          
## day            22                          
## svn rev        78730                       
## language       R                           
## version.string R version 4.0.2 (2020-06-22)
## nickname       Taking Off Again
#RStudio.Version() run this line but it won't knit with it "on"

Answer: I am using the 4.0.2 version RStudio.

Install all the R Packages

  • Install the following packages, that you will need all semester:
  • markdown, knitr, reticulate, party, ca, FactoMineR, rgl, rms, visreg, pvclust, car, boot, psych, ngram, LSAfun, tm, topicmodels, tidytext, tidyverse, ggraph, igraph, widyr, stringdist
  • Run this: Sys.setenv(TZ = “America/New_York”) if you see the TZ errors.
  • You may need to pay attention to the rlang package, I had to uninstall rlang and reinstall it.
  • Do not include the code for knitting purposes.
install.packages("markdown")
install.packages("knitr")
install.packages("reticulate")
install.packages("party")
install.packages("ca")
install.packages("FactoMineR")
install.packages("rgl")
install.packages("rms")
install.packages("visreg")
install.packages("pvclust")
install.packages("car")
install.packages("boot")
install.packages("psych")
install.packages("ngram")
install.packages("LSAfun")
install.packages("tm")
install.packages("topicmodels")
install.packages("tidytext")
install.packages("tidyverse")
install.packages("ggraph")
install.packages("igraph")
install.packages("widyr")
install.packages("stringdist")

Install the special R Packages

  • Use the following code to install the special packages for R.
  • Change eval = TRUE to eval = FALSE once you have them installed.
install.packages("https://osf.io/ak7gq/download", repos = NULL, method = "libcurl", type = "source")
memnetpackage <- "https://cran.r-project.org/src/contrib/Archive/memnet/memnet_0.1.0.tar.gz"
install.packages(memnetpackage, repos=NULL, type="source")

Set up your python

  • Load the reticulate library.
library(reticulate)

Install Miniconda

Try typing py_config() below. You should get a prompt to install Miniconda. If not, use install_miniconda().

py_config()
## python:         /Users/hailunfeng/Library/r-miniconda/envs/r-reticulate/bin/python
## libpython:      /Users/hailunfeng/Library/r-miniconda/envs/r-reticulate/lib/libpython3.6m.dylib
## pythonhome:     /Users/hailunfeng/Library/r-miniconda/envs/r-reticulate:/Users/hailunfeng/Library/r-miniconda/envs/r-reticulate
## version:        3.6.10 |Anaconda, Inc.| (default, Mar 25 2020, 18:53:43)  [GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]
## numpy:          /Users/hailunfeng/Library/r-miniconda/envs/r-reticulate/lib/python3.6/site-packages/numpy
## numpy_version:  1.19.1

Show you’ve installed Python

Run py_config() in the R chunk below.

py_config()
## python:         /Users/hailunfeng/Library/r-miniconda/envs/r-reticulate/bin/python
## libpython:      /Users/hailunfeng/Library/r-miniconda/envs/r-reticulate/lib/libpython3.6m.dylib
## pythonhome:     /Users/hailunfeng/Library/r-miniconda/envs/r-reticulate:/Users/hailunfeng/Library/r-miniconda/envs/r-reticulate
## version:        3.6.10 |Anaconda, Inc.| (default, Mar 25 2020, 18:53:43)  [GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]
## numpy:          /Users/hailunfeng/Library/r-miniconda/envs/r-reticulate/lib/python3.6/site-packages/numpy
## numpy_version:  1.19.1

Windows Machines

Windows machines need special programs to make all this work:

Install Python Packages

  • Install the python packages by typing in R (the reticulate library must be loaded!): py_install("package_name", pip = T)
  • Change eval = TRUE to eval = FALSE once you have them installed.

Packages: nltk, matplotlib, PyQt5, scikit-learn, numpy, pandas, prince, factor-analyzer, gensim, pyLDAvis, bs4

library(reticulate)

py_install("nltk", pip = T)
py_install("matplotlib", pip = T)
py_install("PyQt5", pip = T)
py_install("scikit-learn", pip = T)
py_install("numpy", pip = T)
py_install("pandas", pip = T)
py_install("prince", pip = T)
py_install("factor-analyzer", pip = T)
py_install("gensim", pip = T)
py_install("pyLDAvis", pip = T)
py_install("bs4", pip = T)

Special Python Extras

For nltk, you will need to add a few other pieces. Type the following into R console:

library(reticulate) repl_python()

Here you should notice you have switched from > to >>> which indicates you are in Python:

import nltk nltk.download(“popular”) nltk.download(“nps_chat”) nltk.download(“webtext”) nltk.download(“abc”)

To get out of >>> python, type exit or hit the Esc key.

Click on terminal > type in:

python -m spacy download en_core_web_sm pip install lxml

This will download the English language spacy module.

Virtual Machine

Go to: https://www.thedoomlab.com/rstudio/auth-sign-in

Your log in is:

Python Set Up

  • Click on terminal and run the following lines:

pip3 install -U spacy python3 -m spacy download en_core_web_sm pip3 install nltk pip3 install gensim python3 -m nltk.downloader popular pip3 install prince pip3 install factor-analyzer pip3 install pyLDAvis python3 -m nltk.downloader abc

Turn off Miniconda

When you run py_config() the first time, it will ask you to install miniconda. Say no! We already have python3 installed on the server.

library(reticulate)
py_config()
## python:         /Users/hailunfeng/Library/r-miniconda/envs/r-reticulate/bin/python
## libpython:      /Users/hailunfeng/Library/r-miniconda/envs/r-reticulate/lib/libpython3.6m.dylib
## pythonhome:     /Users/hailunfeng/Library/r-miniconda/envs/r-reticulate:/Users/hailunfeng/Library/r-miniconda/envs/r-reticulate
## version:        3.6.10 |Anaconda, Inc.| (default, Mar 25 2020, 18:53:43)  [GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]
## numpy:          /Users/hailunfeng/Library/r-miniconda/envs/r-reticulate/lib/python3.6/site-packages/numpy
## numpy_version:  1.19.1

Everyone

Let’s do some R

  • In this chunk, we will load a dataset - use data(rock) to load it.
  • Use the head() function to print out the first six rows of the dataset.
data(rock)
head(rock)
##   area    peri     shape perm
## 1 4990 2791.90 0.0903296  6.3
## 2 7002 3892.60 0.1486220  6.3
## 3 7558 3930.66 0.1833120  6.3
## 4 7352 3869.32 0.1170630  6.3
## 5 7943 3948.54 0.1224170 17.1
## 6 7979 4010.15 0.1670450 17.1

Call a dataset in Python

  • First, load the sklearn library, it has several sample datasets. You load python packages by using import PACKAGE. Note that you install and call this package different names (scikit-learn = sklearn).
  • Next, import the datasets part of sklearn by doing from PACKAGE import FUNCTION. Therefore, you should use from sklearn import datasets.
  • Then call the boston dataset by doing: dataset_boston = datasets.load_boston().
  • To print out the first six rows, use the .head() function: df_boston.head(), after converting the file with pandas (code included below).

import sklearn
from sklearn import datasets
dataset_boston = datasets.load_boston()

##convert to pandas
import pandas as pd
df_boston = pd.DataFrame(data=dataset_boston.data, columns=dataset_boston.feature_names)
df_boston.head()
##       CRIM    ZN  INDUS  CHAS    NOX  ...  RAD    TAX  PTRATIO       B  LSTAT
## 0  0.00632  18.0   2.31   0.0  0.538  ...  1.0  296.0     15.3  396.90   4.98
## 1  0.02731   0.0   7.07   0.0  0.469  ...  2.0  242.0     17.8  396.90   9.14
## 2  0.02729   0.0   7.07   0.0  0.469  ...  2.0  242.0     17.8  392.83   4.03
## 3  0.03237   0.0   2.18   0.0  0.458  ...  3.0  222.0     18.7  394.63   2.94
## 4  0.06905   0.0   2.18   0.0  0.458  ...  3.0  222.0     18.7  396.90   5.33
## 
## [5 rows x 13 columns]

Question: Look in your environment window. What do you see? Answer: In my Global Environment, I can see the dataset(s) just imported

Get started with PyCharm!