For this semester, you have two options:

Only complete the part of the assignment based on your choice for computer or virtual machine!!

Your Own Computer

Install Software

##r chunk - do not change these
R.version
##                _                           
## platform       x86_64-w64-mingw32          
## arch           x86_64                      
## os             mingw32                     
## system         x86_64, mingw32             
## status                                     
## major          3                           
## minor          6.2                         
## year           2019                        
## month          12                          
## day            12                          
## svn rev        77560                       
## language       R                           
## version.string R version 3.6.2 (2019-12-12)
## nickname       Dark and Stormy Night
#RStudio.Version() run this line but it won't knit with it "on"

ANSWER: What version of Rstudio are you using? Please note it should be the latest version! R version 3.6.2 (2019-12-12)

Install all the R Packages

  • Install the following packages, that you will need all semester:
  • markdown, knitr, reticulate, party, ca, FactoMineR, rgl, rms, visreg, pvclust, car, boot, psych, ngram, LSAfun, tm, topicmodels, tidytext, tidyverse, ggraph, igraph, memnet, widyr, stringdist
  • Run this: Sys.setenv(TZ = “America/New_York”) if you see the TZ errors.
  • You may need to pay attention to the rlang package, I had to uninstall rlang and reinstall it.
  • Do not include code for knitting purposes.

Install the special R Packages

  • Use the following code to install the special packages for R.
  • Change eval = TRUE to eval = FALSE once you have them installed.
install.packages("https://osf.io/ak7gq/download", repos = NULL, method = "libcurl", type = "source")

Set up your python

  • Load the reticulate library.
##r chunk
library(reticulate)
## Warning: package 'reticulate' was built under R version 3.6.3

Install Miniconda

Try typing py_config() below. You should get a prompt to install Miniconda. If not, use install_miniconda().

##r chunk
py_config()
## python:         C:/Users/PC/AppData/Local/r-miniconda/envs/r-reticulate/python.exe
## libpython:      C:/Users/PC/AppData/Local/r-miniconda/envs/r-reticulate/python36.dll
## pythonhome:     C:/Users/PC/AppData/Local/r-miniconda/envs/r-reticulate
## version:        3.6.10 |Anaconda, Inc.| (default, May  7 2020, 19:46:08) [MSC v.1916 64 bit (AMD64)]
## Architecture:   64bit
## numpy:          C:/Users/PC/AppData/Local/r-miniconda/envs/r-reticulate/Lib/site-packages/numpy
## numpy_version:  1.18.4

Show you’ve installed Python

Run py_config() in the R chunk below.

##r chunk
py_config()
## python:         C:/Users/PC/AppData/Local/r-miniconda/envs/r-reticulate/python.exe
## libpython:      C:/Users/PC/AppData/Local/r-miniconda/envs/r-reticulate/python36.dll
## pythonhome:     C:/Users/PC/AppData/Local/r-miniconda/envs/r-reticulate
## version:        3.6.10 |Anaconda, Inc.| (default, May  7 2020, 19:46:08) [MSC v.1916 64 bit (AMD64)]
## Architecture:   64bit
## numpy:          C:/Users/PC/AppData/Local/r-miniconda/envs/r-reticulate/Lib/site-packages/numpy
## numpy_version:  1.18.4

Windows Machines

Windows machines need special programs to make all this work:

Install Python Packages

  • Install the python packages by typing in R (the reticulate library must be loaded!): py_install("package_name", pip = T)
  • Change eval = TRUE to eval = FALSE once you have them installed.

Packages: nltk, matplotlib, PyQt5, scikit-learn, numpy, pandas, prince, factor-analyzer, gensim, pyLDAvis, bs4

Special Python Extras

For nltk, you will need to add a few other pieces. Type the following into R console: - library(reticulate) - repl_python() - Here you should notice you have switched from > to >>> which indicates you are in Python:

  • import nltk
  • nltk.download(“popular”)
  • nltk.download(“nps_chat”)
  • nltk.download(“webtext”)
  • nltk.download(“abc”)

To get out of >>> python, type exit or hit the Esc key.

Click on terminal > type in: - python -m spacy download en_core_web_sm - This will download the English language spacy module. - pip install lxml

Virtual Machine

Go to: https://class.aggieerin.com/auth-sign-in

Your log in is:

Python Set Up

  • Click on terminal and run the following lines:

    • pip3 install -U spacy
    • python3 -m spacy download en_core_web_sm
    • pip3 install nltk
    • pip3 install gensim
    • python3 -m nltk.downloader popular
    • pip3 install prince
    • pip3 install factor-analyzer
    • pip3 install pyLDAvis
    • python3 -m nltk.downloader abc

Turn off Miniconda

When you run py_config() the first time, it will ask you to install miniconda. Say no! We already have python3 installed on the server.

##r chunk
library(reticulate)
py_config()
## python:         C:/Users/PC/AppData/Local/r-miniconda/envs/r-reticulate/python.exe
## libpython:      C:/Users/PC/AppData/Local/r-miniconda/envs/r-reticulate/python36.dll
## pythonhome:     C:/Users/PC/AppData/Local/r-miniconda/envs/r-reticulate
## version:        3.6.10 |Anaconda, Inc.| (default, May  7 2020, 19:46:08) [MSC v.1916 64 bit (AMD64)]
## Architecture:   64bit
## numpy:          C:/Users/PC/AppData/Local/r-miniconda/envs/r-reticulate/Lib/site-packages/numpy
## numpy_version:  1.18.4

Everyone

Let’s do some R

  • In this chunk, we will load a dataset - use data(rock) to load it.
  • Use the head() function to print out the first six rows of the dataset.
##r chunk
data(rock)
head(rock)
##   area    peri     shape perm
## 1 4990 2791.90 0.0903296  6.3
## 2 7002 3892.60 0.1486220  6.3
## 3 7558 3930.66 0.1833120  6.3
## 4 7352 3869.32 0.1170630  6.3
## 5 7943 3948.54 0.1224170 17.1
## 6 7979 4010.15 0.1670450 17.1

Call a dataset in Python

  • First, load the sklearn library, it has several sample datasets. You load python packages by using import PACKAGE. Note that you install and call this package different names (scikit-learn = sklearn).
  • Next, import the datasets part of sklearn by doing from PACKAGE import FUNCTION. Therefore, you should use from sklearn import datasets.
  • Then call the boston dataset by doing: dataset_boston = datasets.load_boston().
  • To print out the first six rows, use the .head() function: df_boston.head(), after converting the file with pandas (code included below).
##python chunk
#scikit-learn = sklearn
#import sklearn
#from sklearn import datasets
#dataset_boston = datasets.load_boston()
##convert to pandas
#import pandas as pd
#df_boston = pd.DataFrame(data=dataset_boston.data, columns=dataset_boston.feature_names)
#df_boston.head()

QUESTION: Look in your environment window. What do you see?

Get started with PyCharm!