Set Up Your Computer or Virtual Machine

For this semester, you have two options:

Use your own computer.
- If you have an older machine, you are likely to have problems with some of the installation of packages. If you have a MacBook Air or other “mini” type machine, you will potentially find you do not have the space for the installed packages.
- I will help you troubleshoot during the completion of this assignment, but all other problems (and later in the semester) you will need to use google/stack overflow/etc. to fix for yourself.
Use the provided RStudio virtual machine.
- https://class.aggieerin.com/auth-sign-in
- Nearly all packages have been installed for you already!
- Be sure to save your work on your own machine as well, this server is reset at the end of each semester.

Only complete the part of the assignment based on your choice for computer or virtual machine!!

Your Own Computer

Install Software

Most recent version of R: https://cloud.r-project.org/
Most recent version of RStudio: https://rstudio.com/products/rstudio/download/#download
Java JDK: https://www.oracle.com/technetwork/java/javase/downloads/index.html
Most Recent Java: https://www.java.com/en/

##r chunk - do not change these
R.version

##                _                           
## platform       x86_64-apple-darwin17.0     
## arch           x86_64                      
## os             darwin17.0                  
## system         x86_64, darwin17.0          
## status                                     
## major          4                           
## minor          0.2                         
## year           2020                        
## month          06                          
## day            22                          
## svn rev        78730                       
## language       R                           
## version.string R version 4.0.2 (2020-06-22)
## nickname       Taking Off Again

#RStudio.Version() run this line but it won't knit with it "on"

ANSWER: What version of Rstudio are you using? Please note it should be the latest version! R version 3.6.1 (2019-07-05) ### Install all the R Packages

Install the following packages, that you will need all semester:
markdown, knitr, reticulate, rvest, stringr, tokenizers, stringi, textclean, hunspell, tm, textstem, devtools, qdap, wordnet
Run this: Sys.setenv(TZ = “America/New_York”) if you see the TZ errors.
You may need to pay attention to the rlang package, I had to uninstall rlang and reinstall it.
Do not include code for knitting purposes.

Install the special R Packages

Use the following code to install the special packages for R.
Change eval = TRUE to eval = FALSE once you have them installed.

install.packages("https://osf.io/ak7gq/download", repos = NULL, method = "libcurl", type = "source")
devtools::install_github("trinker/termco")

## Skipping install of 'termco' from a github remote, the SHA1 (b246be55) has not changed since last install.
##   Use `force = TRUE` to force installation

devtools::install_github("trinker/coreNLPsetup")

## Skipping install of 'coreNLPsetup' from a github remote, the SHA1 (0fc06d43) has not changed since last install.
##   Use `force = TRUE` to force installation

devtools::install_github("trinker/tagger")

## Skipping install of 'tagger' from a github remote, the SHA1 (203c1ea5) has not changed since last install.
##   Use `force = TRUE` to force installation

devtools::install_github("bnosac/RDRPOSTagger")

## Skipping install of 'RDRPOSTagger' from a github remote, the SHA1 (af51e38f) has not changed since last install.
##   Use `force = TRUE` to force installation

devtools::install_github("bradleyboehmke/harrypotter")

## Skipping install of 'harrypotter' from a github remote, the SHA1 (51f71461) has not changed since last install.
##   Use `force = TRUE` to force installation

Set up your python

Load the reticulate library.

##r chunk

Install Miniconda

Try typing py_config() below. You should get a prompt to install Miniconda. If not, use install_miniconda().

##r chunk

Show you’ve installed Python

Run py_config() in the R chunk below.

##r chunk

Windows Machines

Windows machines need special programs to make all this work:

Rtools: https://cran.r-project.org/bin/windows/Rtools/
Visual Studio Build Tools 2019: https://visualstudio.microsoft.com/downloads/#build-tools
WordNet: https://wordnet.princeton.edu/download/current-version

Install Python Packages

Install the python packages by typing in R (the reticulate library must be loaded!): py_install("package_name", pip = T)
Change eval = TRUE to eval = FALSE once you have them installed.

Packages: nltk, matplotlib, PyQt5, scikit-learn, numpy, pandas, regex, requests, bs4, spacy, contractions, textblob, sip, gensim, afinn, pyLDAvis

py_install("pandas", pip = T)
py_install("nltk", pip = T)
py_install("matplotlib", pip = T)
py_install("PyQt5", pip = T)
py_install("scikit-learn", pip = T)
py_install("numpy", pip = T)
py_install("regex", pip = T)
py_install("requests", pip = T)
py_install("bs4", pip = T)
py_install("spacy", pip = T)
py_install("contractions", pip = T)
py_install("textblob", pip = T)
py_install("sip", pip = T)
py_install("gensim", pip = T)
py_install("afinn", pip = T)
py_install("gensim", pip = T)
py_install("pyLDAvis", pip = T)
py_install("gensim", pip = T)

Special Python Extras

For nltk, you will need to add a few other pieces. Type the following into R console: - library(reticulate) - repl_python() - Here you should notice you have switched from > to >>> which indicates you are in Python:

import nltk
nltk.download(“popular”)
nltk.download(“nps_chat”)
nltk.download(“webtext”)
nltk.download(“brown”)
nltk.download(“sentiwordnet”)
nltk.download(“vader_lexicon”)

To get out of >>> python, type exit or hit the Esc key.

Click on terminal > type in: - python -m spacy download en_core_web_sm - This will download the English language spacy module.

Virtual Machine

Go to: https://class.aggieerin.com/auth-sign-in

Your log in is:

Username: firstname_lastname
Password: firstnameidnumber
Fill in first and last name with your first and last name, while ID number is your HU ID number.

Python Set Up

Click on terminal and run the following lines:
- pip3 install -U spacy
- python3 -m spacy download en_core_web_sm
- pip3 install nltk
- pip3 install gensim
- python3 -m nltk.downloader popular
- pip3 install contractions
- pip3 install textblob
- python3 -m nltk.downloader nps_chat
- python3 -m nltk.downloader webtext
- python3 -m nltk.downloader brown

Special R Package

Run the following in the R console:

devtools::install_github(“bnosac/RDRPOSTagger”, force = TRUE)

Turn off Miniconda

When you run py_config() the first time, it will ask you to install miniconda. Say no! We already have python3 installed on the server.

##r chunk
library(reticulate)
py_config()

## python:         /Users/emilyhuang/Library/r-miniconda/envs/r-reticulate/bin/python
## libpython:      /Users/emilyhuang/Library/r-miniconda/envs/r-reticulate/lib/libpython3.6m.dylib
## pythonhome:     /Users/emilyhuang/Library/r-miniconda/envs/r-reticulate:/Users/emilyhuang/Library/r-miniconda/envs/r-reticulate
## version:        3.6.10 |Anaconda, Inc.| (default, Mar 25 2020, 18:53:43)  [GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]
## numpy:          /Users/emilyhuang/Library/r-miniconda/envs/r-reticulate/lib/python3.6/site-packages/numpy
## numpy_version:  1.19.1

Everyone

Let’s do some R

In this chunk, we will load a dataset - use data(rock) to load it.
Use the head() function to print out the first six rows of the dataset.

##r chunk
rock <= data(rock)

##       area peri shape perm
##  [1,] TRUE TRUE  TRUE TRUE
##  [2,] TRUE TRUE  TRUE TRUE
##  [3,] TRUE TRUE  TRUE TRUE
##  [4,] TRUE TRUE  TRUE TRUE
##  [5,] TRUE TRUE  TRUE TRUE
##  [6,] TRUE TRUE  TRUE TRUE
##  [7,] TRUE TRUE  TRUE TRUE
##  [8,] TRUE TRUE  TRUE TRUE
##  [9,] TRUE TRUE  TRUE TRUE
## [10,] TRUE TRUE  TRUE TRUE
## [11,] TRUE TRUE  TRUE TRUE
## [12,] TRUE TRUE  TRUE TRUE
## [13,] TRUE TRUE  TRUE TRUE
## [14,] TRUE TRUE  TRUE TRUE
## [15,] TRUE TRUE  TRUE TRUE
## [16,] TRUE TRUE  TRUE TRUE
## [17,] TRUE TRUE  TRUE TRUE
## [18,] TRUE TRUE  TRUE TRUE
## [19,] TRUE TRUE  TRUE TRUE
## [20,] TRUE TRUE  TRUE TRUE
## [21,] TRUE TRUE  TRUE TRUE
## [22,] TRUE TRUE  TRUE TRUE
## [23,] TRUE TRUE  TRUE TRUE
## [24,] TRUE TRUE  TRUE TRUE
## [25,] TRUE TRUE  TRUE TRUE
## [26,] TRUE TRUE  TRUE TRUE
## [27,] TRUE TRUE  TRUE TRUE
## [28,] TRUE TRUE  TRUE TRUE
## [29,] TRUE TRUE  TRUE TRUE
## [30,] TRUE TRUE  TRUE TRUE
## [31,] TRUE TRUE  TRUE TRUE
## [32,] TRUE TRUE  TRUE TRUE
## [33,] TRUE TRUE  TRUE TRUE
## [34,] TRUE TRUE  TRUE TRUE
## [35,] TRUE TRUE  TRUE TRUE
## [36,] TRUE TRUE  TRUE TRUE
## [37,] TRUE TRUE  TRUE TRUE
## [38,] TRUE TRUE  TRUE TRUE
## [39,] TRUE TRUE  TRUE TRUE
## [40,] TRUE TRUE  TRUE TRUE
## [41,] TRUE TRUE  TRUE TRUE
## [42,] TRUE TRUE  TRUE TRUE
## [43,] TRUE TRUE  TRUE TRUE
## [44,] TRUE TRUE  TRUE TRUE
## [45,] TRUE TRUE  TRUE TRUE
## [46,] TRUE TRUE  TRUE TRUE
## [47,] TRUE TRUE  TRUE TRUE
## [48,] TRUE TRUE  TRUE TRUE

head(rock)

##   area    peri     shape perm
## 1 4990 2791.90 0.0903296  6.3
## 2 7002 3892.60 0.1486220  6.3
## 3 7558 3930.66 0.1833120  6.3
## 4 7352 3869.32 0.1170630  6.3
## 5 7943 3948.54 0.1224170 17.1
## 6 7979 4010.15 0.1670450 17.1

Call a dataset in Python

First, load the sklearn library, it has several sample datasets. You load python packages by using import PACKAGE. Note that you install and call this package different names (scikit-learn = sklearn).
Next, import the datasets part of sklearn by doing from PACKAGE import FUNCTION. Therefore, you should use from sklearn import datasets.
Then call the boston dataset by doing: dataset_boston = datasets.load_boston().
To print out the first six rows, use the .head() function: df_boston.head(), after converting the file with pandas (code included below).

##python chunk
##TYPE HERE##
#import sklearn
#from sklearn import datasets
##convert to pandas
#import pandas as pd
#df_boston = pd.DataFrame(data=dataset_boston.data, columns=dataset_boston.feature_names)

QUESTION: Look in your environment window. What do you see? same

Print out Python information in R

You can have the two environments interact. To print out information from Python in R: py$VARNAME.
Normally, to print out R dataset columns, you do DATAFRAME$COLUMN. Try to print out the CRIM column from your df_boston variable. py$VARNAME

##r chunk
#rock$area
#view(rock)

Print our R in Python

When using R in Python, instead of $, we use . like this: r.VARNAME.
To print out a single column, you use DATAFRAME["COLUMNNAME"]. Try printing out the shape column in the rock dataset.

##python chunk
#rock['area']

Get started with PyCharm!

Great job! Here’s what you learned:
- Installed Python!
- You know how to install and load the libraries in both languages.
- You know how to load built in datasets in both languages.
- You know how to print out data from one language to another.
Turn this document in for credit –> hit KNIT –> turn in the HTML file.
Be sure to fill in your name at the top!
Be sure to answer the embedded questions!

install.packages(rJava)