Set Up Your Computer or Virtual Machine

For this semester, you have two options:

Use your own computer.
- If you have an older machine, you are likely to have problems with some of the installation of packages. If you have a MacBook Air or other “mini” type machine, you will potentially find you do not have the space for the installed packages.
- I will help you troubleshoot during the completion of this assignment, but all other problems (and later in the semester) you will need to use google/stack overflow/etc. to fix for yourself.
Use the provided RStudio virtual machine.
- https://class.aggieerin.com/auth-sign-in
- Nearly all packages have been installed for you already!
- Be sure to save your work on your own machine as well, this server is reset at the end of each semester.

Only complete the part of the assignment based on your choice for computer or virtual machine!!

Your Own Computer

Install Software

Most recent version of R: https://cloud.r-project.org/
Most recent version of RStudio: https://rstudio.com/products/rstudio/download/#download
Java JDK: https://www.oracle.com/technetwork/java/javase/downloads/index.html
Most Recent Java: https://www.java.com/en/

##r chunk - do not change these
R.version

##                _                           
## platform       x86_64-apple-darwin17.0     
## arch           x86_64                      
## os             darwin17.0                  
## system         x86_64, darwin17.0          
## status                                     
## major          4                           
## minor          0.0                         
## year           2020                        
## month          04                          
## day            24                          
## svn rev        78286                       
## language       R                           
## version.string R version 4.0.0 (2020-04-24)
## nickname       Arbor Day

#RStudio.Version() run this line but it won't knit with it "on"

ANSWER: What version of Rstudio are you using? Please note it should be the latest version!

Install all the R Packages

Install the following packages, that you will need all semester:
markdown, knitr, reticulate, party, ca, FactoMineR, rgl, rms, visreg, pvclust, car, boot, psych, ngram, LSAfun, tm, topicmodels, tidytext, tidyverse, ggraph, igraph, memnet, widyr, stringdist
Run this: Sys.setenv(TZ = “America/New_York”) if you see the TZ errors.
You may need to pay attention to the rlang package, I had to uninstall rlang and reinstall it.
Do not include code for knitting purposes.

Install the special R Packages

Use the following code to install the special packages for R.
Change eval = TRUE to eval = FALSE once you have them installed.

install.packages("https://osf.io/ak7gq/download", repos = NULL, method = "libcurl", type = "source")
Sys.setenv(TZ = "America/New_York")

Set up your python

Load the reticulate library.

##r chunk
library(reticulate)

Install Miniconda

Try typing py_config() below. You should get a prompt to install Miniconda. If not, use install_miniconda().

##r chunk
#install_miniconda()

Show you’ve installed Python

Run py_config() in the R chunk below.

##r chunk
py_config()

## python:         /Users/jiangzijun/Library/r-miniconda/envs/r-reticulate/bin/python
## libpython:      /Users/jiangzijun/Library/r-miniconda/envs/r-reticulate/lib/libpython3.6m.dylib
## pythonhome:     /Users/jiangzijun/Library/r-miniconda/envs/r-reticulate:/Users/jiangzijun/Library/r-miniconda/envs/r-reticulate
## version:        3.6.10 |Anaconda, Inc.| (default, May  7 2020, 23:06:31)  [GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]
## numpy:          /Users/jiangzijun/Library/r-miniconda/envs/r-reticulate/lib/python3.6/site-packages/numpy
## numpy_version:  1.18.4

Windows Machines

Windows machines need special programs to make all this work:

Rtools: https://cran.r-project.org/bin/windows/Rtools/
Visual Studio Build Tools 2019: https://visualstudio.microsoft.com/downloads/#build-tools
WordNet: https://wordnet.princeton.edu/download/current-version

Install Python Packages

Install the python packages by typing in R (the reticulate library must be loaded!): py_install("package_name", pip = T)
Change eval = TRUE to eval = FALSE once you have them installed.

Packages: nltk, matplotlib, PyQt5, scikit-learn, numpy, pandas, prince, factor-analyzer, gensim, pyLDAvis, bs4

#{r eval = false} #py_install("matplotlib", pip = T) #py_install("PyQt5", pip = T) #py_install("scikit-learn", pip = T) #py_install("numpy", pip = T) #py_install("pandas", pip = T) #py_install("prince", pip = T) #py_install("factor-analyzer", pip = T) #py_install("gensim", pip = T) #py_install("pyLDAvis", pip = T) #py_install("bs4", pip = T) ##py_install("nltk", pip = T) "Error installing package(s): 'nltk'" # sorry I have to comment everything otherwise can not run Knit

Special Python Extras

For nltk, you will need to add a few other pieces. Type the following into R console: - library(reticulate) - repl_python() - Here you should notice you have switched from > to >>> which indicates you are in Python:

import nltk
nltk.download(“popular”)
nltk.download(“nps_chat”)
nltk.download(“webtext”)
nltk.download(“abc”)

To get out of >>> python, type exit or hit the Esc key.

Click on terminal > type in: - python -m spacy download en_core_web_sm - This will download the English language spacy module. - pip install lxml

Virtual Machine

Go to: https://class.aggieerin.com/auth-sign-in

Your log in is:

Username: firstname_lastname
Password: firstnameidnumber
Fill in first and last name with your first and last name, while ID number is your HU ID number. # having trouble log in ### Python Set Up
Click on terminal and run the following lines:
- pip3 install -U spacy
- python3 -m spacy download en_core_web_sm
- pip3 install nltk
- pip3 install gensim
- python3 -m nltk.downloader popular
- pip3 install prince
- pip3 install factor-analyzer
- pip3 install pyLDAvis
- python3 -m nltk.downloader abc

Turn off Miniconda

When you run py_config() the first time, it will ask you to install miniconda. Say no! We already have python3 installed on the server.

##r chunk
library(reticulate)
py_config()

## python:         /Users/jiangzijun/Library/r-miniconda/envs/r-reticulate/bin/python
## libpython:      /Users/jiangzijun/Library/r-miniconda/envs/r-reticulate/lib/libpython3.6m.dylib
## pythonhome:     /Users/jiangzijun/Library/r-miniconda/envs/r-reticulate:/Users/jiangzijun/Library/r-miniconda/envs/r-reticulate
## version:        3.6.10 |Anaconda, Inc.| (default, May  7 2020, 23:06:31)  [GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]
## numpy:          /Users/jiangzijun/Library/r-miniconda/envs/r-reticulate/lib/python3.6/site-packages/numpy
## numpy_version:  1.18.4

Everyone

Let’s do some R

In this chunk, we will load a dataset - use data(rock) to load it.
Use the head() function to print out the first six rows of the dataset.

##r chunk
data(rock)
head(rock)

##   area    peri     shape perm
## 1 4990 2791.90 0.0903296  6.3
## 2 7002 3892.60 0.1486220  6.3
## 3 7558 3930.66 0.1833120  6.3
## 4 7352 3869.32 0.1170630  6.3
## 5 7943 3948.54 0.1224170 17.1
## 6 7979 4010.15 0.1670450 17.1

Call a dataset in Python

First, load the sklearn library, it has several sample datasets. You load python packages by using import PACKAGE. Note that you install and call this package different names (scikit-learn = sklearn).
Next, import the datasets part of sklearn by doing from PACKAGE import FUNCTION. Therefore, you should use from sklearn import datasets.
Then call the boston dataset by doing: dataset_boston = datasets.load_boston().
To print out the first six rows, use the .head() function: df_boston.head(), after converting the file with pandas (code included below).

##python chunk
##TYPE HERE##
#library(reticulate)
##convert to pandas
#import pandas as pd
#df_boston = pd.DataFrame(data=dataset_boston.data, columns=dataset_boston.feature_names)
####  I tried to run the codes but it returns an error: "NameError: name 'library' is not defined" Not sure what's wrong

QUESTION: Look in your environment window. What do you see? sorry can’t run the code correctly

Print out Python information in R

You can have the two environments interact. To print out information from Python in R: py$VARNAME.
Normally, to print out R dataset columns, you do DATAFRAME$COLUMN. Try to print out the CRIM column from your df_boston variable.

##r chunk
# sorry really can't figure this out

Print our R in Python

When using R in Python, instead of $, we use . like this: r.VARNAME.
To print out a single column, you use DATAFRAME["COLUMNNAME"]. Try printing out the shape column in the rock dataset.

##python chunk
# sorry really can't figure this out

Get started with PyCharm!

Great job! Here’s what you learned:
- Installed Python!
- You know how to install and load the libraries in both languages.
- You know how to load built in datasets in both languages.
- You know how to print out data from one language to another.
Turn this document in for credit –> hit KNIT –> turn in the HTML file.
Be sure to fill in your name at the top!
Be sure to answer the embedded questions!