Virtual Machine Set Up Guide

Go to: https://rstudio.thedoomlab.com/auth-sign-in

Your log in is:

Python Set Up

  • Click on terminal and run the following lines:

    • pip3 install -U spacy==2.3.7
    • python3 -m spacy download en_core_web_sm
    • pip3 install nltk
    • pip3 install gensim
    • pip3 install contractions
    • pip3 install textblob
    • pip3 install afinn
    • pip3 install pyLDAvis
    • python3 -m nltk.downloader nps_chat
    • python3 -m nltk.downloader webtext
    • python3 -m nltk.downloader brown
    • python3 -m nltk.downloader sentiwordnet
    • python3 -m nltk.downloader vader_lexicon
    • python3 -m nltk.downloader popular

Special R Package

Run the following in the R console:

  • devtools::install_github(“bnosac/RDRPOSTagger”, force = TRUE)

Turn off Miniconda

IMPORTANT BE SURE TO SAY NO

When you run py_config() the first time, it will ask you to install miniconda. Say no! We already have python3 installed on the server.

##r chunk
library(reticulate)
py_config()
## python:         /usr/bin/python3
## libpython:      /usr/lib/python3.9/config-3.9-x86_64-linux-gnu/libpython3.9.so
## pythonhome:     //usr://usr
## version:        3.9.13 (main, May 23 2022, 21:57:12)  [GCC 11.2.0]
## numpy:          /home/yicheng_li/.local/lib/python3.9/site-packages/numpy
## numpy_version:  1.23.3
## 
## NOTE: Python version was forced by RETICULATE_PYTHON
#SAY NO SAY NO SAY NO

Let’s do some R

  • In this chunk, we will load a dataset - use data(rock) to load it.
  • Use the head() function to print out the first six rows of the dataset.
##r chunk
data(rock)
head(rock)
##   area    peri     shape perm
## 1 4990 2791.90 0.0903296  6.3
## 2 7002 3892.60 0.1486220  6.3
## 3 7558 3930.66 0.1833120  6.3
## 4 7352 3869.32 0.1170630  6.3
## 5 7943 3948.54 0.1224170 17.1
## 6 7979 4010.15 0.1670450 17.1

Call a dataset in Python

  • First, load the sklearn library, it has several sample datasets. You load python packages by using import PACKAGE. Note that you install and call this package different names (scikit-learn = sklearn).
  • Next, import the datasets part of sklearn by doing from PACKAGE import FUNCTION. Therefore, you should use from sklearn.datasets import fetch_california_housing.
  • Then call the housing dataset by doing: housing = fetch_california_housing().
  • To print out the first six rows, use the .head() function: df_housing.head(), after converting the file with pandas (code included below).
##python chunk
##TYPE HERE##
import sklearn
from sklearn.datasets import fetch_california_housing
housing = fetch_california_housing()

##convert to pandas
import pandas as pd
df_housing = pd.DataFrame(data = housing.data, 
                         columns = housing.feature_names)
df_housing.head(10)
##    MedInc  HouseAge  AveRooms  ...  AveOccup  Latitude  Longitude
## 0  8.3252      41.0  6.984127  ...  2.555556     37.88    -122.23
## 1  8.3014      21.0  6.238137  ...  2.109842     37.86    -122.22
## 2  7.2574      52.0  8.288136  ...  2.802260     37.85    -122.24
## 3  5.6431      52.0  5.817352  ...  2.547945     37.85    -122.25
## 4  3.8462      52.0  6.281853  ...  2.181467     37.85    -122.25
## 5  4.0368      52.0  4.761658  ...  2.139896     37.85    -122.25
## 6  3.6591      52.0  4.931907  ...  2.128405     37.84    -122.25
## 7  3.1200      52.0  4.797527  ...  1.788253     37.84    -122.25
## 8  2.0804      42.0  4.294118  ...  2.026891     37.84    -122.26
## 9  3.6912      52.0  4.970588  ...  2.172269     37.84    -122.25
## 
## [10 rows x 8 columns]

QUESTION: Check out the environment windows - you should have one for R and one for Python. What differences do you see? ANSWER:

Get started with PyCharm!