Virtual Machine Set Up Guide

Go to: https://rstudio.thedoomlab.com/auth-sign-in

Your log in is:

Python Set Up

  • Click on terminal and run the following lines:

    • pip3 install -U spacy
    • python3 -m spacy download en_core_web_sm
    • pip3 install nltk
    • pip3 install gensim
    • python3 -m nltk.downloader popular
    • pip3 install prince
    • pip3 install factor-analyzer
    • pip3 install pyLDAvis
    • python3 -m nltk.downloader abc

Turn off Miniconda

IMPORTANT BE SURE TO SAY NO

When you run py_config() the first time, it will ask you to install miniconda. Say no! We already have python3 installed on the server.

##r chunk
library(reticulate)
py_config()
## python:         /usr/bin/python3
## libpython:      /usr/lib/python3.8/config-3.8-x86_64-linux-gnu/libpython3.8.so
## pythonhome:     //usr://usr
## version:        3.8.5 (default, Jul 28 2020, 12:59:40)  [GCC 9.3.0]
## numpy:          /home/yongting_tan/.local/lib/python3.8/site-packages/numpy
## numpy_version:  1.20.1

Let’s do some R

  • In this chunk, we will load a dataset - use data(rock) to load it.
  • Use the head() function to print out the first six rows of the dataset.
##r chunk
data(rock)
head(rock)
##   area    peri     shape perm
## 1 4990 2791.90 0.0903296  6.3
## 2 7002 3892.60 0.1486220  6.3
## 3 7558 3930.66 0.1833120  6.3
## 4 7352 3869.32 0.1170630  6.3
## 5 7943 3948.54 0.1224170 17.1
## 6 7979 4010.15 0.1670450 17.1

Call a dataset in Python

  • First, load the sklearn library, it has several sample datasets. You load python packages by using import PACKAGE. Note that you install and call this package different names (scikit-learn = sklearn).
  • Next, import the datasets part of sklearn by doing from PACKAGE import FUNCTION. Therefore, you should use from sklearn import datasets.
  • Then call the boston dataset by doing: dataset_boston = datasets.load_boston().
  • To print out the first six rows, use the .head() function: df_boston.head(), after converting the file with pandas (code included below).
##python chunk
##TYPE HERE##
from sklearn import datasets
dataset_boston = datasets.load_boston()
##convert to pandas
import pandas as pd
df_boston = pd.DataFrame(data=dataset_boston.data, columns=dataset_boston.feature_names)
df_boston.head()
##       CRIM    ZN  INDUS  CHAS    NOX  ...  RAD    TAX  PTRATIO       B  LSTAT
## 0  0.00632  18.0   2.31   0.0  0.538  ...  1.0  296.0     15.3  396.90   4.98
## 1  0.02731   0.0   7.07   0.0  0.469  ...  2.0  242.0     17.8  396.90   9.14
## 2  0.02729   0.0   7.07   0.0  0.469  ...  2.0  242.0     17.8  392.83   4.03
## 3  0.03237   0.0   2.18   0.0  0.458  ...  3.0  222.0     18.7  394.63   2.94
## 4  0.06905   0.0   2.18   0.0  0.458  ...  3.0  222.0     18.7  396.90   5.33
## 
## [5 rows x 13 columns]

QUESTION: Look in your environment window. What do you see? ANSWER: The first six rows of the data

Get started with PyCharm!