Virtual Machine Set Up Guide

Go to: https://rstudio.thedoomlab.com/auth-sign-in

Your log in is:

Python Set Up

  • Click on terminal and run the following lines:

    • pip3 install -U spacy
    • python3 -m spacy download en_core_web_sm
    • pip3 install nltk
    • pip3 install gensim
    • python3 -m nltk.downloader popular
    • pip3 install prince
    • pip3 install factor-analyzer
    • pip3 install pyLDAvis
    • python3 -m nltk.downloader abc

Turn off Miniconda

IMPORTANT BE SURE TO SAY NO

When you run py_config() the first time, it will ask you to install miniconda. Say no! We already have python3 installed on the server.

Let’s do some R

  • In this chunk, we will load a dataset - use data(rock) to load it.
  • Use the head() function to print out the first six rows of the dataset.
##r chunk
data(rock)
head(rock,n=6)
##   area    peri     shape perm
## 1 4990 2791.90 0.0903296  6.3
## 2 7002 3892.60 0.1486220  6.3
## 3 7558 3930.66 0.1833120  6.3
## 4 7352 3869.32 0.1170630  6.3
## 5 7943 3948.54 0.1224170 17.1
## 6 7979 4010.15 0.1670450 17.1

Call a dataset in Python

  • First, load the sklearn library, it has several sample datasets. You load python packages by using import PACKAGE. Note that you install and call this package different names (scikit-learn = sklearn).
  • Next, import the datasets part of sklearn by doing from PACKAGE import FUNCTION. Therefore, you should use from sklearn import datasets.
  • Then call the boston dataset by doing: dataset_boston = datasets.load_boston().
  • To print out the first six rows, use the .head() function: df_boston.head(), after converting the file with pandas (code included below).
##TYPE HERE##
#import sklearn
#from sklearn import datasets
#dataset_boston = datasets.load_boston()
#sklearn.datasets.load_boston(return_X_y=False)
##convert to pandas
#import pandas as pd
#df_boston = pd.DataFrame(data=dataset_boston.data, columns=dataset_boston.feature_names)
#df_boston.head()

QUESTION: Look in your environment window. What do you see? ANSWER: