Set Up Your Virtual Machine

Virtual Machine Set Up Guide

Go to: https://rstudio.thedoomlab.com/auth-sign-in

Your log in is:

Username: firstname_lastname
Password: firstnameidnumber
Fill in first and last name with your first and last name, while ID number is your HU ID number. If you have multiple first or last names, it is the first one of each. If you have a dash in your name, it is not included in the username.

Python Set Up

Click on terminal and run the following lines:
- pip3 install -U spacy
- python3 -m spacy download en_core_web_sm
- pip3 install nltk
- pip3 install gensim
- python3 -m nltk.downloader popular
- pip3 install prince
- pip3 install factor-analyzer
- pip3 install pyLDAvis
- python3 -m nltk.downloader abc

Turn off Miniconda

IMPORTANT BE SURE TO SAY NO

When you run py_config() the first time, it will ask you to install miniconda. Say no! We already have python3 installed on the server.

Let’s do some R

In this chunk, we will load a dataset - use data(rock) to load it.
Use the head() function to print out the first six rows of the dataset.

##r chunk
data(rock)
head(rock,n=6)

##   area    peri     shape perm
## 1 4990 2791.90 0.0903296  6.3
## 2 7002 3892.60 0.1486220  6.3
## 3 7558 3930.66 0.1833120  6.3
## 4 7352 3869.32 0.1170630  6.3
## 5 7943 3948.54 0.1224170 17.1
## 6 7979 4010.15 0.1670450 17.1

Call a dataset in Python

First, load the sklearn library, it has several sample datasets. You load python packages by using import PACKAGE. Note that you install and call this package different names (scikit-learn = sklearn).
Next, import the datasets part of sklearn by doing from PACKAGE import FUNCTION. Therefore, you should use from sklearn import datasets.
Then call the boston dataset by doing: dataset_boston = datasets.load_boston().
To print out the first six rows, use the .head() function: df_boston.head(), after converting the file with pandas (code included below).

##TYPE HERE##
#import sklearn
#from sklearn import datasets
#dataset_boston = datasets.load_boston()
#sklearn.datasets.load_boston(return_X_y=False)
##convert to pandas
#import pandas as pd
#df_boston = pd.DataFrame(data=dataset_boston.data, columns=dataset_boston.feature_names)
#df_boston.head()

QUESTION: Look in your environment window. What do you see? ANSWER:

Print out Python information in R

You can have the two environments interact. To print out information from Python in R: py$VARNAME.
Normally, to print out R dataset columns, you do DATAFRAME$COLUMN. Try to print out the CRIM column from your df_boston variable.

##r chunk
#py$df_boston['CRIM']

Print our R in Python

When using R in Python, instead of $, we use . like this: r.VARNAME.
To print out a single column, you use DATAFRAME["COLUMNNAME"]. Try printing out the shape column in the rock dataset.

##python chunk
r.rock

## {'area': [4990, 7002, 7558, 7352, 7943, 7979, 9333, 8209, 8393, 6425, 9364, 8624, 10651, 8868, 9417, 8874, 10962, 10743, 11878, 9867, 7838, 11876, 12212, 8233, 6360, 4193, 7416, 5246, 6509, 4895, 6775, 7894, 5980, 5318, 7392, 7894, 3469, 1468, 3524, 5267, 5048, 1016, 5605, 8793, 3475, 1651, 5514, 9718], 'peri': [2791.9, 3892.6, 3930.66, 3869.32, 3948.54, 4010.15, 4345.75, 4344.75, 3682.04, 3098.65, 4480.05, 3986.24, 4036.54, 3518.04, 3999.37, 3629.07, 4608.66, 4787.62, 4864.22, 4479.41, 3428.74, 4353.14, 4697.65, 3518.44, 1977.39, 1379.35, 1916.24, 1585.42, 1851.21, 1239.66, 1728.14, 1461.06, 1426.76, 990.388, 1350.76, 1461.06, 1376.7, 476.322, 1189.46, 1644.96, 941.543, 308.642, 1145.69, 2280.49, 1174.11, 597.808, 1455.88, 1485.58], 'shape': [0.0903296, 0.148622, 0.183312, 0.117063, 0.122417, 0.167045, 0.189651, 0.164127, 0.203654, 0.162394, 0.150944, 0.148141, 0.228595, 0.231623, 0.172567, 0.153481, 0.204314, 0.262727, 0.200071, 0.14481, 0.113852, 0.291029, 0.240077, 0.161865, 0.280887, 0.179455, 0.191802, 0.133083, 0.225214, 0.341273, 0.311646, 0.276016, 0.197653, 0.326635, 0.154192, 0.276016, 0.176969, 0.438712, 0.163586, 0.253832, 0.328641, 0.230081, 0.464125, 0.420477, 0.200744, 0.262651, 0.182453, 0.200447], 'perm': [6.3, 6.3, 6.3, 6.3, 17.1, 17.1, 17.1, 17.1, 119.0, 119.0, 119.0, 119.0, 82.4, 82.4, 82.4, 82.4, 58.6, 58.6, 58.6, 58.6, 142.0, 142.0, 142.0, 142.0, 740.0, 740.0, 740.0, 740.0, 890.0, 890.0, 890.0, 890.0, 950.0, 950.0, 950.0, 950.0, 100.0, 100.0, 100.0, 100.0, 1300.0, 1300.0, 1300.0, 1300.0, 580.0, 580.0, 580.0, 580.0]}