Set Up Your Virtual Machine

Virtual Machine Set Up Guide

Go to: https://rstudio.thedoomlab.com/auth-sign-in

Your log in is:

Username: firstname_lastname
Password: firstnameidnumber
Fill in first and last name with your first and last name, while ID number is your HU ID number. If you have multiple first or last names, it is the first one of each. If you have a dash in your name, it is not included in the username.

Python Set Up

Click on terminal and run the following lines:
- pip3 install -U spacy
- python3 -m spacy download en_core_web_sm
- pip3 install nltk
- pip3 install gensim
- python3 -m nltk.downloader popular
- pip3 install prince
- pip3 install factor-analyzer
- pip3 install pyLDAvis
- python3 -m nltk.downloader abc

Turn off Miniconda

IMPORTANT BE SURE TO SAY NO

When you run py_config() the first time, it will ask you to install miniconda. Say no! We already have python3 installed on the server.

##r chunk
library(reticulate)
py_config()

## python:         /usr/bin/python3
## libpython:      /usr/lib/python3.8/config-3.8-x86_64-linux-gnu/libpython3.8.so
## pythonhome:     //usr://usr
## version:        3.8.5 (default, Jul 28 2020, 12:59:40)  [GCC 9.3.0]
## numpy:          /home/yongting_tan/.local/lib/python3.8/site-packages/numpy
## numpy_version:  1.20.1

Let’s do some R

In this chunk, we will load a dataset - use data(rock) to load it.
Use the head() function to print out the first six rows of the dataset.

##r chunk
data(rock)
head(rock)

##   area    peri     shape perm
## 1 4990 2791.90 0.0903296  6.3
## 2 7002 3892.60 0.1486220  6.3
## 3 7558 3930.66 0.1833120  6.3
## 4 7352 3869.32 0.1170630  6.3
## 5 7943 3948.54 0.1224170 17.1
## 6 7979 4010.15 0.1670450 17.1

Call a dataset in Python

First, load the sklearn library, it has several sample datasets. You load python packages by using import PACKAGE. Note that you install and call this package different names (scikit-learn = sklearn).
Next, import the datasets part of sklearn by doing from PACKAGE import FUNCTION. Therefore, you should use from sklearn import datasets.
Then call the boston dataset by doing: dataset_boston = datasets.load_boston().
To print out the first six rows, use the .head() function: df_boston.head(), after converting the file with pandas (code included below).

##python chunk
##TYPE HERE##
from sklearn import datasets
dataset_boston = datasets.load_boston()
##convert to pandas
import pandas as pd
df_boston = pd.DataFrame(data=dataset_boston.data, columns=dataset_boston.feature_names)
df_boston.head()

##       CRIM    ZN  INDUS  CHAS    NOX  ...  RAD    TAX  PTRATIO       B  LSTAT
## 0  0.00632  18.0   2.31   0.0  0.538  ...  1.0  296.0     15.3  396.90   4.98
## 1  0.02731   0.0   7.07   0.0  0.469  ...  2.0  242.0     17.8  396.90   9.14
## 2  0.02729   0.0   7.07   0.0  0.469  ...  2.0  242.0     17.8  392.83   4.03
## 3  0.03237   0.0   2.18   0.0  0.458  ...  3.0  222.0     18.7  394.63   2.94
## 4  0.06905   0.0   2.18   0.0  0.458  ...  3.0  222.0     18.7  396.90   5.33
## 
## [5 rows x 13 columns]

QUESTION: Look in your environment window. What do you see? ANSWER: The first six rows of the data

Print out Python information in R

You can have the two environments interact. To print out information from Python in R: py$VARNAME.
Normally, to print out R dataset columns, you do DATAFRAME$COLUMN. Try to print out the CRIM column from your df_boston variable.

##r chunk
py$df_boston['CRIM']

##         CRIM
## 1    0.00632
## 2    0.02731
## 3    0.02729
## 4    0.03237
## 5    0.06905
## 6    0.02985
## 7    0.08829
## 8    0.14455
## 9    0.21124
## 10   0.17004
## 11   0.22489
## 12   0.11747
## 13   0.09378
## 14   0.62976
## 15   0.63796
## 16   0.62739
## 17   1.05393
## 18   0.78420
## 19   0.80271
## 20   0.72580
## 21   1.25179
## 22   0.85204
## 23   1.23247
## 24   0.98843
## 25   0.75026
## 26   0.84054
## 27   0.67191
## 28   0.95577
## 29   0.77299
## 30   1.00245
## 31   1.13081
## 32   1.35472
## 33   1.38799
## 34   1.15172
## 35   1.61282
## 36   0.06417
## 37   0.09744
## 38   0.08014
## 39   0.17505
## 40   0.02763
## 41   0.03359
## 42   0.12744
## 43   0.14150
## 44   0.15936
## 45   0.12269
## 46   0.17142
## 47   0.18836
## 48   0.22927
## 49   0.25387
## 50   0.21977
## 51   0.08873
## 52   0.04337
## 53   0.05360
## 54   0.04981
## 55   0.01360
## 56   0.01311
## 57   0.02055
## 58   0.01432
## 59   0.15445
## 60   0.10328
## 61   0.14932
## 62   0.17171
## 63   0.11027
## 64   0.12650
## 65   0.01951
## 66   0.03584
## 67   0.04379
## 68   0.05789
## 69   0.13554
## 70   0.12816
## 71   0.08826
## 72   0.15876
## 73   0.09164
## 74   0.19539
## 75   0.07896
## 76   0.09512
## 77   0.10153
## 78   0.08707
## 79   0.05646
## 80   0.08387
## 81   0.04113
## 82   0.04462
## 83   0.03659
## 84   0.03551
## 85   0.05059
## 86   0.05735
## 87   0.05188
## 88   0.07151
## 89   0.05660
## 90   0.05302
## 91   0.04684
## 92   0.03932
## 93   0.04203
## 94   0.02875
## 95   0.04294
## 96   0.12204
## 97   0.11504
## 98   0.12083
## 99   0.08187
## 100  0.06860
## 101  0.14866
## 102  0.11432
## 103  0.22876
## 104  0.21161
## 105  0.13960
## 106  0.13262
## 107  0.17120
## 108  0.13117
## 109  0.12802
## 110  0.26363
## 111  0.10793
## 112  0.10084
## 113  0.12329
## 114  0.22212
## 115  0.14231
## 116  0.17134
## 117  0.13158
## 118  0.15098
## 119  0.13058
## 120  0.14476
## 121  0.06899
## 122  0.07165
## 123  0.09299
## 124  0.15038
## 125  0.09849
## 126  0.16902
## 127  0.38735
## 128  0.25915
## 129  0.32543
## 130  0.88125
## 131  0.34006
## 132  1.19294
## 133  0.59005
## 134  0.32982
## 135  0.97617
## 136  0.55778
## 137  0.32264
## 138  0.35233
## 139  0.24980
## 140  0.54452
## 141  0.29090
## 142  1.62864
## 143  3.32105
## 144  4.09740
## 145  2.77974
## 146  2.37934
## 147  2.15505
## 148  2.36862
## 149  2.33099
## 150  2.73397
## 151  1.65660
## 152  1.49632
## 153  1.12658
## 154  2.14918
## 155  1.41385
## 156  3.53501
## 157  2.44668
## 158  1.22358
## 159  1.34284
## 160  1.42502
## 161  1.27346
## 162  1.46336
## 163  1.83377
## 164  1.51902
## 165  2.24236
## 166  2.92400
## 167  2.01019
## 168  1.80028
## 169  2.30040
## 170  2.44953
## 171  1.20742
## 172  2.31390
## 173  0.13914
## 174  0.09178
## 175  0.08447
## 176  0.06664
## 177  0.07022
## 178  0.05425
## 179  0.06642
## 180  0.05780
## 181  0.06588
## 182  0.06888
## 183  0.09103
## 184  0.10008
## 185  0.08308
## 186  0.06047
## 187  0.05602
## 188  0.07875
## 189  0.12579
## 190  0.08370
## 191  0.09068
## 192  0.06911
## 193  0.08664
## 194  0.02187
## 195  0.01439
## 196  0.01381
## 197  0.04011
## 198  0.04666
## 199  0.03768
## 200  0.03150
## 201  0.01778
## 202  0.03445
## 203  0.02177
## 204  0.03510
## 205  0.02009
## 206  0.13642
## 207  0.22969
## 208  0.25199
## 209  0.13587
## 210  0.43571
## 211  0.17446
## 212  0.37578
## 213  0.21719
## 214  0.14052
## 215  0.28955
## 216  0.19802
## 217  0.04560
## 218  0.07013
## 219  0.11069
## 220  0.11425
## 221  0.35809
## 222  0.40771
## 223  0.62356
## 224  0.61470
## 225  0.31533
## 226  0.52693
## 227  0.38214
## 228  0.41238
## 229  0.29819
## 230  0.44178
## 231  0.53700
## 232  0.46296
## 233  0.57529
## 234  0.33147
## 235  0.44791
## 236  0.33045
## 237  0.52058
## 238  0.51183
## 239  0.08244
## 240  0.09252
## 241  0.11329
## 242  0.10612
## 243  0.10290
## 244  0.12757
## 245  0.20608
## 246  0.19133
## 247  0.33983
## 248  0.19657
## 249  0.16439
## 250  0.19073
## 251  0.14030
## 252  0.21409
## 253  0.08221
## 254  0.36894
## 255  0.04819
## 256  0.03548
## 257  0.01538
## 258  0.61154
## 259  0.66351
## 260  0.65665
## 261  0.54011
## 262  0.53412
## 263  0.52014
## 264  0.82526
## 265  0.55007
## 266  0.76162
## 267  0.78570
## 268  0.57834
## 269  0.54050
## 270  0.09065
## 271  0.29916
## 272  0.16211
## 273  0.11460
## 274  0.22188
## 275  0.05644
## 276  0.09604
## 277  0.10469
## 278  0.06127
## 279  0.07978
## 280  0.21038
## 281  0.03578
## 282  0.03705
## 283  0.06129
## 284  0.01501
## 285  0.00906
## 286  0.01096
## 287  0.01965
## 288  0.03871
## 289  0.04590
## 290  0.04297
## 291  0.03502
## 292  0.07886
## 293  0.03615
## 294  0.08265
## 295  0.08199
## 296  0.12932
## 297  0.05372
## 298  0.14103
## 299  0.06466
## 300  0.05561
## 301  0.04417
## 302  0.03537
## 303  0.09266
## 304  0.10000
## 305  0.05515
## 306  0.05479
## 307  0.07503
## 308  0.04932
## 309  0.49298
## 310  0.34940
## 311  2.63548
## 312  0.79041
## 313  0.26169
## 314  0.26938
## 315  0.36920
## 316  0.25356
## 317  0.31827
## 318  0.24522
## 319  0.40202
## 320  0.47547
## 321  0.16760
## 322  0.18159
## 323  0.35114
## 324  0.28392
## 325  0.34109
## 326  0.19186
## 327  0.30347
## 328  0.24103
## 329  0.06617
## 330  0.06724
## 331  0.04544
## 332  0.05023
## 333  0.03466
## 334  0.05083
## 335  0.03738
## 336  0.03961
## 337  0.03427
## 338  0.03041
## 339  0.03306
## 340  0.05497
## 341  0.06151
## 342  0.01301
## 343  0.02498
## 344  0.02543
## 345  0.03049
## 346  0.03113
## 347  0.06162
## 348  0.01870
## 349  0.01501
## 350  0.02899
## 351  0.06211
## 352  0.07950
## 353  0.07244
## 354  0.01709
## 355  0.04301
## 356  0.10659
## 357  8.98296
## 358  3.84970
## 359  5.20177
## 360  4.26131
## 361  4.54192
## 362  3.83684
## 363  3.67822
## 364  4.22239
## 365  3.47428
## 366  4.55587
## 367  3.69695
## 368 13.52220
## 369  4.89822
## 370  5.66998
## 371  6.53876
## 372  9.23230
## 373  8.26725
## 374 11.10810
## 375 18.49820
## 376 19.60910
## 377 15.28800
## 378  9.82349
## 379 23.64820
## 380 17.86670
## 381 88.97620
## 382 15.87440
## 383  9.18702
## 384  7.99248
## 385 20.08490
## 386 16.81180
## 387 24.39380
## 388 22.59710
## 389 14.33370
## 390  8.15174
## 391  6.96215
## 392  5.29305
## 393 11.57790
## 394  8.64476
## 395 13.35980
## 396  8.71675
## 397  5.87205
## 398  7.67202
## 399 38.35180
## 400  9.91655
## 401 25.04610
## 402 14.23620
## 403  9.59571
## 404 24.80170
## 405 41.52920
## 406 67.92080
## 407 20.71620
## 408 11.95110
## 409  7.40389
## 410 14.43830
## 411 51.13580
## 412 14.05070
## 413 18.81100
## 414 28.65580
## 415 45.74610
## 416 18.08460
## 417 10.83420
## 418 25.94060
## 419 73.53410
## 420 11.81230
## 421 11.08740
## 422  7.02259
## 423 12.04820
## 424  7.05042
## 425  8.79212
## 426 15.86030
## 427 12.24720
## 428 37.66190
## 429  7.36711
## 430  9.33889
## 431  8.49213
## 432 10.06230
## 433  6.44405
## 434  5.58107
## 435 13.91340
## 436 11.16040
## 437 14.42080
## 438 15.17720
## 439 13.67810
## 440  9.39063
## 441 22.05110
## 442  9.72418
## 443  5.66637
## 444  9.96654
## 445 12.80230
## 446 10.67180
## 447  6.28807
## 448  9.92485
## 449  9.32909
## 450  7.52601
## 451  6.71772
## 452  5.44114
## 453  5.09017
## 454  8.24809
## 455  9.51363
## 456  4.75237
## 457  4.66883
## 458  8.20058
## 459  7.75223
## 460  6.80117
## 461  4.81213
## 462  3.69311
## 463  6.65492
## 464  5.82115
## 465  7.83932
## 466  3.16360
## 467  3.77498
## 468  4.42228
## 469 15.57570
## 470 13.07510
## 471  4.34879
## 472  4.03841
## 473  3.56868
## 474  4.64689
## 475  8.05579
## 476  6.39312
## 477  4.87141
## 478 15.02340
## 479 10.23300
## 480 14.33370
## 481  5.82401
## 482  5.70818
## 483  5.73116
## 484  2.81838
## 485  2.37857
## 486  3.67367
## 487  5.69175
## 488  4.83567
## 489  0.15086
## 490  0.18337
## 491  0.20746
## 492  0.10574
## 493  0.11132
## 494  0.17331
## 495  0.27957
## 496  0.17899
## 497  0.28960
## 498  0.26838
## 499  0.23912
## 500  0.17783
## 501  0.22438
## 502  0.06263
## 503  0.04527
## 504  0.06076
## 505  0.10959
## 506  0.04741

Print our R in Python

When using R in Python, instead of $, we use . like this: r.VARNAME.
To print out a single column, you use DATAFRAME["COLUMNNAME"]. Try printing out the shape column in the rock dataset.

##python chunk
r.rock["shape"]

## 0     0.090330
## 1     0.148622
## 2     0.183312
## 3     0.117063
## 4     0.122417
## 5     0.167045
## 6     0.189651
## 7     0.164127
## 8     0.203654
## 9     0.162394
## 10    0.150944
## 11    0.148141
## 12    0.228595
## 13    0.231623
## 14    0.172567
## 15    0.153481
## 16    0.204314
## 17    0.262727
## 18    0.200071
## 19    0.144810
## 20    0.113852
## 21    0.291029
## 22    0.240077
## 23    0.161865
## 24    0.280887
## 25    0.179455
## 26    0.191802
## 27    0.133083
## 28    0.225214
## 29    0.341273
## 30    0.311646
## 31    0.276016
## 32    0.197653
## 33    0.326635
## 34    0.154192
## 35    0.276016
## 36    0.176969
## 37    0.438712
## 38    0.163586
## 39    0.253832
## 40    0.328641
## 41    0.230081
## 42    0.464125
## 43    0.420477
## 44    0.200744
## 45    0.262651
## 46    0.182453
## 47    0.200447
## Name: shape, dtype: float64

Get started with PyCharm!

Great job! Here’s what you learned:
- You know how to install and load the libraries in both languages.
- You know how to load built in datasets in both languages.
- You know how to print out data from one language to another.
Turn this document in for credit –> hit KNIT –> turn in the HTML file.
Be sure to fill in your name at the top!
Be sure to answer the embedded questions!