For this semester, you have two options:
I will help you troubleshoot during the completion of this assignment, but all other problems (and later in the semester) you will need to use google/stack overflow/etc. to fix for yourself.
Only complete the part of the assignment based on your choice for computer or virtual machine!!
## Your Own Computer
### Install Software
Most Recent Java: https://www.java.com/en/
##r chunk - do not change these
R.version
## _
## platform x86_64-pc-linux-gnu
## arch x86_64
## os linux-gnu
## system x86_64, linux-gnu
## status
## major 3
## minor 6.3
## year 2020
## month 02
## day 29
## svn rev 77875
## language R
## version.string R version 3.6.3 (2020-02-29)
## nickname Holding the Windsock
#RStudio.Version() run this line but it won't knit with it "on"ANSWER: What version of Rstudio are you using? Please note it should be the latest version!
### Install all the R Packages
eval = TRUE to eval = FALSE once you have them installed.install.packages("https://osf.io/ak7gq/download", repos = NULL, method = "libcurl", type = "source")
## Installing package into '/home/rajan_patel/R/x86_64-pc-linux-gnu-library/3.6'
## (as 'lib' is unspecified)
reticulate library.##r chunk
Try typing py_config() below. You should get a prompt to install Miniconda. If not, use install_miniconda().
##r chunk
Run py_config() in the R chunk below.
##r chunk
Windows machines need special programs to make all this work:
py_install("package_name", pip = T)eval = TRUE to eval = FALSE once you have them installed.Packages: nltk, matplotlib, PyQt5, scikit-learn, numpy, pandas, prince, factor-analyzer, gensim, pyLDAvis, bs4
For nltk, you will need to add a few other pieces. Type the following into R console: - library(reticulate) - repl_python() - Here you should notice you have switched from > to >>> which indicates you are in Python:
To get out of >>> python, type exit or hit the Esc key.
Click on terminal > type in: - python -m spacy download en_core_web_sm - This will download the English language spacy module. - pip install lxml
Go to: https://class.aggieerin.com/auth-sign-in
Your log in is:
Click on terminal and run the following lines:
python3 -m nltk.downloader abc
When you run py_config() the first time, it will ask you to install miniconda. Say no! We already have python3 installed on the server.
##r chunk
library(reticulate)
py_config()
## python: /usr/bin/python3
## libpython: /usr/lib/python3.6/config-3.6m-x86_64-linux-gnu/libpython3.6.so
## pythonhome: //usr://usr
## version: 3.6.9 (default, Nov 7 2019, 10:44:02) [GCC 8.3.0]
## numpy: /home/rajan_patel/.local/lib/python3.6/site-packages/numpy
## numpy_version: 1.18.2
##
## python versions found:
## /usr/bin/python3
## /usr/bin/python
data(rock) to load it.head() function to print out the first six rows of the dataset.##r chunk
data(rock)
head(rock)
## area peri shape perm
## 1 4990 2791.90 0.0903296 6.3
## 2 7002 3892.60 0.1486220 6.3
## 3 7558 3930.66 0.1833120 6.3
## 4 7352 3869.32 0.1170630 6.3
## 5 7943 3948.54 0.1224170 17.1
## 6 7979 4010.15 0.1670450 17.1
sklearn library, it has several sample datasets. You load python packages by using import PACKAGE. Note that you install and call this package different names (scikit-learn = sklearn).from PACKAGE import FUNCTION. Therefore, you should use from sklearn import datasets.boston dataset by doing: dataset_boston = datasets.load_boston()..head() function: df_boston.head(), after converting the file with pandas (code included below).##python chunk
import sklearn;
from sklearn import datasets
dataset_boston = datasets.load_boston()
##convert to pandas
import pandas as pd
df_boston = pd.DataFrame(data=dataset_boston.data, columns=dataset_boston.feature_names)
df_boston.head()
## CRIM ZN INDUS CHAS NOX ... RAD TAX PTRATIO B LSTAT
## 0 0.00632 18.0 2.31 0.0 0.538 ... 1.0 296.0 15.3 396.90 4.98
## 1 0.02731 0.0 7.07 0.0 0.469 ... 2.0 242.0 17.8 396.90 9.14
## 2 0.02729 0.0 7.07 0.0 0.469 ... 2.0 242.0 17.8 392.83 4.03
## 3 0.03237 0.0 2.18 0.0 0.458 ... 3.0 222.0 18.7 394.63 2.94
## 4 0.06905 0.0 2.18 0.0 0.458 ... 3.0 222.0 18.7 396.90 5.33
##
## [5 rows x 13 columns]
QUESTION: Look in your environment window. What do you see?
## Print out Python information in R
py$VARNAME.DATAFRAME$COLUMN. Try to print out the CRIM column from your df_boston variable.##r chunk
py$df_boston$CRIM
## [1] 0.00632 0.02731 0.02729 0.03237 0.06905 0.02985 0.08829 0.14455
## [9] 0.21124 0.17004 0.22489 0.11747 0.09378 0.62976 0.63796 0.62739
## [17] 1.05393 0.78420 0.80271 0.72580 1.25179 0.85204 1.23247 0.98843
## [25] 0.75026 0.84054 0.67191 0.95577 0.77299 1.00245 1.13081 1.35472
## [33] 1.38799 1.15172 1.61282 0.06417 0.09744 0.08014 0.17505 0.02763
## [41] 0.03359 0.12744 0.14150 0.15936 0.12269 0.17142 0.18836 0.22927
## [49] 0.25387 0.21977 0.08873 0.04337 0.05360 0.04981 0.01360 0.01311
## [57] 0.02055 0.01432 0.15445 0.10328 0.14932 0.17171 0.11027 0.12650
## [65] 0.01951 0.03584 0.04379 0.05789 0.13554 0.12816 0.08826 0.15876
## [73] 0.09164 0.19539 0.07896 0.09512 0.10153 0.08707 0.05646 0.08387
## [81] 0.04113 0.04462 0.03659 0.03551 0.05059 0.05735 0.05188 0.07151
## [89] 0.05660 0.05302 0.04684 0.03932 0.04203 0.02875 0.04294 0.12204
## [97] 0.11504 0.12083 0.08187 0.06860 0.14866 0.11432 0.22876 0.21161
## [105] 0.13960 0.13262 0.17120 0.13117 0.12802 0.26363 0.10793 0.10084
## [113] 0.12329 0.22212 0.14231 0.17134 0.13158 0.15098 0.13058 0.14476
## [121] 0.06899 0.07165 0.09299 0.15038 0.09849 0.16902 0.38735 0.25915
## [129] 0.32543 0.88125 0.34006 1.19294 0.59005 0.32982 0.97617 0.55778
## [137] 0.32264 0.35233 0.24980 0.54452 0.29090 1.62864 3.32105 4.09740
## [145] 2.77974 2.37934 2.15505 2.36862 2.33099 2.73397 1.65660 1.49632
## [153] 1.12658 2.14918 1.41385 3.53501 2.44668 1.22358 1.34284 1.42502
## [161] 1.27346 1.46336 1.83377 1.51902 2.24236 2.92400 2.01019 1.80028
## [169] 2.30040 2.44953 1.20742 2.31390 0.13914 0.09178 0.08447 0.06664
## [177] 0.07022 0.05425 0.06642 0.05780 0.06588 0.06888 0.09103 0.10008
## [185] 0.08308 0.06047 0.05602 0.07875 0.12579 0.08370 0.09068 0.06911
## [193] 0.08664 0.02187 0.01439 0.01381 0.04011 0.04666 0.03768 0.03150
## [201] 0.01778 0.03445 0.02177 0.03510 0.02009 0.13642 0.22969 0.25199
## [209] 0.13587 0.43571 0.17446 0.37578 0.21719 0.14052 0.28955 0.19802
## [217] 0.04560 0.07013 0.11069 0.11425 0.35809 0.40771 0.62356 0.61470
## [225] 0.31533 0.52693 0.38214 0.41238 0.29819 0.44178 0.53700 0.46296
## [233] 0.57529 0.33147 0.44791 0.33045 0.52058 0.51183 0.08244 0.09252
## [241] 0.11329 0.10612 0.10290 0.12757 0.20608 0.19133 0.33983 0.19657
## [249] 0.16439 0.19073 0.14030 0.21409 0.08221 0.36894 0.04819 0.03548
## [257] 0.01538 0.61154 0.66351 0.65665 0.54011 0.53412 0.52014 0.82526
## [265] 0.55007 0.76162 0.78570 0.57834 0.54050 0.09065 0.29916 0.16211
## [273] 0.11460 0.22188 0.05644 0.09604 0.10469 0.06127 0.07978 0.21038
## [281] 0.03578 0.03705 0.06129 0.01501 0.00906 0.01096 0.01965 0.03871
## [289] 0.04590 0.04297 0.03502 0.07886 0.03615 0.08265 0.08199 0.12932
## [297] 0.05372 0.14103 0.06466 0.05561 0.04417 0.03537 0.09266 0.10000
## [305] 0.05515 0.05479 0.07503 0.04932 0.49298 0.34940 2.63548 0.79041
## [313] 0.26169 0.26938 0.36920 0.25356 0.31827 0.24522 0.40202 0.47547
## [321] 0.16760 0.18159 0.35114 0.28392 0.34109 0.19186 0.30347 0.24103
## [329] 0.06617 0.06724 0.04544 0.05023 0.03466 0.05083 0.03738 0.03961
## [337] 0.03427 0.03041 0.03306 0.05497 0.06151 0.01301 0.02498 0.02543
## [345] 0.03049 0.03113 0.06162 0.01870 0.01501 0.02899 0.06211 0.07950
## [353] 0.07244 0.01709 0.04301 0.10659 8.98296 3.84970 5.20177 4.26131
## [361] 4.54192 3.83684 3.67822 4.22239 3.47428 4.55587 3.69695 13.52220
## [369] 4.89822 5.66998 6.53876 9.23230 8.26725 11.10810 18.49820 19.60910
## [377] 15.28800 9.82349 23.64820 17.86670 88.97620 15.87440 9.18702 7.99248
## [385] 20.08490 16.81180 24.39380 22.59710 14.33370 8.15174 6.96215 5.29305
## [393] 11.57790 8.64476 13.35980 8.71675 5.87205 7.67202 38.35180 9.91655
## [401] 25.04610 14.23620 9.59571 24.80170 41.52920 67.92080 20.71620 11.95110
## [409] 7.40389 14.43830 51.13580 14.05070 18.81100 28.65580 45.74610 18.08460
## [417] 10.83420 25.94060 73.53410 11.81230 11.08740 7.02259 12.04820 7.05042
## [425] 8.79212 15.86030 12.24720 37.66190 7.36711 9.33889 8.49213 10.06230
## [433] 6.44405 5.58107 13.91340 11.16040 14.42080 15.17720 13.67810 9.39063
## [441] 22.05110 9.72418 5.66637 9.96654 12.80230 10.67180 6.28807 9.92485
## [449] 9.32909 7.52601 6.71772 5.44114 5.09017 8.24809 9.51363 4.75237
## [457] 4.66883 8.20058 7.75223 6.80117 4.81213 3.69311 6.65492 5.82115
## [465] 7.83932 3.16360 3.77498 4.42228 15.57570 13.07510 4.34879 4.03841
## [473] 3.56868 4.64689 8.05579 6.39312 4.87141 15.02340 10.23300 14.33370
## [481] 5.82401 5.70818 5.73116 2.81838 2.37857 3.67367 5.69175 4.83567
## [489] 0.15086 0.18337 0.20746 0.10574 0.11132 0.17331 0.27957 0.17899
## [497] 0.28960 0.26838 0.23912 0.17783 0.22438 0.06263 0.04527 0.06076
## [505] 0.10959 0.04741
$, we use . like this: r.VARNAME.DATAFRAME["COLUMNNAME"]. Try printing out the shape column in the rock dataset.##python chunk
r.rock["shape"]
## 0 0.090330
## 1 0.148622
## 2 0.183312
## 3 0.117063
## 4 0.122417
## 5 0.167045
## 6 0.189651
## 7 0.164127
## 8 0.203654
## 9 0.162394
## 10 0.150944
## 11 0.148141
## 12 0.228595
## 13 0.231623
## 14 0.172567
## 15 0.153481
## 16 0.204314
## 17 0.262727
## 18 0.200071
## 19 0.144810
## 20 0.113852
## 21 0.291029
## 22 0.240077
## 23 0.161865
## 24 0.280887
## 25 0.179455
## 26 0.191802
## 27 0.133083
## 28 0.225214
## 29 0.341273
## 30 0.311646
## 31 0.276016
## 32 0.197653
## 33 0.326635
## 34 0.154192
## 35 0.276016
## 36 0.176969
## 37 0.438712
## 38 0.163586
## 39 0.253832
## 40 0.328641
## 41 0.230081
## 42 0.464125
## 43 0.420477
## 44 0.200744
## 45 0.262651
## 46 0.182453
## 47 0.200447
## Name: shape, dtype: float64