##r chunk - do not change these
R.version
## _
## platform x86_64-w64-mingw32
## arch x86_64
## os mingw32
## system x86_64, mingw32
## status
## major 3
## minor 6.2
## year 2019
## month 12
## day 12
## svn rev 77560
## language R
## version.string R version 3.6.2 (2019-12-12)
## nickname Dark and Stormy Night
#RStudio.Version() run this line but it won't knit with it "on"
reticulate package (do not include this code).reticulate library.##r chunk
Try typing py_config() below. You should get a prompt to install Miniconda. If not, use install_miniconda().
##r chunk
#py_config()
#giving error while knitting the document, otherwise installed and working
Run py_config() in the R chunk below.
##r chunk
#py_config()
#giving error while knitting the document, otherwise installed and working
data(rock) to load it.head() function to print out the first six rows of the dataset.##r chunk
data(rock)
head(rock,n=6 )
## area peri shape perm
## 1 4990 2791.90 0.0903296 6.3
## 2 7002 3892.60 0.1486220 6.3
## 3 7558 3930.66 0.1833120 6.3
## 4 7352 3869.32 0.1170630 6.3
## 5 7943 3948.54 0.1224170 17.1
## 6 7979 4010.15 0.1670450 17.1
numpy, nltk, spacy, scikit-learn and pandas for starters.py_module_available("PACKAGE").scikit-learn is a special package, you can check if it’s avaliable by using sklearn but you install it with scikit-learn.##r chunk
#py_module_available(numpy)
#giving error while knitting the document, otherwise installed and working
FALSE, then install them using py_install("PACKAGE"). If you receive an error saying it cannot be found on Minicode, use py_install("PACKAGE", pip = T).#if they are all TRUE, leave this blank
##r chunk
#I used them in last one only, even I have installed in R seperatley befor doing this assignment
#py_install("numpy", pip =T)
#py_install("nltk", pip =T)
#py_install("spacy", pip =T)
#py_install("scikit-learn", pip =T)
#py_install("pandas", pip =T)
#giving error while knitting the document, otherwise installed and working
sklearn library, it has several sample datasets. You load python packages by using import PACKAGE. Note that you install and call this package different names (scikit-learn = sklearn).from PACKAGE import FUNCTION. Therefore, you should use from sklearn import datasets.boston dataset by doing: dataset_boston = datasets.load_boston()..head() function: df_boston.head(), after converting the file with pandas (code included below).##python chunk
#repl_python()
import sklearn
from sklearn import datasets
dataset_boston = datasets.load_boston()
sklearn.datasets.load_boston(return_X_y=False)
##convert to pandas
## {'data': array([[6.3200e-03, 1.8000e+01, 2.3100e+00, ..., 1.5300e+01, 3.9690e+02,
## 4.9800e+00],
## [2.7310e-02, 0.0000e+00, 7.0700e+00, ..., 1.7800e+01, 3.9690e+02,
## 9.1400e+00],
## [2.7290e-02, 0.0000e+00, 7.0700e+00, ..., 1.7800e+01, 3.9283e+02,
## 4.0300e+00],
## ...,
## [6.0760e-02, 0.0000e+00, 1.1930e+01, ..., 2.1000e+01, 3.9690e+02,
## 5.6400e+00],
## [1.0959e-01, 0.0000e+00, 1.1930e+01, ..., 2.1000e+01, 3.9345e+02,
## 6.4800e+00],
## [4.7410e-02, 0.0000e+00, 1.1930e+01, ..., 2.1000e+01, 3.9690e+02,
## 7.8800e+00]]), 'target': array([24. , 21.6, 34.7, 33.4, 36.2, 28.7, 22.9, 27.1, 16.5, 18.9, 15. ,
## 18.9, 21.7, 20.4, 18.2, 19.9, 23.1, 17.5, 20.2, 18.2, 13.6, 19.6,
## 15.2, 14.5, 15.6, 13.9, 16.6, 14.8, 18.4, 21. , 12.7, 14.5, 13.2,
## 13.1, 13.5, 18.9, 20. , 21. , 24.7, 30.8, 34.9, 26.6, 25.3, 24.7,
## 21.2, 19.3, 20. , 16.6, 14.4, 19.4, 19.7, 20.5, 25. , 23.4, 18.9,
## 35.4, 24.7, 31.6, 23.3, 19.6, 18.7, 16. , 22.2, 25. , 33. , 23.5,
## 19.4, 22. , 17.4, 20.9, 24.2, 21.7, 22.8, 23.4, 24.1, 21.4, 20. ,
## 20.8, 21.2, 20.3, 28. , 23.9, 24.8, 22.9, 23.9, 26.6, 22.5, 22.2,
## 23.6, 28.7, 22.6, 22. , 22.9, 25. , 20.6, 28.4, 21.4, 38.7, 43.8,
## 33.2, 27.5, 26.5, 18.6, 19.3, 20.1, 19.5, 19.5, 20.4, 19.8, 19.4,
## 21.7, 22.8, 18.8, 18.7, 18.5, 18.3, 21.2, 19.2, 20.4, 19.3, 22. ,
## 20.3, 20.5, 17.3, 18.8, 21.4, 15.7, 16.2, 18. , 14.3, 19.2, 19.6,
## 23. , 18.4, 15.6, 18.1, 17.4, 17.1, 13.3, 17.8, 14. , 14.4, 13.4,
## 15.6, 11.8, 13.8, 15.6, 14.6, 17.8, 15.4, 21.5, 19.6, 15.3, 19.4,
## 17. , 15.6, 13.1, 41.3, 24.3, 23.3, 27. , 50. , 50. , 50. , 22.7,
## 25. , 50. , 23.8, 23.8, 22.3, 17.4, 19.1, 23.1, 23.6, 22.6, 29.4,
## 23.2, 24.6, 29.9, 37.2, 39.8, 36.2, 37.9, 32.5, 26.4, 29.6, 50. ,
## 32. , 29.8, 34.9, 37. , 30.5, 36.4, 31.1, 29.1, 50. , 33.3, 30.3,
## 34.6, 34.9, 32.9, 24.1, 42.3, 48.5, 50. , 22.6, 24.4, 22.5, 24.4,
## 20. , 21.7, 19.3, 22.4, 28.1, 23.7, 25. , 23.3, 28.7, 21.5, 23. ,
## 26.7, 21.7, 27.5, 30.1, 44.8, 50. , 37.6, 31.6, 46.7, 31.5, 24.3,
## 31.7, 41.7, 48.3, 29. , 24. , 25.1, 31.5, 23.7, 23.3, 22. , 20.1,
## 22.2, 23.7, 17.6, 18.5, 24.3, 20.5, 24.5, 26.2, 24.4, 24.8, 29.6,
## 42.8, 21.9, 20.9, 44. , 50. , 36. , 30.1, 33.8, 43.1, 48.8, 31. ,
## 36.5, 22.8, 30.7, 50. , 43.5, 20.7, 21.1, 25.2, 24.4, 35.2, 32.4,
## 32. , 33.2, 33.1, 29.1, 35.1, 45.4, 35.4, 46. , 50. , 32.2, 22. ,
## 20.1, 23.2, 22.3, 24.8, 28.5, 37.3, 27.9, 23.9, 21.7, 28.6, 27.1,
## 20.3, 22.5, 29. , 24.8, 22. , 26.4, 33.1, 36.1, 28.4, 33.4, 28.2,
## 22.8, 20.3, 16.1, 22.1, 19.4, 21.6, 23.8, 16.2, 17.8, 19.8, 23.1,
## 21. , 23.8, 23.1, 20.4, 18.5, 25. , 24.6, 23. , 22.2, 19.3, 22.6,
## 19.8, 17.1, 19.4, 22.2, 20.7, 21.1, 19.5, 18.5, 20.6, 19. , 18.7,
## 32.7, 16.5, 23.9, 31.2, 17.5, 17.2, 23.1, 24.5, 26.6, 22.9, 24.1,
## 18.6, 30.1, 18.2, 20.6, 17.8, 21.7, 22.7, 22.6, 25. , 19.9, 20.8,
## 16.8, 21.9, 27.5, 21.9, 23.1, 50. , 50. , 50. , 50. , 50. , 13.8,
## 13.8, 15. , 13.9, 13.3, 13.1, 10.2, 10.4, 10.9, 11.3, 12.3, 8.8,
## 7.2, 10.5, 7.4, 10.2, 11.5, 15.1, 23.2, 9.7, 13.8, 12.7, 13.1,
## 12.5, 8.5, 5. , 6.3, 5.6, 7.2, 12.1, 8.3, 8.5, 5. , 11.9,
## 27.9, 17.2, 27.5, 15. , 17.2, 17.9, 16.3, 7. , 7.2, 7.5, 10.4,
## 8.8, 8.4, 16.7, 14.2, 20.8, 13.4, 11.7, 8.3, 10.2, 10.9, 11. ,
## 9.5, 14.5, 14.1, 16.1, 14.3, 11.7, 13.4, 9.6, 8.7, 8.4, 12.8,
## 10.5, 17.1, 18.4, 15.4, 10.8, 11.8, 14.9, 12.6, 14.1, 13. , 13.4,
## 15.2, 16.1, 17.8, 14.9, 14.1, 12.7, 13.5, 14.9, 20. , 16.4, 17.7,
## 19.5, 20.2, 21.4, 19.9, 19. , 19.1, 19.1, 20.1, 19.9, 19.6, 23.2,
## 29.8, 13.8, 13.3, 16.7, 12. , 14.6, 21.4, 23. , 23.7, 25. , 21.8,
## 20.6, 21.2, 19.1, 20.6, 15.2, 7. , 8.1, 13.6, 20.1, 21.8, 24.5,
## 23.1, 19.7, 18.3, 21.2, 17.5, 16.8, 22.4, 20.6, 23.9, 22. , 11.9]), 'feature_names': array(['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD',
## 'TAX', 'PTRATIO', 'B', 'LSTAT'], dtype='<U7'), 'DESCR': ".. _boston_dataset:\n\nBoston house prices dataset\n---------------------------\n\n**Data Set Characteristics:** \n\n :Number of Instances: 506 \n\n :Number of Attributes: 13 numeric/categorical predictive. Median Value (attribute 14) is usually the target.\n\n :Attribute Information (in order):\n - CRIM per capita crime rate by town\n - ZN proportion of residential land zoned for lots over 25,000 sq.ft.\n - INDUS proportion of non-retail business acres per town\n - CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)\n - NOX nitric oxides concentration (parts per 10 million)\n - RM average number of rooms per dwelling\n - AGE proportion of owner-occupied units built prior to 1940\n - DIS weighted distances to five Boston employment centres\n - RAD index of accessibility to radial highways\n - TAX full-value property-tax rate per $10,000\n - PTRATIO pupil-teacher ratio by town\n - B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town\n - LSTAT % lower status of the population\n - MEDV Median value of owner-occupied homes in $1000's\n\n :Missing Attribute Values: None\n\n :Creator: Harrison, D. and Rubinfeld, D.L.\n\nThis is a copy of UCI ML housing dataset.\nhttps://archive.ics.uci.edu/ml/machine-learning-databases/housing/\n\n\nThis dataset was taken from the StatLib library which is maintained at Carnegie Mellon University.\n\nThe Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic\nprices and the demand for clean air', J. Environ. Economics & Management,\nvol.5, 81-102, 1978. Used in Belsley, Kuh & Welsch, 'Regression diagnostics\n...', Wiley, 1980. N.B. Various transformations are used in the table on\npages 244-261 of the latter.\n\nThe Boston house-price data has been used in many machine learning papers that address regression\nproblems. \n \n.. topic:: References\n\n - Belsley, Kuh & Welsch, 'Regression diagnostics: Identifying Influential Data and Sources of Collinearity', Wiley, 1980. 244-261.\n - Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth International Conference of Machine Learning, 236-243, University of Massachusetts, Amherst. Morgan Kaufmann.\n", 'filename': 'C:\\Users\\Shashank\\AppData\\Local\\r-miniconda\\envs\\r-reticulate\\lib\\site-packages\\sklearn\\datasets\\data\\boston_house_prices.csv'}
import pandas as pd
df_boston = pd.DataFrame(data=dataset_boston.data, columns=dataset_boston.feature_names)
QUESTION: Look in your environment window. What do you see?
py$VARNAME.DATAFRAME$COLUMN. Try to print out the CRIM column from your df_boston variable.##r chunk
#py$VARNAME
#df_boston$CRIM
py\(CRIM - When using R in Python, instead of `\), we use.like this:r.VARNAME. - To print out a single column, you useDATAFRAME[“COLUMNNAME”]. Try printing out theshapecolumn in therock` dataset.
##python chunk
r.rock
## area peri shape perm
## 0 4990 2791.900 0.090330 6.3
## 1 7002 3892.600 0.148622 6.3
## 2 7558 3930.660 0.183312 6.3
## 3 7352 3869.320 0.117063 6.3
## 4 7943 3948.540 0.122417 17.1
## 5 7979 4010.150 0.167045 17.1
## 6 9333 4345.750 0.189651 17.1
## 7 8209 4344.750 0.164127 17.1
## 8 8393 3682.040 0.203654 119.0
## 9 6425 3098.650 0.162394 119.0
## 10 9364 4480.050 0.150944 119.0
## 11 8624 3986.240 0.148141 119.0
## 12 10651 4036.540 0.228595 82.4
## 13 8868 3518.040 0.231623 82.4
## 14 9417 3999.370 0.172567 82.4
## 15 8874 3629.070 0.153481 82.4
## 16 10962 4608.660 0.204314 58.6
## 17 10743 4787.620 0.262727 58.6
## 18 11878 4864.220 0.200071 58.6
## 19 9867 4479.410 0.144810 58.6
## 20 7838 3428.740 0.113852 142.0
## 21 11876 4353.140 0.291029 142.0
## 22 12212 4697.650 0.240077 142.0
## 23 8233 3518.440 0.161865 142.0
## 24 6360 1977.390 0.280887 740.0
## 25 4193 1379.350 0.179455 740.0
## 26 7416 1916.240 0.191802 740.0
## 27 5246 1585.420 0.133083 740.0
## 28 6509 1851.210 0.225214 890.0
## 29 4895 1239.660 0.341273 890.0
## 30 6775 1728.140 0.311646 890.0
## 31 7894 1461.060 0.276016 890.0
## 32 5980 1426.760 0.197653 950.0
## 33 5318 990.388 0.326635 950.0
## 34 7392 1350.760 0.154192 950.0
## 35 7894 1461.060 0.276016 950.0
## 36 3469 1376.700 0.176969 100.0
## 37 1468 476.322 0.438712 100.0
## 38 3524 1189.460 0.163586 100.0
## 39 5267 1644.960 0.253832 100.0
## 40 5048 941.543 0.328641 1300.0
## 41 1016 308.642 0.230081 1300.0
## 42 5605 1145.690 0.464125 1300.0
## 43 8793 2280.490 0.420477 1300.0
## 44 3475 1174.110 0.200744 580.0
## 45 1651 597.808 0.262651 580.0
## 46 5514 1455.880 0.182453 580.0
## 47 9718 1485.580 0.200447 580.0