a = 6
b = 4
area = a*b
print(paste("The area of rectangle with sides",a,"x",b,"is",area))[1] "The area of rectangle with sides 6 x 4 is 24"
This exercise is similar to HW0 in KSB-999, which you were required to complete before starting this course. So, if you already did that, this should be an easy exercise and a good warm up refresher. If you didn’t do it, this is you opportunity to catch up. This course moves fast and it assumes that you have some familiarity with R.
Download the R Quarto template for this exercise Ex1_R_YourLastName.Qmd and save it to your class project folder. Rename the file to replace YourLastName with your own last name.
Open the file in RStudio and complete the coding exercises and answer the interpretation questions. Run the code to ensure everything is working fine.
I will always display the exercise code output in the instructions on Canvas, so that you can compare your results against the solution. Technical Note: Your quantitative and visual outputs should look identical to the outputs shown in the homework. Now is a good time to compare your results, and ask me with questions if they differ.
When done, knit your R Quarto file into a Word document. On Canvas please submit both the knitted word document AND the .qmd file. If you have troubles knitting a word file, you can knit to a PDF file, or to an HTML file and then save it as a PDF.
Preparing analytic reports using the {knitr} package is an important learning objective of this course. Your R Quarto file MUST have the attribute echo=T in the {r global options} so that we can see and grade your R code. You are required to knit your homework R Quarto files to a Word (preferred) or PDF file. The knitted document should adhere to proper business-like formatting and appearance. This means all interpretations should be in markdown sections, NOT code comments.
As such, inadequate or no knitting will carry point deductions up to 30 points max.
1.1 Write a simple R function named area() that takes 2 values as parameters (x and y, representing the two sides of a rectangle) and returns the product of the two values (representing the rectangle’s area). Then use this function to display the area of a rectangle of sides 6x4. Then, use the functions paste(), print() and area() to output this result: The area of a rectangle of sides 6x4 is 24, where 24 is calculated with the area() function you just created
a = 6
b = 4
area = a*b
print(paste("The area of rectangle with sides",a,"x",b,"is",area))[1] "The area of rectangle with sides 6 x 4 is 24"
1.2 Write a simple for loop for i from 1 to 10. In each loop cycle, compute the area of a rectangle of sides i and i*2, using the function created in 1.1, (i.e., all rectangles have one side double the length than the other) and for each of the 10 rectangles display “The area of an 1 x 2 rectangle is 2” for i=1, “The area of an 2 x 4 rectangle is 8”, and so on.
for (i in 1:10) {
area <- i*(i*2)
print(paste("The area of rectangle",i,"x",i*2,"is",area))
}[1] "The area of rectangle 1 x 2 is 2"
[1] "The area of rectangle 2 x 4 is 8"
[1] "The area of rectangle 3 x 6 is 18"
[1] "The area of rectangle 4 x 8 is 32"
[1] "The area of rectangle 5 x 10 is 50"
[1] "The area of rectangle 6 x 12 is 72"
[1] "The area of rectangle 7 x 14 is 98"
[1] "The area of rectangle 8 x 16 is 128"
[1] "The area of rectangle 9 x 18 is 162"
[1] "The area of rectangle 10 x 20 is 200"
2.1 Copy the Credit.csv data file to your working directory (if you haven’t done this yet). Then read the Credit.csv data file into a data frame object named Credit (Tip: use the read.table() function with the parameters header=T, sep=",", row.names=1). Then, list the first 5 columns of the top 5 rows (Tip: use Credit[1:5,1:5])
getwd()[1] "C:/Users/auuser/Desktop/R Exercise 1"
Credit <- read.table("Credit.csv", header = TRUE, sep = ",", row.names = 1)
print(Credit[1:5,1:5]) Income Limit Rating Cards Age
1 14.891 3606 283 2 34
2 106.025 6645 483 3 82
3 104.593 7075 514 4 71
4 148.924 9504 681 3 36
5 55.882 4897 357 2 68
2.2 Using the class() function, display the object class for the Credit dataset, and for Gender (i.e., Credit$Gender), Income and Cards
class (Credit)[1] "data.frame"
class (Credit$Gender)[1] "character"
class (Credit$Income)[1] "numeric"
class (Credit$Cards)[1] "integer"
2.3 Create a vector named income.vect with data from the Income column of the Credit dataframe. Then use the head() function to display the first 6 values of this vector.
Credit <- read.table("Credit.csv", header = TRUE, sep = ",", row.names = 1)
c = head(Credit$Income)
c[1] 14.891 106.025 104.593 148.924 55.882 80.180
3.1 Compute the mean, minimum, maximum, standard deviation and variance for all the values in this income vector. Store the respective results in variables name mean.inc, min.inc, etc. Then, use the c() function to create a vector called income.stats with 5 values you computed above. Then use the names() function to give the corresponding names “Mean”, “Min”, “Max”, “StDev”, and “Var”. Then display Income.stats.
Technical Note: The names() needs to create a vector with the respective names above, which need to correspond to the values in incom.vect. Therefore, you need to use the c() function to create a vector with these 5 names.
a = c(Credit$Income)
a [1] 14.891 106.025 104.593 148.924 55.882 80.180 20.996 71.408 15.125
[10] 71.061 63.095 15.045 80.616 43.682 19.144 20.089 53.598 36.496
[19] 49.570 42.079 17.700 37.348 20.103 64.027 10.742 14.090 42.471
[28] 32.793 186.634 26.813 34.142 28.941 134.181 31.367 20.150 23.350
[37] 62.413 30.007 11.795 13.647 34.950 113.659 44.158 36.929 31.861
[46] 77.380 19.531 44.646 44.522 43.479 36.362 39.705 44.205 16.304
[55] 15.333 32.916 57.100 76.273 10.354 51.872 35.510 21.238 30.682
[64] 14.132 32.164 12.000 113.829 11.187 27.847 49.502 24.889 58.781
[73] 22.939 23.989 16.103 33.017 30.622 20.936 110.968 15.354 27.369
[82] 53.480 23.672 19.225 43.540 152.298 55.367 11.741 15.560 59.530
[91] 20.191 48.498 30.733 16.479 38.009 14.084 14.312 26.067 36.295
[100] 83.851 21.153 17.976 68.713 146.183 15.846 12.031 16.819 39.110
[109] 107.986 13.561 34.537 28.575 46.007 69.251 16.482 40.442 35.177
[118] 91.362 27.039 23.012 27.241 148.080 62.602 11.808 29.564 27.578
[127] 26.427 57.202 123.299 18.145 23.793 10.726 23.283 21.455 34.664
[136] 44.473 54.663 36.355 21.374 107.841 39.831 91.876 103.893 19.636
[145] 17.392 19.529 17.055 23.857 15.184 13.444 63.931 35.864 41.419
[154] 92.112 55.056 19.537 31.811 56.256 42.357 53.319 12.238 31.353
[163] 63.809 13.676 76.782 25.383 35.691 29.403 27.470 27.330 34.772
[172] 36.934 76.348 14.887 121.834 30.132 24.050 22.379 28.316 58.026
[181] 10.635 46.102 58.929 80.861 158.889 30.420 36.472 23.365 83.869
[190] 58.351 55.187 124.290 28.508 130.209 30.406 23.883 93.039 50.699
[199] 27.349 10.403 23.949 73.914 21.038 68.206 57.337 10.793 23.450
[208] 10.842 51.345 151.947 24.543 29.567 39.145 39.422 34.909 41.025
[217] 15.476 12.456 10.627 38.954 44.847 98.515 33.437 27.512 121.709
[226] 15.079 59.879 66.989 69.165 69.943 33.214 25.124 15.741 11.603
[235] 69.656 10.503 42.529 60.579 26.532 27.952 29.705 15.602 20.918
[244] 58.165 22.561 34.509 19.588 36.364 15.717 22.574 10.363 28.474
[253] 72.945 85.425 36.508 58.063 25.936 15.629 41.400 33.657 67.937
[262] 180.379 10.588 29.725 27.999 40.885 88.830 29.638 25.988 39.055
[271] 15.866 44.978 30.413 16.751 30.550 163.329 23.106 41.532 128.040
[280] 54.319 53.401 36.142 63.534 49.927 14.711 18.967 18.036 60.449
[289] 16.711 10.852 26.370 24.088 51.532 140.672 42.915 27.272 65.896
[298] 55.054 20.791 24.919 21.786 31.335 59.855 44.061 82.706 24.460
[307] 45.120 75.406 14.956 75.257 33.694 23.375 27.825 92.386 115.520
[316] 14.479 52.179 68.462 18.951 27.590 16.279 25.078 27.229 182.728
[325] 31.029 17.765 125.480 49.166 41.192 94.193 20.405 12.581 62.328
[334] 21.011 24.230 24.314 32.856 12.414 41.365 149.316 27.794 13.234
[343] 14.595 10.735 48.218 30.012 21.551 160.231 13.433 48.577 30.002
[352] 61.620 104.483 41.868 12.068 180.682 34.480 39.609 30.111 12.335
[361] 53.566 53.217 26.162 64.173 128.669 113.772 61.069 23.793 89.000
[370] 71.682 35.610 39.116 19.782 55.412 29.400 20.974 87.625 28.144
[379] 19.349 53.308 115.123 101.788 24.824 14.292 20.088 26.400 19.253
[388] 16.529 37.878 83.948 135.118 73.327 25.974 17.316 49.794 12.096
[397] 13.364 57.872 37.728 18.701
Avg = mean(a)
Min.= min(a)
Max.= max(a)
SD = sd(a)
Var = var(a)
income.stats = c(Avg, Min., Max., SD, Var)
names(income.stats) = c("Mean", "Min.", "Max.", "StDev", "Var")
income.stats Mean Min. Max. StDev Var
45.21889 10.35400 186.63400 35.24427 1242.15879
3.2 Display a boxplot for the predictor Income. Tip: you can do this 2 ways. First you can attach() the Credit data set (which loads the data set in the work environment) and then do a boxplot() for Income. Or, do it without attaching, but using the table prefix (i.e., Credit$Income). Use the xlab attribute to name the label “Income”. Then display a similar boxplot, but this time broken down by **Gender** (i.e. Income ~ Gender).
boxplot(Credit$Income, ylab="Amount (USD)",xlab="Income")boxplot(Credit$Income~Credit$Gender, ylab="Income Amount (USD)", xlab="Gender based Distribution")3.3 Display a histogram for the variable Rating, with the main title “Credit Rating Histogram” (main=) and X label “Rating” (xlab=). Then draw a QQ Plot for Rating (Tip: use the qqnorm() function first to draw the data points and then use the qqline() function to layer the QQ Line on top).
Rating = c(Credit$Rating)
hist(Rating, main="Credit Rating Histrogram", xlab="Rating", Ylab="Frequency")qqnorm(Rating)
qqline(Rating)3.4 Briefly answer: Do you think that this data is somewhat normally distributed? Why or why not? In your answer, please refer to both, the Histogram and the QQ Plot.
Response: The data is right-skewed as can be observed on both the histogram with data concentrated on left on histogram, and below zero value on the qq plot.
4.1 First, enter the command options(scipen=4) to minimize the display values with scientific notation. Then, create a simple linear regression model object with the lm() function to fit credit Rating as a function of Income and save the results in an object named lm.rating. Then display the model summary results with the summary() function. Tip: use the formula Rating ~ Income, data=Credit inside the lm() function.
options(scipen=4)
lm.rating=lm(Rating~Income, data=Credit)
summary(lm.rating)
Call:
lm(formula = Rating ~ Income, data = Credit)
Residuals:
Min 1Q Median 3Q Max
-173.855 -79.417 -0.384 79.747 171.955
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 197.8411 7.7089 25.66 <2e-16 ***
Income 3.4742 0.1345 25.83 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 94.71 on 398 degrees of freedom
Multiple R-squared: 0.6263, Adjusted R-squared: 0.6253
F-statistic: 667 on 1 and 398 DF, p-value: < 2.2e-16
4.2 Now, plot Credit Rating (Y axis) against Income (X axis), with respective labels “Income” and “Credit Rating”. Tip: feed the same formula you used in the lm() function above, but using the plot() function instead. Then draw a regression line by feeding lm.rating into the abline() function.
plot(Rating~Income, data=Credit)
abline(lm.rating)4.3 Write a simple linear model to predict credit ratings using these predictors: Income, Limit, Cards, Married and Balance. Name the resulting model lm.rating.5. Then display the regression using the summary() function. No need to answer, but what do you think are the most influential predictors of credit rating?
lm.rating.5=lm(Rating~Income+Limit+Cards+Married+Balance, data=Credit)
summary(lm.rating.5)
Call:
lm(formula = Rating ~ Income + Limit + Cards + Married + Balance,
data = Credit)
Residuals:
Min 1Q Median 3Q Max
-24.0051 -7.0024 -0.9291 6.3789 26.2751
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 27.1070066 2.1867611 12.396 < 2e-16 ***
Income 0.0975008 0.0335195 2.909 0.00383 **
Limit 0.0641536 0.0009004 71.247 < 2e-16 ***
Cards 4.7108256 0.3762419 12.521 < 2e-16 ***
MarriedYes 2.1217503 1.0441007 2.032 0.04281 *
Balance 0.0084355 0.0031308 2.694 0.00735 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 10.14 on 394 degrees of freedom
Multiple R-squared: 0.9958, Adjusted R-squared: 0.9957
F-statistic: 1.85e+04 on 5 and 394 DF, p-value: < 2.2e-16