ITEC 621 Exercise 1 - R Refresher

Author

Sohaib Anwar

Published

January 17, 2026

General Instructions

This exercise is similar to HW0 in KSB-999, which you were required to complete before starting this course. So, if you already did that, this should be an easy exercise and a good warm up refresher. If you didn’t do it, this is you opportunity to catch up. This course moves fast and it assumes that you have some familiarity with R.

Download the R Quarto template for this exercise Ex1_R_YourLastName.Qmd and save it to your class project folder. Rename the file to replace YourLastName with your own last name.
Open the file in RStudio and complete the coding exercises and answer the interpretation questions. Run the code to ensure everything is working fine.
I will always display the exercise code output in the instructions on Canvas, so that you can compare your results against the solution. Technical Note: Your quantitative and visual outputs should look identical to the outputs shown in the homework. Now is a good time to compare your results, and ask me with questions if they differ.
When done, knit your R Quarto file into a Word document. On Canvas please submit both the knitted word document AND the .qmd file. If you have troubles knitting a word file, you can knit to a PDF file, or to an HTML file and then save it as a PDF.

Knitting

Preparing analytic reports using the {knitr} package is an important learning objective of this course. Your R Quarto file MUST have the attribute echo=T in the {r global options} so that we can see and grade your R code. You are required to knit your homework R Quarto files to a Word (preferred) or PDF file. The knitted document should adhere to proper business-like formatting and appearance. This means all interpretations should be in markdown sections, NOT code comments.

As such, inadequate or no knitting will carry point deductions up to 30 points max.

1. Basic R Concepts

1.1 Write a simple R function named area() that takes 2 values as parameters (x and y, representing the two sides of a rectangle) and returns the product of the two values (representing the rectangle’s area). Then use this function to display the area of a rectangle of sides 6x4. Then, use the functions paste(), print() and area() to output this result: The area of a rectangle of sides 6x4 is 24, where 24 is calculated with the area() function you just created

a = 6
b = 4
area = a*b
print(paste("The area of rectangle with sides",a,"x",b,"is",area))

[1] "The area of rectangle with sides 6 x 4 is 24"

1.2 Write a simple for loop for i from 1 to 10. In each loop cycle, compute the area of a rectangle of sides i and i*2, using the function created in 1.1, (i.e., all rectangles have one side double the length than the other) and for each of the 10 rectangles display “The area of an 1 x 2 rectangle is 2” for i=1, “The area of an 2 x 4 rectangle is 8”, and so on.

for (i in 1:10) {
  area <- i*(i*2)
  print(paste("The area of rectangle",i,"x",i*2,"is",area))
}

[1] "The area of rectangle 1 x 2 is 2"
[1] "The area of rectangle 2 x 4 is 8"
[1] "The area of rectangle 3 x 6 is 18"
[1] "The area of rectangle 4 x 8 is 32"
[1] "The area of rectangle 5 x 10 is 50"
[1] "The area of rectangle 6 x 12 is 72"
[1] "The area of rectangle 7 x 14 is 98"
[1] "The area of rectangle 8 x 16 is 128"
[1] "The area of rectangle 9 x 18 is 162"
[1] "The area of rectangle 10 x 20 is 200"

2. Data Manipulation

2.1 Copy the Credit.csv data file to your working directory (if you haven’t done this yet). Then read the Credit.csv data file into a data frame object named Credit (Tip: use the read.table() function with the parameters header=T, sep=",", row.names=1). Then, list the first 5 columns of the top 5 rows (Tip: use Credit[1:5,1:5])

getwd()

[1] "C:/Users/auuser/Desktop/R Exercise 1"

Credit <- read.table("Credit.csv", header = TRUE, sep = ",", row.names = 1)
print(Credit[1:5,1:5])

   Income Limit Rating Cards Age
1  14.891  3606    283     2  34
2 106.025  6645    483     3  82
3 104.593  7075    514     4  71
4 148.924  9504    681     3  36
5  55.882  4897    357     2  68

2.2 Using the class() function, display the object class for the Credit dataset, and for Gender (i.e., Credit$Gender), Income and Cards

class (Credit)

[1] "data.frame"

class (Credit$Gender)

[1] "character"

class (Credit$Income)

[1] "numeric"

class (Credit$Cards)

[1] "integer"

2.3 Create a vector named income.vect with data from the Income column of the Credit dataframe. Then use the head() function to display the first 6 values of this vector.

Credit <- read.table("Credit.csv", header = TRUE, sep = ",", row.names = 1)
c = head(Credit$Income)
c

[1]  14.891 106.025 104.593 148.924  55.882  80.180

3. Basic Descriptive Analytics

3.1 Compute the mean, minimum, maximum, standard deviation and variance for all the values in this income vector. Store the respective results in variables name mean.inc, min.inc, etc. Then, use the c() function to create a vector called income.stats with 5 values you computed above. Then use the names() function to give the corresponding names “Mean”, “Min”, “Max”, “StDev”, and “Var”. Then display Income.stats.

Technical Note: The names() needs to create a vector with the respective names above, which need to correspond to the values in incom.vect. Therefore, you need to use the c() function to create a vector with these 5 names.

a = c(Credit$Income)
a

  [1]  14.891 106.025 104.593 148.924  55.882  80.180  20.996  71.408  15.125
 [10]  71.061  63.095  15.045  80.616  43.682  19.144  20.089  53.598  36.496
 [19]  49.570  42.079  17.700  37.348  20.103  64.027  10.742  14.090  42.471
 [28]  32.793 186.634  26.813  34.142  28.941 134.181  31.367  20.150  23.350
 [37]  62.413  30.007  11.795  13.647  34.950 113.659  44.158  36.929  31.861
 [46]  77.380  19.531  44.646  44.522  43.479  36.362  39.705  44.205  16.304
 [55]  15.333  32.916  57.100  76.273  10.354  51.872  35.510  21.238  30.682
 [64]  14.132  32.164  12.000 113.829  11.187  27.847  49.502  24.889  58.781
 [73]  22.939  23.989  16.103  33.017  30.622  20.936 110.968  15.354  27.369
 [82]  53.480  23.672  19.225  43.540 152.298  55.367  11.741  15.560  59.530
 [91]  20.191  48.498  30.733  16.479  38.009  14.084  14.312  26.067  36.295
[100]  83.851  21.153  17.976  68.713 146.183  15.846  12.031  16.819  39.110
[109] 107.986  13.561  34.537  28.575  46.007  69.251  16.482  40.442  35.177
[118]  91.362  27.039  23.012  27.241 148.080  62.602  11.808  29.564  27.578
[127]  26.427  57.202 123.299  18.145  23.793  10.726  23.283  21.455  34.664
[136]  44.473  54.663  36.355  21.374 107.841  39.831  91.876 103.893  19.636
[145]  17.392  19.529  17.055  23.857  15.184  13.444  63.931  35.864  41.419
[154]  92.112  55.056  19.537  31.811  56.256  42.357  53.319  12.238  31.353
[163]  63.809  13.676  76.782  25.383  35.691  29.403  27.470  27.330  34.772
[172]  36.934  76.348  14.887 121.834  30.132  24.050  22.379  28.316  58.026
[181]  10.635  46.102  58.929  80.861 158.889  30.420  36.472  23.365  83.869
[190]  58.351  55.187 124.290  28.508 130.209  30.406  23.883  93.039  50.699
[199]  27.349  10.403  23.949  73.914  21.038  68.206  57.337  10.793  23.450
[208]  10.842  51.345 151.947  24.543  29.567  39.145  39.422  34.909  41.025
[217]  15.476  12.456  10.627  38.954  44.847  98.515  33.437  27.512 121.709
[226]  15.079  59.879  66.989  69.165  69.943  33.214  25.124  15.741  11.603
[235]  69.656  10.503  42.529  60.579  26.532  27.952  29.705  15.602  20.918
[244]  58.165  22.561  34.509  19.588  36.364  15.717  22.574  10.363  28.474
[253]  72.945  85.425  36.508  58.063  25.936  15.629  41.400  33.657  67.937
[262] 180.379  10.588  29.725  27.999  40.885  88.830  29.638  25.988  39.055
[271]  15.866  44.978  30.413  16.751  30.550 163.329  23.106  41.532 128.040
[280]  54.319  53.401  36.142  63.534  49.927  14.711  18.967  18.036  60.449
[289]  16.711  10.852  26.370  24.088  51.532 140.672  42.915  27.272  65.896
[298]  55.054  20.791  24.919  21.786  31.335  59.855  44.061  82.706  24.460
[307]  45.120  75.406  14.956  75.257  33.694  23.375  27.825  92.386 115.520
[316]  14.479  52.179  68.462  18.951  27.590  16.279  25.078  27.229 182.728
[325]  31.029  17.765 125.480  49.166  41.192  94.193  20.405  12.581  62.328
[334]  21.011  24.230  24.314  32.856  12.414  41.365 149.316  27.794  13.234
[343]  14.595  10.735  48.218  30.012  21.551 160.231  13.433  48.577  30.002
[352]  61.620 104.483  41.868  12.068 180.682  34.480  39.609  30.111  12.335
[361]  53.566  53.217  26.162  64.173 128.669 113.772  61.069  23.793  89.000
[370]  71.682  35.610  39.116  19.782  55.412  29.400  20.974  87.625  28.144
[379]  19.349  53.308 115.123 101.788  24.824  14.292  20.088  26.400  19.253
[388]  16.529  37.878  83.948 135.118  73.327  25.974  17.316  49.794  12.096
[397]  13.364  57.872  37.728  18.701

Avg = mean(a)
Min.= min(a)
Max.= max(a)
SD = sd(a)
Var = var(a)
income.stats = c(Avg, Min., Max., SD, Var)
names(income.stats) = c("Mean", "Min.", "Max.", "StDev", "Var") 
income.stats

      Mean       Min.       Max.      StDev        Var 
  45.21889   10.35400  186.63400   35.24427 1242.15879

3.2 Display a boxplot for the predictor Income. Tip: you can do this 2 ways. First you can attach() the Credit data set (which loads the data set in the work environment) and then do a boxplot() for Income. Or, do it without attaching, but using the table prefix (i.e., Credit$Income). Use the xlab attribute to name the label “Income”. Then display a similar boxplot, but this time broken down by **Gender** (i.e. Income ~ Gender).

boxplot(Credit$Income, ylab="Amount (USD)",xlab="Income")

boxplot(Credit$Income~Credit$Gender, ylab="Income Amount (USD)", xlab="Gender based Distribution")

3.3 Display a histogram for the variable Rating, with the main title “Credit Rating Histogram” (main=) and X label “Rating” (xlab=). Then draw a QQ Plot for Rating (Tip: use the qqnorm() function first to draw the data points and then use the qqline() function to layer the QQ Line on top).

Rating = c(Credit$Rating)
hist(Rating, main="Credit Rating Histrogram", xlab="Rating", Ylab="Frequency")

qqnorm(Rating)
qqline(Rating)

3.4 Briefly answer: Do you think that this data is somewhat normally distributed? Why or why not? In your answer, please refer to both, the Histogram and the QQ Plot.

Response: The data is right-skewed as can be observed on both the histogram with data concentrated on left on histogram, and below zero value on the qq plot.

4. Basic Predictive Analytics

4.1 First, enter the command options(scipen=4) to minimize the display values with scientific notation. Then, create a simple linear regression model object with the lm() function to fit credit Rating as a function of Income and save the results in an object named lm.rating. Then display the model summary results with the summary() function. Tip: use the formula Rating ~ Income, data=Credit inside the lm() function.

options(scipen=4)
lm.rating=lm(Rating~Income, data=Credit)
summary(lm.rating)


Call:
lm(formula = Rating ~ Income, data = Credit)

Residuals:
     Min       1Q   Median       3Q      Max 
-173.855  -79.417   -0.384   79.747  171.955 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 197.8411     7.7089   25.66   <2e-16 ***
Income        3.4742     0.1345   25.83   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 94.71 on 398 degrees of freedom
Multiple R-squared:  0.6263,    Adjusted R-squared:  0.6253 
F-statistic:   667 on 1 and 398 DF,  p-value: < 2.2e-16

4.2 Now, plot Credit Rating (Y axis) against Income (X axis), with respective labels “Income” and “Credit Rating”. Tip: feed the same formula you used in the lm() function above, but using the plot() function instead. Then draw a regression line by feeding lm.rating into the abline() function.

plot(Rating~Income, data=Credit)
abline(lm.rating)

4.3 Write a simple linear model to predict credit ratings using these predictors: Income, Limit, Cards, Married and Balance. Name the resulting model lm.rating.5. Then display the regression using the summary() function. No need to answer, but what do you think are the most influential predictors of credit rating?

lm.rating.5=lm(Rating~Income+Limit+Cards+Married+Balance, data=Credit)
summary(lm.rating.5)


Call:
lm(formula = Rating ~ Income + Limit + Cards + Married + Balance, 
    data = Credit)

Residuals:
     Min       1Q   Median       3Q      Max 
-24.0051  -7.0024  -0.9291   6.3789  26.2751 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) 27.1070066  2.1867611  12.396  < 2e-16 ***
Income       0.0975008  0.0335195   2.909  0.00383 ** 
Limit        0.0641536  0.0009004  71.247  < 2e-16 ***
Cards        4.7108256  0.3762419  12.521  < 2e-16 ***
MarriedYes   2.1217503  1.0441007   2.032  0.04281 *  
Balance      0.0084355  0.0031308   2.694  0.00735 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 10.14 on 394 degrees of freedom
Multiple R-squared:  0.9958,    Adjusted R-squared:  0.9957 
F-statistic: 1.85e+04 on 5 and 394 DF,  p-value: < 2.2e-16