This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

Install the important packages

library(dplyr)
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
credit <- read.table("http://nathanieldphillips.com/wp-content/uploads/2015/05/credit.txt", 
                     sep = ",", 
                     header = T, 
                     stringsAsFactors = F)

```

Question 1: A :How many rows and columns are in the dataframe?

nrow(credit)
## [1] 1000
ncol(credit)
## [1] 17

B: What are the names of the columns in the dataframe?

names(credit)
##  [1] "checking_balance"     "months_loan_duration" "credit_history"      
##  [4] "purpose"              "amount"               "savings_balance"     
##  [7] "employment_duration"  "percent_of_income"    "years_at_residence"  
## [10] "age"                  "other_credit"         "housing"             
## [13] "existing_loans_count" "job"                  "dependents"          
## [16] "phone"                "default"

Question 2: The column credit_history is a character value indicating how good the credit history of the customer was.

What are the different values of the credit_history variable and how often did each occur?
table(credit$credit_history)
## 
##  critical      good   perfect      poor very good 
##       293       530        40        88        49

What is the mean loan amount (column is called amount) for each level of credit_history?

aggregate(formula= amount ~ credit_history, FUN = mean, na.rm = T, data = credit)  
##   credit_history   amount
## 1       critical 3088.038
## 2           good 3040.958
## 3        perfect 5305.675
## 4           poor 4302.602
## 5      very good 3344.878

What is the median age for each level of credit_history?

aggregate(formula= age ~ credit_history, FUN = median, na.rm = T, data = credit)
##   credit_history age
## 1       critical  36
## 2           good  31
## 3        perfect  32
## 4           poor  34
## 5      very good  34

Question 3: What was the purpose of the highest loan amount? (Hint: Start by answering the question: What was the maximum loan amount for each each loan purpose?)

require(dplyr)
credit %>% group_by(purpose) %>% summarise(amount.max = max(amount)) 
## Source: local data frame [5 x 2]
## 
##                purpose amount.max
##                  (chr)      (int)
## 1             business      15945
## 2                  car      18424
## 3            education      12612
## 4 furniture/appliances      15653
## 5          renovations      11998

What was the purpose of the smallest loan amount?

aggregate (formula = amount ~ purpose, FUN = min, na.rm = T, data = credit)
##                purpose amount
## 1             business    609
## 2                  car    250
## 3            education    339
## 4 furniture/appliances    338
## 5          renovations    454
sorry none of my neighbors and me neither konw how to extract the minimum/maximum amount so if it was in real life, I would simply look at the output and see that the purpose for the smallest amount is a carand for the highest it’s also a car.

Q4: Does it look like there is a relationship between a person’s housing status and their age?

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.