作業一

現有一金融業客戶流失資料,資料如下:

library(readr)
churn <- read_csv("https://raw.githubusercontent.com/ywchiu/rcathaybk/master/data/Churn_Modelling.csv")
## Parsed with column specification:
## cols(
##   RowNumber = col_integer(),
##   CustomerId = col_integer(),
##   Surname = col_character(),
##   CreditScore = col_integer(),
##   Geography = col_character(),
##   Gender = col_character(),
##   Age = col_integer(),
##   Tenure = col_integer(),
##   Balance = col_double(),
##   NumOfProducts = col_integer(),
##   HasCrCard = col_integer(),
##   IsActiveMember = col_integer(),
##   EstimatedSalary = col_double(),
##   Exited = col_integer()
## )
head(churn)
## # A tibble: 6 x 14
##   RowNumber CustomerId  Surname CreditScore Geography Gender   Age Tenure
##       <int>      <int>    <chr>       <int>     <chr>  <chr> <int>  <int>
## 1         1   15634602 Hargrave         619    France Female    42      2
## 2         2   15647311     Hill         608     Spain Female    41      1
## 3         3   15619304     Onio         502    France Female    42      8
## 4         4   15701354     Boni         699    France Female    39      1
## 5         5   15737888 Mitchell         850     Spain Female    43      2
## 6         6   15574012      Chu         645     Spain   Male    44      8
## # ... with 6 more variables: Balance <dbl>, NumOfProducts <int>,
## #   HasCrCard <int>, IsActiveMember <int>, EstimatedSalary <dbl>,
## #   Exited <int>

請試用R 語言回答以下問題:

  1. 請統計出有多少比例的客戶有信用卡(HasCrCard)?
  2. 請計算出最低、最高與平均預估薪資(EstimatedSalary) ?
  3. 請列出預估薪資(EstimatedSalary)前三高的客戶ID?
  4. 請計算出平均預估薪資(EstimatedSalary)100,000以上的客戶流失(Existed)比例?
  5. 請依地理區域(Geography)與性別(Gender)分組計算出平均預估薪資(EstimatedSalary)?
  6. 請移除掉RowNumber,CustomerId 以及Surname等欄位?
  7. 繼第6題,請將Geography, Gender,HasCrCard,IsActiveMember,Exited 轉換為Factor 欄位
  8. 繼第7題,以客戶是否流失(Existed)為目標,使用決策樹(rpart)建立分類模型?
  9. 繼第8題,請計算出該模型的準確度(Accuracy)與混淆矩陣(Confusion Matrix)?
  10. 繼第9題,請繪製該模型的ROC Curve,並計算該曲線下的面積(AUC) ?