Part 1: Data Exploration

Data Dictionary

Item Variable.Name Definition Theoretical.Effect
1 Index Unique identifier (key) for a distinct service contract. None
2 Area Location where the service was sold. Unknown effect
3 Service Approach S is a particular service model. All other services are labeled not S. The theory is that S will be much more effective than anything else we offer.
4 Channel Approach We sell services largely through other companies. This reflects two different channel models. Unknown effect
5 Renewal Flag The key variable – whether or not a contract renewed. If it did not renew it is now an expired contract representing lost business. N/A - Variable to predict
6 Contract Expiration Date The date the contract expired or if renewed when it will expire. Unknown effect
7 Contract Length Annual or multi-year renewal. Multi-year contracts may be more likely to renew.
8 Sales Category Different categories of clients. Unknown effect
9 Seller Unique ID Unique idenitfier for the actual seller of the service on behalf of the company. Certain sellers may be more likely to renew.
10 Contract Value Category Buckets for contract value based on net (i.e. discounted price). Discounted contracts may be more likely to renew.
11 Contract Line Category Buckets based on items on the contract. Less than 10 is very small, 10-49 is small, 50-99 is medium, 100-999 is large and greater than 999 is very large. Contracts with more items may be more likely torenew.
12 Discount Category Buckets representining the amount of discount applied from the catalogue price of the contract. Discounts are earned based on loyalty, length of contract, system configuration and geography. Discounted contracts may be more likely to renew.
13 Multiple Services Flag to call out whether there is more than one service approach on the contract. When there is more than 1 the highest value service is called out in the Service Approach column. Unknown effect
14 Item Count Count of the items under service on the contract. Contracts with more items may be more likely torenew.
15 Cost Catalogue cost for the service requested on the items of the contract. Unknown effect

Summary Statistics

##      INDEX                    AREA       SERVICE_APPROACH CHANNEL_APPROACH
##  Min.   :    1   USA            :67978   NOT S:32817      FIRST :43042    
##  1st Qu.:22932   CANADA         : 6679   S    :58909      SECOND:48684    
##  Median :45864   US_CANADA_OTHER: 5019                                    
##  Mean   :45864   BRASIL         : 3059                                    
##  3rd Qu.:68795   CANSAC         : 2878                                    
##  Max.   :91726   MCO            : 2672                                    
##                  (Other)        : 3441                                    
##   RENEWAL_FLAG   CONTRACT_EXPIRATION   CONTRACT_LENGTH  SALES_CATEGORY
##  Min.   :0.000   7/31/2015 : 4475    ANNUAL    :75975   COM  :60259   
##  1st Qu.:0.000   12/31/2015: 2843    MULTI-YEAR:15751   ENT  : 7025   
##  Median :0.000   12/31/2016: 1938                       OTHER: 6562   
##  Mean   :0.319   3/31/2016 : 1470                       PS   :17571   
##  3rd Qu.:1.000   10/31/2016: 1425                       SMB  :    1   
##  Max.   :1.000   9/30/2015 : 1230                       SP   :  308   
##                  (Other)   :78345                                     
##  SELLER_UNIQUE_ID CONTRACT_VALUE_CATEGORY CONTRACT_LINE_CATEGORY
##  54     : 8773    <10K     :80500         LARGE     : 2601      
##  4139   : 3387    >250K    :  441         MEDIUM    : 2190      
##  #VALUE!: 3026    100K-250K:  826         SMALL     :12182      
##  47796  : 1712    10K-25K  : 6045         VERY LARGE:  262      
##  29543  : 1521    25K-50K  : 2551         VERY SMALL:74491      
##  15617  : 1367    50K-100K : 1363                               
##  (Other):71940                                                  
##    DISCOUNT_CATEGORY MULTIPLE_SERVICES   ITEM_COUNT             COST      
##  #DIV/0!    :  146   NO :91009         Min.   :    1.00   $101.00 :  955  
##  LARGE      :23028   YES:  717         1st Qu.:    1.00   $69.00  :  902  
##  MEDIUM     :42578                     Median :    2.00   $119.00 :  730  
##  NO DISCOUNT:16029                     Mean   :   22.71   $71.00  :  638  
##  VERY LARGE : 9945                     3rd Qu.:    6.00   $203.00 :  502  
##                                        Max.   :84184.00   $100.00 :  457  
##                                                           (Other) :87542  
##  SALES_STRATEGY 
##  GCS    :71126  
##  NOT GCS:20600  
##                 
##                 
##                 
##                 
## 
INDEX AREA SERVICE_APPROACH CHANNEL_APPROACH RENEWAL_FLAG CONTRACT_EXPIRATION CONTRACT_LENGTH SALES_CATEGORY SELLER_UNIQUE_ID CONTRACT_VALUE_CATEGORY CONTRACT_LINE_CATEGORY DISCOUNT_CATEGORY MULTIPLE_SERVICES ITEM_COUNT COST SALES_STRATEGY
1 USA S FIRST 1 12/29/2015 ANNUAL ENT 54 25K-50K SMALL LARGE NO 38 $63,958.00 NOT GCS
2 USA S FIRST 1 9/30/2016 ANNUAL PS 47796 <10K SMALL VERY LARGE NO 14 $18,966.00 NOT GCS
3 USA S SECOND 0 7/31/2015 ANNUAL COM 1611 <10K SMALL NO DISCOUNT NO 11 $5,518.00 GCS
4 USA S SECOND 0 9/30/2015 ANNUAL PS -989 25K-50K SMALL LARGE NO 18 $62,150.00 NOT GCS
5 USA S FIRST 0 1/19/2016 ANNUAL ENT 24116 25K-50K SMALL NO DISCOUNT NO 17 $18,870.56 NOT GCS
6 USA S SECOND 1 12/22/2016 ANNUAL COM 25276 <10K VERY SMALL MEDIUM NO 9 $10,682.00 GCS
#basicStats(fp1)

Our data set contains recent service contracts with Company X, some of which were renewed and some which were not renewed. Our goal is to predict whether a service contract will be renewed based on the attributes in the data set. A renewal is considered successful.

There are 91726 rows of data, each representing a service contract with Company X.

We have 15 potential predictor variables and one response variable (“RENEWAL_FLAG”) that indicates if the contract was renewed.

Analysis of Predictor Variables

Missing Data to be Imputed

The transformed data set can be seen here:

https://raw.githubusercontent.com/spsstudent15/2016-02-621-W2/master/621-FP-Transformed-Data.csv