Let’s get a quick overview of the dataset you gave me to see if any immediate properties stick out.
## GenderDesc RaceDesc Age
## Female: 30 American Indian or Alaska Native: 5 Min. :22.00
## Male :196 Black or African American : 31 1st Qu.:39.25
## Multi-Racial : 4 Median :53.00
## White :186 Mean :49.96
## 3rd Qu.:59.00
## Max. :86.00
##
## EnrollDate ExitDate DaysEnrolled
## Min. :2014-07-21 Min. :2014-07-22 Min. : 2.00
## 1st Qu.:2014-11-05 1st Qu.:2015-01-07 1st Qu.: 19.50
## Median :2015-01-07 Median :2015-03-31 Median : 61.00
## Mean :2015-01-17 Mean :2015-03-11 Mean : 71.12
## 3rd Qu.:2015-03-29 3rd Qu.:2015-05-31 3rd Qu.:120.25
## Max. :2015-07-17 Max. :2015-07-31 Max. :184.00
## NA's :30
## LenthofStay Enrolled StillEnrolled Exited
## Min. : 1.00 Min. :1 Min. :0.000 Min. :0.000
## 1st Qu.: 19.50 1st Qu.:1 1st Qu.:0.000 1st Qu.:1.000
## Median : 68.50 Median :1 Median :0.000 Median :1.000
## Mean : 74.27 Mean :1 Mean :0.146 Mean :0.854
## 3rd Qu.:133.00 3rd Qu.:1 3rd Qu.:0.000 3rd Qu.:1.000
## Max. :183.00 Max. :1 Max. :1.000 Max. :1.000
## NA's :30
## ClientID
## Min. : 1097
## 1st Qu.:37856
## Median :54920
## Mean :47384
## 3rd Qu.:60429
## Max. :69247
##
## ExitReason
## Completed Program :178
## Criminal activity/destruction of property/violence : 1
## Death : 1
## Left for a housing opportunity before completing program: 4
## Non-Compliance with Program : 4
## Other : 4
## NA's : 34
## ExitDestination
## Rental by client, no ongoing housing subsidy :99
## Rental by client, VASH Subsidy :62
## Emergency Shelter, including hotel or motel paid for with shelter voucher :10
## Place not meant for habitation (e.g., a vehicle, an abandoned building, bus/train/subway station/airport or anywhere outside): 4
## Rental by client, other (non-VASH) ongoing housing subsidy : 4
## (Other) :17
## NA's :30
## monthExited
## Min. : 0.000
## 1st Qu.: 2.000
## Median : 4.000
## Mean : 4.845
## 3rd Qu.: 6.000
## Max. :12.000
##
What’s peaked my interest is to figure out the percentage of clients who exited the program in a “bad” way and see if there are any particular properties that stand out among them.
A high nonRental percentage is considered a bad thing, so we will typically be looking to find low spots in the graph because these will me high rental outcome rates (and obviously, low nonRental outcome rates).
NOTE: I’m going to consider anything that doesn’t result in renting a “bad” result (hereby known as nonRental). So the majority of this analysis will consist of attempting to understand this.
It looks like the highest nonRental outcome is Emergency Shelters, and after that uninhabitable living quarters.
It does appear as though there is a peak at 30 and 50-60 year olds in nonRental outcomes, but as you can see by the regression line, it isn’t anything too crazy.
This is explained by the fact that you have two main sets of age groups that you work with, namely a ~30 year old group and a 50-65 year old group.
In our early conversations and in my exploratory analysis I noticed a strong correlation between lack of completion and Female clients. Let’s see if this holds true for nonRental outcomes.
## GenderDesc nonRentals
## Female:30 Min. :0.0
## Male : 0 1st Qu.:0.0
## Median :0.0
## Mean :0.3
## 3rd Qu.:1.0
## Max. :1.0
## GenderDesc nonRentals
## Female: 0 Min. :0.0000
## Male :196 1st Qu.:0.0000
## Median :0.0000
## Mean :0.1122
## 3rd Qu.:0.0000
## Max. :1.0000
Females have a 70% chance of renting in the program, while males have an 88.8%.
Let’s investigate this further. Similar to the overall outcome the majority are leaving the program due to Emergency Shelters, this holds true for females as well.
## monthExited nonRentals
## 1 1 0.0
## 2 2 0.0
## 3 3 0.5
## 4 4 0.0
## 5 5 0.0
## 6 7 0.0
## 7 8 1.0
## 8 9 0.0
## 9 11 0.0
## 10 12 1.0
It looks like females have a 50% chance of ending up in a rental in March, the rest of the year seems successful (90%-100%).
There doesn’t appear to be any correlation between race and renting rate, except multi-racial and Native people have a 100% rental rate (this is due to lack of data in those racial descriptions).
## RaceDesc nonRentals
## 1 American Indian or Alaska Native 0.0000000
## 2 Black or African American 0.1538462
## 3 Multi-Racial 0.0000000
## 4 White 0.1666667
For my own curiousity I’d like to take a look at how date affects nonRental outcomes, particularly if the winter months affect rental rates.
Here we are plotting the sum of all nonRental outcomes for each month of the year. The blue line signifies the average value of nonRental outcomes, as you can see mid February to mid April are all above the average.
Here we plot the percentage of nonRental outcomes for each month of the year. Once again, blue signifies the average nonRental rate, and red the rate for each month.
It appears as though you have a much higher nonRental rate in August (25% success rate), and a much higher rental rate in May and June (98% and 95% success).
However, if we look at the total number of outcomes, we’ll see the reason for this poor success rate in August is lack of data in that month. Which means people aren’t leaving your program often in August, they are however leaving in January up until May, which could explain the above average outcomes we saw in the first graph.
If we remove this outlier and check again we’ll see there is a relatively strong correlation between rental rate success and the number of total outcomes in the data set. (notice the peak in the number of outcomes occurs in May, as does the most successful in rental rates). This means, the more people exiting your program, the higher your percentage of rental outcomes is (the higher the success rate).
If we plot success rate against the total number of outcomes we see this idea much more clearly. As the number of all outcomes increases your success rate increases.
Lets visualize the some broad views of your program, regardless of any sort of “good” or “bad” results.
The longer a client stays in your program, the less likely they are to end up in a rental, among lengths of stays that do not include exclusively 100% success days (ie there is some amount of failure on these lengths)
Here we can see the the number of clients peaked in Jan 2015, but is back on the decline by July 2015. It could be that you see an increase in clients during the fall to winter months, a longer dataset could verify this hypothesis.
It looks like over the course of the timeframe of this dataset, your clients are getting older.
Finally, lets check your success rate against time, and see if you’re getting better or worse.
NICE! it does look like you guys are getting better over time at getting people into rental units as you can see by the linear regression line. The nonRental outcomes are less likely as time progresses in this data set.
Congrats!
The next step I’d like to take with this data is to start predicting on it. Essentially the idea is, you type in the characteristics of your client (“White”, “male”,“22”,etc.) and I return a percentage of each possible outcome (eg, 20% chance of rental, 10% chance of Emergency shelter, etc.).