Dallas OpenData distributes animal shelter records in separate datasets for each fiscal year from 2015 through 2017 (2017 gets daily updates in 2017):
Dallas OpenData is powered by Socrata and its Open Data Network that supports Socrata Open Data API (SODA API). Package RSocrata supports SODA API and variety of data sources including public Dallas OpenData without neither access credentials nor API token:
data15.source = read.socrata(url = "https://www.dallasopendata.com/resource/8pn8-24ku.csv")
data16.source = read.socrata(url = "https://www.dallasopendata.com/resource/4qfv-27du.csv")
data17.source = read.socrata(url = "https://www.dallasopendata.com/resource/8849-mzxh.csv")
File format slightly changed after 2016 so we process 2015 and 2016 first and then 2017 into two separate data frames:
# 2015 and 2016
data15n16 = bind_rows(data15.source, data16.source) %>%
rename(months=month) %>%
filter(!is.na(outcome_date) | intake_total != 1) %>%
select(-kennel_number, -kennel_status, -tag_type, -activity_number,
-source_id, -staff_id, -hold_request,
-receipt_number, -impound_number, -service_request_number,
-intake_total) %>%
separate(intake_time, c("Intake.Fake.Date","Intake.Real.Time"), sep=10, remove=FALSE) %>%
separate(outcome_time, c("Outcome.Fake.Date","Outcome.Real.Time"), sep=10, remove=FALSE) %>%
mutate(intake_ts = ymd_hms(paste(intake_date, Intake.Real.Time), tz="America/Chicago"),
outcome_ts = ymd_hms(paste(outcome_date, Outcome.Real.Time), tz="America/Chicago")) %>%
select(-Intake.Real.Time, -Intake.Fake.Date, -Outcome.Real.Time, -Outcome.Fake.Date,
-intake_time, -intake_date, -outcome_time, -outcome_date)
## Warning: All formats failed to parse. No formats found.
## Warning: All formats failed to parse. No formats found.
# format in 2017 is slightly different
data17 = filter(data17.source, !is.na(outcome_date)) %>%
select(-kennel_number, -kennel_status, -tag_type, -activity_number,
-source_id, -staff_id, -hold_request,
-receipt_number, -impound_number, -service_request_number) %>%
separate(intake_time, c("Intake.Real.Time", "Intake.Time.Seconds"), sep=8, remove=FALSE) %>%
separate(outcome_time, c("Outcome.Real.Time", "Outcome.Time.Seconds"), sep=8, remove=FALSE) %>%
mutate(intake_ts = ymd_hms(paste(intake_date, Intake.Real.Time), tz="America/Chicago"),
outcome_ts = ymd_hms(paste(outcome_date, Outcome.Real.Time), tz="America/Chicago")) %>%
select(-Intake.Real.Time, -Intake.Time.Seconds, -Outcome.Real.Time, -Outcome.Time.Seconds,
-intake_time, -intake_date, -outcome_time, -outcome_date)
## Warning: All formats failed to parse. No formats found.
## Warning: All formats failed to parse. No formats found.
For each the following actions took place:
Next we combine three years of data (2 data frames) into single data frame (cat and dog animals only):
# combine all years into single dataset
model.data.all = bind_rows(data15n16, data17) %>%
mutate(
animal = factor(animal_type),
outcome = factor(outcome_type),
breed = factor(animal_breed),
tract = factor(census_tract),
district = factor(council_district),
type = factor(intake_type),
subtype = factor(intake_subtype),
condition_in = factor(intake_condition),
condition_out = factor(outcome_condition),
chip = factor(chip_status),
origin = factor(animal_origin))
Which animal types and how many are treated by Dallas shelters?
Some records occur for the same animal in sequence and in short time intervals. In such case We assume such records correspond to single shelter stay so they need additional processing. The rule of thumb is if 2 records for the same animal (animal_id) are within 30 days from each other then they are assigned phases that represent the same shelter stay:
# find and assign phases to consecutive records for the same stay:
model.data.dogscats = group_by(model.data.dogscats, animal_id) %>%
arrange(intake_ts) %>%
mutate(n=n(),
phase = row_number(),
lag_time = intake_ts - lag(outcome_ts)) %>%
ungroup() %>%
arrange(animal_id, phase) %>%
mutate(phase_group = ifelse(lag_time <= 30, lag(phase), phase),
phase_group = ifelse(lag_time <= 30, lag(phase_group), phase),
phase_group = ifelse(is.na(phase_group), 1, phase_group),
groupid = paste(animal, animal_id, phase_group, sep="-"))
The code above is crude attempt at sequential pattern processing like SQL function npath in Teradata Aster.
Having identified records for the same stay we can compact them into single records:
# combine mutlple records for the same animal stay into single record
model.data.dogscats.compact = model.data.dogscats %>%
group_by(groupid) %>%
summarise(
intake_ts = min(intake_ts),
outcome_ts = max(outcome_ts),
interval = interval(min(intake_ts), max(outcome_ts)),
duration = as.duration(interval),
hours = as.numeric(duration)/(60.*60.),
animal=first(animal),
outcome = last(outcome),
breed = first(breed),
tract = first(tract),
district = first(district),
type = first(type),
subtype = first(subtype),
condition_in = first(condition_in),
condition_out = last(condition_out),
chip = first(chip),
origin = first(origin),
phase = min(phase)) %>%
mutate(monthly_in = floor_date(intake_ts, 'month'),
monthly_out = floor_date(outcome_ts, 'month'))
Sankey diagram will show how dogs addmitted to shelters transition from intake types to outcomes with arrows of variable width corresponding to number of records. Essentially, this diagram shows probabilities: absolute probabilities of dog admitted by intake types and discharged by outcomes, and conditional probabilities of dog discharged with outcomes given its admission with an intake type:
Next, we would like to correlate intake types with outcomes. For that we devise monthly trends for both and correlate resulting time series - the total of 12 combinations of 3 intake types and 4 outcommes:
Now correlation coefficients get replaced with actual times series used to compute them leaving matrix structure intact:
Kernel density estimate is a smoothered version of the histogram above.
Let’s revisit Sankey diagrams to show average time of stay depending on admission and outcome. In the Sankeys below transition arrow widths correspond to average length of stay: the wider the longer and the narrower the shorter:
We enhance the data with new attributes first:
survival.data = model.data.dogscats.compact %>%
filter(hours < 2500) %>%
mutate(event = ifelse(outcome %in% c('DIED','EUTHANIZED','MISSING'), 1, 0),
unhealthy_in = ifelse(condition_in == 'HEALTHY', 'NO', 'YES'),
contagious_in = ifelse(is.na(str_match(condition_in, ' CONTAGIOUS$')[,1]), 'NO','YES'),
treatable_in = ifelse(is.na(str_match(condition_in, '^TREATABLE ')[,1]), 'UNTREATABLE',
ifelse(is.na(str_match(condition_in, 'MANAGEABLE')[,1]),
'REHABILITABLE', 'MANAGEABLE')))
survival.data.dogs = survival.data %>%
filter(
animal == 'Dogs',
outcome %in% c('ADOPTION','DIED','EUTHANIZED','FOSTER','FOUND REPORT',
'LOST REPORT','RETURNED TO OWNER','TRANSFER'),
!type %in% c('FOSTER','FOUND REPORT','LOST REPORT','TRANSFER'),
!subtype %in% c('EUTHANASIA REQUESTED','- DEAD ON ARRIVAL',
'QUARANTINE - DEAD ON ARRIVAL','KEEP SAFE - DEAD ON ARRIVAL',
'CRUELT - DEAD ON ARRIVAL','DANGER - DEAD ON ARRIVAL'),
origin %in% c('FIELD','OVER THE COUNTER','SWEEP','NIGHT DROP'))
Chances of dying in shelter by various categories available at intake time:
animal_counts = survival.data %>%
group_by(animal) %>%
summarise(cnt = n(), survived = sum(1-event), dead = sum(event)) %>%
mutate(rank = row_number(desc(cnt)),
odds = dead/survived,
notsurv_chance = dead/cnt) %>%
mutate(name = paste("Animal:", toupper(animal)))
intake_type_counts = survival.data.dogs %>%
group_by(type) %>%
summarise(cnt = n(), survived = sum(1-event), dead = sum(event)) %>%
mutate(rank = row_number(desc(cnt)),
odds = dead/survived,
notsurv_chance = dead/cnt) %>%
mutate(name = paste("Intake Type:", type))
origin_counts = survival.data.dogs %>%
filter(origin %in% c("FIELD","OVER THE COUNTER","SWEEP")) %>%
group_by(origin) %>%
summarise(cnt = n(), survived = sum(1-event), dead = sum(event)) %>%
mutate(rank = row_number(desc(cnt)),
odds = dead/survived,
notsurv_chance = dead/cnt) %>%
mutate(name = paste("Origin:", origin))
condition_counts = survival.data.dogs %>%
group_by(condition_in) %>%
summarise(cnt = n(), survived = sum(1-event), dead = sum(event)) %>%
mutate(rank = row_number(desc(cnt)),
odds = dead/survived,
notsurv_chance = dead/cnt) %>%
mutate(name = paste("Condition:", condition_in))
data = rbind(animal_counts[-1], intake_type_counts[-1], origin_counts[-1],
condition_counts[-1])
N = dim(data)[[1]]
data$name = factor(data$name, levels = data$name[order(data$notsurv_chance)])
data$text_size = (10 + data$notsurv_chance/max(data$notsurv_chance))
data$text_pos = data$notsurv_chance/3
ggplot(data) +
geom_bar(aes(name, notsurv_chance, fill=name), stat = 'identity', alpha=0.7) +
geom_text(aes(name, text_pos, label=name, size=text_size), angle=0, family='Palatino', fontface='bold', hjust=0) +
scale_size(range=c(3,5)) +
scale_fill_manual(values = rev(colorRampPalette(wsj_pal(palette = "colors6")(4))(N))) +
labs(title=paste("Chances of Dying in Shelter by Intake Categories"),
subtitle="Dallas Animal Services FY 2015-2018. Only for dogs unless cats specified",
caption=caption, x=NULL, y="Expected Probability") +
coord_flip() +
theme_minimal(base_size = 14, base_family = 'serif') +
theme(legend.position = "none",
# axis.text.x = element_blank(),
axis.text.y = element_blank(),
panel.grid.major.y = element_blank(),
strip.text = element_text(size = 14, face = 'bold'),
plot.caption = element_text(size = 10, hjust=1))
How many breeds are registered?
breed_counts = survival.data.dogs %>%
group_by(breed) %>%
summarise(cnt = n(), survived = sum(1-event), dead = sum(event)) %>%
mutate(rank = row_number(desc(cnt)),
odds = dead/survived,
notsurv_chance = dead/cnt) %>%
arrange(desc(rank))
data = breed_counts[breed_counts$cnt >= 100,]
N = dim(data)[[1]]
data$breed = factor(data$breed, levels = data$breed[order(-data$rank)])
data$text_size = (10 + data$cnt/max(data$cnt))
data$text_pos = ifelse(data$cnt>5000, data$cnt/2, data$cnt + 100)
ggplot(data) +
geom_bar(aes(breed, cnt, fill=breed), stat = 'identity', alpha=0.7) +
geom_text(aes(breed, text_pos, label=breed, size=text_size), angle=0, family='Palatino', fontface='bold', hjust=0) +
scale_size(range=c(2,4)) +
scale_fill_manual(values = rev(colorRampPalette(wsj_pal(palette = "colors6")(6))(N))) +
labs(title=paste("Dog Breeds Admitted to Shelters"),
subtitle="Dallas Animal Services FY 2015-2017. Only breeds with 100+ admissions",
caption=caption, x=NULL, y="Admissions") +
coord_flip() +
# theme_tufte(base_size = 14, base_family = 'serif', ticks=FALSE) +
theme_minimal(base_size = 14, base_family = 'serif') +
theme(legend.position = "none",
# axis.text.x = element_blank(),
axis.text.y = element_blank(),
panel.grid.major.y = element_blank(),
strip.text = element_text(size = 14, face = 'bold'),
plot.caption = element_text(size = 10, hjust=1))
Unfortunate for Pit bull terrier group they have the worst chances of survival baring top breed of Chow chow. Survival chances also appear to correlate with dog sizes but for the lack of this information we can’t substantiate it with data.
data = breed_counts[breed_counts$cnt >= 100,]
N = dim(data)[[1]]
data$breed = factor(data$breed, levels = data$breed[order(data$notsurv_chance)])
data$text_size = (10 + data$notsurv_chance/max(data$notsurv_chance))
data$text_pos = data$notsurv_chance/2 # ifelse(data$cnt>5000, data$cnt/2, data$cnt + 100)
ggplot(data) +
geom_bar(aes(breed, notsurv_chance, fill=breed), stat = 'identity', alpha=0.7) +
geom_text(aes(breed, text_pos, label=breed, size=text_size), angle=0, family='Palatino', fontface='bold', hjust=0) +
scale_size(range=c(2,4)) +
scale_fill_manual(values = rev(colorRampPalette(wsj_pal(palette = "colors6")(4))(N))) +
labs(title=paste("Chances of Dying in Shelter by Dog Breed"),
subtitle="Dallas Animal Services FY 2015-2018. Only breeds with 100+ admissions",
caption=caption, x=NULL, y="Expected Probability") +
coord_flip() +
# theme_tufte(base_size = 14, base_family = 'serif', ticks=FALSE) +
theme_minimal(base_size = 14, base_family = 'serif') +
theme(legend.position = "none",
# axis.text.x = element_blank(),
axis.text.y = element_blank(),
panel.grid.major.y = element_blank(),
strip.text = element_text(size = 14, face = 'bold'),
plot.caption = element_text(size = 10, hjust=1))
km.animal = survfit(Surv(hours, event) ~ animal, data=survival.data)
p1 = createSurvivalCurve(km.animal, "Animal", nchar("animal")+2,
scale_color = scale_color_solarized(name="Animal"))
p2 = createSurvivalHistogram(survival.data, "animal", caption=caption,
scale_fill = scale_fill_solarized(name="Animal"))
grid.arrange(p1, p2, heights=c(3.5/5, 1.5/5), ncol=1)
Let’s zoom in into first 2 weeks to see how critical those first days at shelter are:
p1 + xlim(0,15) +
labs(title="Cats and Dogs: First 2 Weeks",
caption=caption) +
theme(legend.position = "bottom",
plot.caption = element_text(size = 8, hjust = 0))
## Warning: Removed 9756 rows containing missing values (geom_path).
Again, for consistency and plausability of analysis will focus on survival of dogs from here on.
km.type = survfit(Surv(hours, event) ~ type, data=survival.data.dogs)
p1 = createSurvivalCurve(km.type, "Dog Intake Type", nchar("type")+2,
title="Dogs by Shelter Intake Type",
legend.position = "none")
p2 = createSurvivalHistogram(survival.data.dogs, "type",
subtitle="Accepted to Dallas Shelters Totals by Intake Type and Outcome",
legend.position="right", caption=caption)
grid.arrange(p1, p2, heights=c(3.0/5, 2.0/5), ncol=1)
data = survival.data.dogs[survival.data.dogs$origin %in% c("FIELD","OVER THE COUNTER","SWEEP"),]
km.origin = survfit(Surv(hours, event) ~ origin, data=data)
p1 = createSurvivalCurve(km.origin, "Dog Origin in Shelter", nchar("origin")+2,
title="Dogs by Origin In Shelter", legend.position = "none")
p2 = createSurvivalHistogram(data, "origin",
subtitle="Accepted to Dallas Shelters Totals by Origin and Outcome",
legend.position="right", caption = caption, legend.title = "Origin")
grid.arrange(p1, p2, heights=c(3.0/5, 2.0/5), ncol=1)
km.healthy = survfit(Surv(hours, event) ~ unhealthy_in, data=survival.data.dogs)
p1 = createSurvivalCurve(km.healthy, "Dog Health In", nchar("unhealthy_in")+2,
title="Dogs by Unhealthy Condition In", legend.position = "none")
p2 = createSurvivalHistogram(survival.data.dogs, "unhealthy_in",
subtitle="Accepted to Dallas Shelters Totals by Unhealthy Condition In and Outcome",
legend.position="right", caption = caption, legend.title = "Unhealthy")
grid.arrange(p1, p2, heights=c(3.0/5, 2.0/5), ncol=1)
We select only conditions represented by at least 1000 dogs.
data = survival.data.dogs %>%
group_by(condition_in) %>%
filter(n() >= 1000) %>%
droplevels()
km.condition = survfit(Surv(hours, event) ~ condition_in, data=data)
p1 = createSurvivalCurve(km.condition, "Dog Condition In", nchar("condition_in")+2,
title="Dogs by Shelter Condition In", legend.position = "none")
p2 = createSurvivalHistogram(data, "condition_in", legend.position = "right",
subtitle="Accepted to Dallas Shelters Totals by Condition In and Outcome",
caption = caption, legend.title = "Condition")
grid.arrange(p1, p2, heights=c(3.0/5, 2.0/5), ncol=1)
Original attribute condition_in carries too much information for meaningful analysis. It can be broken down into several attributes:
survival.data.dogs.unhealthy = survival.data.dogs %>%
filter(unhealthy_in == 'YES')
km.contagious = survfit(Surv(hours, event) ~ contagious_in, data=survival.data.dogs.unhealthy)
p1 = createSurvivalCurve(km.contagious, "Unhealthy Dogs Contagious Condition In", nchar("contagious_in")+2,
title="Unhealthy Dogs by Contagious Condition In", legend.position = "none")
p2 = createSurvivalHistogram(survival.data.dogs.unhealthy, "contagious_in", legend.position = "right",
subtitle="Accepted to Dallas Shelters Unhealthy Dog Totals by Contagious Condition In and Outcome",
caption = caption, legend.title = "Contagious")
grid.arrange(p1, p2, heights=c(3.0/5, 2.0/5), ncol=1)
km.contagious = survfit(Surv(hours, event) ~ treatable_in, data=survival.data.dogs.unhealthy)
p1 = createSurvivalCurve(km.contagious, "Unhealthy Dogs Contagious Condition In", nchar("treatable_in")+2,
title="Unhealthy Dogs by Treatable Condition In", legend.position = "none")
p2 = createSurvivalHistogram(survival.data.dogs.unhealthy, "treatable_in", legend.position = "right",
subtitle="Accepted to Dallas Shelters Unhealthy Dog Totals by Treatable Condition In and Outcome",
caption = caption, legend.title = "Treatable")
grid.arrange(p1, p2, heights=c(3.0/5, 2.0/5), ncol=1)
km.scan = survfit(Surv(hours, event) ~ chip, data=survival.data.dogs)
p1 = createSurvivalCurve(km.scan, "Chip Scan Result", nchar("chip")+2,
title="Dogs by Chip Scan Result", legend.position = "none")
p2 = createSurvivalHistogram(survival.data.dogs, "chip", legend.position = "right",
subtitle = "Accepted to Dallas Shelters Totals by Chip Scan Result and Outcome",
caption = caption, legend.title = "Chip Scan Result")
grid.arrange(p1, p2, heights=c(3.0/5, 2.0/5), ncol=1)
We include top 5 breeds that represent about 63% of all dogs accepted to shelters.
N = 5
data = survival.data.dogs %>%
filter(breed %in% breed_counts[breed_counts$rank <= N,]$breed)
# data$breed = factor(data$breed, levels = breed_counts$breed[order(-breed_counts$rank)])
data$breed_str = as.character(data$breed)
km.breed = survfit(Surv(hours, event) ~ breed_str, data=data)
p1 = createSurvivalCurve(km.breed, "Breed", nchar("chip")+2,
title=paste("Top",N,"Dog Breeds by Admission"), legend.position = "none",
scale_color = scale_color_ptol(name="Breed"))
p2 = createSurvivalHistogram(data, "breed_str", legend.position = "right",
subtitle = "Accepted to Dallas Shelters Totals by Breed and Outcome",
caption = caption, legend.title = "breed",
scale_fill = scale_fill_ptol(name="Breed", guide = guide_legend(reverse = TRUE)),
caption_hjust = 1)
grid.arrange(p1, p2, heights=c(3.0/5, 2.0/5), ncol=1)
# Survival Curves for Pit Bull Group
Pit Bull Terrier group includes Pit Bull, American Staff, Am Pit Bull Ter, and Staffordshire breeds:
data = survival.data.dogs %>%
filter(breed %in% c('PIT BULL','AMERICAN STAFF','AM PIT BULL TER','STAFFORDSHIRE'))
data$breed_str = as.character(data$breed)
km.breed = survfit(Surv(hours, event) ~ breed_str, data=data)
p1 = createSurvivalCurve(km.breed, "Breed", nchar("chip")+2,
title="Pit Bull Terrier Group", legend.position = "none",
scale_color = scale_color_ptol())
p2 = createSurvivalHistogram(data, "breed_str", legend.position = "right",
subtitle = "Accepted to Dallas Shelters Totals by Breed and Outcome",
caption = caption, legend.title = "Pit Bulls",
scale_fill = scale_fill_ptol(name="Breed"))
grid.arrange(p1, p2, heights=c(3.0/5, 2.0/5), ncol=1)
data = survival.data.dogs %>%
filter(breed %in% breed_counts[breed_counts$rank <=8 & breed_counts$rank > 4,]$breed)
km.breed = survfit(Surv(hours, event) ~ breed, data=data)
p1 = createSurvivalCurve(km.breed, "Breed", nchar("chip")+2,
title="Dogs by Breed", legend.position = "none",
scale_color = scale_color_ptol(name="Breed"))
p2 = createSurvivalHistogram(data, "breed", legend.position = "right",
subtitle = "Accepted to Dallas Shelters Totals by Breed and Outcome",
caption = caption, legend.title = "breed",
scale_fill = scale_fill_ptol(name="Breed"))
grid.arrange(p1, p2, heights=c(3.0/5, 2.0/5), ncol=1)
coxph.surv.mod = coxph(Surv(hours, event) ~ type + chip + origin + unhealthy_in,
data = survival.data.dogs,
control = coxph.control(iter.max=100))
summary(coxph.surv.mod)
## Call:
## coxph(formula = Surv(hours, event) ~ type + chip + origin + unhealthy_in,
## data = survival.data.dogs, control = coxph.control(iter.max = 100))
##
## n= 64036, number of events= 21427
##
## coef exp(coef) se(coef) z Pr(>|z|)
## typeFOSTER NA NA 0.00000 NA NA
## typeFOUND REPORT NA NA 0.00000 NA NA
## typeLOST REPORT NA NA 0.00000 NA NA
## typeOWNER SURRENDER 1.12513 3.08061 0.03418 32.915 < 2e-16 ***
## typeSTRAY 0.53881 1.71397 0.03188 16.903 < 2e-16 ***
## typeTRANSFER NA NA 0.00000 NA NA
## typeWILDLIFE NA NA 0.00000 NA NA
## chipSCAN NO CHIP 0.53859 1.71359 0.02288 23.544 < 2e-16 ***
## chipUNABLE TO SCAN 0.99733 2.71104 0.04332 23.023 < 2e-16 ***
## originFIELD 0.54156 1.71869 0.04005 13.524 < 2e-16 ***
## originNIGHT DROP 2.54351 12.72431 0.57879 4.395 1.11e-05 ***
## originOVER THE COUNTER 0.24140 1.27304 0.04090 5.903 3.58e-09 ***
## originSWEEP NA NA 0.00000 NA NA
## unhealthy_inYES 2.38401 10.84830 0.07373 32.333 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## typeFOSTER NA NA NA NA
## typeFOUND REPORT NA NA NA NA
## typeLOST REPORT NA NA NA NA
## typeOWNER SURRENDER 3.081 0.32461 2.881 3.294
## typeSTRAY 1.714 0.58344 1.610 1.824
## typeTRANSFER NA NA NA NA
## typeWILDLIFE NA NA NA NA
## chipSCAN NO CHIP 1.714 0.58357 1.638 1.792
## chipUNABLE TO SCAN 2.711 0.36886 2.490 2.951
## originFIELD 1.719 0.58184 1.589 1.859
## originNIGHT DROP 12.724 0.07859 4.092 39.564
## originOVER THE COUNTER 1.273 0.78552 1.175 1.379
## originSWEEP NA NA NA NA
## unhealthy_inYES 10.848 0.09218 9.389 12.535
##
## Concordance= 0.659 (se = 0.002 )
## Rsquare= 0.075 (max possible= 0.999 )
## Likelihood ratio test= 4997 on 8 df, p=0
## Wald test = 3445 on 8 df, p=0
## Score (logrank) test = 4043 on 8 df, p=0
coxph.surv.unhealthy.mod = coxph(Surv(hours, event) ~ type + chip + origin + contagious_in + treatable_in,
data = survival.data.dogs.unhealthy,
control = coxph.control(iter.max=100))
summary(coxph.surv.unhealthy.mod)
## Call:
## coxph(formula = Surv(hours, event) ~ type + chip + origin + contagious_in +
## treatable_in, data = survival.data.dogs.unhealthy, control = coxph.control(iter.max = 100))
##
## n= 59000, number of events= 21241
##
## coef exp(coef) se(coef) z Pr(>|z|)
## typeFOSTER NA NA 0.00000 NA NA
## typeFOUND REPORT NA NA 0.00000 NA NA
## typeLOST REPORT NA NA 0.00000 NA NA
## typeOWNER SURRENDER 1.72327 5.60283 0.03492 49.350 < 2e-16 ***
## typeSTRAY 1.15527 3.17488 0.03303 34.971 < 2e-16 ***
## typeTRANSFER NA NA 0.00000 NA NA
## typeWILDLIFE NA NA 0.00000 NA NA
## chipSCAN NO CHIP 0.51930 1.68084 0.02307 22.506 < 2e-16 ***
## chipUNABLE TO SCAN 0.52831 1.69606 0.04379 12.065 < 2e-16 ***
## originFIELD 0.42278 1.52620 0.04032 10.484 < 2e-16 ***
## originNIGHT DROP 2.81869 16.75487 0.57877 4.870 1.12e-06 ***
## originOVER THE COUNTER 0.17763 1.19438 0.04094 4.339 1.43e-05 ***
## originSWEEP NA NA 0.00000 NA NA
## contagious_inYES 0.68940 1.99252 0.02757 25.010 < 2e-16 ***
## treatable_inREHABILITABLE -0.48311 0.61686 0.01810 -26.690 < 2e-16 ***
## treatable_inUNTREATABLE 1.00604 2.73474 0.02144 46.918 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## typeFOSTER NA NA NA NA
## typeFOUND REPORT NA NA NA NA
## typeLOST REPORT NA NA NA NA
## typeOWNER SURRENDER 5.6028 0.17848 5.2322 5.9997
## typeSTRAY 3.1749 0.31497 2.9758 3.3872
## typeTRANSFER NA NA NA NA
## typeWILDLIFE NA NA NA NA
## chipSCAN NO CHIP 1.6808 0.59494 1.6065 1.7586
## chipUNABLE TO SCAN 1.6961 0.58960 1.5566 1.8481
## originFIELD 1.5262 0.65522 1.4102 1.6517
## originNIGHT DROP 16.7549 0.05968 5.3888 52.0942
## originOVER THE COUNTER 1.1944 0.83725 1.1023 1.2942
## originSWEEP NA NA NA NA
## contagious_inYES 1.9925 0.50188 1.8877 2.1031
## treatable_inREHABILITABLE 0.6169 1.62110 0.5954 0.6391
## treatable_inUNTREATABLE 2.7347 0.36567 2.6222 2.8521
##
## Concordance= 0.713 (se = 0.002 )
## Rsquare= 0.151 (max possible= 0.999 )
## Likelihood ratio test= 9686 on 10 df, p=0
## Wald test = 11614 on 10 df, p=0
## Score (logrank) test = 12622 on 10 df, p=0