Load

Load R

######Libraries#############################
suppressWarnings({
require(Amelia)      
require(broom)
require(car)         
require(caret)
require(corrplot)
require(dplyr)     
require(e1071)
require(fastDummies)
require(ggplot2)     
require(ggcorrplot)  
require(ggExtra)   
require(glmpath) 
require(grid)        
require(gridExtra)   
require(kableExtra)  
require(leaflet)
require(leaflet.extras)
require(leaps)  
require(maptools)    
require(MASS)    
require(imbalance)
require(mlpack)
require(neuralnet)
require(psych)       
require(raster)      
require(RColorBrewer)
require(ResourceSelection)
require(reticulate)
require(rgdal)       
require(rgeos)       
require(shiny)       
require(sf)    
require(sjPlot)
require(sp)          
require(tidyverse)   
})

## Loading required package: Amelia

## Loading required package: Rcpp

## ## 
## ## Amelia II: Multiple Imputation
## ## (Version 1.8.1, built: 2022-11-18)
## ## Copyright (C) 2005-2023 James Honaker, Gary King and Matthew Blackwell
## ## Refer to http://gking.harvard.edu/amelia/ for more information
## ##

## Loading required package: broom

## Loading required package: car

## Loading required package: carData

## Loading required package: caret

## Loading required package: ggplot2

## Loading required package: lattice

## Loading required package: corrplot

## corrplot 0.92 loaded

## Loading required package: dplyr

## 
## Attaching package: 'dplyr'

## The following object is masked from 'package:car':
## 
##     recode

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

## Loading required package: e1071

## Loading required package: fastDummies

## Thank you for using fastDummies!

## To acknowledge our work, please cite the package:

## Kaplan, J. & Schlegel, B. (2023). fastDummies: Fast Creation of Dummy (Binary) Columns and Rows from Categorical Variables. Version 1.7.1. URL: https://github.com/jacobkap/fastDummies, https://jacobkap.github.io/fastDummies/.

## Loading required package: ggcorrplot

## Loading required package: ggExtra

## Loading required package: glmpath

## Loading required package: survival

## 
## Attaching package: 'survival'

## The following object is masked from 'package:caret':
## 
##     cluster

## Loading required package: grid

## Loading required package: gridExtra

## 
## Attaching package: 'gridExtra'

## The following object is masked from 'package:dplyr':
## 
##     combine

## Loading required package: kableExtra

## 
## Attaching package: 'kableExtra'

## The following object is masked from 'package:dplyr':
## 
##     group_rows

## Loading required package: leaflet

## Loading required package: leaflet.extras

## Loading required package: leaps

## Loading required package: maptools

## Loading required package: sp

## The legacy packages maptools, rgdal, and rgeos, underpinning the sp package,
## which was just loaded, were retired in October 2023.
## Please refer to R-spatial evolution reports for details, especially
## https://r-spatial.org/r/2023/05/15/evolution4.html.
## It may be desirable to make the sf package available;
## package maintainers should consider adding sf to Suggests:.

## Please note that 'maptools' will be retired during October 2023,
## plan transition at your earliest convenience (see
## https://r-spatial.org/r/2023/05/15/evolution4.html and earlier blogs
## for guidance);some functionality will be moved to 'sp'.
##  Checking rgeos availability: TRUE

## 
## Attaching package: 'maptools'

## The following object is masked from 'package:car':
## 
##     pointLabel

## Loading required package: MASS

## 
## Attaching package: 'MASS'

## The following object is masked from 'package:dplyr':
## 
##     select

## Loading required package: imbalance

## Loading required package: mlpack

## 
## Attaching package: 'mlpack'

## The following object is masked from 'package:stats':
## 
##     kmeans

## The following object is masked from 'package:base':
## 
##     det

## Loading required package: neuralnet

## 
## Attaching package: 'neuralnet'

## The following object is masked from 'package:dplyr':
## 
##     compute

## Loading required package: psych

## 
## Attaching package: 'psych'

## The following object is masked from 'package:mlpack':
## 
##     pca

## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha

## The following object is masked from 'package:car':
## 
##     logit

## Loading required package: raster

## 
## Attaching package: 'raster'

## The following object is masked from 'package:MASS':
## 
##     select

## The following object is masked from 'package:dplyr':
## 
##     select

## Loading required package: RColorBrewer

## Loading required package: ResourceSelection

## ResourceSelection 0.3-6   2023-06-27

## Loading required package: reticulate

## Loading required package: rgdal

## Please note that rgdal will be retired during October 2023,
## plan transition to sf/stars/terra functions using GDAL and PROJ
## at your earliest convenience.
## See https://r-spatial.org/r/2023/05/15/evolution4.html and https://github.com/r-spatial/evolution
## rgdal: version: 1.6-7, (SVN revision 1203)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 3.6.2, released 2023/01/02
## Path to GDAL shared files: C:/Users/lfult/AppData/Local/R/win-library/4.2/rgdal/gdal
##  GDAL does not use iconv for recoding strings.
## GDAL binary built with GEOS: TRUE 
## Loaded PROJ runtime: Rel. 9.2.0, March 1st, 2023, [PJ_VERSION: 920]
## Path to PROJ shared files: C:/Users/lfult/AppData/Local/R/win-library/4.2/rgdal/proj
## PROJ CDN enabled: FALSE
## Linking to sp version:2.1-0
## To mute warnings of possible GDAL/OSR exportToProj4() degradation,
## use options("rgdal_show_exportToProj4_warnings"="none") before loading sp or rgdal.

## Loading required package: rgeos

## rgeos version: 0.6-4, (SVN revision 699)
##  GEOS runtime version: 3.11.2-CAPI-1.17.2 
##  Please note that rgeos will be retired during October 2023,
## plan transition to sf or terra functions using GEOS at your earliest convenience.
## See https://r-spatial.org/r/2023/05/15/evolution4.html for details.
##  GEOS using OverlayNG
##  Linking to sp version: 2.1-0 
##  Polygon checking: TRUE

## 
## Attaching package: 'rgeos'

## The following object is masked from 'package:dplyr':
## 
##     symdiff

## Loading required package: shiny

## 
## Attaching package: 'shiny'

## The following object is masked from 'package:ggExtra':
## 
##     runExample

## Loading required package: sf

## Linking to GEOS 3.11.2, GDAL 3.6.2, PROJ 9.2.0; sf_use_s2() is TRUE

## Loading required package: sjPlot

## #refugeeswelcome

## Loading required package: tidyverse

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ lubridate 1.9.3     ✔ tibble    3.2.1
## ✔ purrr     1.0.1     ✔ tidyr     1.3.0
## ✔ readr     2.1.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ psych::%+%()             masks ggplot2::%+%()
## ✖ psych::alpha()           masks ggplot2::alpha()
## ✖ gridExtra::combine()     masks dplyr::combine()
## ✖ neuralnet::compute()     masks dplyr::compute()
## ✖ tidyr::extract()         masks raster::extract()
## ✖ dplyr::filter()          masks stats::filter()
## ✖ kableExtra::group_rows() masks dplyr::group_rows()
## ✖ dplyr::lag()             masks stats::lag()
## ✖ purrr::lift()            masks caret::lift()
## ✖ dplyr::recode()          masks car::recode()
## ✖ raster::select()         masks MASS::select(), dplyr::select()
## ✖ purrr::some()            masks car::some()
## ✖ rgeos::symdiff()         masks dplyr::symdiff()
## ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors

############################################

Load Python


#Basic Operating System Stuff
import os
import gc #garbage collector
import random #random seed generator

#Basic dataframe, array, and math stuff
import pandas as pd #data frame
import math #math functions
import numpy as np    #numerical package

#Scikit learn
from math import sqrt
import sklearn as sk  #scikit learn
import sklearn.linear_model 
from sklearn.linear_model import LogisticRegression as LR
from sklearn.kernel_ridge import KernelRidge
from sklearn.utils import resample #sampling
from sklearn.model_selection import train_test_split as tts, KFold #train test split
from sklearn.decomposition import PCA #principal components
from sklearn.metrics import classification_report as CR,confusion_matrix, roc_curve
from sklearn.metrics import average_precision_score #for 2-class model
from sklearn.metrics import PrecisionRecallDisplay as PRD
from sklearn.metrics import ConfusionMatrixDisplay as CMD
from sklearn.preprocessing import MinMaxScaler as MMS, StandardScaler as SS, PolynomialFeatures as poly # used for variable scaling data
from sklearn.tree import DecisionTreeClassifier as Tree
from sklearn.ensemble import RandomForestClassifier as RFC, ExtraTreesClassifier as ETC
from sklearn.ensemble import GradientBoostingClassifier as GBC, AdaBoostClassifier as ABC
from sklearn.gaussian_process import GaussianProcessClassifier as GPC  
from sklearn.svm import LinearSVC, SVC
from sklearn.linear_model import SGDClassifier as SGD
from sklearn.naive_bayes import BernoulliNB as NB
from sklearn.neighbors import KNeighborsClassifier as KNN
from sklearn.pipeline import make_pipeline
from sklearn.neural_network import MLPClassifier as MLP
from sklearn.linear_model import Perceptron
from sklearn.tree import plot_tree as treeplot, export_graphviz

from scipy import misc, stats as st #Lots of stuff here
from scipy.stats import norm
import itertools

from statsmodels.genmod.generalized_linear_model import GLM
from statsmodels.genmod import families
import statsmodels.stats.tests.test_influence
import statsmodels.formula.api as smf
import statsmodels.stats.api as sms
import matplotlib.pyplot as plt
from statsmodels.compat import lzip
import statsmodels.api as sm

#Graphing
import seaborn as sns
from IPython.display import SVG #Same here
import matplotlib.pyplot as plt #plotting
import matplotlib #image save
from matplotlib.pyplot import imshow #Show images
from PIL import Image #Another image utility
import seaborn as sns


os.chdir('C:/Users/lfult/Desktop/Breach')
##############################################################################################################################

Load Functions

myprint=function(x){x%>%kbl()%>%kable_classic(html_font = "Cambria")}
mycite=function(x){citation(x)}

Load Geography

setwd("C:/Users/lfult/Desktop/Breach")
myshape=shapefile("cb_2018_us_county_20m.shp") #shape file

Load Flat Files

countydata=read.csv("GIS.csv",fileEncoding="UTF-8-BOM", stringsAsFactors = T)
missmap(countydata, x.cex=.6)

countydata$M=countydata$FIPS
countydata$Pop2020=NULL
countydata$CensusMean=NULL

Merge Files

myshape$M=as.numeric(myshape$GEOID)
counties=sp::merge(myshape, countydata, by="M",all.x=F)
counties=counties[complete.cases(counties@data),]
mydata=counties@data
#write.csv(counties@data,'merged.csv', row.names = FALSE)
missmap(mydata, col=c('red','blue'))

GIS

temp=counties
qpal<-colorBin(c("green", "orange", "red"), 0:10); qpal2<-colorNumeric("Reds", 0:11)

leaf=leaflet(counties) %>%
  addTiles(group = "OSM (default)") %>%
  addMapPane("borders", zIndex = 410) %>%
  
  #Base Diagrams
  addPolylines(data = temp,color = "black",
               opacity = 1, weight = 1, group="Borders", options = pathOptions(pane="borders"))%>%
  fitBounds(-124.8, -66.9, 24.4,49.4) %>% setView(-98.6, 39.83, zoom = 4)%>%
  
  addPolygons(stroke = FALSE,fillOpacity = 1, smoothFactor = 0.2, 
              color=~qpal(temp@data$y), 
              popup = paste("County: ", temp@data$NAME, "<br>", 
                    "Count of Breaches: ", temp@data$y, "<br>",
                    "Breaches per 100K: ", round(temp@data$BreachesPer100K),3), 
              group='Sum of Breaches')%>%
  
  addPolygons(stroke = FALSE,fillOpacity = 1, smoothFactor = 0.2, 
              color=~qpal2(temp@data$BreachesPer100K), 
              popup = paste("County: ", temp@data$NAME, "<br>", 
                    "Count of Breaches: ", temp@data$y, "<br>",
                    "Breaches per 100K: ",temp@data$BreachesPer100K), 
              group='Breaches Per 100K')%>%
  
  addLegend(data=temp, 
            "bottomleft", opacity=1, pal = qpal, 
            values = ~temp@data$y,
            title = "Sum of Breaches")%>%
  
  addLegend(data=temp, 
            "bottomright", opacity=1, pal = qpal2, 
            values = ~temp@data$BreachesPer100K,
            title = "Breaches Per 100K")%>%

  addLayersControl(
    baseGroups = c("Sum of Breaches", "Breaches Per 100K"),
    overlayGroups = c("Borders"), options = layersControlOptions(collapsed = TRUE))

leaf

rm(temp)

Drop Variables

mydata[, c(1:11, 13)]=NULL
colnames(mydata)

##  [1] "y"                "OpProfitMargin"   "CapitalExp"       "OpIncome"        
##  [5] "AR"               "BadDebt"          "BedUtil"          "OutpatientVisits"
##  [9] "ALOS"             "PopDensity"       "Native"           "Hispanic"        
## [13] "Black"            "Asian"            "Prop65"           "UE2019"          
## [17] "Poverty"          "AcuteBeds"        "CMI"              "PedTrauma"       
## [21] "MedCenter"

Dichotomize DV

mydata$y[mydata$y>0]=1

Engineer Variables

mydata$ALOS[mydata$ALOS==70]=7 # correct known outlier
mydata$BedFreqSev=mydata$AcuteBeds*mydata$BedUtil*mydata$CMI #calculate beds used / beds available x severity x average duration
mydata$BedUtil=mydata$CMI=mydata$AcuteBeds=NULL #drop variables used
mydata$PedTrauma=as.factor(mydata$PedTrauma)
mydata$MedCenter=as.factor(mydata$MedCenter)

Reorder

mydata=mydata[, c('y','Native', 'Hispanic', 'Black', 'Asian', 'Prop65', 
                  'BedFreqSev','OutpatientVisits', 'ALOS',
                  'OpProfitMargin', 'CapitalExp', 'OpIncome', 'AR', 'BadDebt',
                  'PedTrauma', 'MedCenter', 
                  'UE2019', 'Poverty')]

Describe

options(scipen=999)
myprint(describe(mydata))

	vars	n	mean	sd	median	trimmed	mad	min	max	range	skew	kurtosis	se
y	1	1032	0.2267442	0.4189288	0.0000000	0.1585956	0.0000000	0.000	1.0000000	1.0000000	1.3032814	-0.3017473	0.0130407
Native	2	1032	0.0090494	0.0295769	0.0030000	0.0033584	0.0029652	0.000	0.3920000	0.3920000	8.0053322	78.8142105	0.0009207
Hispanic	3	1032	0.1091027	0.1354378	0.0590000	0.0795932	0.0533736	0.002	0.9550000	0.9530000	2.8566099	9.9843466	0.0042160
Black	4	1032	0.1030484	0.1332278	0.0480000	0.0740981	0.0578214	0.000	0.7740000	0.7740000	2.0987350	4.5790464	0.0041472
Asian	5	1032	0.0262716	0.0393888	0.0132533	0.0182661	0.0115520	0.000	0.4308105	0.4308105	4.7772245	31.2330418	0.0012261
Prop65	6	1032	0.1789021	0.0422421	0.1750000	0.1757978	0.0355824	0.079	0.4080000	0.3290000	1.0245633	2.4612127	0.0013149
BedFreqSev	7	1032	616.9259881	1299.7348469	192.5671700	334.9489180	238.3344883	0.000	19129.7326145	19129.7326145	6.2089754	59.7129983	40.4589786
OutpatientVisits	8	1032	186413.5445098	233411.4767733	100676.2500000	135443.9112650	86314.7481000	3904.000	1713803.0000000	1709899.0000000	3.0701713	11.6732102	7265.7819066
ALOS	9	1032	4.5385384	0.8949574	4.5666667	4.5462724	0.7907200	0.000	9.2000000	9.2000000	-0.0691299	1.5035699	0.0278588
OpProfitMargin	10	1032	-0.0080760	0.1688887	-0.0095000	-0.0073902	0.1178667	-2.091	1.6543750	3.7453750	-0.9879296	33.2644862	0.0052573
CapitalExp	11	1032	58634325.2387103	560360298.9388866	10315109.0000000	16513388.1652893	12093079.6833000	-213031958.000	15898581598.0000000	16111613556.0000000	23.9936061	638.5991031	17443254.1942870
OpIncome	12	1032	-3257404.0468822	97489174.9088088	-1388767.0000000	-1002978.3313709	24781356.1789500	-1109645967.000	802473902.0000000	1912119869.0000000	-1.6752233	36.3037000	3034705.4606578
AR	13	1032	151104584.8743217	268250394.6065483	61440467.0000000	93170310.5463680	67091045.5987800	-1012900560.000	3306532176.0000000	4319432736.0000000	4.6537756	34.7543780	8350270.0489317
BadDebt	14	1032	23366221.8963275	34484394.5336317	11760138.0000000	15938197.9585593	12077097.9966000	197238.000	356609942.1000000	356412704.1000000	4.0240209	22.7458393	1073452.3140295
PedTrauma*	15	1032	1.0523256	0.2227907	1.0000000	1.0000000	0.0000000	1.000	2.0000000	1.0000000	4.0148943	14.1330739	0.0069352
MedCenter*	16	1032	1.1124031	0.3160149	1.0000000	1.0157385	0.0000000	1.000	2.0000000	1.0000000	2.4506524	4.0095850	0.0098371
UE2019	17	1032	3.8631783	1.2219464	3.7000000	3.7407990	1.0378200	1.800	18.3000000	16.5000000	3.3244991	29.4976023	0.0380375
Poverty	18	1032	13.9828488	5.1828661	13.6000000	13.6725182	4.8925800	2.600	38.2000000	35.6000000	0.7092444	0.9806952	0.1613356

Correlate

mycol=colorRampPalette(c("red","orange","yellow","white","green", "dark green"))(20)

myf=function(x){
  x=x[ , purrr::map_lgl(x, is.numeric)]
  mycor=cor(x)
  corrplot(mycor, method="ellipse", type="upper",
         addCoef.col=TRUE, tl.cex=.6, number.cex=.5, insig="pch",
         order="hclust", hclust.method="centroid", number.digits=2,
         col=mycol, pch=4)}

myf(mydata)

Pair

demographics=c('Native', 'Hispanic', 'Black', 'Asian', 'Prop65')
workload=c('BedFreqSev','OutpatientVisits', 'ALOS')
financial=c('OpProfitMargin', 'CapitalExp', 'OpIncome', 'AR', 'BadDebt') 
type=c('PedTrauma', 'MedCenter')
economics=c('UE2019', 'Poverty')

kdepairs(mydata[,demographics])

## Warning in par(usr): argument 1 does not name a graphical parameter

## Warning in par(usr): argument 1 does not name a graphical parameter

## Warning in par(usr): argument 1 does not name a graphical parameter

## Warning in par(usr): argument 1 does not name a graphical parameter

## Warning in par(usr): argument 1 does not name a graphical parameter

kdepairs(mydata[,workload])

## Warning in par(usr): argument 1 does not name a graphical parameter

## Warning in par(usr): argument 1 does not name a graphical parameter

## Warning in par(usr): argument 1 does not name a graphical parameter

kdepairs(mydata[,financial])

## Warning in par(usr): argument 1 does not name a graphical parameter

## Warning in par(usr): argument 1 does not name a graphical parameter

## Warning in par(usr): argument 1 does not name a graphical parameter

## Warning in par(usr): argument 1 does not name a graphical parameter

## Warning in par(usr): argument 1 does not name a graphical parameter

kdepairs(mydata[,economics])

## Warning in par(usr): argument 1 does not name a graphical parameter

## Warning in par(usr): argument 1 does not name a graphical parameter

#kdepairs(mydata[,-c(1,15,16)])

Split

set.seed(1234)
mys=sample(1:nrow(mydata), .8*nrow(mydata), replace=F)
train=mydata[mys,]
test=mydata[-mys,]

Discretize

#discretize Native American
myq=quantile(train$Native, c(.2,.4,.6,.8,1))  #Quantiles were determined based on the training data and applied to the entire dataset
mydata$NatNew=mydata$Native
mydata$NatNew[mydata$Native<=myq[1]]="P20"
mydata$NatNew[mydata$Native>myq[1] &mydata$Native<=myq[2]]="P40"
mydata$NatNew[mydata$Native>myq[2] &mydata$Native<=myq[3]]="P60"
mydata$NatNew[mydata$Native>myq[3] &mydata$Native<=myq[4]]="P80"
mydata$NatNew[mydata$Native>myq[4] &mydata$Native<=myq[5]]="P100"
mydata$Native=as.factor(mydata$NatNew)
mydata$NatNew=NULL

#discretize Hispanic
myq=quantile(train$Hispanic, c(.2,.4,.6,.8,1))
mydata$HispNew=mydata$Hispanic
mydata$HispNew[mydata$Hispanic<=myq[1]]="P20"
mydata$HispNew[mydata$Hispanic>myq[1] &mydata$Hispanic<=myq[2]]="P40"
mydata$HispNew[mydata$Hispanic>myq[2] &mydata$Hispanic<=myq[3]]="P60"
mydata$HispNew[mydata$Hispanic>myq[3] &mydata$Hispanic<=myq[4]]="P80"
mydata$HispNew[mydata$Hispanic>myq[4] &mydata$Hispanic<=myq[5]]="P100"
mydata$Hispanic=as.factor(mydata$HispNew)
mydata$HispNew=NULL

#discretize Asian
myq=quantile(train$Asian, c(.2,.4,.6,.8,1))
mydata$AsianNew=mydata$Asian
mydata$AsianNew[mydata$Asian<=myq[1]]="P20"
mydata$AsianNew[mydata$Asian>myq[1] &mydata$Asian<=myq[2]]="P40"
mydata$AsianNew[mydata$Asian>myq[2] &mydata$Asian<=myq[3]]="P60"
mydata$AsianNew[mydata$Asian>myq[3] &mydata$Asian<=myq[4]]="P80"
mydata$AsianNew[mydata$Asian>myq[4] &mydata$Asian<=1000000]="P100"
mydata$Asian=as.factor(mydata$AsianNew)
mydata$AsianNew=NULL

#discretize Black
myq=quantile(train$Black, c(.2,.4,.6,.8,1))
mydata$BlackNew=mydata$Black
mydata$BlackNew[mydata$Black<=myq[1]]="P20"
mydata$BlackNew[mydata$Black>myq[1] &mydata$Black<=myq[2]]="P40"
mydata$BlackNew[mydata$Black>myq[2] &mydata$Black<=myq[3]]="P60"
mydata$BlackNew[mydata$Black>myq[3] &mydata$Black<=myq[4]]="P80"
mydata$BlackNew[mydata$Black>myq[4] &mydata$Black<=myq[5]]="P100"
mydata$Black=as.factor(mydata$BlackNew)
mydata$BlackNew=NULL

#discretize Capital Expenditures
myq=quantile(train$CapitalExp, c(.2,.4,.6,.8,1))
mydata$CapitalExpNew=mydata$CapitalExp
mydata$CapitalExpNew[mydata$CapitalExp<=myq[1]]="P20"
mydata$CapitalExpNew[mydata$CapitalExp>myq[1] &mydata$CapitalExp<=myq[2]]="P40"
mydata$CapitalExpNew[mydata$CapitalExp>myq[2] &mydata$CapitalExp<=myq[3]]="P60"
mydata$CapitalExpNew[mydata$CapitalExp>myq[3] &mydata$CapitalExp<=myq[4]]="P80"
mydata$CapitalExpNew[mydata$CapitalExp>myq[4] &mydata$CapitalExp<=myq[5]]="P100"
mydata$CapitalExp=as.factor(mydata$CapitalExpNew)
mydata$CapitalExpNew=NULL

Transform

#log Outpatient Visits
mydata$OutpatientVisits=log(mydata$OutpatientVisits)

#log BedFreqSev
mydata$BedFreqSev=log(mydata$BedFreqSev+.01)

#log Bad Debt
mydata$BadDebt=log(mydata$BadDebt)

#Unemployment ^ -(1/2)
mydata$UE2019=mydata$UE2019^(-1/2)

#Poverty
mydata$Poverty=mydata$Poverty^.5

temp=fastDummies::dummy_cols(mydata, remove_most_frequent_dummy = T, remove_selected_columns = T)
mydata=temp

#Rebuild the Training and Test Sets

train=mydata[mys,]
test=mydata[-mys,]

Reorder

#Reorder
mydata=mydata[, c("y",
                  
                  "Native_P40", "Native_P60", "Native_P80", "Native_P100",
                  "Hispanic_P40","Hispanic_P60", "Hispanic_P80", "Hispanic_P100", 
                  "Black_P40", "Black_P60", "Black_P80" ,"Black_P100",
                  "Asian_P20", "Asian_P40", "Asian_P80","Asian_P100",
                  
                  "PedTrauma_1" , "MedCenter_1",
                
                  "CapitalExp_P20" ,  "CapitalExp_P40", 
                  "CapitalExp_P80","CapitalExp_P100", 
                  
                  "Prop65", 
                  "OpProfitMargin", "OpIncome","AR" ,"BadDebt", 
                  
                  "BedFreqSev" ,"OutpatientVisits","ALOS",
                  "UE2019", "Poverty" 
                  
                 )]

train=mydata[mys,]
test=mydata[-mys,]

Re-draw

financial=c('OpProfitMargin', 'OpIncome', 'AR', 'BadDebt') 
kdepairs(mydata[,workload])

## Warning in par(usr): argument 1 does not name a graphical parameter

## Warning in par(usr): argument 1 does not name a graphical parameter

## Warning in par(usr): argument 1 does not name a graphical parameter

kdepairs(mydata[,financial])

## Warning in par(usr): argument 1 does not name a graphical parameter

## Warning in par(usr): argument 1 does not name a graphical parameter

## Warning in par(usr): argument 1 does not name a graphical parameter

## Warning in par(usr): argument 1 does not name a graphical parameter

kdepairs(mydata[,economics])

## Warning in par(usr): argument 1 does not name a graphical parameter

## Warning in par(usr): argument 1 does not name a graphical parameter

kdepairs(mydata[,-c(1:24)])

## Warning in par(usr): argument 1 does not name a graphical parameter

## Warning in par(usr): argument 1 does not name a graphical parameter

## Warning in par(usr): argument 1 does not name a graphical parameter

## Warning in par(usr): argument 1 does not name a graphical parameter

## Warning in par(usr): argument 1 does not name a graphical parameter

## Warning in par(usr): argument 1 does not name a graphical parameter

## Warning in par(usr): argument 1 does not name a graphical parameter

## Warning in par(usr): argument 1 does not name a graphical parameter

## Warning in par(usr): argument 1 does not name a graphical parameter

Scale

mymeans=colMeans(train)  #use the means and sds from the training set to apply to the test set
mysd=apply(train, 2, sd) #doing so avoids leakage

for (i in 24:ncol(mydata)){
  train[,i]=(train[,i]-mymeans[i])/mysd[i]
  test[,i]=(test[,i]-mymeans[i])/mysd[i]
}

Rebalance Training Set

tmp=mwmote(train, numInstances = 500, classAttr = "y")
train2=rbind(tmp,train)

Describe

describe(train2)

##                  vars    n  mean   sd median trimmed  mad    min   max range
## y                   1 1325  0.52 0.50   1.00    0.52 0.00   0.00  1.00  1.00
## Native_P40          2 1325  0.18 0.39   0.00    0.10 0.00   0.00  1.00  1.00
## Native_P60          3 1325  0.11 0.31   0.00    0.01 0.00   0.00  1.00  1.00
## Native_P80          4 1325  0.11 0.31   0.00    0.01 0.00   0.00  1.00  1.00
## Native_P100         5 1325  0.13 0.33   0.00    0.04 0.00   0.00  1.00  1.00
## Hispanic_P40        6 1325  0.13 0.34   0.00    0.04 0.00   0.00  1.00  1.00
## Hispanic_P60        7 1325  0.14 0.35   0.00    0.05 0.00   0.00  1.00  1.00
## Hispanic_P80        8 1325  0.15 0.35   0.00    0.06 0.00   0.00  1.00  1.00
## Hispanic_P100       9 1325  0.14 0.35   0.00    0.06 0.00   0.00  1.00  1.00
## Black_P40          10 1325  0.14 0.34   0.00    0.04 0.00   0.00  1.00  1.00
## Black_P60          11 1325  0.14 0.35   0.00    0.05 0.00   0.00  1.00  1.00
## Black_P80          12 1325  0.16 0.37   0.00    0.07 0.00   0.00  1.00  1.00
## Black_P100         13 1325  0.15 0.36   0.00    0.07 0.00   0.00  1.00  1.00
## Asian_P20          14 1325  0.13 0.33   0.00    0.03 0.00   0.00  1.00  1.00
## Asian_P40          15 1325  0.13 0.34   0.00    0.04 0.00   0.00  1.00  1.00
## Asian_P80          16 1325  0.15 0.36   0.00    0.06 0.00   0.00  1.00  1.00
## Asian_P100         17 1325  0.18 0.39   0.00    0.11 0.00   0.00  1.00  1.00
## PedTrauma_1        18 1325  0.04 0.20   0.00    0.00 0.00   0.00  1.00  1.00
## MedCenter_1        19 1325  0.12 0.33   0.00    0.03 0.00   0.00  1.00  1.00
## CapitalExp_P20     20 1325  0.13 0.33   0.00    0.03 0.00   0.00  1.00  1.00
## CapitalExp_P40     21 1325  0.13 0.33   0.00    0.03 0.00   0.00  1.00  1.00
## CapitalExp_P80     22 1325  0.16 0.37   0.00    0.08 0.00   0.00  1.00  1.00
## CapitalExp_P100    23 1325  0.18 0.39   0.00    0.10 0.00   0.00  1.00  1.00
## Prop65             24 1325 -0.10 0.89  -0.21   -0.17 0.75  -2.36  5.39  7.75
## OpProfitMargin     25 1325 -0.06 0.87  -0.03   -0.04 0.61 -12.24  4.60 16.85
## OpIncome           26 1325 -0.05 0.96   0.01   -0.02 0.33 -11.84  7.32 19.16
## AR                 27 1325  0.19 0.99  -0.15    0.00 0.51  -4.78 10.43 15.22
## BadDebt            28 1325  0.20 0.92   0.27    0.24 0.87  -3.37  2.83  6.20
## BedFreqSev         29 1325  0.31 0.99   0.39    0.35 1.07  -6.07  2.89  8.95
## OutpatientVisits   30 1325  0.26 0.96   0.28    0.27 1.02  -3.40  2.71  6.11
## ALOS               31 1325  0.16 0.92   0.21    0.19 0.80  -5.06  5.22 10.28
## UE2019             32 1325  0.00 0.92  -0.03   -0.01 0.88  -4.16  3.16  7.32
## Poverty            33 1325  0.01 0.91   0.00    0.00 0.89  -2.99  3.65  6.64
##                   skew kurtosis   se
## y                -0.07    -2.00 0.01
## Native_P40        1.63     0.67 0.01
## Native_P60        2.54     4.44 0.01
## Native_P80        2.51     4.31 0.01
## Native_P100       2.22     2.93 0.01
## Hispanic_P40      2.15     2.63 0.01
## Hispanic_P60      2.05     2.21 0.01
## Hispanic_P80      1.99     1.96 0.01
## Hispanic_P100     2.02     2.10 0.01
## Black_P40         2.13     2.55 0.01
## Black_P60         2.09     2.35 0.01
## Black_P80         1.87     1.49 0.01
## Black_P100        1.94     1.76 0.01
## Asian_P20         2.25     3.07 0.01
## Asian_P40         2.19     2.80 0.01
## Asian_P80         1.96     1.86 0.01
## Asian_P100        1.63     0.65 0.01
## PedTrauma_1       4.59    19.10 0.01
## MedCenter_1       2.28     3.21 0.01
## CapitalExp_P20    2.25     3.07 0.01
## CapitalExp_P40    2.25     3.07 0.01
## CapitalExp_P80    1.81     1.27 0.01
## CapitalExp_P100   1.64     0.69 0.01
## Prop65            1.20     3.34 0.02
## OpProfitMargin   -2.25    33.00 0.02
## OpIncome         -2.01    31.88 0.03
## AR                2.82    15.15 0.03
## BadDebt          -0.44     0.47 0.03
## BedFreqSev       -0.46     0.56 0.03
## OutpatientVisits -0.07    -0.32 0.03
## ALOS             -0.31     1.82 0.03
## UE2019            0.01     0.60 0.03
## Poverty           0.17     0.29 0.02

Interim Write

write.csv(train2, 'train2.csv', row.names = FALSE)
write.csv(test,'test.csv', row.names=FALSE)

Interim Read

train2=read.csv('train2.csv')
test=read.csv('test.csv')

M1:LogReg

suppressWarnings({
myglm=glm(y~.,data=train2, family='binomial')
par(ask=FALSE)
par(mfrow=c(2,3))
})
mysum=summary(myglm)
print(noquote(paste("R^2:",1-myglm$deviance/myglm$null.deviance)))

## [1] R^2: 0.576294974175473

pm1=plot_model(myglm, main="Full Model", show.values=TRUE, show.p=TRUE, value.offset=.4, main='Full Model')
pm1

VIF

mys=summary(myglm)
myvif=noquote(c(NA,vif(myglm)))
names(myvif)='VIF'
newcoefs=cbind(round(mys$coefficients,3), myvif)
newcoefs=round(newcoefs,3)
colnames(newcoefs)=c('LR', 'SE', 'Z', 'P(Z)', 'VIF')
myprint(newcoefs[-1,])

	LR	SE	Z	P(Z)	VIF
Native_P40	-0.928	0.261	-3.556	0.000	1.391
Native_P60	-0.905	0.306	-2.958	0.003	1.256
Native_P80	-1.371	0.321	-4.277	0.000	1.248
Native_P100	-1.123	0.307	-3.656	0.000	1.417
Hispanic_P40	-0.861	0.309	-2.787	0.005	1.434
Hispanic_P60	-1.296	0.324	-4.000	0.000	1.614
Hispanic_P80	-1.404	0.304	-4.614	0.000	1.772
Hispanic_P100	-2.264	0.344	-6.577	0.000	1.990
Black_P40	-1.581	0.300	-5.262	0.000	1.418
Black_P60	-2.162	0.315	-6.856	0.000	1.626
Black_P80	-1.518	0.316	-4.808	0.000	1.931
Black_P100	-1.920	0.366	-5.248	0.000	2.425
Asian_P20	-2.142	0.387	-5.538	0.000	1.323
Asian_P40	-1.298	0.306	-4.239	0.000	1.303
Asian_P80	-1.116	0.291	-3.832	0.000	1.547
Asian_P100	-0.278	0.306	-0.907	0.364	2.064
PedTrauma_1	0.785	0.471	1.667	0.095	1.219
MedCenter_1	0.954	0.360	2.652	0.008	1.575
CapitalExp_P20	-0.831	0.401	-2.074	0.038	1.432
CapitalExp_P40	-1.262	0.343	-3.676	0.000	1.259
CapitalExp_P80	-0.616	0.261	-2.358	0.018	1.460
CapitalExp_P100	-0.679	0.308	-2.204	0.028	1.988
Prop65	-0.190	0.121	-1.563	0.118	1.514
OpProfitMargin	-0.351	0.119	-2.943	0.003	1.629
OpIncome	0.145	0.122	1.193	0.233	1.610
AR	0.296	0.130	2.277	0.023	1.672
BadDebt	0.100	0.167	0.599	0.549	2.592
BedFreqSev	1.371	0.213	6.449	0.000	3.719
OutpatientVisits	0.252	0.170	1.480	0.139	2.470
ALOS	-0.416	0.148	-2.807	0.005	2.026
UE2019	-0.175	0.123	-1.421	0.155	1.732
Poverty	0.209	0.129	1.617	0.106	1.800

forexport= newcoefs[2:length(myglm$coefficients),]

Outliers

plot(myglm, which = 4, id.n = 6)

#plot(myglm, which =c(5))
model.data <- augment(myglm) %>%   mutate(index = 1:n()) 

ggplot(model.data, aes(index, .std.resid)) + 
  geom_point(aes(), alpha = .5) +
  theme_bw()

model.data %>% 
  filter(abs(.std.resid) > 3)

## # A tibble: 6 × 40
##       y Native_P40 Native_P60 Native_P80 Native_P100 Hispanic_P40 Hispanic_P60
##   <int>      <int>      <int>      <int>       <int>        <int>        <int>
## 1     1          0          0          1           0            1            0
## 2     0          0          0          1           0            1            0
## 3     1          0          1          0           0            0            0
## 4     1          0          0          1           0            0            0
## 5     1          1          0          0           0            0            0
## 6     1          0          0          0           1            0            0
## # ℹ 33 more variables: Hispanic_P80 <int>, Hispanic_P100 <int>,
## #   Black_P40 <int>, Black_P60 <int>, Black_P80 <int>, Black_P100 <int>,
## #   Asian_P20 <int>, Asian_P40 <int>, Asian_P80 <int>, Asian_P100 <int>,
## #   PedTrauma_1 <int>, MedCenter_1 <int>, CapitalExp_P20 <int>,
## #   CapitalExp_P40 <int>, CapitalExp_P80 <int>, CapitalExp_P100 <int>,
## #   Prop65 <dbl>, OpProfitMargin <dbl>, OpIncome <dbl>, AR <dbl>,
## #   BadDebt <dbl>, BedFreqSev <dbl>, OutpatientVisits <dbl>, ALOS <dbl>, …

Outlier Effect

train3=train2[-c(526,556,793,845,858,1314),]

myglm2=glm(y~.,data=train3, family='binomial')
compare=cbind(myglm2$coefficients, myglm$coefficients)
colnames(compare)=c('Without Outliers', 'With Outliers')
myprint(compare)

	Without Outliers	With Outliers
(Intercept)	3.2386233	3.1436510
Native_P40	-1.0352935	-0.9282454
Native_P60	-0.7911568	-0.9047416
Native_P80	-1.4869334	-1.3712453
Native_P100	-1.1831740	-1.1234862
Hispanic_P40	-0.8657349	-0.8610440
Hispanic_P60	-1.3384428	-1.2958404
Hispanic_P80	-1.4728688	-1.4039161
Hispanic_P100	-2.4535927	-2.2643513
Black_P40	-1.5778923	-1.5809882
Black_P60	-2.2983882	-2.1620087
Black_P80	-1.6003539	-1.5175757
Black_P100	-1.9423319	-1.9196144
Asian_P20	-2.2634797	-2.1418128
Asian_P40	-1.3351283	-1.2982008
Asian_P80	-1.1545162	-1.1163427
Asian_P100	-0.1502029	-0.2778906
PedTrauma_1	0.9893849	0.7852201
MedCenter_1	1.1084424	0.9535528
CapitalExp_P20	-0.8110963	-0.8312061
CapitalExp_P40	-1.4022893	-1.2615687
CapitalExp_P80	-0.6179319	-0.6158547
CapitalExp_P100	-0.6845975	-0.6790396
Prop65	-0.1488427	-0.1895831
OpProfitMargin	-0.4897787	-0.3510113
OpIncome	0.1563753	0.1454388
AR	0.3562718	0.2956772
BadDebt	0.0754719	0.0998081
BedFreqSev	1.3857339	1.3706484
OutpatientVisits	0.2692817	0.2515908
ALOS	-0.4616862	-0.4159372
UE2019	-0.1953127	-0.1748229
Poverty	0.2415636	0.2086354

Linearity LogOdds

probabilities <- predict(myglm, type = "response")
predicted.classes <- ifelse(probabilities > 0.5, "pos", "neg")
# Select only numeric predictors
tempdata <- train2[,-c(1:23)] %>%
  dplyr::select_if(is.numeric) 
predictors <- colnames(train2)
# Bind the logit and tidying the data for plot
tempdata <- tempdata %>%
  mutate(logit = log(probabilities/(1-probabilities))) %>%
  gather(key = "predictors", value = "predictor.value", -logit)
ggplot(tempdata, aes(logit, predictor.value))+
  geom_point(size = 0.5, alpha = 0.5) +
  geom_smooth(method = "loess") + 
  theme_bw() + 
  facet_wrap(~predictors, scales = "free_y")

## `geom_smooth()` using formula = 'y ~ x'

## Submodels

dem=train2[,1:17]
work=train2[,c(1,29:31)]
fin=train2[,c(1,25:28)]
type=train2[,c(1,18:19)]
econ=train2[,c(1,32:33)]
sig=train2[,-c(17,24,26,28,30,32,33)]

m1a=glm(y~.,data=dem, family='binomial')
m1b=glm(y~., data=work, family='binomial')
m1c=glm(y~., data=fin, family='binomial')
m1d=glm(y~., data=type, family='binomial')
m1e=glm(y~., data=econ, family='binomial')
m1f=glm(y~., data=sig, family='binomial')

par(mfrow=c(3,2))


pm1=plot_model(m1a, main="Demographics", show.values=TRUE, show.p=TRUE, value.offset=.4)
pm2=plot_model(m1b, main="Workload", show.values=TRUE, show.p=TRUE, value.offset=.4)
pm3=plot_model(m1c, main="Financial", show.values=TRUE, show.p=TRUE, value.offset=.4)
pm4=plot_model(m1d, main="Type", show.values=TRUE, show.p=TRUE, value.offset=.4)
pm5=plot_model(m1e, main="Economic", show.values=TRUE, show.p=TRUE, value.offset=.4)
pm6=plot_model(m1f, main="Significant Variables", show.values=TRUE, show.p=TRUE, value.offset=.4)
pm7=plot_model(myglm, main="Full Model", show.values=TRUE, show.p=TRUE, value.offset=.4)

t1=pm1$data[,c(1,2,5,6)]
t1$Group=rep("1. Demographics", nrow(t1))
t2=pm2$data[,c(1,2,5,6)]
t2$Group=rep("2. Workload", nrow(t2))
t3=pm3$data[,c(1,2,5,6)]
t3$Group=rep("3. Financial", nrow(t3))
t4=pm4$data[,c(1,2,5,6)]
t4$Group=rep("4. Type", nrow(t4))
t5=pm5$data[,c(1,2,5,6)]
t5$Group=rep("5. Economic", nrow(t5))
t6=pm6$data[,c(1,2,5,6)]
t6$Group=rep("6. Significant Only", nrow(t6))
t7=pm7$data[,c(1,2,5,6)]
t7$Group=rep("7. All Variables", nrow(t7))

ttot=rbind(t1,t2,t3,t4,t5,t6,t7)
ttot$Group=as.factor(ttot$Group)

ggplot(data=ttot,
    aes(x = term,y = estimate, ymin = .5, ymax = 2.0 ))+
    geom_point(aes(col=Group))+
    geom_hline(aes(fill=Group),yintercept=1, linetype=2)+
    xlab('')+ ylab("Odds Ratio (95% Confidence Interval)")+
    geom_errorbar(aes(ymin=conf.low,
                      ymax=conf.high,col=Group),width=0.5,cex=1)+ 
  facet_grid(~Group)+
        theme(plot.title=element_text(size=16,face="bold"),
        axis.text.y=element_text(size=8),
        axis.text.x=element_text(size=8,face="bold", angle=90),
        axis.title=element_text(size=8,face="bold"),
        strip.text.y = element_text(hjust=0,vjust = 1,angle=180,face="bold"))+
        guides(colour=FALSE)+
        coord_flip()

## Warning: `geom_hline()`: Ignoring `mapping` because `yintercept` was provided.

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
## of ggplot2 3.3.4.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Confusion Report

mypred1=as.factor(round(predict(m1a, test, type='response'),0))
mypred2=as.factor(round(predict(m1b, test, type='response'),0))
mypred3=as.factor(round(predict(m1c, test, type='response'),0))
mypred4=as.factor(round(predict(m1d, test, type='response'),0))
mypred5=as.factor(round(predict(m1e, test, type='response'),0))
mypred6=as.factor(round(predict(m1f, test, type='response'),0))
mypred7=as.factor(round(predict(myglm, test, type='response'),0))
mycm1=confusionMatrix(data=mypred1, reference=as.factor(test$y), positive = '1')
mycm2=confusionMatrix(data=mypred2, reference=as.factor(test$y), positive = '1')
mycm3=confusionMatrix(data=mypred3, reference=as.factor(test$y), positive = '1')
mycm4=confusionMatrix(data=mypred4, reference=as.factor(test$y), positive = '1')
mycm5=confusionMatrix(data=mypred5, reference=as.factor(test$y), positive = '1')
mycm6=confusionMatrix(data=mypred6, reference=as.factor(test$y), positive = '1')
mycm7=confusionMatrix(data=mypred7, reference=as.factor(test$y), positive = '1')
met1=rbind(c(mycm1$overall,mycm1$byClass),
           c(mycm2$overall,mycm2$byClass),
           c(mycm3$overall,mycm3$byClass),
           c(mycm4$overall,mycm4$byClass),
           c(mycm5$overall,mycm5$byClass),
           c(mycm6$overall,mycm6$byClass),
           c(mycm7$overall,mycm7$byClass))

mydf=as.data.frame(t(round(met1,3)))
colnames(mydf)=c("Demographics","Workload","Finance","Type","Economics","Significant Only","Full Model")
mydf

##                      Demographics Workload Finance  Type Economics
## Accuracy                    0.696    0.705   0.783 0.831     0.324
## Kappa                       0.164    0.377   0.457 0.430     0.017
## AccuracyLower               0.628    0.638   0.720 0.773     0.260
## AccuracyUpper               0.758    0.766   0.837 0.879     0.392
## AccuracyNull                0.768    0.768   0.768 0.768     0.768
## AccuracyPValue              0.993    0.985   0.345 0.017     1.000
## McnemarPValue               0.801    0.000   0.017 0.000     0.000
## Sensitivity                 0.375    0.833   0.708 0.396     0.875
## Specificity                 0.792    0.667   0.805 0.962     0.157
## Pos Pred Value              0.353    0.430   0.523 0.760     0.239
## Neg Pred Value              0.808    0.930   0.901 0.841     0.806
## Precision                   0.353    0.430   0.523 0.760     0.239
## Recall                      0.375    0.833   0.708 0.396     0.875
## F1                          0.364    0.567   0.602 0.521     0.375
## Prevalence                  0.232    0.232   0.232 0.232     0.232
## Detection Rate              0.087    0.193   0.164 0.092     0.203
## Detection Prevalence        0.246    0.449   0.314 0.121     0.850
## Balanced Accuracy           0.584    0.750   0.757 0.679     0.516
##                      Significant Only Full Model
## Accuracy                        0.778      0.783
## Kappa                           0.376      0.385
## AccuracyLower                   0.715      0.720
## AccuracyUpper                   0.832      0.837
## AccuracyNull                    0.768      0.768
## AccuracyPValue                  0.408      0.345
## McnemarPValue                   1.000      1.000
## Sensitivity                     0.521      0.521
## Specificity                     0.855      0.862
## Pos Pred Value                  0.521      0.532
## Neg Pred Value                  0.855      0.856
## Precision                       0.521      0.532
## Recall                          0.521      0.521
## F1                              0.521      0.526
## Prevalence                      0.232      0.232
## Detection Rate                  0.121      0.121
## Detection Prevalence            0.232      0.227
## Balanced Accuracy               0.688      0.691

Other Models

Prediction Functions


test=r.test
y_test=test['y']
X_test=test.drop('y', axis=1)

def myf(mod):
    y_hat=mod.predict(X_test) #can use either encoded or decoded data..doesn't help
    results=pd.DataFrame(CR(y_test, y_hat, output_dict=True))
    try:  
        CMD.from_estimator(mod,X_test,y_test)
        plt.show()
    except:
        print('No confusion plot.')
    return(results)

def prplot(mod):
    average_precision = average_precision_score(y_test, mod.predict(X_test))
    disp = PRD.from_estimator(mod, X_test, y_test)
    disp.ax_.set_title('Precision-Recall curve: '
                   'AP={0:0.2f}'.format(average_precision))
    plt.show()
    
def mytree(mod):
    imp, std=mod.feature_importances_, np.std([mod.feature_importances_ for tree in mod.estimators_], axis=0)
    importances = pd.Series(imp, index=feature_names).sort_values(ascending=False)
    fig, ax = plt.subplots()
    importances.plot.bar(yerr=std[0:20], ax=ax)
    ax.set_title("Feature importances using MDI")
    ax.set_ylabel("Mean decrease in impurity")
    fig.tight_layout() 
    plt.show()

M2:Perceptron

#Data & Names
train2=r.train2
X_train2=train2.drop('y', axis=1)
feature_names=X_train2.columns
y_train2=train2['y']

#Perceptron

#for i in np.arange(.1,.95, .01): #hyperparameter tuning
#   j=i
nn = Perceptron(alpha=1e-5,random_state=1234, max_iter=40000, eta0=3, shuffle=True, class_weight={0:.8,1:.2})
nn.fit(X_train2, y_train2)

## Perceptron(alpha=1e-05, class_weight={0: 0.8, 1: 0.2}, eta0=3, max_iter=40000,
##            random_state=1234)

#hat=nn.predict(X_train2)
#pd.DataFrame(CR(y_train2, hat, output_dict=True))
print(myf(nn))

##                     0          1  accuracy   macro avg  weighted avg
## precision    0.863354   0.565217  0.797101    0.714286      0.794221
## recall       0.874214   0.541667  0.797101    0.707940      0.797101
## f1-score     0.868750   0.553191  0.797101    0.710971      0.795577
## support    159.000000  48.000000  0.797101  207.000000    207.000000

print(prplot(nn))

## None

M3:SVM

#SVM
svm=LinearSVC(max_iter=10000, random_state=1234, loss='hinge')
svm.fit(X_train2, y_train2)

## LinearSVC(loss='hinge', max_iter=10000, random_state=1234)

#hat=svm.predict(X_train2)
#pd.DataFrame(CR(y_train2, hat, output_dict=True))
print(myf(svm))

##                     0          1  accuracy   macro avg  weighted avg
## precision    0.878049   0.651163  0.830918    0.764606      0.825438
## recall       0.905660   0.583333  0.830918    0.744497      0.830918
## f1-score     0.891641   0.615385  0.830918    0.753513      0.827581
## support    159.000000  48.000000  0.830918  207.000000    207.000000

print(prplot(svm))

## None

Others

#Regularized
lr=LR(penalty='l1', random_state=1234,solver='liblinear', max_iter=10000)
lr.fit(X_train2, y_train2)

## LogisticRegression(max_iter=10000, penalty='l1', random_state=1234,
##                    solver='liblinear')

print(myf(lr))

##                     0          1  accuracy   macro avg  weighted avg
## precision    0.857143   0.543478   0.78744    0.700311      0.784409
## recall       0.867925   0.520833   0.78744    0.694379      0.787440
## f1-score     0.862500   0.531915   0.78744    0.697207      0.785843
## support    159.000000  48.000000   0.78744  207.000000    207.000000

print(prplot(lr))

## None

# Decision Tree
dt=Tree()
dt.fit(X_train2,y_train2)

## DecisionTreeClassifier()

print(myf(dt))

##                     0          1  accuracy   macro avg  weighted avg
## precision    0.863946   0.466667  0.748792    0.665306      0.771823
## recall       0.798742   0.583333  0.748792    0.691038      0.748792
## f1-score     0.830065   0.518519  0.748792    0.674292      0.757823
## support    159.000000  48.000000  0.748792  207.000000    207.000000

print(prplot(dt))

## None

# Random Forest
rf=RFC(random_state=1234)
rf.fit(X_train2,y_train2)

## RandomForestClassifier(random_state=1234)

print(myf(rf))

##                     0          1  accuracy   macro avg  weighted avg
## precision    0.886076   0.612245  0.821256    0.749160      0.822579
## recall       0.880503   0.625000  0.821256    0.752752      0.821256
## f1-score     0.883281   0.618557  0.821256    0.750919      0.821895
## support    159.000000  48.000000  0.821256  207.000000    207.000000

print(prplot(rf))

## None

# GBC
gb=GBC(random_state=1234)
gb.fit(X_train2,y_train2)

## GradientBoostingClassifier(random_state=1234)

print(myf(gb))

##                     0          1  accuracy   macro avg  weighted avg
## precision    0.870968   0.538462   0.78744    0.704715      0.793865
## recall       0.849057   0.583333   0.78744    0.716195      0.787440
## f1-score     0.859873   0.560000   0.78744    0.709936      0.790337
## support    159.000000  48.000000   0.78744  207.000000    207.000000

print(prplot(gb))

## None

# ETC
et=ETC(random_state=1234)
et.fit(X_train2,y_train2)

## ExtraTreesClassifier(random_state=1234)

print(myf(et))

##                     0          1  accuracy   macro avg  weighted avg
## precision    0.875776   0.608696  0.816425    0.742236      0.813845
## recall       0.886792   0.583333  0.816425    0.735063      0.816425
## f1-score     0.881250   0.595745  0.816425    0.738497      0.815046
## support    159.000000  48.000000  0.816425  207.000000    207.000000

print(prplot(et))

## None

# AdaBoost
ad=ABC(random_state=1234)
ad.fit(X_train2,y_train2)

## AdaBoostClassifier(random_state=1234)

print(myf(ad))

##                     0          1  accuracy   macro avg  weighted avg
## precision    0.887324   0.492308  0.763285    0.689816      0.795726
## recall       0.792453   0.666667  0.763285    0.729560      0.763285
## f1-score     0.837209   0.566372  0.763285    0.701790      0.774406
## support    159.000000  48.000000  0.763285  207.000000    207.000000

print(prplot(ad))

## None

Importances

df=pd.DataFrame(np.squeeze(nn.coef_), columns=['NN_coef'])
df.index=np.squeeze(feature_names)
tmp=np.squeeze(svm.coef_)
tmp2=np.squeeze(lr.coef_)
tmp3=np.squeeze(dt.feature_importances_)
tmp4=np.squeeze(rf.feature_importances_)
tmp5=np.squeeze(gb.feature_importances_)
tmp6=np.squeeze(et.feature_importances_)
tmp7=np.squeeze(ad.feature_importances_)

df['SVM']=tmp
df['DT']=tmp3
df['RF']=tmp4
df['GB']=tmp5
df['ET']=tmp6
df['AD']=tmp7

df

##                     NN_coef       SVM        DT  ...        GB        ET    AD
## Native_P40        -8.400000 -0.664847  0.007086  ...  0.017680  0.020295  0.02
## Native_P60        -9.600000 -0.575906  0.004946  ...  0.008025  0.021216  0.02
## Native_P80       -13.200000 -0.911476  0.021921  ...  0.018712  0.027835  0.04
## Native_P100       -6.000000 -0.749499  0.009644  ...  0.025142  0.029278  0.04
## Hispanic_P40       0.600000 -0.496737  0.002999  ...  0.003922  0.017003  0.02
## Hispanic_P60      -9.000000 -0.638108  0.018350  ...  0.015103  0.019521  0.02
## Hispanic_P80     -13.800000 -0.715280  0.000000  ...  0.006832  0.018522  0.02
## Hispanic_P100    -14.400000 -1.300741  0.011526  ...  0.027317  0.025750  0.04
## Black_P40        -13.800000 -0.829078  0.003778  ...  0.010796  0.025864  0.02
## Black_P60        -16.200000 -1.152443  0.014649  ...  0.027860  0.035664  0.04
## Black_P80        -12.000000 -0.824494  0.008905  ...  0.007844  0.016818  0.02
## Black_P100       -13.200000 -1.002618  0.000870  ...  0.014049  0.014402  0.02
## Asian_P20        -16.800000 -1.239400  0.029608  ...  0.025054  0.057162  0.02
## Asian_P40         -6.000000 -0.843911  0.021915  ...  0.025324  0.040246  0.02
## Asian_P80        -10.200000 -0.562356  0.021104  ...  0.023522  0.023929  0.02
## Asian_P100        -3.000000 -0.169772  0.016311  ...  0.003807  0.012908  0.02
## PedTrauma_1        7.200000  0.755052  0.000000  ...  0.000506  0.006829  0.00
## MedCenter_1        7.800000  0.921467  0.000000  ...  0.000455  0.024948  0.00
## CapitalExp_P20    -5.400000 -0.447313  0.000000  ...  0.003911  0.060892  0.02
## CapitalExp_P40    -8.400000 -0.680389  0.002062  ...  0.015554  0.064059  0.02
## CapitalExp_P80    -4.200000 -0.254959  0.001953  ...  0.000864  0.016571  0.02
## CapitalExp_P100   -7.200000 -0.307107  0.007216  ...  0.001178  0.012200  0.02
## Prop65            -2.518278 -0.086875  0.033255  ...  0.016854  0.029140  0.08
## OpProfitMargin    -1.434892 -0.135308  0.055462  ...  0.028457  0.029902  0.06
## OpIncome           1.235882  0.008449  0.030035  ...  0.016805  0.027573  0.02
## AR                 3.168171  0.287449  0.111957  ...  0.164898  0.044793  0.04
## BadDebt            0.517059 -0.024293  0.059004  ...  0.030496  0.045447  0.06
## BedFreqSev        15.404488  0.722833  0.336823  ...  0.329138  0.072263  0.14
## OutpatientVisits   0.055865  0.118044  0.078458  ...  0.080970  0.068448  0.04
## ALOS              -1.335678 -0.221242  0.010614  ...  0.012031  0.030706  0.04
## UE2019            -1.535168 -0.039130  0.049371  ...  0.017117  0.028617  0.02
## Poverty            0.194355  0.195630  0.030179  ...  0.019778  0.031199  0.02
## 
## [32 rows x 7 columns]

Coefficients

newdf=py$df
newdf=cbind(newdf, forexport)
myprint(round(newdf,3))

	NN_coef	SVM	DT	RF	GB	ET	AD	LR	SE	Z	P(Z)	VIF
Native_P40	-8.400	-0.665	0.007	0.010	0.018	0.020	0.02	-0.928	0.261	-3.556	0.000	1.391
Native_P60	-9.600	-0.576	0.005	0.009	0.008	0.021	0.02	-0.905	0.306	-2.958	0.003	1.256
Native_P80	-13.200	-0.911	0.022	0.016	0.019	0.028	0.04	-1.371	0.321	-4.277	0.000	1.248
Native_P100	-6.000	-0.749	0.010	0.014	0.025	0.029	0.04	-1.123	0.307	-3.656	0.000	1.417
Hispanic_P40	0.600	-0.497	0.003	0.008	0.004	0.017	0.02	-0.861	0.309	-2.787	0.005	1.434
Hispanic_P60	-9.000	-0.638	0.018	0.011	0.015	0.020	0.02	-1.296	0.324	-4.000	0.000	1.614
Hispanic_P80	-13.800	-0.715	0.000	0.009	0.007	0.019	0.02	-1.404	0.304	-4.614	0.000	1.772
Hispanic_P100	-14.400	-1.301	0.012	0.013	0.027	0.026	0.04	-2.264	0.344	-6.577	0.000	1.990
Black_P40	-13.800	-0.829	0.004	0.011	0.011	0.026	0.02	-1.581	0.300	-5.262	0.000	1.418
Black_P60	-16.200	-1.152	0.015	0.020	0.028	0.036	0.04	-2.162	0.315	-6.856	0.000	1.626
Black_P80	-12.000	-0.824	0.009	0.008	0.008	0.017	0.02	-1.518	0.316	-4.808	0.000	1.931
Black_P100	-13.200	-1.003	0.001	0.009	0.014	0.014	0.02	-1.920	0.366	-5.248	0.000	2.425
Asian_P20	-16.800	-1.239	0.030	0.029	0.025	0.057	0.02	-2.142	0.387	-5.538	0.000	1.323
Asian_P40	-6.000	-0.844	0.022	0.021	0.025	0.040	0.02	-1.298	0.306	-4.239	0.000	1.303
Asian_P80	-10.200	-0.562	0.021	0.014	0.024	0.024	0.02	-1.116	0.291	-3.832	0.000	1.547
Asian_P100	-3.000	-0.170	0.016	0.006	0.004	0.013	0.02	-0.278	0.306	-0.907	0.364	2.064
PedTrauma_1	7.200	0.755	0.000	0.002	0.001	0.007	0.00	0.785	0.471	1.667	0.095	1.219
MedCenter_1	7.800	0.921	0.000	0.008	0.000	0.025	0.00	0.954	0.360	2.652	0.008	1.575
CapitalExp_P20	-5.400	-0.447	0.000	0.018	0.004	0.061	0.02	-0.831	0.401	-2.074	0.038	1.432
CapitalExp_P40	-8.400	-0.680	0.002	0.028	0.016	0.064	0.02	-1.262	0.343	-3.676	0.000	1.259
CapitalExp_P80	-4.200	-0.255	0.002	0.008	0.001	0.017	0.02	-0.616	0.261	-2.358	0.018	1.460
CapitalExp_P100	-7.200	-0.307	0.007	0.005	0.001	0.012	0.02	-0.679	0.308	-2.204	0.028	1.988
Prop65	-2.518	-0.087	0.033	0.045	0.017	0.029	0.08	-0.190	0.121	-1.563	0.118	1.514
OpProfitMargin	-1.435	-0.135	0.055	0.051	0.028	0.030	0.06	-0.351	0.119	-2.943	0.003	1.629
OpIncome	1.236	0.008	0.030	0.045	0.017	0.028	0.02	0.145	0.122	1.193	0.233	1.610
AR	3.168	0.287	0.112	0.135	0.165	0.045	0.04	0.296	0.130	2.277	0.023	1.672
BadDebt	0.517	-0.024	0.059	0.075	0.030	0.045	0.06	0.100	0.167	0.599	0.549	2.592
BedFreqSev	15.404	0.723	0.337	0.136	0.329	0.072	0.14	1.371	0.213	6.449	0.000	3.719
OutpatientVisits	0.056	0.118	0.078	0.103	0.081	0.068	0.04	0.252	0.170	1.480	0.139	2.470
ALOS	-1.336	-0.221	0.011	0.054	0.012	0.031	0.04	-0.416	0.148	-2.807	0.005	2.026
UE2019	-1.535	-0.039	0.049	0.038	0.017	0.029	0.02	-0.175	0.123	-1.421	0.155	1.732
Poverty	0.194	0.196	0.030	0.042	0.020	0.031	0.02	0.209	0.129	1.617	0.106	1.800

Citations

citation("Amelia")

## 
## To cite Amelia in publications use:
## 
##   James Honaker, Gary King, Matthew Blackwell (2011). Amelia II: A
##   Program for Missing Data. Journal of Statistical Software, 45(7),
##   1-47. URL https://www.jstatsoft.org/v45/i07/.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Article{,
##     title = {{Amelia II}: A Program for Missing Data},
##     author = {James Honaker and Gary King and Matthew Blackwell},
##     journal = {Journal of Statistical Software},
##     year = {2011},
##     volume = {45},
##     number = {7},
##     pages = {1--47},
##     doi = {10.18637/jss.v045.i07},
##   }

citation("broom")

## 
## To cite package 'broom' in publications use:
## 
##   Robinson D, Hayes A, Couch S (2023). _broom: Convert Statistical
##   Objects into Tidy Tibbles_. R package version 1.0.5,
##   <https://CRAN.R-project.org/package=broom>.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {broom: Convert Statistical Objects into Tidy Tibbles},
##     author = {David Robinson and Alex Hayes and Simon Couch},
##     year = {2023},
##     note = {R package version 1.0.5},
##     url = {https://CRAN.R-project.org/package=broom},
##   }

citation("car")

## 
## To cite the car package in publications use:
## 
##   Fox J, Weisberg S (2019). _An R Companion to Applied Regression_,
##   Third edition. Sage, Thousand Oaks CA.
##   <https://socialsciences.mcmaster.ca/jfox/Books/Companion/>.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Book{,
##     title = {An {R} Companion to Applied Regression},
##     edition = {Third},
##     author = {John Fox and Sanford Weisberg},
##     year = {2019},
##     publisher = {Sage},
##     address = {Thousand Oaks {CA}},
##     url = {https://socialsciences.mcmaster.ca/jfox/Books/Companion/},
##   }

citation("caret")

## 
## To cite caret in publications use:
## 
##   Kuhn, M. (2008). Building Predictive Models in R Using the caret
##   Package. Journal of Statistical Software, 28(5), 1–26.
##   https://doi.org/10.18637/jss.v028.i05
## 
## A BibTeX entry for LaTeX users is
## 
##   @Article{,
##     title = {Building Predictive Models in R Using the caret Package},
##     volume = {28},
##     url = {https://www.jstatsoft.org/index.php/jss/article/view/v028i05},
##     doi = {10.18637/jss.v028.i05},
##     number = {5},
##     journal = {Journal of Statistical Software},
##     author = {{Kuhn} and {Max}},
##     year = {2008},
##     pages = {1–26},
##   }

citation("corrplot")

## 
## To cite corrplot in publications use:
## 
##   Taiyun Wei and Viliam Simko (2021). R package 'corrplot':
##   Visualization of a Correlation Matrix (Version 0.92). Available from
##   https://github.com/taiyun/corrplot
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{corrplot2021,
##     title = {R package 'corrplot': Visualization of a Correlation Matrix},
##     author = {Taiyun Wei and Viliam Simko},
##     year = {2021},
##     note = {(Version 0.92)},
##     url = {https://github.com/taiyun/corrplot},
##   }

citation("dplyr")

## 
## To cite package 'dplyr' in publications use:
## 
##   Wickham H, François R, Henry L, Müller K, Vaughan D (2023). _dplyr: A
##   Grammar of Data Manipulation_. R package version 1.1.1,
##   <https://CRAN.R-project.org/package=dplyr>.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {dplyr: A Grammar of Data Manipulation},
##     author = {Hadley Wickham and Romain François and Lionel Henry and Kirill Müller and Davis Vaughan},
##     year = {2023},
##     note = {R package version 1.1.1},
##     url = {https://CRAN.R-project.org/package=dplyr},
##   }

citation("e1071")

## 
## To cite package 'e1071' in publications use:
## 
##   Meyer D, Dimitriadou E, Hornik K, Weingessel A, Leisch F (2023).
##   _e1071: Misc Functions of the Department of Statistics, Probability
##   Theory Group (Formerly: E1071), TU Wien_. R package version 1.7-13,
##   <https://CRAN.R-project.org/package=e1071>.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {e1071: Misc Functions of the Department of Statistics, Probability
## Theory Group (Formerly: E1071), TU Wien},
##     author = {David Meyer and Evgenia Dimitriadou and Kurt Hornik and Andreas Weingessel and Friedrich Leisch},
##     year = {2023},
##     note = {R package version 1.7-13},
##     url = {https://CRAN.R-project.org/package=e1071},
##   }

citation("fastDummies")

## 
## To cite package 'fastDummies' in publications use:
## 
##   Kaplan J (2023). _fastDummies: Fast Creation of Dummy (Binary)
##   Columns and Rows from Categorical Variables_. R package version
##   1.7.3, <https://CRAN.R-project.org/package=fastDummies>.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {fastDummies: Fast Creation of Dummy (Binary) Columns and Rows from
## Categorical Variables},
##     author = {Jacob Kaplan},
##     year = {2023},
##     note = {R package version 1.7.3},
##     url = {https://CRAN.R-project.org/package=fastDummies},
##   }

citation("ggplot2")

## 
## To cite ggplot2 in publications, please use
## 
##   H. Wickham. ggplot2: Elegant Graphics for Data Analysis.
##   Springer-Verlag New York, 2016.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Book{,
##     author = {Hadley Wickham},
##     title = {ggplot2: Elegant Graphics for Data Analysis},
##     publisher = {Springer-Verlag New York},
##     year = {2016},
##     isbn = {978-3-319-24277-4},
##     url = {https://ggplot2.tidyverse.org},
##   }

citation("ggcorrplot")

## 
## To cite package 'ggcorrplot' in publications use:
## 
##   Kassambara A (2023). _ggcorrplot: Visualization of a Correlation
##   Matrix using 'ggplot2'_. R package version 0.1.4.1,
##   <https://CRAN.R-project.org/package=ggcorrplot>.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {ggcorrplot: Visualization of a Correlation Matrix using 'ggplot2'},
##     author = {Alboukadel Kassambara},
##     year = {2023},
##     note = {R package version 0.1.4.1},
##     url = {https://CRAN.R-project.org/package=ggcorrplot},
##   }

citation("ggExtra")

## 
## To cite package 'ggExtra' in publications use:
## 
##   Attali D, Baker C (2023). _ggExtra: Add Marginal Histograms to
##   'ggplot2', and More 'ggplot2' Enhancements_. R package version
##   0.10.1, <https://CRAN.R-project.org/package=ggExtra>.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {ggExtra: Add Marginal Histograms to 'ggplot2', and More 'ggplot2'
## Enhancements},
##     author = {Dean Attali and Christopher Baker},
##     year = {2023},
##     note = {R package version 0.10.1},
##     url = {https://CRAN.R-project.org/package=ggExtra},
##   }

citation("glmpath")

## 
## To cite package 'glmpath' in publications use:
## 
##   Park MY, Hastie T (2018). _glmpath: L1 Regularization Path for
##   Generalized Linear Models and Cox Proportional Hazards Model_. R
##   package version 0.98, <https://CRAN.R-project.org/package=glmpath>.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {glmpath: L1 Regularization Path for Generalized Linear Models and Cox
## Proportional Hazards Model},
##     author = {Mee Young Park and Trevor Hastie},
##     year = {2018},
##     note = {R package version 0.98},
##     url = {https://CRAN.R-project.org/package=glmpath},
##   }
## 
## ATTENTION: This citation information has been auto-generated from the
## package DESCRIPTION file and may need manual editing, see
## 'help("citation")'.

citation("grid")

## 
## The 'grid' package is part of R.  To cite R in publications use:
## 
##   R Core Team (2023). R: A language and environment for statistical
##   computing. R Foundation for Statistical Computing, Vienna, Austria.
##   URL https://www.R-project.org/.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {R: A Language and Environment for Statistical Computing},
##     author = {{R Core Team}},
##     organization = {R Foundation for Statistical Computing},
##     address = {Vienna, Austria},
##     year = {2023},
##     url = {https://www.R-project.org/},
##   }
## 
## We have invested a lot of time and effort in creating R, please cite it
## when using it for data analysis. See also 'citation("pkgname")' for
## citing R packages.

citation("gridExtra")

## 
## To cite package 'gridExtra' in publications use:
## 
##   Auguie B (2017). _gridExtra: Miscellaneous Functions for "Grid"
##   Graphics_. R package version 2.3,
##   <https://CRAN.R-project.org/package=gridExtra>.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {gridExtra: Miscellaneous Functions for "Grid" Graphics},
##     author = {Baptiste Auguie},
##     year = {2017},
##     note = {R package version 2.3},
##     url = {https://CRAN.R-project.org/package=gridExtra},
##   }

citation("kableExtra")

## 
## To cite package 'kableExtra' in publications use:
## 
##   Zhu H (2021). _kableExtra: Construct Complex Table with 'kable' and
##   Pipe Syntax_. R package version 1.3.4,
##   <https://CRAN.R-project.org/package=kableExtra>.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {kableExtra: Construct Complex Table with 'kable' and Pipe Syntax},
##     author = {Hao Zhu},
##     year = {2021},
##     note = {R package version 1.3.4},
##     url = {https://CRAN.R-project.org/package=kableExtra},
##   }

citation("leaflet")

## 
## To cite package 'leaflet' in publications use:
## 
##   Cheng J, Schloerke B, Karambelkar B, Xie Y (2023). _leaflet: Create
##   Interactive Web Maps with the JavaScript 'Leaflet' Library_. R
##   package version 2.2.0, <https://CRAN.R-project.org/package=leaflet>.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {leaflet: Create Interactive Web Maps with the JavaScript 'Leaflet'
## Library},
##     author = {Joe Cheng and Barret Schloerke and Bhaskar Karambelkar and Yihui Xie},
##     year = {2023},
##     note = {R package version 2.2.0},
##     url = {https://CRAN.R-project.org/package=leaflet},
##   }

citation("leaflet.extras")

## 
## To cite package 'leaflet.extras' in publications use:
## 
##   Karambelkar B, Schloerke B (2018). _leaflet.extras: Extra
##   Functionality for 'leaflet' Package_. R package version 1.0.0,
##   <https://CRAN.R-project.org/package=leaflet.extras>.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {leaflet.extras: Extra Functionality for 'leaflet' Package},
##     author = {Bhaskar Karambelkar and Barret Schloerke},
##     year = {2018},
##     note = {R package version 1.0.0},
##     url = {https://CRAN.R-project.org/package=leaflet.extras},
##   }

citation("leaps")

## 
## To cite package 'leaps' in publications use:
## 
##   Miller TLboFcbA (2020). _leaps: Regression Subset Selection_. R
##   package version 3.1, <https://CRAN.R-project.org/package=leaps>.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {leaps: Regression Subset Selection},
##     author = {Thomas Lumley based on Fortran code by Alan Miller},
##     year = {2020},
##     note = {R package version 3.1},
##     url = {https://CRAN.R-project.org/package=leaps},
##   }
## 
## ATTENTION: This citation information has been auto-generated from the
## package DESCRIPTION file and may need manual editing, see
## 'help("citation")'.

citation("maptools")

## 
## To cite package 'maptools' in publications use:
## 
##   Bivand R, Lewin-Koh N (2023). _maptools: Tools for Handling Spatial
##   Objects_. R package version 1.1-8,
##   <https://CRAN.R-project.org/package=maptools>.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {maptools: Tools for Handling Spatial Objects},
##     author = {Roger Bivand and Nicholas Lewin-Koh},
##     year = {2023},
##     note = {R package version 1.1-8},
##     url = {https://CRAN.R-project.org/package=maptools},
##   }

citation("MASS")

## 
## To cite the MASS package in publications use:
## 
##   Venables, W. N. & Ripley, B. D. (2002) Modern Applied Statistics with
##   S. Fourth Edition. Springer, New York. ISBN 0-387-95457-0
## 
## A BibTeX entry for LaTeX users is
## 
##   @Book{,
##     title = {Modern Applied Statistics with S},
##     author = {W. N. Venables and B. D. Ripley},
##     publisher = {Springer},
##     edition = {Fourth},
##     address = {New York},
##     year = {2002},
##     note = {ISBN 0-387-95457-0},
##     url = {https://www.stats.ox.ac.uk/pub/MASS4/},
##   }

citation("imbalance")

## 
## To cite package imbalance in publications use:
## 
##   Cordón I, García S, Fernández A, Herrera F (2018). "Imbalance:
##   Oversampling algorithms for imbalanced classification in R",
##   Knowledge-Based Systems, volume 161, pages 329-341
## 
## A BibTeX entry for LaTeX users is
## 
##   @Article{,
##     title = {Imbalance: Oversampling algorithms for imbalanced classification in R},
##     author = {Ignacio Cordón and Salvador García and Alberto Fernández and Francisco Herrera},
##     month = {12},
##     year = {2018},
##     pages = {329-341},
##     journal = {Knowledge-Based Systems},
##     volume = {161},
##     url = {https://doi.org/10.1016/j.knosys.2018.07.035},
##   }

citation("mlpack")

## 
## To cite package 'mlpack' in publications use:
## 
##   Singh Parihar Y, Curtin R, Eddelbuettel D, Balamuta J (2023).
##   _mlpack: 'Rcpp' Integration for the 'mlpack' Library_. R package
##   version 4.2.1, <https://CRAN.R-project.org/package=mlpack>.
## 
##   Curtin R, Edel M, Shrit O, Agrawal S, Basak S, Balamuta J, Birmingham
##   R, Dutt K, Eddelbuettel D, Garg R, Jaiswal S, Kaushik A, Kim S,
##   Mukherjee A, Sai N, Sharma N, Parihar Y, Swain R, Sanderson C (2023).
##   "mlpack 4: a fast, header-only C++ machine learning library."
##   _Journal of Open Source Software_, *8*(82). doi:10.21105/joss.05026
##   <https://doi.org/10.21105/joss.05026>.
## 
## To see these entries in BibTeX format, use 'print(<citation>,
## bibtex=TRUE)', 'toBibtex(.)', or set
## 'options(citation.bibtex.max=999)'.

citation("neuralnet")

## 
## To cite package 'neuralnet' in publications use:
## 
##   Fritsch S, Guenther F, Wright M (2019). _neuralnet: Training of
##   Neural Networks_. R package version 1.44.2,
##   <https://CRAN.R-project.org/package=neuralnet>.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {neuralnet: Training of Neural Networks},
##     author = {Stefan Fritsch and Frauke Guenther and Marvin N. Wright},
##     year = {2019},
##     note = {R package version 1.44.2},
##     url = {https://CRAN.R-project.org/package=neuralnet},
##   }

citation("psych")

## 
## To cite package 'psych' in publications use:
## 
##   William Revelle (2023). _psych: Procedures for Psychological,
##   Psychometric, and Personality Research_. Northwestern University,
##   Evanston, Illinois. R package version 2.3.9,
##   <https://CRAN.R-project.org/package=psych>.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {psych: Procedures for Psychological, Psychometric, and Personality Research},
##     author = {{William Revelle}},
##     organization = {Northwestern University},
##     address = {Evanston, Illinois},
##     year = {2023},
##     note = {R package version 2.3.9},
##     url = {https://CRAN.R-project.org/package=psych},
##   }

citation("raster")

## 
## To cite package 'raster' in publications use:
## 
##   Hijmans R (2023). _raster: Geographic Data Analysis and Modeling_. R
##   package version 3.6-26, <https://CRAN.R-project.org/package=raster>.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {raster: Geographic Data Analysis and Modeling},
##     author = {Robert J. Hijmans},
##     year = {2023},
##     note = {R package version 3.6-26},
##     url = {https://CRAN.R-project.org/package=raster},
##   }

citation("RColorBrewer")

## 
## To cite package 'RColorBrewer' in publications use:
## 
##   Neuwirth E (2022). _RColorBrewer: ColorBrewer Palettes_. R package
##   version 1.1-3, <https://CRAN.R-project.org/package=RColorBrewer>.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {RColorBrewer: ColorBrewer Palettes},
##     author = {Erich Neuwirth},
##     year = {2022},
##     note = {R package version 1.1-3},
##     url = {https://CRAN.R-project.org/package=RColorBrewer},
##   }

citation("ResourceSelection")

## 
## To cite package 'ResourceSelection' in publications use:
## 
##   Lele SR, Keim JL, Solymos P (2023). _ResourceSelection: Resource
##   Selection (Probability) Functions for Use-Availability Data_. R
##   package version 0.3-6,
##   <https://CRAN.R-project.org/package=ResourceSelection>.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {ResourceSelection: Resource Selection (Probability) Functions for Use-Availability
## Data},
##     author = {Subhash R. Lele and Jonah L. Keim and Peter Solymos},
##     year = {2023},
##     note = {R package version 0.3-6},
##     url = {https://CRAN.R-project.org/package=ResourceSelection},
##   }

citation()

## 
## To cite R in publications use:
## 
##   R Core Team (2023). R: A language and environment for statistical
##   computing. R Foundation for Statistical Computing, Vienna, Austria.
##   URL https://www.R-project.org/.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {R: A Language and Environment for Statistical Computing},
##     author = {{R Core Team}},
##     organization = {R Foundation for Statistical Computing},
##     address = {Vienna, Austria},
##     year = {2023},
##     url = {https://www.R-project.org/},
##   }
## 
## We have invested a lot of time and effort in creating R, please cite it
## when using it for data analysis. See also 'citation("pkgname")' for
## citing R packages.

citation("reticulate")

## 
## To cite package 'reticulate' in publications use:
## 
##   Ushey K, Allaire J, Tang Y (2023). _reticulate: Interface to
##   'Python'_. R package version 1.34.0,
##   <https://CRAN.R-project.org/package=reticulate>.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {reticulate: Interface to 'Python'},
##     author = {Kevin Ushey and JJ Allaire and Yuan Tang},
##     year = {2023},
##     note = {R package version 1.34.0},
##     url = {https://CRAN.R-project.org/package=reticulate},
##   }

citation("rgdal")

## 
## To cite package 'rgdal' in publications use:
## 
##   Bivand R, Keitt T, Rowlingson B (2023). _rgdal: Bindings for the
##   'Geospatial' Data Abstraction Library_. R package version 1.6-7,
##   <https://CRAN.R-project.org/package=rgdal>.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {rgdal: Bindings for the 'Geospatial' Data Abstraction Library},
##     author = {Roger Bivand and Tim Keitt and Barry Rowlingson},
##     year = {2023},
##     note = {R package version 1.6-7},
##     url = {https://CRAN.R-project.org/package=rgdal},
##   }

citation("rgeos")

## 
## To cite package 'rgeos' in publications use:
## 
##   Bivand R, Rundel C (2023). _rgeos: Interface to Geometry Engine -
##   Open Source ('GEOS')_. R package version 0.6-4,
##   <https://CRAN.R-project.org/package=rgeos>.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {rgeos: Interface to Geometry Engine - Open Source ('GEOS')},
##     author = {Roger Bivand and Colin Rundel},
##     year = {2023},
##     note = {R package version 0.6-4},
##     url = {https://CRAN.R-project.org/package=rgeos},
##   }

citation("shiny")

## 
## To cite package 'shiny' in publications use:
## 
##   Chang W, Cheng J, Allaire J, Sievert C, Schloerke B, Xie Y, Allen J,
##   McPherson J, Dipert A, Borges B (2023). _shiny: Web Application
##   Framework for R_. R package version 1.7.5.1,
##   <https://CRAN.R-project.org/package=shiny>.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {shiny: Web Application Framework for R},
##     author = {Winston Chang and Joe Cheng and JJ Allaire and Carson Sievert and Barret Schloerke and Yihui Xie and Jeff Allen and Jonathan McPherson and Alan Dipert and Barbara Borges},
##     year = {2023},
##     note = {R package version 1.7.5.1},
##     url = {https://CRAN.R-project.org/package=shiny},
##   }

citation("sf")

## 
## To cite package sf in publications, please use:
## 
##   Pebesma, E., & Bivand, R. (2023). Spatial Data Science: With
##   Applications in R. Chapman and Hall/CRC.
##   https://doi.org/10.1201/9780429459016
## 
##   Pebesma, E., 2018. Simple Features for R: Standardized Support for
##   Spatial Vector Data. The R Journal 10 (1), 439-446,
##   https://doi.org/10.32614/RJ-2018-009
## 
## To see these entries in BibTeX format, use 'print(<citation>,
## bibtex=TRUE)', 'toBibtex(.)', or set
## 'options(citation.bibtex.max=999)'.

citation("sjPlot")

## 
## To cite package 'sjPlot' in publications use:
## 
##   Lüdecke D (2023). _sjPlot: Data Visualization for Statistics in
##   Social Science_. R package version 2.8.15,
##   <https://CRAN.R-project.org/package=sjPlot>.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {sjPlot: Data Visualization for Statistics in Social Science},
##     author = {Daniel Lüdecke},
##     year = {2023},
##     note = {R package version 2.8.15},
##     url = {https://CRAN.R-project.org/package=sjPlot},
##   }

citation("sp")

## 
## To cite package sp in publications use:
## 
##   Pebesma E, Bivand R (2005). "Classes and methods for spatial data in
##   R." _R News_, *5*(2), 9-13. <https://CRAN.R-project.org/doc/Rnews/>.
## 
##   Bivand R, Pebesma E, Gomez-Rubio V (2013). _Applied spatial data
##   analysis with R, Second edition_. Springer, NY.
##   <https://asdar-book.org/>.
## 
## To see these entries in BibTeX format, use 'print(<citation>,
## bibtex=TRUE)', 'toBibtex(.)', or set
## 'options(citation.bibtex.max=999)'.

citation("tidyverse")

## 
## To cite package 'tidyverse' in publications use:
## 
##   Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R,
##   Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller
##   E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V,
##   Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H (2019). "Welcome to
##   the tidyverse." _Journal of Open Source Software_, *4*(43), 1686.
##   doi:10.21105/joss.01686 <https://doi.org/10.21105/joss.01686>.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Article{,
##     title = {Welcome to the {tidyverse}},
##     author = {Hadley Wickham and Mara Averick and Jennifer Bryan and Winston Chang and Lucy D'Agostino McGowan and Romain François and Garrett Grolemund and Alex Hayes and Lionel Henry and Jim Hester and Max Kuhn and Thomas Lin Pedersen and Evan Miller and Stephan Milton Bache and Kirill Müller and Jeroen Ooms and David Robinson and Dana Paige Seidel and Vitalie Spinu and Kohske Takahashi and Davis Vaughan and Claus Wilke and Kara Woo and Hiroaki Yutani},
##     year = {2019},
##     journal = {Journal of Open Source Software},
##     volume = {4},
##     number = {43},
##     pages = {1686},
##     doi = {10.21105/joss.01686},
##   }

Breach Analysis

Sith

3/1/2023

Load

Load R

Load Python

Load Functions

Load Geography

Load Flat Files

Merge Files

GIS

Drop Variables

Dichotomize DV

Engineer Variables

Reorder

Describe

Correlate

Pair

Split

Discretize

Transform

Reorder

Re-draw

Scale

Rebalance Training Set

Describe

Interim Write

Interim Read

M1:LogReg

VIF

Outliers

Outlier Effect

Linearity LogOdds

Confusion Report

Other Models

Prediction Functions

M2:Perceptron

M3:SVM

Others

Importances

Coefficients

Citations