https://rpubs.com/fumi/581597

参考資料 https://to-kei.net/r-beginner/r-5-logistic/

ロジスティック回帰と変数選択 https://oku.edu.mie-u.ac.jp/~okumura/stat/140921.html

# ロジスティック回帰による病気分析 

logi_fun <- function(data,file,disease){
ans <- glm(data$Y~.,data=data,family=binomial)
s.ans <- summary(ans)
coe <- s.ans$coefficient
RR <- exp(coe[,1])
RRlow <- exp(coe[,1]-1.96*coe[,2])
RRup <- exp(coe[,1]+1.96*coe[,2])
N <- nrow(data)
aic <- AIC(ans)
result <- cbind(coe,RR,RRlow,RRup,aic,N)
colnames(result)[6:7] <- c("RR95%CI.low","RR95%CI.up")
if(nrow(result)>=2){
result[2:nrow(result),8:9] <- ""
}
write.table(disease,file,append=T,quote=F,sep=",",row.names=F,col.names=F)
write.table(matrix(c("",colnames(result)),nrow=1),file,append=T,quote=F,sep=",",row.names=F,col.names=F)
write.table(result,file,append=T,quote=F,sep=",",row.names=T,col.names=F)
write.table("",file,append=T,quote=F,sep=",",row.names=F,col.names=F)
}
df <- read.csv("sample-data.csv",header=T,row.names=1)
dat <- df[,c(5,1,2,6)]#5列が目的変数。1,2,6列目が説明変数。
colnames(dat)[1] <- "Y"

結果は、C:\721540~-data.csvにある。

#その2 http://www.ner.takushoku-u.ac.jp/masano/class_material/waseda/keiryo/15_logit.html

library("dplyr")
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library("ggplot2")
library("reshape")
## Warning: package 'reshape' was built under R version 3.6.2
## 
## Attaching package: 'reshape'
## The following object is masked from 'package:dplyr':
## 
##     rename
library("reshape2")
## 
## Attaching package: 'reshape2'
## The following objects are masked from 'package:reshape':
## 
##     colsplit, melt, recast
library("stargazer")
## 
## Please cite as:
##  Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables.
##  R package version 5.2.2. https://CRAN.R-project.org/package=stargazer
## ロジスティック関数を定義する
## 引数:x = 数値ベクトル
## 返り値:y

invlogit <- function(x){ 
    return( y <- 1 / (1 + exp(-x)) )
}

- ロジスティック曲線を書く http://www.ner.takushoku-u.ac.jp/masano/class_material/waseda/keiryo/15_logit.html

x <- seq(-5, 5, length = 100)
plot(x, invlogit(x), type = "l", lwd = 2, col = "darkblue",
     ylab = "invlogit(x)", main = "Logistic Function")
abline(h = c(0, 1), col = "gray")

- ロジスティック関数による当落判定の実行 - data:logit.csv→C:\721540~

# データフレーム logit に含まれている変数を表示する。
logit <- read.csv("logit.csv", na = ".") 

names(logit)
## [1] "wlsmd"    "previous" "expm"
# データフレーム logit を表示する。
logit
##    wlsmd previous expm
## 1      1        0 10.0
## 2      1        1 10.0
## 3      1        3  8.9
## 4      1        5  7.7
## 5      1        7  5.4
## 6      1        4  3.0
## 7      1        5  2.0
## 8      1        4  7.0
## 9      0        0  2.2
## 10     0        0  4.3
## 11     0        5  4.0
## 12     0        2  3.0
## 13     0        4  4.0
## 14     0        1  2.2
## 15     0        3  4.0

-2.2 帰無仮説と対立仮説の設定 -ここでの帰無仮説は次のとおり。

・H01:当選回数は小選挙区での当選確率に影響を与えない ・H02:選挙費用は小選挙区での当選確率に影響を与えない

-対立仮説は次のとおり。

・Ha1:当選回数が多いほど、小選挙区での当選確率が高くなる ・Ha2:選挙費用が多いほど、小選挙区での当選確率が高くなる

-2.3 説明変数と応答変数の散布図 ・ダウンロードしたデータ logit を使い、候補者の小選挙区における当落 (wlsmd) を縦軸に、それまでの当選回数 (previous) と選挙費用 (expm) を横軸にした散布図を表示する。

-以下の二つのグラフを見ると、当選回数と当落は関係が低く、選挙費用と当落は大いに関係がある。

# 当選回数 (previous) と当落 (wlsmd)の散布図
ggplot(logit, aes(previous, wlsmd)) + geom_point() +
  stat_smooth(method = lm, se = FALSE) +
  theme_bw() +
  geom_jitter(width = 0.05, height = 0.05) #jitterで重複したデータを散らす

# 選挙費用 (expm) と当落 (wlsmd)の散布図
ggplot(logit, aes(expm, wlsmd)) + geom_point() +
  stat_smooth(method = lm, se = FALSE)+
    theme_bw() +
  geom_jitter(width = 0.05, height = 0.05)#jitterで重複したデータを散らす

-ロジスティック回帰式の推定値を求める

・ロジスティック回帰分析を行うには glm() 関数を利用する。 ・ glm() 関数はロジスティク回帰以外にも様々な一般化線形モデルで利用される。 ・ロジスティック回帰に使うには、family を以下のように指定する。

model_1 <- glm(wlsmd ~ previous + expm, data = logit,family = binomial(link = "logit"))
summary(model_1)
## 
## Call:
## glm(formula = wlsmd ~ previous + expm, family = binomial(link = "logit"), 
##     data = logit)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.5741  -0.3781   0.2013   0.3943   1.4948  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept)  -6.3811     3.5147  -1.816   0.0694 .
## previous      0.8085     0.5851   1.382   0.1670  
## expm          0.8088     0.4000   2.022   0.0431 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 20.728  on 14  degrees of freedom
## Residual deviance: 10.384  on 12  degrees of freedom
## AIC: 16.384
## 
## Number of Fisher Scoring iterations: 6

- 当選確率の予測値 pを求める。 -・Rでは predict() 関数を使うと、予測値を計算してくれる。

predict(model_1, type = "response")
##           1           2           3           4           5           6 
## 0.846455457 0.925228189 0.962419494 0.979953464 0.974574115 0.327276800 
##           7           8           9          10          11          12 
## 0.327213998 0.925168964 0.009934946 0.051995861 0.710296138 0.088057029 
##          13          14          15 
## 0.522058216 0.022027721 0.327339608

- 1から8までが当選した実績があり、9から15までが落選した実績がある。 - これからすると11を除き、0.3以上と以下に分かれ、概ね当落実績どおりの結果となっている。

stargazer(model_1, style = "ajps", # AJPS仕様
          dep.var.labels = "Win or Lose", # 応答変数の名前
          title = "Logistic Regression Analysis on Winning a Seat", # タイトル
          type = "html")
Logistic Regression Analysis on Winning a Seat
Win or Lose
previous 0.809
(0.585)
expm 0.809**
(0.400)
Constant -6.381*
(3.515)
N 15
Log Likelihood -5.192
AIC 16.384
p < .01; p < .05; p < .1

-・previousは有意ではないが、expm は 5%で統計的に有意であることがわかる。

-・expmの係数 (0.809)の意味: 候補者の選挙費用が 100万円増える → 当選する log odds が 0.809 増える ・これだとわかりにくい

-ひとつの解決策: ・log odds → predict()関数を使って当選確率 に変換 ・二つの説明変数(previous と expm) の値を変えて当選確率を予測す

厚木市下水道管路で試行する

library(car)
## Loading required package: carData
## 
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
## 
##     recode
library(caret)
## Warning: package 'caret' was built under R version 3.6.2
## Loading required package: lattice
library(cluster)
library(dummies)
## dummies-1.5.6 provided by Decision Patterns
library(data.table)
## Warning: package 'data.table' was built under R version 3.6.2
## 
## Attaching package: 'data.table'
## The following objects are masked from 'package:reshape2':
## 
##     dcast, melt
## The following object is masked from 'package:reshape':
## 
##     melt
## The following objects are masked from 'package:dplyr':
## 
##     between, first, last
library(dplyr)
library(e1071)
## Warning: package 'e1071' was built under R version 3.6.2
library(epitools)
library(effects)
## Warning: package 'effects' was built under R version 3.6.3
## Registered S3 methods overwritten by 'lme4':
##   method                          from
##   cooks.distance.influence.merMod car 
##   influence.merMod                car 
##   dfbeta.influence.merMod         car 
##   dfbetas.influence.merMod        car
## Use the command
##     lattice::trellis.par.set(effectsTheme())
##   to customize lattice options for effects plots.
## See ?effectsTheme for details.
library(ggplot2)
library(ggthemes)
## Warning: package 'ggthemes' was built under R version 3.6.2
library(randomForest)
## randomForest 4.6-14
## Type rfNews() to see new features/changes/bug fixes.
## 
## Attaching package: 'randomForest'
## The following object is masked from 'package:dplyr':
## 
##     combine
## The following object is masked from 'package:ggplot2':
## 
##     margin
library(ranger)
## Warning: package 'ranger' was built under R version 3.6.2
## 
## Attaching package: 'ranger'
## The following object is masked from 'package:randomForest':
## 
##     importance
library(rgl)
library(rattle)
## Warning: package 'rattle' was built under R version 3.6.2
## Rattle: A free graphical interface for data science with R.
## バージョン 5.3.0 Copyright (c) 2006-2018 Togaware Pty Ltd.
## 'rattle()' と入力して、データを多角的に分析します。
## 
## Attaching package: 'rattle'
## The following object is masked from 'package:ranger':
## 
##     importance
## The following object is masked from 'package:randomForest':
## 
##     importance
library(readr)
## Warning: package 'readr' was built under R version 3.6.2
library(rpart.plot)
## Warning: package 'rpart.plot' was built under R version 3.6.2
## Loading required package: rpart
library(rpart)
library(readr)
library(reshape)
library(rsconnect)
## Warning: package 'rsconnect' was built under R version 3.6.2
library(reshape2)
library(tidyr)
## 
## Attaching package: 'tidyr'
## The following object is masked from 'package:reshape2':
## 
##     smiths
## The following objects are masked from 'package:reshape':
## 
##     expand, smiths
library(xtable)
library(nnet)
## Warning: package 'nnet' was built under R version 3.6.2
library(stargazer)

Rによる条件付きロジスティック回帰分析

https://to-kei.net/r-beginner/r-5-logistic/

ロジスティック関数を R で定義してみる。

## ロジスティック関数を定義する
## 引数:x = 数値ベクトル
## 返り値:y

invlogit <- function(x){ 
    return( y <- 1 / (1 + exp(-x)) )
}

確認のため、(−5<x<5) の範囲で logistic(x)=logit(x−1)) の値を図示する。

x <- seq(-5, 5, length = 100)
plot(x, invlogit(x), type = "l", lwd = 2, col = "darkblue",
     ylab = "invlogit(x)", main = "Logistic Function")
abline(h = c(0, 1), col = "gray")

-Windows では  を指定しては駄目.例えば,setwd(“c:”) とするとエラーが出る.setwd(“c:/usr”) とするか setwd(“c:\usr”) こと.

#setwd("C:/Users/721540/Documents/practice/ロジスティック回帰/~/practice ") 
#gesui = read_csv("gesuidou.csv")

#getwd()

下水道データ読み込み# 基本統計量表示 gesui # 教科書ではlogit

setwd("C:/Users/721540/Documents/practice/ロジスティック回帰/~/practice ") 
gesui = read_csv("gesuidou.csv")
## Parsed with column specification:
## cols(
##   OBJECTID = col_double(),
##   slope = col_double(),
##   long = col_double(),
##   uedokaburi = col_double(),
##   sitadokaburi = col_double(),
##   masuhonsuu = col_double(),
##   nendo = col_double(),
##   kei = col_double(),
##   kubun = col_double(),
##   did = col_double(),
##   kouhou = col_double(),
##   ekijyouka = col_double(),
##   kansyu = col_double(),
##   kinkyuudo = col_double(),
##   taisyo = col_double()
## )
gesui<- data.frame(gesui) # 教科書ではlogit
gesui <- gesui[-1] #OBJECTID列をデータから削除
stargazer(as.data.frame(gesui),type = "html")
Statistic N Mean St. Dev. Min Pctl(25) Pctl(75) Max
slope 478 3.509 2.128 0 2 4.7 10
long 478 3.646 1.844 1.004 2.152 4.717 9.940
uedokaburi 478 4.138 2.096 1.020 2.675 4.900 13.863
sitadokaburi 478 2.983 1.686 0.001 1.845 3.821 11.196
masuhonsuu 478 1.554 2.090 0 0 2 14
nendo 478 1,985.094 6.015 1,976 1,978 1,991 1,992
kei 478 0.578 0.353 0.200 0.250 0.800 1.650
kubun 478 0.828 0.377 0 1 1 1
did 478 0.826 0.379 0 1 1 1
kouhou 478 0.502 0.501 0 0 1 1
ekijyouka 478 0.303 0.460 0 0 1 1
kansyu 478 0.705 0.457 0 0 1 1
kinkyuudo 478 1.510 1.462 0 0 3 3
taisyo 478 0.059 0.235 0 0 0 1
ggplot(gesui,aes(slope))+
geom_histogram(aes(y = ..density..),
bins = 10,
colour = "gray",
fill = "blue")+
  theme_classic()

ggplot(gesui,aes(long))+
geom_histogram(aes(y = ..density..),
bins = 10,
colour = "gray",
fill = "blue")+
  theme_classic()

ggplot(gesui,aes(uedokaburi))+
geom_histogram(aes(y = ..density..),
bins = 10,
colour = "gray",
fill = "blue")+
  theme_classic()

ggplot(gesui,aes(sitadokaburi))+
geom_histogram(aes(y = ..density..),
bins = 10,
colour = "gray",
fill = "blue")+
  theme_classic()

ggplot(gesui,aes(masuhonsuu ))+
geom_histogram(aes(y = ..density..),
bins = 10,
colour = "gray",
fill = "blue")+
  theme_classic()

ggplot(gesui,aes(nendo))+
geom_histogram(aes(y = ..density..),
bins = 10,
colour = "gray",
fill = "blue")+
  theme_classic()

ggplot(gesui,aes(kei))+
geom_histogram(aes(y = ..density..),
bins = 10,
colour = "gray",
fill = "blue")+
  theme_classic()

ggplot(gesui,aes(ekijyouka))+
geom_histogram(aes(y = ..density..),
bins = 10,
colour = "gray",
fill = "blue")+
  theme_classic()

ggplot(gesui,aes(kinkyuudo))+
geom_histogram(aes(y = ..density..),
bins = 10,
colour = "gray",
fill = "blue")+
  theme_classic()

head(gesui)
##   slope  long uedokaburi sitadokaburi masuhonsuu nendo  kei kubun did kouhou
## 1  0.00 9.940   3.758000     3.258000          0  1983 1.10     1   1      0
## 2  5.00 1.232   2.331392     2.255000          0  1982 1.10     1   1      1
## 3  5.00 6.500   4.654844     4.513411          0  1981 0.60     1   1      0
## 4  2.70 5.606   4.584913     3.037999          0  1976 0.60     1   1      0
## 5  4.71 5.020   1.414000     1.403298          0  1992 0.25     0   1      1
## 6  1.10 1.317   1.544714     1.153690          3  1992 0.25     0   1      1
##   ekijyouka kansyu kinkyuudo taisyo
## 1         1      1         0      0
## 2         1      1         3      0
## 3         1      1         3      0
## 4         1      1         3      0
## 5         1      0         0      0
## 6         1      0         0      0
plot(gesui[,2],gesui[,3])

#plot(gesui[,2],gesui[,3],type="l",xlab="Length of slope", ylab="Length of long",cex=2,col="red")
#参照頁はcol=”red”でエラー
plot(gesui[,2],gesui[,3],type="p",xlab="Length of slope", ylab="Length of long", main="延長と傾斜の散布図",cex=0.8,col="red")

pairs(gesui[2:7], pch = 21, bg = c("red", "green3", "blue", "orenge","black", "yellow", "skyblue")[unclass(gesui$kinkyuudo)])

pairs(gesui[2:14], pch = 21)

upper <- function(x, y, ...){ # 右上の散布図を設定
  oldpar <- par(usr = c(0, 1, 0, 1)) # 3行下でテキストの位置を(0.5, 0.5)にするための調整
  v <- abs(cor(x, y))
  sz <- 1 + v * 2 # テキストのサイズを直線性の強さで調整
  text(0.5, 0.5, sprintf("%.3f", v), cex = sz)
  par(oldpar)
}

lower <- function(x, y, ...){ # 左下の散布図を設定
  points(x, y, pch = 21, bg = c("red", "green3", "blue")[unclass(iris$Species)])
  if( abs(cor(x, y)) > 0.8 ){ # 直線性の強いデータの組だけ回帰式を作成
    lmobj <- lm(y~x)
    abline(lmobj)
  }
}

pairs(gesui[,2:14], upper.panel=upper, lower.panel=lower)

各変数の正規分布性の確認(qqnorm()関数)s悦明変数は正規分布しているか?

https://detail.chiebukuro.yahoo.co.jp/qa/question_detail/q1283089632 - https://sites.google.com/site/webtextofr/graph - 正規分布ならばQQプロットは直線になるはずです.

まずSLOPE

qqnorm(gesui$slope)

対数正規性を調べる http://cse.naro.affrc.go.jp/takezawa/r-tips/r/63.html https://tutorialmore.com/questions-1478863.htm

 x <- gesui$slope
 xy <- qqplot(qt(ppoints(50), df=5), x)
 abline(lsfit(xy$x, xy$y), lty=2)

 plot(qlnorm(ppoints(x)), sort(x))

次に桝本数:カテゴリー変数であり正規分布にはならない

qqnorm(gesui$masu)

qqnorm(gesui$uedokaburi)

qqnorm(gesui$sitadokaburi)

qqnorm(gesui$masu)

qqnorm(gesui$long)

シャピロ・ウィルク検定

-敢えて検定するまでもなく今までの分布形状を見ると、左側に偏った形状をしている。 https://tkstock.site/2017/04/12/2017-04-12-000000_1/ で傾斜に対してシャピロ・ウィルク検定を行うと

 shapiro.test(gesui$slope)
## 
##  Shapiro-Wilk normality test
## 
## data:  gesui$slope
## W = 0.89138, p-value < 2.2e-16

同様にして管路延長も正規分布でない

 shapiro.test(gesui$long)
## 
##  Shapiro-Wilk normality test
## 
## data:  gesui$long
## W = 0.92612, p-value = 1.365e-14

-data: gesui$long - W = 0.92612, p-value = 1.365e-14

同様に桝も正規分でない

 shapiro.test(gesui$masu)
## 
##  Shapiro-Wilk normality test
## 
## data:  gesui$masu
## W = 0.73461, p-value < 2.2e-16

Shapiro-Wilk normality test

data: gesui$masu W = 0.73461, p-value < 2.2e-16

# 説明変数slope等は正規分布する必要はない。必要なのは誤差分散が正規分布しているかいなかである

-https://toukeier.hatenablog.com/entry/2019/09/08/224346

slopeの確立密度関数

https://oku.edu.mie-u.ac.jp/~okumura/stat/first.html x=slope

 x <- gesui$slope
curve(dnorm(x), 6, -6)

slope等の非正規なデータの最尤法フィッティング

下準備:http://tips-r.blogspot.com/2014/05/repseq.html

rep(1, 10) #「○を○個だけrepeat」って感じ
##  [1] 1 1 1 1 1 1 1 1 1 1

slope等の非正規なデータの最尤法フィッティング

-https://oku.edu.mie-u.ac.jp/~okumura/stat/141115.html

slopeの平均 -平均3.509,標準偏差2.128の正規分布の密度関数 と指数関数 とを50:10で混ぜ合わせた簡単な関数を考えます:

何をやりたいのか途中でわからなくなったので中止 x=478 y=gesui$slope x= 1:20 y = c(11,4,13,10,4,8,6,16,7,12,10,13,6,5,1,4,2,0,0,1)

z = rep(x, y) r = glm(y ~ dnorm(x,10,3) + exp(-x/10) - 1, family=poisson(link=“identity”)) summary(r) curve(dnorm(x,10,3))

目的変数・説明変数共に正規性は関係ないこととしてロジスティック回帰式の推定値を求める

-http://www.ner.takushoku-u.ac.jp/masano/class_material/waseda/keiryo/15_logit.html

#0305より以下のファクター化について検証する。
gesui$kansyu <- as.factor(gesui$kansyu)
gesui$ktaisyo <- as.factor(gesui$taisyo)
gesui$kubun <- as.factor(gesui$kubun)
gesui$did <- as.factor(gesui$did)
gesui$ekijyouka <- as.factor(gesui$ekijyouka)
gesui$kinkyuudo <- as.factor(gesui$kinkyuudo)
sapply(gesui, class)
##        slope         long   uedokaburi sitadokaburi   masuhonsuu        nendo 
##    "numeric"    "numeric"    "numeric"    "numeric"    "numeric"    "numeric" 
##          kei        kubun          did       kouhou    ekijyouka       kansyu 
##    "numeric"     "factor"     "factor"    "numeric"     "factor"     "factor" 
##    kinkyuudo       taisyo      ktaisyo 
##     "factor"    "numeric"     "factor"
dim(gesui)
## [1] 478  15
summary(gesui)
##      slope            long         uedokaburi      sitadokaburi      
##  Min.   :0.000   Min.   :1.004   Min.   : 1.020   Min.   : 0.000685  
##  1st Qu.:2.000   1st Qu.:2.152   1st Qu.: 2.675   1st Qu.: 1.844750  
##  Median :2.800   Median :3.183   Median : 3.722   Median : 2.693033  
##  Mean   :3.509   Mean   :3.646   Mean   : 4.138   Mean   : 2.983303  
##  3rd Qu.:4.700   3rd Qu.:4.717   3rd Qu.: 4.900   3rd Qu.: 3.820872  
##  Max.   :9.710   Max.   :9.940   Max.   :13.863   Max.   :11.196442  
##    masuhonsuu         nendo           kei         kubun   did    
##  Min.   : 0.000   Min.   :1976   Min.   :0.2000   0: 82   0: 83  
##  1st Qu.: 0.000   1st Qu.:1978   1st Qu.:0.2500   1:396   1:395  
##  Median : 1.000   Median :1986   Median :0.6000                  
##  Mean   : 1.554   Mean   :1985   Mean   :0.5779                  
##  3rd Qu.: 2.000   3rd Qu.:1991   3rd Qu.:0.8000                  
##  Max.   :14.000   Max.   :1992   Max.   :1.6500                  
##      kouhou       ekijyouka kansyu  kinkyuudo     taisyo        ktaisyo
##  Min.   :0.0000   0:333     0:141   0:228     Min.   :0.00000   0:450  
##  1st Qu.:0.0000   1:145     1:337   2: 28     1st Qu.:0.00000   1: 28  
##  Median :1.0000                     3:222     Median :0.00000          
##  Mean   :0.5021                               Mean   :0.05858          
##  3rd Qu.:1.0000                               3rd Qu.:0.00000          
##  Max.   :1.0000                               Max.   :1.00000
names(gesui)
##  [1] "slope"        "long"         "uedokaburi"   "sitadokaburi" "masuhonsuu"  
##  [6] "nendo"        "kei"          "kubun"        "did"          "kouhou"      
## [11] "ekijyouka"    "kansyu"       "kinkyuudo"    "taisyo"       "ktaisyo"
model_1 <- glm(taisyo ~ slope + long + uedokaburi + sitadokaburi + masuhonsuu + nendo + kei + kubun + did + kouhou + ekijyouka + kansyu, data = gesui,family = binomial(link = "logit"))
summary(model_1)
## 
## Call:
## glm(formula = taisyo ~ slope + long + uedokaburi + sitadokaburi + 
##     masuhonsuu + nendo + kei + kubun + did + kouhou + ekijyouka + 
##     kansyu, family = binomial(link = "logit"), data = gesui)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.0921  -0.3082  -0.1684  -0.1077   3.9648  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)   
## (Intercept)  -487.54742  149.09610  -3.270  0.00108 **
## slope          -0.21202    0.14667  -1.446  0.14830   
## long           -0.05377    0.15201  -0.354  0.72357   
## uedokaburi      0.28273    0.22129   1.278  0.20137   
## sitadokaburi   -0.47435    0.32603  -1.455  0.14569   
## masuhonsuu      0.12920    0.09966   1.296  0.19484   
## nendo           0.24178    0.07482   3.231  0.00123 **
## kei             2.37393    1.18788   1.998  0.04567 * 
## kubun1         -1.65915    1.02478  -1.619  0.10544   
## did1            2.79188    1.37822   2.026  0.04279 * 
## kouhou          0.25207    0.83722   0.301  0.76335   
## ekijyouka1     -0.51234    0.70686  -0.725  0.46857   
## kansyu1         3.32820    1.12433   2.960  0.00307 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 213.22  on 477  degrees of freedom
## Residual deviance: 162.56  on 465  degrees of freedom
## AIC: 188.56
## 
## Number of Fisher Scoring iterations: 7

-上記表の見方は以下のサイトに掲載されている。 https://oku.edu.mie-u.ac.jp/~okumura/stat/logistic.html https://oku.edu.mie-u.ac.jp/~okumura/stat/140921.html 係数は非常に有意なもの(***)からまったく有意でないもの(無印)まで多様である。 -5%で統計的に優位なパラメータは、nendo,kansyu,5% kei とdidは1%で有意となった。即ち影響があった。

あまり関係なさそうな変数を手作業で除いていってもよいが,自動化するには,step() という関数を使う: これは,デフォルトではフルモデルから1個ずつ変数を外すことを試み,外すと一番AICが良く(小さく)なるものを実際に外す。外すことによってAICが改善しなくなるまで続ける。

r2 = step(model_1)
## Start:  AIC=188.56
## taisyo ~ slope + long + uedokaburi + sitadokaburi + masuhonsuu + 
##     nendo + kei + kubun + did + kouhou + ekijyouka + kansyu
## 
##                Df Deviance    AIC
## - kouhou        1   162.65 186.65
## - long          1   162.69 186.69
## - ekijyouka     1   163.11 187.11
## - uedokaburi    1   164.13 188.13
## - masuhonsuu    1   164.26 188.26
## <none>              162.56 188.56
## - sitadokaburi  1   164.75 188.75
## - slope         1   164.96 188.96
## - kubun         1   165.19 189.19
## - kei           1   167.01 191.01
## - did           1   168.48 192.48
## - nendo         1   175.64 199.64
## - kansyu        1   180.12 204.12
## 
## Step:  AIC=186.65
## taisyo ~ slope + long + uedokaburi + sitadokaburi + masuhonsuu + 
##     nendo + kei + kubun + did + ekijyouka + kansyu
## 
##                Df Deviance    AIC
## - long          1   162.82 184.82
## - ekijyouka     1   163.19 185.19
## - uedokaburi    1   164.13 186.13
## - masuhonsuu    1   164.60 186.60
## <none>              162.65 186.65
## - slope         1   165.12 187.12
## - kubun         1   165.24 187.24
## - sitadokaburi  1   165.66 187.66
## - kei           1   167.24 189.24
## - did           1   168.50 190.50
## - nendo         1   176.08 198.08
## - kansyu        1   180.30 202.30
## 
## Step:  AIC=184.82
## taisyo ~ slope + uedokaburi + sitadokaburi + masuhonsuu + nendo + 
##     kei + kubun + did + ekijyouka + kansyu
## 
##                Df Deviance    AIC
## - ekijyouka     1   163.42 183.42
## - uedokaburi    1   164.23 184.23
## - masuhonsuu    1   164.60 184.60
## <none>              162.82 184.82
## - slope         1   165.36 185.36
## - kubun         1   165.46 185.46
## - sitadokaburi  1   165.91 185.91
## - kei           1   167.24 187.24
## - did           1   169.06 189.06
## - nendo         1   177.35 197.35
## - kansyu        1   181.81 201.81
## 
## Step:  AIC=183.42
## taisyo ~ slope + uedokaburi + sitadokaburi + masuhonsuu + nendo + 
##     kei + kubun + did + kansyu
## 
##                Df Deviance    AIC
## - uedokaburi    1   164.93 182.93
## - masuhonsuu    1   165.24 183.24
## <none>              163.42 183.42
## - kubun         1   165.69 183.69
## - slope         1   166.43 184.43
## - sitadokaburi  1   166.55 184.55
## - kei           1   167.35 185.35
## - did           1   169.20 187.20
## - nendo         1   177.85 195.85
## - kansyu        1   183.12 201.12
## 
## Step:  AIC=182.93
## taisyo ~ slope + sitadokaburi + masuhonsuu + nendo + kei + kubun + 
##     did + kansyu
## 
##                Df Deviance    AIC
## - masuhonsuu    1   166.26 182.26
## - sitadokaburi  1   166.57 182.57
## <none>              164.93 182.93
## - kubun         1   166.94 182.94
## - slope         1   167.83 183.83
## - kei           1   168.87 184.87
## - did           1   169.62 185.62
## - nendo         1   180.33 196.33
## - kansyu        1   184.81 200.81
## 
## Step:  AIC=182.26
## taisyo ~ slope + sitadokaburi + nendo + kei + kubun + did + kansyu
## 
##                Df Deviance    AIC
## <none>              166.26 182.26
## - sitadokaburi  1   168.27 182.27
## - kubun         1   168.50 182.50
## - slope         1   170.24 184.24
## - did           1   172.85 186.85
## - kei           1   173.89 187.89
## - nendo         1   184.04 198.04
## - kansyu        1   187.33 201.33
r3 <- glm(taisyo ~ slope + sitadokaburi + nendo + kei + kubun + did + kansyu, data = gesui,family = binomial(link = "logit"))
summary(r3)
## 
## Call:
## glm(formula = taisyo ~ slope + sitadokaburi + nendo + kei + kubun + 
##     did + kansyu, family = binomial(link = "logit"), data = gesui)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.9522  -0.3094  -0.1728  -0.1035   4.0352  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -488.36244  129.87178  -3.760 0.000170 ***
## slope          -0.25414    0.14044  -1.810 0.070349 .  
## sitadokaburi   -0.33073    0.24916  -1.327 0.184398    
## nendo           0.24266    0.06517   3.724 0.000196 ***
## kei             2.06469    0.76643   2.694 0.007062 ** 
## kubun1         -1.38829    0.90452  -1.535 0.124825    
## did1            2.47929    1.18220   2.097 0.035978 *  
## kansyu1         3.57821    1.14699   3.120 0.001811 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 213.22  on 477  degrees of freedom
## Residual deviance: 166.26  on 470  degrees of freedom
## AIC: 182.26
## 
## Number of Fisher Scoring iterations: 8
r4 = step(r3)
## Start:  AIC=182.26
## taisyo ~ slope + sitadokaburi + nendo + kei + kubun + did + kansyu
## 
##                Df Deviance    AIC
## <none>              166.26 182.26
## - sitadokaburi  1   168.27 182.27
## - kubun         1   168.50 182.50
## - slope         1   170.24 184.24
## - did           1   172.85 186.85
## - kei           1   173.89 187.89
## - nendo         1   184.04 198.04
## - kansyu        1   187.33 201.33

上記の結果、slope + sitadokaburi + nendo + kei + kubun + did+kansyu 以上に変数を減らしても一番AICが良く(小さく)なるものなく予測精度は向上しないことが判ったためこのモデルで予測行う

このモデルで予測を行うと

plot(fitted(r3), gesui$taisyo, ylim=c(-0.2,1.2), yaxt="n")
axis(2, c(0,1))

predict(r3, type = "response")
##            1            2            3            4            5            6 
## 0.2135601317 0.0768991781 0.0109093912 0.0094899540 0.0253781086 0.0661022586 
##            7            8            9           10           11           12 
## 0.0406679132 0.0284045807 0.0084288000 0.0102082823 0.0130696261 0.0267057447 
##           13           14           15           16           17           18 
## 0.0271286158 0.0438520479 0.1621464059 0.0156139597 0.0144815476 0.0521926782 
##           19           20           21           22           23           24 
## 0.0073356539 0.0137476719 0.0001398624 0.0002848636 0.0070851207 0.0063870725 
##           25           26           27           28           29           30 
## 0.0067153207 0.0083000062 0.0100323021 0.0182066483 0.0144483951 0.0758285044 
##           31           32           33           34           35           36 
## 0.0209988394 0.0601658809 0.0057180394 0.0340177689 0.0570973499 0.0589086894 
##           37           38           39           40           41           42 
## 0.0610189616 0.0314904131 0.0917820932 0.0587999971 0.0083843067 0.0693714489 
##           43           44           45           46           47           48 
## 0.0087760370 0.0072928159 0.0062727843 0.0048545571 0.0008092696 0.0003584811 
##           49           50           51           52           53           54 
## 0.0007176583 0.0023522503 0.0021220677 0.0021625704 0.0065923474 0.0129690479 
##           55           56           57           58           59           60 
## 0.0274562370 0.0089498104 0.0105839454 0.2495923119 0.2389666523 0.2367276370 
##           61           62           63           64           65           66 
## 0.2411897377 0.2509096390 0.2233843162 0.2542983720 0.2419867278 0.2279093689 
##           67           68           69           70           71           72 
## 0.2207753309 0.2184370464 0.2183767813 0.2182475368 0.2251353646 0.0025744396 
##           73           74           75           76           77           78 
## 0.1370360720 0.0110855446 0.0367219655 0.1948391308 0.2417631510 0.0058932279 
##           79           80           81           82           83           84 
## 0.0078584650 0.0052637390 0.0089730991 0.0071872015 0.0076126739 0.0119456845 
##           85           86           87           88           89           90 
## 0.2261093327 0.1928793859 0.1549465131 0.2396999842 0.2597490234 0.0584936415 
##           91           92           93           94           95           96 
## 0.1623878999 0.0780328340 0.1807684824 0.1709913576 0.2280661862 0.1615428692 
##           97           98           99          100          101          102 
## 0.2257567833 0.2247755293 0.1910239571 0.2466190883 0.0685025584 0.1312633852 
##          103          104          105          106          107          108 
## 0.2706879272 0.1705641968 0.2737352210 0.0964350503 0.1622606232 0.0888744740 
##          109          110          111          112          113          114 
## 0.0128922389 0.0044190828 0.0304136690 0.0130563568 0.0555896685 0.0372812979 
##          115          116          117          118          119          120 
## 0.0538612536 0.0039907148 0.0071667145 0.1266748398 0.1139850172 0.1418072368 
##          121          122          123          124          125          126 
## 0.1092979991 0.0138037715 0.0124350024 0.0118177136 0.0315786748 0.1188930548 
##          127          128          129          130          131          132 
## 0.1665382595 0.0983548523 0.0870002089 0.1490831562 0.1356671914 0.1177800236 
##          133          134          135          136          137          138 
## 0.0873362999 0.0845173224 0.1476509502 0.1605509437 0.1261544839 0.1574673642 
##          139          140          141          142          143          144 
## 0.0388048799 0.0367239567 0.1015637047 0.1671249446 0.0698463841 0.0634608089 
##          145          146          147          148          149          150 
## 0.1658964320 0.1670664395 0.1184682191 0.1819988025 0.0297998617 0.0090928176 
##          151          152          153          154          155          156 
## 0.0125537734 0.0113672176 0.0218916441 0.0044145930 0.0527562058 0.2288646250 
##          157          158          159          160          161          162 
## 0.1424733834 0.1482259882 0.2631162246 0.0134996272 0.0021699938 0.0151752111 
##          163          164          165          166          167          168 
## 0.0659867009 0.0869200667 0.0665645718 0.0702742089 0.0074301537 0.0158064217 
##          169          170          171          172          173          174 
## 0.0200413495 0.2611817456 0.0090437318 0.0058761959 0.0148817593 0.0173317078 
##          175          176          177          178          179          180 
## 0.0218535266 0.0158485241 0.0170485076 0.0284311765 0.0395142053 0.0222524596 
##          181          182          183          184          185          186 
## 0.0191032201 0.0226687955 0.0081523637 0.0169416916 0.0242141747 0.0249068794 
##          187          188          189          190          191          192 
## 0.0338151108 0.0306700792 0.0444256383 0.1668598805 0.2660606162 0.0208690773 
##          193          194          195          196          197          198 
## 0.0141900831 0.0076680432 0.0151379132 0.0041774349 0.0100085312 0.0194844659 
##          199          200          201          202          203          204 
## 0.0093184014 0.0088725506 0.0311180754 0.0767179306 0.0218772971 0.0247103368 
##          205          206          207          208          209          210 
## 0.0515302590 0.0469861387 0.0221469034 0.0457008530 0.0219287825 0.0211546445 
##          211          212          213          214          215          216 
## 0.0347548811 0.0052206675 0.0246835389 0.0429131536 0.1863276425 0.0017826610 
##          217          218          219          220          221          222 
## 0.0055420808 0.0118469766 0.0280098528 0.0153604620 0.0271897837 0.0850648371 
##          223          224          225          226          227          228 
## 0.0139343625 0.0147668188 0.0131693491 0.0149312331 0.0417239334 0.0709732320 
##          229          230          231          232          233          234 
## 0.0261322928 0.0221054164 0.0304104913 0.0442650489 0.0518026536 0.0513430932 
##          235          236          237          238          239          240 
## 0.0517144771 0.0071834707 0.0059300538 0.0063273936 0.0107216302 0.0096773706 
##          241          242          243          244          245          246 
## 0.0077424350 0.0083412035 0.0093546927 0.0119548246 0.0084470014 0.0301761227 
##          247          248          249          250          251          252 
## 0.0268997777 0.0253491062 0.0148836306 0.0113477567 0.0127716515 0.0175057391 
##          253          254          255          256          257          258 
## 0.0163785333 0.0192354585 0.0657067615 0.0116928298 0.0086391472 0.0091972164 
##          259          260          261          262          263          264 
## 0.0066367116 0.0094392034 0.0105310046 0.0249320004 0.0214701295 0.0239025816 
##          265          266          267          268          269          270 
## 0.0280656267 0.0220920256 0.0232565727 0.0267054297 0.0290880025 0.0330448218 
##          271          272          273          274          275          276 
## 0.0274371183 0.0347231897 0.0258680762 0.0179360845 0.0107849313 0.0193786607 
##          277          278          279          280          281          282 
## 0.0972429322 0.0613168067 0.0623163699 0.0184559187 0.0119432292 0.0705969499 
##          283          284          285          286          287          288 
## 0.0097492535 0.0095609684 0.0421656861 0.0496418979 0.0459412667 0.0245686172 
##          289          290          291          292          293          294 
## 0.0009088146 0.0008834079 0.0117356162 0.0075824964 0.0305838296 0.0065736861 
##          295          296          297          298          299          300 
## 0.0045320528 0.0030873023 0.0068203709 0.0085574126 0.0033909411 0.0006344732 
##          301          302          303          304          305          306 
## 0.0121180626 0.0076249502 0.0057713804 0.0002912282 0.0003814451 0.0006040089 
##          307          308          309          310          311          312 
## 0.0002960656 0.0002534497 0.0142167327 0.0187934513 0.0004045288 0.0003866215 
##          313          314          315          316          317          318 
## 0.0002806823 0.0061857257 0.0134765481 0.0106264268 0.0113251556 0.0116740953 
##          319          320          321          322          323          324 
## 0.0125247630 0.0138299317 0.0154152622 0.0164528718 0.0167025860 0.0057958440 
##          325          326          327          328          329          330 
## 0.0045692486 0.0013376677 0.0030450104 0.0015880168 0.0008838401 0.0009043697 
##          331          332          333          334          335          336 
## 0.0011143889 0.0009956370 0.0013444178 0.0026278949 0.0028923509 0.0042428902 
##          337          338          339          340          341          342 
## 0.0052318206 0.0073738895 0.0058670266 0.0014933204 0.0132570866 0.0052751860 
##          343          344          345          346          347          348 
## 0.0105531741 0.0097117191 0.0187789651 0.0225924485 0.0153238415 0.0170529548 
##          349          350          351          352          353          354 
## 0.0187749363 0.0198525970 0.0209559658 0.0176936106 0.0178462078 0.0007985918 
##          355          356          357          358          359          360 
## 0.0007337668 0.0007739984 0.0007533362 0.0006065061 0.0008684522 0.0005551224 
##          361          362          363          364          365          366 
## 0.0005747017 0.0006147297 0.0007264931 0.0004557065 0.0006106372 0.0007062232 
##          367          368          369          370          371          372 
## 0.0010003772 0.0006966461 0.0008590111 0.0335614314 0.0375255454 0.0850899885 
##          373          374          375          376          377          378 
## 0.0043626010 0.0028493451 0.0001291378 0.0018847789 0.0044401182 0.0006612728 
##          379          380          381          382          383          384 
## 0.0009477544 0.0014454243 0.0010045696 0.0004373985 0.0006524030 0.0007164283 
##          385          386          387          388          389          390 
## 0.0184000081 0.0275493692 0.0244452822 0.0107659408 0.0419985564 0.0543408875 
##          391          392          393          394          395          396 
## 0.0439101907 0.0083661057 0.0259016421 0.0270580411 0.0189880653 0.0296389452 
##          397          398          399          400          401          402 
## 0.0281879788 0.0289737291 0.0237464905 0.0102012255 0.0027985505 0.0022808377 
##          403          404          405          406          407          408 
## 0.0079108619 0.0346428543 0.0318788428 0.0358526677 0.0263144119 0.0217030645 
##          409          410          411          412          413          414 
## 0.2257168865 0.2424655645 0.2464889826 0.2455751555 0.2144570747 0.2437333261 
##          415          416          417          418          419          420 
## 0.1968818778 0.2733129402 0.2334103710 0.2495909901 0.2124113419 0.2202684381 
##          421          422          423          424          425          426 
## 0.2263333579 0.2315698309 0.2620667426 0.2413749842 0.2491528364 0.2207772987 
##          427          428          429          430          431          432 
## 0.2326109112 0.2306743939 0.1149586715 0.1550945748 0.2087461993 0.2281683498 
##          433          434          435          436          437          438 
## 0.2498022835 0.1906003869 0.2564652432 0.2597105914 0.2658442529 0.3268816736 
##          439          440          441          442          443          444 
## 0.3644901974 0.0119085763 0.0153717553 0.0100552641 0.1754079310 0.0088958899 
##          445          446          447          448          449          450 
## 0.0092896744 0.0006292188 0.0007164239 0.0004287314 0.0009183739 0.0378388826 
##          451          452          453          454          455          456 
## 0.2069871140 0.1773452546 0.1161040838 0.0052243920 0.0066122036 0.0089536724 
##          457          458          459          460          461          462 
## 0.0086755358 0.0025384541 0.0026407118 0.0081524597 0.0076338712 0.0074746600 
##          463          464          465          466          467          468 
## 0.0073688760 0.0022689476 0.0032176605 0.0006743661 0.0084324129 0.0085736344 
##          469          470          471          472          473          474 
## 0.0029248931 0.0064785595 0.0071039324 0.0051158835 0.0045618622 0.0048751756 
##          475          476          477          478 
## 0.0139618365 0.0078446828 0.0035332043 0.0028445210

-結果は最大で0.47であり,全て0.5以下になっている。

回帰係数を個別に検定する。

stargazer(r3, 
          style = "ajps", # AJPS仕様
          dep.var.labels = "影響あり or 影響なし", # 応答変数の名前
          title = "厚木のテストデータを使ったパラメータの影響度判定", # タイトル
          type ="html")
厚木のテストデータを使ったパラメータの影響度判定
影響あり or 影響なし
slope -0.254*
(0.140)
sitadokaburi -0.331
(0.249)
nendo 0.243***
(0.065)
kei 2.065***
(0.766)
kubun1 -1.388
(0.905)
did1 2.479**
(1.182)
kansyu1 3.578***
(1.147)
Constant -488.362***
(129.872)
N 478
Log Likelihood -83.132
AIC 182.264
p < .01; p < .05; p < .1

-5%で統計的に優位なパラメータは、nendo,kansyu, kei とdidは1%で有意となった。即ち影響があった

http://www.ner.takushoku-u.ac.jp/masano/class_material/waseda/keiryo/15_logit.html