参考資料 https://to-kei.net/r-beginner/r-5-logistic/

ロジスティック回帰と変数選択 https://oku.edu.mie-u.ac.jp/~okumura/stat/140921.html

#　ロジスティック回帰による病気分析　

logi_fun <- function(data,file,disease){
ans <- glm(data$Y~.,data=data,family=binomial)
s.ans <- summary(ans)
coe <- s.ans$coefficient
RR <- exp(coe[,1])
RRlow <- exp(coe[,1]-1.96*coe[,2])
RRup <- exp(coe[,1]+1.96*coe[,2])
N <- nrow(data)
aic <- AIC(ans)
result <- cbind(coe,RR,RRlow,RRup,aic,N)
colnames(result)[6:7] <- c("RR95%CI.low","RR95%CI.up")
if(nrow(result)>=2){
result[2:nrow(result),8:9] <- ""
}
write.table(disease,file,append=T,quote=F,sep=",",row.names=F,col.names=F)
write.table(matrix(c("",colnames(result)),nrow=1),file,append=T,quote=F,sep=",",row.names=F,col.names=F)
write.table(result,file,append=T,quote=F,sep=",",row.names=T,col.names=F)
write.table("",file,append=T,quote=F,sep=",",row.names=F,col.names=F)
}

df <- read.csv("sample-data.csv",header=T,row.names=1)

dat <- df[,c(5,1,2,6)]#５列が目的変数。1,2,6列目が説明変数。

colnames(dat)[1] <- "Y"

結果は、C:\721540~-data.csvにある。

#その2 http://www.ner.takushoku-u.ac.jp/masano/class_material/waseda/keiryo/15_logit.html

library("dplyr")

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library("ggplot2")
library("reshape")

## Warning: package 'reshape' was built under R version 3.6.2

## 
## Attaching package: 'reshape'

## The following object is masked from 'package:dplyr':
## 
##     rename

library("reshape2")

## 
## Attaching package: 'reshape2'

## The following objects are masked from 'package:reshape':
## 
##     colsplit, melt, recast

library("stargazer")

## 
## Please cite as:

##  Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables.

##  R package version 5.2.2. https://CRAN.R-project.org/package=stargazer

## ロジスティック関数を定義する
## 引数：x = 数値ベクトル
## 返り値：y

invlogit <- function(x){ 
    return( y <- 1 / (1 + exp(-x)) )
}

-　ロジスティック曲線を書く http://www.ner.takushoku-u.ac.jp/masano/class_material/waseda/keiryo/15_logit.html

x <- seq(-5, 5, length = 100)
plot(x, invlogit(x), type = "l", lwd = 2, col = "darkblue",
     ylab = "invlogit(x)", main = "Logistic Function")
abline(h = c(0, 1), col = "gray")

-　ロジスティック関数による当落判定の実行 -　data：logit.csv→C:\721540~

# データフレーム logit に含まれている変数を表示する。
logit <- read.csv("logit.csv", na = ".") 

names(logit)

## [1] "wlsmd"    "previous" "expm"

# データフレーム logit を表示する。
logit

##    wlsmd previous expm
## 1      1        0 10.0
## 2      1        1 10.0
## 3      1        3  8.9
## 4      1        5  7.7
## 5      1        7  5.4
## 6      1        4  3.0
## 7      1        5  2.0
## 8      1        4  7.0
## 9      0        0  2.2
## 10     0        0  4.3
## 11     0        5  4.0
## 12     0        2  3.0
## 13     0        4  4.0
## 14     0        1  2.2
## 15     0        3  4.0

-2.2 帰無仮説と対立仮説の設定 -ここでの帰無仮説は次のとおり。

・H01：当選回数は小選挙区での当選確率に影響を与えない・H02：選挙費用は小選挙区での当選確率に影響を与えない

-対立仮説は次のとおり。

・Ha1：当選回数が多いほど、小選挙区での当選確率が高くなる・Ha2：選挙費用が多いほど、小選挙区での当選確率が高くなる

-2.3 説明変数と応答変数の散布図・ダウンロードしたデータ logit を使い、候補者の小選挙区における当落 (wlsmd) を縦軸に、それまでの当選回数 (previous) と選挙費用 (expm) を横軸にした散布図を表示する。

-以下の二つのグラフを見ると、当選回数と当落は関係が低く、選挙費用と当落は大いに関係がある。

# 当選回数 (previous) と当落 (wlsmd)の散布図
ggplot(logit, aes(previous, wlsmd)) + geom_point() +
  stat_smooth(method = lm, se = FALSE) +
  theme_bw() +
  geom_jitter(width = 0.05, height = 0.05) #jitterで重複したデータを散らす

# 選挙費用 (expm) と当落 (wlsmd)の散布図
ggplot(logit, aes(expm, wlsmd)) + geom_point() +
  stat_smooth(method = lm, se = FALSE)+
    theme_bw() +
  geom_jitter(width = 0.05, height = 0.05)#jitterで重複したデータを散らす

-ロジスティック回帰式の推定値を求める

・ロジスティック回帰分析を行うには glm() 関数を利用する。・ glm() 関数はロジスティク回帰以外にも様々な一般化線形モデルで利用される。・ロジスティック回帰に使うには、family を以下のように指定する。

model_1 <- glm(wlsmd ~ previous + expm, data = logit,family = binomial(link = "logit"))
summary(model_1)

## 
## Call:
## glm(formula = wlsmd ~ previous + expm, family = binomial(link = "logit"), 
##     data = logit)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.5741  -0.3781   0.2013   0.3943   1.4948  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept)  -6.3811     3.5147  -1.816   0.0694 .
## previous      0.8085     0.5851   1.382   0.1670  
## expm          0.8088     0.4000   2.022   0.0431 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 20.728  on 14  degrees of freedom
## Residual deviance: 10.384  on 12  degrees of freedom
## AIC: 16.384
## 
## Number of Fisher Scoring iterations: 6

-　当選確率の予測値 pを求める。 -・Rでは predict() 関数を使うと、予測値を計算してくれる。

predict(model_1, type = "response")

##           1           2           3           4           5           6 
## 0.846455457 0.925228189 0.962419494 0.979953464 0.974574115 0.327276800 
##           7           8           9          10          11          12 
## 0.327213998 0.925168964 0.009934946 0.051995861 0.710296138 0.088057029 
##          13          14          15 
## 0.522058216 0.022027721 0.327339608

-　1から8までが当選した実績があり、9から15までが落選した実績がある。 -　これからすると11を除き、0.3以上と以下に分かれ、概ね当落実績どおりの結果となっている。

stargazer(model_1, style = "ajps", # AJPS仕様
          dep.var.labels = "Win or Lose", # 応答変数の名前
          title = "Logistic Regression Analysis on Winning a Seat", # タイトル
          type = "html")

**Logistic Regression Analysis on Winning a Seat**

	Win or Lose

previous	0.809
	(0.585)
expm	0.809^**
	(0.400)
Constant	-6.381^*
	(3.515)
N	15
Log Likelihood	-5.192
AIC	16.384

p < .01; p < .05; p < .1

-・previousは有意ではないが、expm は 5%で統計的に有意であることがわかる。

-・expmの係数 (0.809)の意味: 候補者の選挙費用が 100万円増える　→　当選する log odds が 0.809 増える・これだとわかりにくい

-ひとつの解決策: ・log odds → predict()関数を使って当選確率に変換・二つの説明変数（previous と expm) の値を変えて当選確率を予測す

厚木市下水道管路で試行する

library(car)

## Loading required package: carData

## 
## Attaching package: 'car'

## The following object is masked from 'package:dplyr':
## 
##     recode

library(caret)

## Warning: package 'caret' was built under R version 3.6.2

## Loading required package: lattice

library(cluster)
library(dummies)

## dummies-1.5.6 provided by Decision Patterns

library(data.table)

## Warning: package 'data.table' was built under R version 3.6.2

## 
## Attaching package: 'data.table'

## The following objects are masked from 'package:reshape2':
## 
##     dcast, melt

## The following object is masked from 'package:reshape':
## 
##     melt

## The following objects are masked from 'package:dplyr':
## 
##     between, first, last

library(dplyr)
library(e1071)

## Warning: package 'e1071' was built under R version 3.6.2

library(epitools)
library(effects)

## Warning: package 'effects' was built under R version 3.6.3

## Registered S3 methods overwritten by 'lme4':
##   method                          from
##   cooks.distance.influence.merMod car 
##   influence.merMod                car 
##   dfbeta.influence.merMod         car 
##   dfbetas.influence.merMod        car

## Use the command
##     lattice::trellis.par.set(effectsTheme())
##   to customize lattice options for effects plots.
## See ?effectsTheme for details.

library(ggplot2)
library(ggthemes)

## Warning: package 'ggthemes' was built under R version 3.6.2

library(randomForest)

## randomForest 4.6-14

## Type rfNews() to see new features/changes/bug fixes.

## 
## Attaching package: 'randomForest'

## The following object is masked from 'package:dplyr':
## 
##     combine

## The following object is masked from 'package:ggplot2':
## 
##     margin

library(ranger)

## Warning: package 'ranger' was built under R version 3.6.2

## 
## Attaching package: 'ranger'

## The following object is masked from 'package:randomForest':
## 
##     importance

library(rgl)
library(rattle)

## Warning: package 'rattle' was built under R version 3.6.2

## Rattle: A free graphical interface for data science with R.
## バージョン 5.3.0 Copyright (c) 2006-2018 Togaware Pty Ltd.
## 'rattle()' と入力して、データを多角的に分析します。

## 
## Attaching package: 'rattle'

## The following object is masked from 'package:ranger':
## 
##     importance

## The following object is masked from 'package:randomForest':
## 
##     importance

library(readr)

## Warning: package 'readr' was built under R version 3.6.2

library(rpart.plot)

## Warning: package 'rpart.plot' was built under R version 3.6.2

## Loading required package: rpart

library(rpart)
library(readr)
library(reshape)
library(rsconnect)

## Warning: package 'rsconnect' was built under R version 3.6.2

library(reshape2)
library(tidyr)

## 
## Attaching package: 'tidyr'

## The following object is masked from 'package:reshape2':
## 
##     smiths

## The following objects are masked from 'package:reshape':
## 
##     expand, smiths

library(xtable)
library(nnet)

## Warning: package 'nnet' was built under R version 3.6.2

library(stargazer)

Rによる条件付きロジスティック回帰分析

https://to-kei.net/r-beginner/r-5-logistic/

ロジスティック関数を R で定義してみる。

## ロジスティック関数を定義する
## 引数：x = 数値ベクトル
## 返り値：y

invlogit <- function(x){ 
    return( y <- 1 / (1 + exp(-x)) )
}

確認のため、(−5<x<5) の範囲で logistic(x)=logit(x−1)) の値を図示する。

x <- seq(-5, 5, length = 100)
plot(x, invlogit(x), type = "l", lwd = 2, col = "darkblue",
     ylab = "invlogit(x)", main = "Logistic Function")
abline(h = c(0, 1), col = "gray")

-Windows ではを指定しては駄目．例えば，setwd(“c:”) とするとエラーが出る．setwd(“c:/usr”) とするか setwd(“c:\usr”) こと．

#setwd("C:/Users/721540/Documents/practice/ロジスティック回帰/~/practice ") 
#gesui = read_csv("gesuidou.csv")

#getwd()

下水道データ読み込み# 基本統計量表示 gesui　# 教科書ではlogit

setwd("C:/Users/721540/Documents/practice/ロジスティック回帰/~/practice ") 
gesui = read_csv("gesuidou.csv")

## Parsed with column specification:
## cols(
##   OBJECTID = col_double(),
##   slope = col_double(),
##   long = col_double(),
##   uedokaburi = col_double(),
##   sitadokaburi = col_double(),
##   masuhonsuu = col_double(),
##   nendo = col_double(),
##   kei = col_double(),
##   kubun = col_double(),
##   did = col_double(),
##   kouhou = col_double(),
##   ekijyouka = col_double(),
##   kansyu = col_double(),
##   kinkyuudo = col_double(),
##   taisyo = col_double()
## )

gesui<- data.frame(gesui) # 教科書ではlogit
gesui <- gesui[-1] #OBJECTID列をデータから削除

stargazer(as.data.frame(gesui),type = "html")


Statistic	N	Mean	St. Dev.	Min	Pctl(25)	Pctl(75)	Max

slope	478	3.509	2.128	0	2	4.7	10
long	478	3.646	1.844	1.004	2.152	4.717	9.940
uedokaburi	478	4.138	2.096	1.020	2.675	4.900	13.863
sitadokaburi	478	2.983	1.686	0.001	1.845	3.821	11.196
masuhonsuu	478	1.554	2.090	0	0	2	14
nendo	478	1,985.094	6.015	1,976	1,978	1,991	1,992
kei	478	0.578	0.353	0.200	0.250	0.800	1.650
kubun	478	0.828	0.377	0	1	1	1
did	478	0.826	0.379	0	1	1	1
kouhou	478	0.502	0.501	0	0	1	1
ekijyouka	478	0.303	0.460	0	0	1	1
kansyu	478	0.705	0.457	0	0	1	1
kinkyuudo	478	1.510	1.462	0	0	3	3
taisyo	478	0.059	0.235	0	0	0	1

ggplot(gesui,aes(slope))+
geom_histogram(aes(y = ..density..),
bins = 10,
colour = "gray",
fill = "blue")+
  theme_classic()

ggplot(gesui,aes(long))+
geom_histogram(aes(y = ..density..),
bins = 10,
colour = "gray",
fill = "blue")+
  theme_classic()

ggplot(gesui,aes(uedokaburi))+
geom_histogram(aes(y = ..density..),
bins = 10,
colour = "gray",
fill = "blue")+
  theme_classic()

ggplot(gesui,aes(sitadokaburi))+
geom_histogram(aes(y = ..density..),
bins = 10,
colour = "gray",
fill = "blue")+
  theme_classic()

桝本数

ggplot(gesui,aes(masuhonsuu ))+
geom_histogram(aes(y = ..density..),
bins = 10,
colour = "gray",
fill = "blue")+
  theme_classic()

施工年度

ggplot(gesui,aes(nendo))+
geom_histogram(aes(y = ..density..),
bins = 10,
colour = "gray",
fill = "blue")+
  theme_classic()

管路径

ggplot(gesui,aes(kei))+
geom_histogram(aes(y = ..density..),
bins = 10,
colour = "gray",
fill = "blue")+
  theme_classic()

液状化

ggplot(gesui,aes(ekijyouka))+
geom_histogram(aes(y = ..density..),
bins = 10,
colour = "gray",
fill = "blue")+
  theme_classic()

緊急度

ggplot(gesui,aes(kinkyuudo))+
geom_histogram(aes(y = ..density..),
bins = 10,
colour = "gray",
fill = "blue")+
  theme_classic()

散布図-plot

head(gesui)

##   slope  long uedokaburi sitadokaburi masuhonsuu nendo  kei kubun did kouhou
## 1  0.00 9.940   3.758000     3.258000          0  1983 1.10     1   1      0
## 2  5.00 1.232   2.331392     2.255000          0  1982 1.10     1   1      1
## 3  5.00 6.500   4.654844     4.513411          0  1981 0.60     1   1      0
## 4  2.70 5.606   4.584913     3.037999          0  1976 0.60     1   1      0
## 5  4.71 5.020   1.414000     1.403298          0  1992 0.25     0   1      1
## 6  1.10 1.317   1.544714     1.153690          3  1992 0.25     0   1      1
##   ekijyouka kansyu kinkyuudo taisyo
## 1         1      1         0      0
## 2         1      1         3      0
## 3         1      1         3      0
## 4         1      1         3      0
## 5         1      0         0      0
## 6         1      0         0      0

plot(gesui[,2],gesui[,3])

散布図-plot 2 https://www1.doshisha.ac.jp/~mjin/R/Chap_07/07.html参照

#plot(gesui[,2],gesui[,3],type="l",xlab="Length of slope", ylab="Length of long",cex=2,col="red")
#参照頁はcol=”red”でエラー
plot(gesui[,2],gesui[,3],type="p",xlab="Length of slope", ylab="Length of long", main="延長と傾斜の散布図",cex=0.8,col="red")

関数　 plot(x, y, xlim=range(x), ylim=range(y), type=“p”, main, xlab, ylab……)　 x 　横軸のデータ y 　縦軸のデータ xlim 　横軸の範囲 ylim 　縦軸の範囲 type 　“p” は、点を描く　“l” は、点を通過する線のみを描く　“b” は、点を描き、線で点を連結　“c” は、点を飛ばして線を描く　“h” は、各点から横軸までの垂直線を描く　“n” は、枠(軸)のみを描く main 　散布図の上部にタイトルを付ける xlab 　x軸(横軸)の名前を付ける ylab 　y軸(縦軸)の名前を付ける cex 　点のサイズ、指定しない場合は１
対散布図（散布図行列）用いる変数が2以上で、かつ変数がそれほど多くない量的データ場合は、本格的な解析を行う前に、すべての変数を組み合わせた散布図について考察を行うことで、データ間の関連性を視覚的に把握することができる。1つの画面に複数の変数を組み合わせた散布図を対散布図、あるいは散布図行列という。対散布図の作成は、関数 pairs を用いる。gesui のデータを用いて対散布図の作成について説明する。iris データの四つの変数を用いた最もシンプルな対散布図は次のコマンドで作成できる。

pairs(gesui[2:7], pch = 21, bg = c("red", "green3", "blue", "orenge","black", "yellow", "skyblue")[unclass(gesui$kinkyuudo)])

pairs(gesui[2:14], pch = 21)

Rのpairs関数の出力をカスタマイズする
https://entertainment-lab.blogspot.com/2013/01/r_27.html

upper <- function(x, y, ...){ # 右上の散布図を設定
  oldpar <- par(usr = c(0, 1, 0, 1)) # 3行下でテキストの位置を(0.5, 0.5)にするための調整
  v <- abs(cor(x, y))
  sz <- 1 + v * 2 # テキストのサイズを直線性の強さで調整
  text(0.5, 0.5, sprintf("%.3f", v), cex = sz)
  par(oldpar)
}

lower <- function(x, y, ...){ # 左下の散布図を設定
  points(x, y, pch = 21, bg = c("red", "green3", "blue")[unclass(iris$Species)])
  if( abs(cor(x, y)) > 0.8 ){ # 直線性の強いデータの組だけ回帰式を作成
    lmobj <- lm(y~x)
    abline(lmobj)
  }
}

pairs(gesui[,2:14], upper.panel=upper, lower.panel=lower)

各変数の正規分布性の確認(qqnorm()関数）s悦明変数は正規分布しているか？

https://detail.chiebukuro.yahoo.co.jp/qa/question_detail/q1283089632 - https://sites.google.com/site/webtextofr/graph - 正規分布ならばQQプロットは直線になるはずです．

まずSLOPE

qqnorm(gesui$slope)

対数正規性を調べる http://cse.naro.affrc.go.jp/takezawa/r-tips/r/63.html https://tutorialmore.com/questions-1478863.htm

 x <- gesui$slope
 xy <- qqplot(qt(ppoints(50), df=5), x)
 abline(lsfit(xy$x, xy$y), lty=2)

 plot(qlnorm(ppoints(x)), sort(x))

次に桝本数：カテゴリー変数であり正規分布にはならない

qqnorm(gesui$masu)

上土被り

qqnorm(gesui$uedokaburi)

下土被り

qqnorm(gesui$sitadokaburi)

桝本数

qqnorm(gesui$masu)

延長

qqnorm(gesui$long)

シャピロ・ウィルク検定

-敢えて検定するまでもなく今までの分布形状を見ると、左側に偏った形状をしている。 https://tkstock.site/2017/04/12/2017-04-12-000000_1/ で傾斜に対してシャピロ・ウィルク検定を行うと

 shapiro.test(gesui$slope)

## 
##  Shapiro-Wilk normality test
## 
## data:  gesui$slope
## W = 0.89138, p-value < 2.2e-16

Shapiro-Wilk normality test
data: gesui$slope
W = 0.89138, p-value < 2.2e-16
結果の W は本検定の検定統計量を示す．この検定では，p値<2.2e-16i以下であるため，有意水準が5%で帰無仮説が棄却され (p ≤ α)，データXの分布は正規分布に従うとはいえない (実質的に従わない) と判断できる．本検定にて計算されるp値はデータ数が4個以上のときは近似的に求められているものである．

同様にして管路延長も正規分布でない

 shapiro.test(gesui$long)

## 
##  Shapiro-Wilk normality test
## 
## data:  gesui$long
## W = 0.92612, p-value = 1.365e-14

Shapiro-Wilk normality test

-data: gesui$long - W = 0.92612, p-value = 1.365e-14

同様に桝も正規分でない

 shapiro.test(gesui$masu)

## 
##  Shapiro-Wilk normality test
## 
## data:  gesui$masu
## W = 0.73461, p-value < 2.2e-16

Shapiro-Wilk normality test

data: gesui$masu W = 0.73461, p-value < 2.2e-16

#　説明変数slope等は正規分布する必要はない。必要なのは誤差分散が正規分布しているかいなかである

-https://toukeier.hatenablog.com/entry/2019/09/08/224346

slopeの確立密度関数

https://oku.edu.mie-u.ac.jp/~okumura/stat/first.html x=slope

 x <- gesui$slope
curve(dnorm(x), 6, -6)

slope等の非正規なデータの最尤法フィッティング

下準備：http://tips-r.blogspot.com/2014/05/repseq.html

rep(1, 10) #「○を○個だけrepeat」って感じ

##  [1] 1 1 1 1 1 1 1 1 1 1

slope等の非正規なデータの最尤法フィッティング

-https://oku.edu.mie-u.ac.jp/~okumura/stat/141115.html

slopeの平均 -平均3.509，標準偏差2.128の正規分布の密度関数と指数関数とを50：10で混ぜ合わせた簡単な関数を考えます：

何をやりたいのか途中でわからなくなったので中止 x=478 y=gesui$slope x= 1:20 y = c(11,4,13,10,4,8,6,16,7,12,10,13,6,5,1,4,2,0,0,1)

z = rep(x, y) r = glm(y ~ dnorm(x,10,3) + exp(-x/10) - 1, family=poisson(link=“identity”)) summary(r) curve(dnorm(x,10,3))

目的変数・説明変数共に正規性は関係ないこととしてロジスティック回帰式の推定値を求める

-http://www.ner.takushoku-u.ac.jp/masano/class_material/waseda/keiryo/15_logit.html

#0305より以下のファクター化について検証する。
gesui$kansyu <- as.factor(gesui$kansyu)
gesui$ktaisyo <- as.factor(gesui$taisyo)
gesui$kubun <- as.factor(gesui$kubun)
gesui$did <- as.factor(gesui$did)
gesui$ekijyouka <- as.factor(gesui$ekijyouka)
gesui$kinkyuudo <- as.factor(gesui$kinkyuudo)
sapply(gesui, class)

##        slope         long   uedokaburi sitadokaburi   masuhonsuu        nendo 
##    "numeric"    "numeric"    "numeric"    "numeric"    "numeric"    "numeric" 
##          kei        kubun          did       kouhou    ekijyouka       kansyu 
##    "numeric"     "factor"     "factor"    "numeric"     "factor"     "factor" 
##    kinkyuudo       taisyo      ktaisyo 
##     "factor"    "numeric"     "factor"

dim(gesui)

## [1] 478  15

summary(gesui)

##      slope            long         uedokaburi      sitadokaburi      
##  Min.   :0.000   Min.   :1.004   Min.   : 1.020   Min.   : 0.000685  
##  1st Qu.:2.000   1st Qu.:2.152   1st Qu.: 2.675   1st Qu.: 1.844750  
##  Median :2.800   Median :3.183   Median : 3.722   Median : 2.693033  
##  Mean   :3.509   Mean   :3.646   Mean   : 4.138   Mean   : 2.983303  
##  3rd Qu.:4.700   3rd Qu.:4.717   3rd Qu.: 4.900   3rd Qu.: 3.820872  
##  Max.   :9.710   Max.   :9.940   Max.   :13.863   Max.   :11.196442  
##    masuhonsuu         nendo           kei         kubun   did    
##  Min.   : 0.000   Min.   :1976   Min.   :0.2000   0: 82   0: 83  
##  1st Qu.: 0.000   1st Qu.:1978   1st Qu.:0.2500   1:396   1:395  
##  Median : 1.000   Median :1986   Median :0.6000                  
##  Mean   : 1.554   Mean   :1985   Mean   :0.5779                  
##  3rd Qu.: 2.000   3rd Qu.:1991   3rd Qu.:0.8000                  
##  Max.   :14.000   Max.   :1992   Max.   :1.6500                  
##      kouhou       ekijyouka kansyu  kinkyuudo     taisyo        ktaisyo
##  Min.   :0.0000   0:333     0:141   0:228     Min.   :0.00000   0:450  
##  1st Qu.:0.0000   1:145     1:337   2: 28     1st Qu.:0.00000   1: 28  
##  Median :1.0000                     3:222     Median :0.00000          
##  Mean   :0.5021                               Mean   :0.05858          
##  3rd Qu.:1.0000                               3rd Qu.:0.00000          
##  Max.   :1.0000                               Max.   :1.00000

names(gesui)

##  [1] "slope"        "long"         "uedokaburi"   "sitadokaburi" "masuhonsuu"  
##  [6] "nendo"        "kei"          "kubun"        "did"          "kouhou"      
## [11] "ekijyouka"    "kansyu"       "kinkyuudo"    "taisyo"       "ktaisyo"

model_1 <- glm(taisyo ~ slope + long + uedokaburi + sitadokaburi + masuhonsuu + nendo + kei + kubun + did + kouhou + ekijyouka + kansyu, data = gesui,family = binomial(link = "logit"))
summary(model_1)

## 
## Call:
## glm(formula = taisyo ~ slope + long + uedokaburi + sitadokaburi + 
##     masuhonsuu + nendo + kei + kubun + did + kouhou + ekijyouka + 
##     kansyu, family = binomial(link = "logit"), data = gesui)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.0921  -0.3082  -0.1684  -0.1077   3.9648  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)   
## (Intercept)  -487.54742  149.09610  -3.270  0.00108 **
## slope          -0.21202    0.14667  -1.446  0.14830   
## long           -0.05377    0.15201  -0.354  0.72357   
## uedokaburi      0.28273    0.22129   1.278  0.20137   
## sitadokaburi   -0.47435    0.32603  -1.455  0.14569   
## masuhonsuu      0.12920    0.09966   1.296  0.19484   
## nendo           0.24178    0.07482   3.231  0.00123 **
## kei             2.37393    1.18788   1.998  0.04567 * 
## kubun1         -1.65915    1.02478  -1.619  0.10544   
## did1            2.79188    1.37822   2.026  0.04279 * 
## kouhou          0.25207    0.83722   0.301  0.76335   
## ekijyouka1     -0.51234    0.70686  -0.725  0.46857   
## kansyu1         3.32820    1.12433   2.960  0.00307 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 213.22  on 477  degrees of freedom
## Residual deviance: 162.56  on 465  degrees of freedom
## AIC: 188.56
## 
## Number of Fisher Scoring iterations: 7

-上記表の見方は以下のサイトに掲載されている。 https://oku.edu.mie-u.ac.jp/~okumura/stat/logistic.html https://oku.edu.mie-u.ac.jp/~okumura/stat/140921.html 係数は非常に有意なもの（***）からまったく有意でないもの（無印）まで多様である。 -5％で統計的に優位なパラメータは、nendo,kansyu,5％　kei とdidは1％で有意となった。即ち影響があった。

あまり関係なさそうな変数を手作業で除いていってもよいが，自動化するには，step() という関数を使う：これは，デフォルトではフルモデルから1個ずつ変数を外すことを試み，外すと一番AICが良く（小さく）なるものを実際に外す。外すことによってAICが改善しなくなるまで続ける。

r2 = step(model_1)

## Start:  AIC=188.56
## taisyo ~ slope + long + uedokaburi + sitadokaburi + masuhonsuu + 
##     nendo + kei + kubun + did + kouhou + ekijyouka + kansyu
## 
##                Df Deviance    AIC
## - kouhou        1   162.65 186.65
## - long          1   162.69 186.69
## - ekijyouka     1   163.11 187.11
## - uedokaburi    1   164.13 188.13
## - masuhonsuu    1   164.26 188.26
## <none>              162.56 188.56
## - sitadokaburi  1   164.75 188.75
## - slope         1   164.96 188.96
## - kubun         1   165.19 189.19
## - kei           1   167.01 191.01
## - did           1   168.48 192.48
## - nendo         1   175.64 199.64
## - kansyu        1   180.12 204.12
## 
## Step:  AIC=186.65
## taisyo ~ slope + long + uedokaburi + sitadokaburi + masuhonsuu + 
##     nendo + kei + kubun + did + ekijyouka + kansyu
## 
##                Df Deviance    AIC
## - long          1   162.82 184.82
## - ekijyouka     1   163.19 185.19
## - uedokaburi    1   164.13 186.13
## - masuhonsuu    1   164.60 186.60
## <none>              162.65 186.65
## - slope         1   165.12 187.12
## - kubun         1   165.24 187.24
## - sitadokaburi  1   165.66 187.66
## - kei           1   167.24 189.24
## - did           1   168.50 190.50
## - nendo         1   176.08 198.08
## - kansyu        1   180.30 202.30
## 
## Step:  AIC=184.82
## taisyo ~ slope + uedokaburi + sitadokaburi + masuhonsuu + nendo + 
##     kei + kubun + did + ekijyouka + kansyu
## 
##                Df Deviance    AIC
## - ekijyouka     1   163.42 183.42
## - uedokaburi    1   164.23 184.23
## - masuhonsuu    1   164.60 184.60
## <none>              162.82 184.82
## - slope         1   165.36 185.36
## - kubun         1   165.46 185.46
## - sitadokaburi  1   165.91 185.91
## - kei           1   167.24 187.24
## - did           1   169.06 189.06
## - nendo         1   177.35 197.35
## - kansyu        1   181.81 201.81
## 
## Step:  AIC=183.42
## taisyo ~ slope + uedokaburi + sitadokaburi + masuhonsuu + nendo + 
##     kei + kubun + did + kansyu
## 
##                Df Deviance    AIC
## - uedokaburi    1   164.93 182.93
## - masuhonsuu    1   165.24 183.24
## <none>              163.42 183.42
## - kubun         1   165.69 183.69
## - slope         1   166.43 184.43
## - sitadokaburi  1   166.55 184.55
## - kei           1   167.35 185.35
## - did           1   169.20 187.20
## - nendo         1   177.85 195.85
## - kansyu        1   183.12 201.12
## 
## Step:  AIC=182.93
## taisyo ~ slope + sitadokaburi + masuhonsuu + nendo + kei + kubun + 
##     did + kansyu
## 
##                Df Deviance    AIC
## - masuhonsuu    1   166.26 182.26
## - sitadokaburi  1   166.57 182.57
## <none>              164.93 182.93
## - kubun         1   166.94 182.94
## - slope         1   167.83 183.83
## - kei           1   168.87 184.87
## - did           1   169.62 185.62
## - nendo         1   180.33 196.33
## - kansyu        1   184.81 200.81
## 
## Step:  AIC=182.26
## taisyo ~ slope + sitadokaburi + nendo + kei + kubun + did + kansyu
## 
##                Df Deviance    AIC
## <none>              166.26 182.26
## - sitadokaburi  1   168.27 182.27
## - kubun         1   168.50 182.50
## - slope         1   170.24 184.24
## - did           1   172.85 186.85
## - kei           1   173.89 187.89
## - nendo         1   184.04 198.04
## - kansyu        1   187.33 201.33

r3 <- glm(taisyo ~ slope + sitadokaburi + nendo + kei + kubun + did + kansyu, data = gesui,family = binomial(link = "logit"))
summary(r3)

## 
## Call:
## glm(formula = taisyo ~ slope + sitadokaburi + nendo + kei + kubun + 
##     did + kansyu, family = binomial(link = "logit"), data = gesui)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.9522  -0.3094  -0.1728  -0.1035   4.0352  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -488.36244  129.87178  -3.760 0.000170 ***
## slope          -0.25414    0.14044  -1.810 0.070349 .  
## sitadokaburi   -0.33073    0.24916  -1.327 0.184398    
## nendo           0.24266    0.06517   3.724 0.000196 ***
## kei             2.06469    0.76643   2.694 0.007062 ** 
## kubun1         -1.38829    0.90452  -1.535 0.124825    
## did1            2.47929    1.18220   2.097 0.035978 *  
## kansyu1         3.57821    1.14699   3.120 0.001811 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 213.22  on 477  degrees of freedom
## Residual deviance: 166.26  on 470  degrees of freedom
## AIC: 182.26
## 
## Number of Fisher Scoring iterations: 8

r4 = step(r3)

## Start:  AIC=182.26
## taisyo ~ slope + sitadokaburi + nendo + kei + kubun + did + kansyu
## 
##                Df Deviance    AIC
## <none>              166.26 182.26
## - sitadokaburi  1   168.27 182.27
## - kubun         1   168.50 182.50
## - slope         1   170.24 184.24
## - did           1   172.85 186.85
## - kei           1   173.89 187.89
## - nendo         1   184.04 198.04
## - kansyu        1   187.33 201.33

上記の結果、slope + sitadokaburi + nendo + kei + kubun + did+kansyu 以上に変数を減らしても一番AICが良く（小さく）なるものなく予測精度は向上しないことが判ったためこのモデルで予測行う

このモデルで予測を行うと

plot(fitted(r3), gesui$taisyo, ylim=c(-0.2,1.2), yaxt="n")
axis(2, c(0,1))

predict(r3, type = "response")

##            1            2            3            4            5            6 
## 0.2135601317 0.0768991781 0.0109093912 0.0094899540 0.0253781086 0.0661022586 
##            7            8            9           10           11           12 
## 0.0406679132 0.0284045807 0.0084288000 0.0102082823 0.0130696261 0.0267057447 
##           13           14           15           16           17           18 
## 0.0271286158 0.0438520479 0.1621464059 0.0156139597 0.0144815476 0.0521926782 
##           19           20           21           22           23           24 
## 0.0073356539 0.0137476719 0.0001398624 0.0002848636 0.0070851207 0.0063870725 
##           25           26           27           28           29           30 
## 0.0067153207 0.0083000062 0.0100323021 0.0182066483 0.0144483951 0.0758285044 
##           31           32           33           34           35           36 
## 0.0209988394 0.0601658809 0.0057180394 0.0340177689 0.0570973499 0.0589086894 
##           37           38           39           40           41           42 
## 0.0610189616 0.0314904131 0.0917820932 0.0587999971 0.0083843067 0.0693714489 
##           43           44           45           46           47           48 
## 0.0087760370 0.0072928159 0.0062727843 0.0048545571 0.0008092696 0.0003584811 
##           49           50           51           52           53           54 
## 0.0007176583 0.0023522503 0.0021220677 0.0021625704 0.0065923474 0.0129690479 
##           55           56           57           58           59           60 
## 0.0274562370 0.0089498104 0.0105839454 0.2495923119 0.2389666523 0.2367276370 
##           61           62           63           64           65           66 
## 0.2411897377 0.2509096390 0.2233843162 0.2542983720 0.2419867278 0.2279093689 
##           67           68           69           70           71           72 
## 0.2207753309 0.2184370464 0.2183767813 0.2182475368 0.2251353646 0.0025744396 
##           73           74           75           76           77           78 
## 0.1370360720 0.0110855446 0.0367219655 0.1948391308 0.2417631510 0.0058932279 
##           79           80           81           82           83           84 
## 0.0078584650 0.0052637390 0.0089730991 0.0071872015 0.0076126739 0.0119456845 
##           85           86           87           88           89           90 
## 0.2261093327 0.1928793859 0.1549465131 0.2396999842 0.2597490234 0.0584936415 
##           91           92           93           94           95           96 
## 0.1623878999 0.0780328340 0.1807684824 0.1709913576 0.2280661862 0.1615428692 
##           97           98           99          100          101          102 
## 0.2257567833 0.2247755293 0.1910239571 0.2466190883 0.0685025584 0.1312633852 
##          103          104          105          106          107          108 
## 0.2706879272 0.1705641968 0.2737352210 0.0964350503 0.1622606232 0.0888744740 
##          109          110          111          112          113          114 
## 0.0128922389 0.0044190828 0.0304136690 0.0130563568 0.0555896685 0.0372812979 
##          115          116          117          118          119          120 
## 0.0538612536 0.0039907148 0.0071667145 0.1266748398 0.1139850172 0.1418072368 
##          121          122          123          124          125          126 
## 0.1092979991 0.0138037715 0.0124350024 0.0118177136 0.0315786748 0.1188930548 
##          127          128          129          130          131          132 
## 0.1665382595 0.0983548523 0.0870002089 0.1490831562 0.1356671914 0.1177800236 
##          133          134          135          136          137          138 
## 0.0873362999 0.0845173224 0.1476509502 0.1605509437 0.1261544839 0.1574673642 
##          139          140          141          142          143          144 
## 0.0388048799 0.0367239567 0.1015637047 0.1671249446 0.0698463841 0.0634608089 
##          145          146          147          148          149          150 
## 0.1658964320 0.1670664395 0.1184682191 0.1819988025 0.0297998617 0.0090928176 
##          151          152          153          154          155          156 
## 0.0125537734 0.0113672176 0.0218916441 0.0044145930 0.0527562058 0.2288646250 
##          157          158          159          160          161          162 
## 0.1424733834 0.1482259882 0.2631162246 0.0134996272 0.0021699938 0.0151752111 
##          163          164          165          166          167          168 
## 0.0659867009 0.0869200667 0.0665645718 0.0702742089 0.0074301537 0.0158064217 
##          169          170          171          172          173          174 
## 0.0200413495 0.2611817456 0.0090437318 0.0058761959 0.0148817593 0.0173317078 
##          175          176          177          178          179          180 
## 0.0218535266 0.0158485241 0.0170485076 0.0284311765 0.0395142053 0.0222524596 
##          181          182          183          184          185          186 
## 0.0191032201 0.0226687955 0.0081523637 0.0169416916 0.0242141747 0.0249068794 
##          187          188          189          190          191          192 
## 0.0338151108 0.0306700792 0.0444256383 0.1668598805 0.2660606162 0.0208690773 
##          193          194          195          196          197          198 
## 0.0141900831 0.0076680432 0.0151379132 0.0041774349 0.0100085312 0.0194844659 
##          199          200          201          202          203          204 
## 0.0093184014 0.0088725506 0.0311180754 0.0767179306 0.0218772971 0.0247103368 
##          205          206          207          208          209          210 
## 0.0515302590 0.0469861387 0.0221469034 0.0457008530 0.0219287825 0.0211546445 
##          211          212          213          214          215          216 
## 0.0347548811 0.0052206675 0.0246835389 0.0429131536 0.1863276425 0.0017826610 
##          217          218          219          220          221          222 
## 0.0055420808 0.0118469766 0.0280098528 0.0153604620 0.0271897837 0.0850648371 
##          223          224          225          226          227          228 
## 0.0139343625 0.0147668188 0.0131693491 0.0149312331 0.0417239334 0.0709732320 
##          229          230          231          232          233          234 
## 0.0261322928 0.0221054164 0.0304104913 0.0442650489 0.0518026536 0.0513430932 
##          235          236          237          238          239          240 
## 0.0517144771 0.0071834707 0.0059300538 0.0063273936 0.0107216302 0.0096773706 
##          241          242          243          244          245          246 
## 0.0077424350 0.0083412035 0.0093546927 0.0119548246 0.0084470014 0.0301761227 
##          247          248          249          250          251          252 
## 0.0268997777 0.0253491062 0.0148836306 0.0113477567 0.0127716515 0.0175057391 
##          253          254          255          256          257          258 
## 0.0163785333 0.0192354585 0.0657067615 0.0116928298 0.0086391472 0.0091972164 
##          259          260          261          262          263          264 
## 0.0066367116 0.0094392034 0.0105310046 0.0249320004 0.0214701295 0.0239025816 
##          265          266          267          268          269          270 
## 0.0280656267 0.0220920256 0.0232565727 0.0267054297 0.0290880025 0.0330448218 
##          271          272          273          274          275          276 
## 0.0274371183 0.0347231897 0.0258680762 0.0179360845 0.0107849313 0.0193786607 
##          277          278          279          280          281          282 
## 0.0972429322 0.0613168067 0.0623163699 0.0184559187 0.0119432292 0.0705969499 
##          283          284          285          286          287          288 
## 0.0097492535 0.0095609684 0.0421656861 0.0496418979 0.0459412667 0.0245686172 
##          289          290          291          292          293          294 
## 0.0009088146 0.0008834079 0.0117356162 0.0075824964 0.0305838296 0.0065736861 
##          295          296          297          298          299          300 
## 0.0045320528 0.0030873023 0.0068203709 0.0085574126 0.0033909411 0.0006344732 
##          301          302          303          304          305          306 
## 0.0121180626 0.0076249502 0.0057713804 0.0002912282 0.0003814451 0.0006040089 
##          307          308          309          310          311          312 
## 0.0002960656 0.0002534497 0.0142167327 0.0187934513 0.0004045288 0.0003866215 
##          313          314          315          316          317          318 
## 0.0002806823 0.0061857257 0.0134765481 0.0106264268 0.0113251556 0.0116740953 
##          319          320          321          322          323          324 
## 0.0125247630 0.0138299317 0.0154152622 0.0164528718 0.0167025860 0.0057958440 
##          325          326          327          328          329          330 
## 0.0045692486 0.0013376677 0.0030450104 0.0015880168 0.0008838401 0.0009043697 
##          331          332          333          334          335          336 
## 0.0011143889 0.0009956370 0.0013444178 0.0026278949 0.0028923509 0.0042428902 
##          337          338          339          340          341          342 
## 0.0052318206 0.0073738895 0.0058670266 0.0014933204 0.0132570866 0.0052751860 
##          343          344          345          346          347          348 
## 0.0105531741 0.0097117191 0.0187789651 0.0225924485 0.0153238415 0.0170529548 
##          349          350          351          352          353          354 
## 0.0187749363 0.0198525970 0.0209559658 0.0176936106 0.0178462078 0.0007985918 
##          355          356          357          358          359          360 
## 0.0007337668 0.0007739984 0.0007533362 0.0006065061 0.0008684522 0.0005551224 
##          361          362          363          364          365          366 
## 0.0005747017 0.0006147297 0.0007264931 0.0004557065 0.0006106372 0.0007062232 
##          367          368          369          370          371          372 
## 0.0010003772 0.0006966461 0.0008590111 0.0335614314 0.0375255454 0.0850899885 
##          373          374          375          376          377          378 
## 0.0043626010 0.0028493451 0.0001291378 0.0018847789 0.0044401182 0.0006612728 
##          379          380          381          382          383          384 
## 0.0009477544 0.0014454243 0.0010045696 0.0004373985 0.0006524030 0.0007164283 
##          385          386          387          388          389          390 
## 0.0184000081 0.0275493692 0.0244452822 0.0107659408 0.0419985564 0.0543408875 
##          391          392          393          394          395          396 
## 0.0439101907 0.0083661057 0.0259016421 0.0270580411 0.0189880653 0.0296389452 
##          397          398          399          400          401          402 
## 0.0281879788 0.0289737291 0.0237464905 0.0102012255 0.0027985505 0.0022808377 
##          403          404          405          406          407          408 
## 0.0079108619 0.0346428543 0.0318788428 0.0358526677 0.0263144119 0.0217030645 
##          409          410          411          412          413          414 
## 0.2257168865 0.2424655645 0.2464889826 0.2455751555 0.2144570747 0.2437333261 
##          415          416          417          418          419          420 
## 0.1968818778 0.2733129402 0.2334103710 0.2495909901 0.2124113419 0.2202684381 
##          421          422          423          424          425          426 
## 0.2263333579 0.2315698309 0.2620667426 0.2413749842 0.2491528364 0.2207772987 
##          427          428          429          430          431          432 
## 0.2326109112 0.2306743939 0.1149586715 0.1550945748 0.2087461993 0.2281683498 
##          433          434          435          436          437          438 
## 0.2498022835 0.1906003869 0.2564652432 0.2597105914 0.2658442529 0.3268816736 
##          439          440          441          442          443          444 
## 0.3644901974 0.0119085763 0.0153717553 0.0100552641 0.1754079310 0.0088958899 
##          445          446          447          448          449          450 
## 0.0092896744 0.0006292188 0.0007164239 0.0004287314 0.0009183739 0.0378388826 
##          451          452          453          454          455          456 
## 0.2069871140 0.1773452546 0.1161040838 0.0052243920 0.0066122036 0.0089536724 
##          457          458          459          460          461          462 
## 0.0086755358 0.0025384541 0.0026407118 0.0081524597 0.0076338712 0.0074746600 
##          463          464          465          466          467          468 
## 0.0073688760 0.0022689476 0.0032176605 0.0006743661 0.0084324129 0.0085736344 
##          469          470          471          472          473          474 
## 0.0029248931 0.0064785595 0.0071039324 0.0051158835 0.0045618622 0.0048751756 
##          475          476          477          478 
## 0.0139618365 0.0078446828 0.0035332043 0.0028445210

-結果は最大で0.47であり,全て0.5以下になっている。

回帰係数を個別に検定する。

stargazer(r3, 
          style = "ajps", # AJPS仕様
          dep.var.labels = "影響あり or 影響なし", # 応答変数の名前
          title = "厚木のテストデータを使ったパラメータの影響度判定", # タイトル
          type ="html")

**厚木のテストデータを使ったパラメータの影響度判定**

	影響あり or 影響なし

slope	-0.254^*
	(0.140)
sitadokaburi	-0.331
	(0.249)
nendo	0.243^***
	(0.065)
kei	2.065^***
	(0.766)
kubun1	-1.388
	(0.905)
did1	2.479^**
	(1.182)
kansyu1	3.578^***
	(1.147)
Constant	-488.362^***
	(129.872)
N	478
Log Likelihood	-83.132
AIC	182.264

p < .01; p < .05; p < .1

-5％で統計的に優位なパラメータは、nendo,kansyu,　kei とdidは1％で有意となった。即ち影響があった

http://www.ner.takushoku-u.ac.jp/masano/class_material/waseda/keiryo/15_logit.html

ロジスティック曲線

kagaya　fumitaka

2020/2/28

厚木市下水道管路で試行する

Rによる条件付きロジスティック回帰分析

下水道データ読み込み# 基本統計量表示 gesui　# 教科書ではlogit

各変数の正規分布性の確認(qqnorm()関数）s悦明変数は正規分布しているか？

シャピロ・ウィルク検定

slopeの確立密度関数

slope等の非正規なデータの最尤法フィッティング

slope等の非正規なデータの最尤法フィッティング

目的変数・説明変数共に正規性は関係ないこととしてロジスティック回帰式の推定値を求める

ロジスティック曲線

kagaya fumitaka

2020/2/28

厚木市下水道管路で試行する

Rによる条件付きロジスティック回帰分析

下水道データ読み込み# 基本統計量表示 gesui # 教科書ではlogit

各変数の正規分布性の確認(qqnorm()関数）s悦明変数は正規分布しているか？

シャピロ・ウィルク検定

slopeの確立密度関数

slope等の非正規なデータの最尤法フィッティング

slope等の非正規なデータの最尤法フィッティング

目的変数・説明変数共に正規性は関係ないこととしてロジスティック回帰式の推定値を求める

kagaya　fumitaka

下水道データ読み込み# 基本統計量表示 gesui　# 教科書ではlogit