【注意】
RStudioのViewerは一部LaTeXを使った数式HTML表示（MathJax）に対応していない。 MathJaxのエラー「Math Processing Error」がでた場合は，knitで出力されたHTMLファイルをChromeなどの通常のウェブブラウザでみる。

1 データ

x <- 1:30
n <- length(x)

b0 <- 20
b1 <- 1.2

set.seed(2)
e <- rnorm(n, mean = 0, sd = 5)

y <- b0 + b1 * x + e

ybar <- mean(y)

# Test data
#x <- c(1, 2, 3, 4, 5)
#y <- c(2, 2, 4, 3, 5)

d <- data.frame(x, y)
d

# カラーパレット
# [RGB_Color] https://www.rapidtables.com/web/color/RGB_Color.html
COL <- c(rgb(255,   0,   0,  255, max = 255), # 赤
         rgb(  0,   0, 255,  255, max = 255), # 青
         rgb(  0, 155,   0,  255, max = 255)) # 緑

1.1 散布図

matplot(x, y, pch = 1, col = COL[1])
grid()

2 単回帰モデル

\[ \begin{align} Y_i=&\beta_0 + \beta_1 x_i + \epsilon_i， \epsilon_i \sim \mathbf{N}(0, \sigma^2)\\ &ここで，\\ &Y_i：目的変数（確率変数）\\ &x_i：説明変数（定数）\\ &\beta_0：切片を表す母回帰係数（定数）\\ &\beta_1：傾きを表す母回帰係数（定数）\\ &\epsilon_i：誤差項（確率変数）\\ &\sigma^2：誤差分散（定数） \end{align} \]

回帰係数\(\beta_0, \beta_1\)の推定値\(b_0, b_1\)： \[ \begin{align} b_0 &= 20\\ b_1 &= 1.2 \end{align} \]

2.1 回帰係数の推定

fit <- lm(y ~ x, data = d)
summary(fit)

## 
## Call:
## lm(formula = y ~ x, data = d)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.5989  -2.8452   0.0335   3.6787   9.2076 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  20.8526     2.2356   9.327 4.38e-10 ***
## x             1.2188     0.1259   9.678 1.97e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.97 on 28 degrees of freedom
## Multiple R-squared:  0.7699, Adjusted R-squared:  0.7616 
## F-statistic: 93.66 on 1 and 28 DF,  p-value: 1.972e-10

2.2 グラフ

matplot(x, y, pch = 1, col = COL[1], main = '主タイトル')
grid()
matlines(x, fit$fitted, col = COL[2])

library(latex2exp)
legend('topleft', lty = c(NA, 1), pch = c(1, NA), col = COL, 
       legend = c('Data', TeX('$\\hat{y}_i = b_0 + b_1 x_i $')))

2.3 インタラクティブグラフ

library(plotly)

plot_ly() |>
  add_trace(x = x, y = y,          mode = 'markers', name = 'Data') |>
  add_trace(x = x, y = fit$fitted, mode = 'lines',   name = '$\\hat{y}_i = b_0 + b_1 x_i $') |>
  layout(font  = list(size = 11, color = 'blue', family = 'UD Digi Kyokasho NK-R'),
         title = '主タイトル',
         xaxis = list(title = 'x軸ラベル［単位］'),
         yaxis = list(title = 'y軸ラベル［単位］')) |>
  config(mathjax = 'cdn')

3 Python

3.1 データ

import numpy as np
import pandas as pd

b0 = 20
b1 = 1.2 

np.random.seed(3)
e = np.random.normal(loc = 0, scale = 5, size = 30).reshape(-1,1)

x = np.arange(1, 31, 1).reshape(-1,1)
y = b0 + b1*x + e

3.2 回帰分析

from sklearn.linear_model import LinearRegression

model_lr = LinearRegression()
model_lr.fit(x, y)

LinearRegression()

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

3.3 グラフ

import matplotlib.pyplot as plt

# ラベル
plt.title('主タイトル')
plt.xlabel('x軸ラベル［単位］')
plt.ylabel('y軸ラベル［単位］')

# 格子線（grid lines）
plt.grid(linestyle = '--', color = (0.9, 0.9, 0.9, 0.25))
plt.plot(x, y, 'o', label = 'Data points')
plt.plot(x, model_lr.predict(x), linestyle = 'solid', 
         label = '$\hat{y}_i = b_0 + b_1 x_i$')

# 凡例（はんれい）
plt.legend(loc = 'lower right')

plt.show()

回帰分析（単回帰）

東京国際大学データサイエンス教育研究所竹田恒

2024-06-06

1 データ

1.1 散布図

2 単回帰モデル

2.1 回帰係数の推定

2.2 グラフ

2.3 インタラクティブグラフ

3 Python

3.1 データ

3.2 回帰分析

3.3 グラフ

回帰分析（単回帰）

東京国際大学 データサイエンス教育研究所 竹田 恒

2024-06-06

1 データ

1.1 散布図

2 単回帰モデル

2.1 回帰係数の推定

2.2 グラフ

2.3 インタラクティブグラフ

3 Python

3.1 データ

3.2 回帰分析

3.3 グラフ

東京国際大学データサイエンス教育研究所竹田恒