considering the part to be measured

今回はdplyrというパッケージを使用するため、tidyverseというパッケージ群を読み込みます。

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.0     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

まずは適当なデータを作成します。

rnorm関数を用いて、 - 平均値が5で標準偏差が1である25個の乱数(x1) - 平均値が15で標準偏差が1である25個の乱数(x2) - 平均値が25で標準偏差が1である25個の乱数(x3) を作成します。

作成した3つをまとめてxというリストにした後、matrix関数を用いて、25個の変数を15$$5の行列に変換します。

x1 <- rnorm(25, 5, 1) # 乱数生成
x2 <- rnorm(25, 15, 1)
x3 <- rnorm(25, 25, 1)
x <- c(x1, x2, x3)

y <- matrix(x,  nrow = 15) # 行列形式に変換

print(y)

##           [,1]      [,2]     [,3]     [,4]     [,5]
##  [1,] 5.368455  6.416307 14.33045 15.38254 24.85243
##  [2,] 4.484531  4.004149 14.17378 14.29903 24.92441
##  [3,] 5.239062  6.611719 14.29502 15.71494 25.75965
##  [4,] 5.750844  5.774671 17.33479 15.04075 26.76501
##  [5,] 5.120660  5.624268 15.98253 14.53951 24.59794
##  [6,] 5.161773  4.419253 13.92941 25.73259 24.17113
##  [7,] 5.028879  5.736402 15.19318 25.65678 24.26277
##  [8,] 5.216355  4.823768 14.60389 26.00079 25.13239
##  [9,] 6.222579  5.236826 15.57491 25.28386 25.41083
## [10,] 5.984525  4.928497 15.02514 25.74755 22.96484
## [11,] 5.016479 12.992640 16.06305 24.67796 24.83647
## [12,] 3.826577 15.293570 14.01583 25.95008 25.31340
## [13,] 5.149746 16.673695 16.28771 23.67503 24.53799
## [14,] 5.560246 15.377483 14.83786 24.15200 26.34496
## [15,] 6.577010 15.928976 13.75147 25.56136 26.28460

以上の行列のうち、各行がアメリカザリガニの各個体を示し、各列が測定者を示すと考えます。ここでアメリカザリガニを3つのグループに分けて区別できるようにするため、groupという列を新たに作成して、行列に結合します。

group = rep(1:3, each = 5) # groupの列の要素を作成

y <- cbind(y, group) # cbind関数で列として、groupをyに結合

print(y)

##                                                     group
##  [1,] 5.368455  6.416307 14.33045 15.38254 24.85243     1
##  [2,] 4.484531  4.004149 14.17378 14.29903 24.92441     1
##  [3,] 5.239062  6.611719 14.29502 15.71494 25.75965     1
##  [4,] 5.750844  5.774671 17.33479 15.04075 26.76501     1
##  [5,] 5.120660  5.624268 15.98253 14.53951 24.59794     1
##  [6,] 5.161773  4.419253 13.92941 25.73259 24.17113     2
##  [7,] 5.028879  5.736402 15.19318 25.65678 24.26277     2
##  [8,] 5.216355  4.823768 14.60389 26.00079 25.13239     2
##  [9,] 6.222579  5.236826 15.57491 25.28386 25.41083     2
## [10,] 5.984525  4.928497 15.02514 25.74755 22.96484     2
## [11,] 5.016479 12.992640 16.06305 24.67796 24.83647     3
## [12,] 3.826577 15.293570 14.01583 25.95008 25.31340     3
## [13,] 5.149746 16.673695 16.28771 23.67503 24.53799     3
## [14,] 5.560246 15.377483 14.83786 24.15200 26.34496     3
## [15,] 6.577010 15.928976 13.75147 25.56136 26.28460     3

その後、行列yをdata.frame関数を用いて、データフレーム形式に変換します。

df <- data.frame(y) # データフレーム形式に変換

print(df)

##          V1        V2       V3       V4       V5 group
## 1  5.368455  6.416307 14.33045 15.38254 24.85243     1
## 2  4.484531  4.004149 14.17378 14.29903 24.92441     1
## 3  5.239062  6.611719 14.29502 15.71494 25.75965     1
## 4  5.750844  5.774671 17.33479 15.04075 26.76501     1
## 5  5.120660  5.624268 15.98253 14.53951 24.59794     1
## 6  5.161773  4.419253 13.92941 25.73259 24.17113     2
## 7  5.028879  5.736402 15.19318 25.65678 24.26277     2
## 8  5.216355  4.823768 14.60389 26.00079 25.13239     2
## 9  6.222579  5.236826 15.57491 25.28386 25.41083     2
## 10 5.984525  4.928497 15.02514 25.74755 22.96484     2
## 11 5.016479 12.992640 16.06305 24.67796 24.83647     3
## 12 3.826577 15.293570 14.01583 25.95008 25.31340     3
## 13 5.149746 16.673695 16.28771 23.67503 24.53799     3
## 14 5.560246 15.377483 14.83786 24.15200 26.34496     3
## 15 6.577010 15.928976 13.75147 25.56136 26.28460     3

測定者による測定値のバラツキを示すため、アメリカザリガニの個体ごとに標準偏差を求めます。

df <- df %>% mutate(sd = apply(df, 1, sd))

df %>%
  ggplot(aes(x = factor(group), y = sd)) +
  geom_point()

considering the part to be measured

Kazuma-Nakano

2024-04-29