If you have a 2D scatterplot, how do you add a line of best fit to the scatterplot?
We’ll use the “palmerpenguins” packages (https://allisonhorst.github.io/palmerpenguins/) to address this question. You’ll need to install the package with install.packages(“palmerpenguins”) if you have not done so before, call library(““palmerpenguins”), and load the data with data(penguins)
#install.packages("palmerpenguins")
library(palmerpenguins)
data(penguins)
Let’s put two columns from this dataframe into vectors called X and Y. (Don’t worry if you don’t know how to do this, just run the code).
X <- penguins$body_mass_g
Y <-penguins$bill_length_mm
Next, let’s create a dataframe and put put both vectors in it.
df <- data.frame(body.mass = X,
bill.length = Y)
df
## body.mass bill.length
## 1 3750 39.1
## 2 3800 39.5
## 3 3250 40.3
## 4 NA NA
## 5 3450 36.7
## 6 3650 39.3
## 7 3625 38.9
## 8 4675 39.2
## 9 3475 34.1
## 10 4250 42.0
## 11 3300 37.8
## 12 3700 37.8
## 13 3200 41.1
## 14 3800 38.6
## 15 4400 34.6
## 16 3700 36.6
## 17 3450 38.7
## 18 4500 42.5
## 19 3325 34.4
## 20 4200 46.0
## 21 3400 37.8
## 22 3600 37.7
## 23 3800 35.9
## 24 3950 38.2
## 25 3800 38.8
## 26 3800 35.3
## 27 3550 40.6
## 28 3200 40.5
## 29 3150 37.9
## 30 3950 40.5
## 31 3250 39.5
## 32 3900 37.2
## 33 3300 39.5
## 34 3900 40.9
## 35 3325 36.4
## 36 4150 39.2
## 37 3950 38.8
## 38 3550 42.2
## 39 3300 37.6
## 40 4650 39.8
## 41 3150 36.5
## 42 3900 40.8
## 43 3100 36.0
## 44 4400 44.1
## 45 3000 37.0
## 46 4600 39.6
## 47 3425 41.1
## 48 2975 37.5
## 49 3450 36.0
## 50 4150 42.3
## 51 3500 39.6
## 52 4300 40.1
## 53 3450 35.0
## 54 4050 42.0
## 55 2900 34.5
## 56 3700 41.4
## 57 3550 39.0
## 58 3800 40.6
## 59 2850 36.5
## 60 3750 37.6
## 61 3150 35.7
## 62 4400 41.3
## 63 3600 37.6
## 64 4050 41.1
## 65 2850 36.4
## 66 3950 41.6
## 67 3350 35.5
## 68 4100 41.1
## 69 3050 35.9
## 70 4450 41.8
## 71 3600 33.5
## 72 3900 39.7
## 73 3550 39.6
## 74 4150 45.8
## 75 3700 35.5
## 76 4250 42.8
## 77 3700 40.9
## 78 3900 37.2
## 79 3550 36.2
## 80 4000 42.1
## 81 3200 34.6
## 82 4700 42.9
## 83 3800 36.7
## 84 4200 35.1
## 85 3350 37.3
## 86 3550 41.3
## 87 3800 36.3
## 88 3500 36.9
## 89 3950 38.3
## 90 3600 38.9
## 91 3550 35.7
## 92 4300 41.1
## 93 3400 34.0
## 94 4450 39.6
## 95 3300 36.2
## 96 4300 40.8
## 97 3700 38.1
## 98 4350 40.3
## 99 2900 33.1
## 100 4100 43.2
## 101 3725 35.0
## 102 4725 41.0
## 103 3075 37.7
## 104 4250 37.8
## 105 2925 37.9
## 106 3550 39.7
## 107 3750 38.6
## 108 3900 38.2
## 109 3175 38.1
## 110 4775 43.2
## 111 3825 38.1
## 112 4600 45.6
## 113 3200 39.7
## 114 4275 42.2
## 115 3900 39.6
## 116 4075 42.7
## 117 2900 38.6
## 118 3775 37.3
## 119 3350 35.7
## 120 3325 41.1
## 121 3150 36.2
## 122 3500 37.7
## 123 3450 40.2
## 124 3875 41.4
## 125 3050 35.2
## 126 4000 40.6
## 127 3275 38.8
## 128 4300 41.5
## 129 3050 39.0
## 130 4000 44.1
## 131 3325 38.5
## 132 3500 43.1
## 133 3500 36.8
## 134 4475 37.5
## 135 3425 38.1
## 136 3900 41.1
## 137 3175 35.6
## 138 3975 40.2
## 139 3400 37.0
## 140 4250 39.7
## 141 3400 40.2
## 142 3475 40.6
## 143 3050 32.1
## 144 3725 40.7
## 145 3000 37.3
## 146 3650 39.0
## 147 4250 39.2
## 148 3475 36.6
## 149 3450 36.0
## 150 3750 37.8
## 151 3700 36.0
## 152 4000 41.5
## 153 4500 46.1
## 154 5700 50.0
## 155 4450 48.7
## 156 5700 50.0
## 157 5400 47.6
## 158 4550 46.5
## 159 4800 45.4
## 160 5200 46.7
## 161 4400 43.3
## 162 5150 46.8
## 163 4650 40.9
## 164 5550 49.0
## 165 4650 45.5
## 166 5850 48.4
## 167 4200 45.8
## 168 5850 49.3
## 169 4150 42.0
## 170 6300 49.2
## 171 4800 46.2
## 172 5350 48.7
## 173 5700 50.2
## 174 5000 45.1
## 175 4400 46.5
## 176 5050 46.3
## 177 5000 42.9
## 178 5100 46.1
## 179 4100 44.5
## 180 5650 47.8
## 181 4600 48.2
## 182 5550 50.0
## 183 5250 47.3
## 184 4700 42.8
## 185 5050 45.1
## 186 6050 59.6
## 187 5150 49.1
## 188 5400 48.4
## 189 4950 42.6
## 190 5250 44.4
## 191 4350 44.0
## 192 5350 48.7
## 193 3950 42.7
## 194 5700 49.6
## 195 4300 45.3
## 196 4750 49.6
## 197 5550 50.5
## 198 4900 43.6
## 199 4200 45.5
## 200 5400 50.5
## 201 5100 44.9
## 202 5300 45.2
## 203 4850 46.6
## 204 5300 48.5
## 205 4400 45.1
## 206 5000 50.1
## 207 4900 46.5
## 208 5050 45.0
## 209 4300 43.8
## 210 5000 45.5
## 211 4450 43.2
## 212 5550 50.4
## 213 4200 45.3
## 214 5300 46.2
## 215 4400 45.7
## 216 5650 54.3
## 217 4700 45.8
## 218 5700 49.8
## 219 4650 46.2
## 220 5800 49.5
## 221 4700 43.5
## 222 5550 50.7
## 223 4750 47.7
## 224 5000 46.4
## 225 5100 48.2
## 226 5200 46.5
## 227 4700 46.4
## 228 5800 48.6
## 229 4600 47.5
## 230 6000 51.1
## 231 4750 45.2
## 232 5950 45.2
## 233 4625 49.1
## 234 5450 52.5
## 235 4725 47.4
## 236 5350 50.0
## 237 4750 44.9
## 238 5600 50.8
## 239 4600 43.4
## 240 5300 51.3
## 241 4875 47.5
## 242 5550 52.1
## 243 4950 47.5
## 244 5400 52.2
## 245 4750 45.5
## 246 5650 49.5
## 247 4850 44.5
## 248 5200 50.8
## 249 4925 49.4
## 250 4875 46.9
## 251 4625 48.4
## 252 5250 51.1
## 253 4850 48.5
## 254 5600 55.9
## 255 4975 47.2
## 256 5500 49.1
## 257 4725 47.3
## 258 5500 46.8
## 259 4700 41.7
## 260 5500 53.4
## 261 4575 43.3
## 262 5500 48.1
## 263 5000 50.5
## 264 5950 49.8
## 265 4650 43.5
## 266 5500 51.5
## 267 4375 46.2
## 268 5850 55.1
## 269 4875 44.5
## 270 6000 48.8
## 271 4925 47.2
## 272 NA NA
## 273 4850 46.8
## 274 5750 50.4
## 275 5200 45.2
## 276 5400 49.9
## 277 3500 46.5
## 278 3900 50.0
## 279 3650 51.3
## 280 3525 45.4
## 281 3725 52.7
## 282 3950 45.2
## 283 3250 46.1
## 284 3750 51.3
## 285 4150 46.0
## 286 3700 51.3
## 287 3800 46.6
## 288 3775 51.7
## 289 3700 47.0
## 290 4050 52.0
## 291 3575 45.9
## 292 4050 50.5
## 293 3300 50.3
## 294 3700 58.0
## 295 3450 46.4
## 296 4400 49.2
## 297 3600 42.4
## 298 3400 48.5
## 299 2900 43.2
## 300 3800 50.6
## 301 3300 46.7
## 302 4150 52.0
## 303 3400 50.5
## 304 3800 49.5
## 305 3700 46.4
## 306 4550 52.8
## 307 3200 40.9
## 308 4300 54.2
## 309 3350 42.5
## 310 4100 51.0
## 311 3600 49.7
## 312 3900 47.5
## 313 3850 47.6
## 314 4800 52.0
## 315 2700 46.9
## 316 4500 53.5
## 317 3950 49.0
## 318 3650 46.2
## 319 3550 50.9
## 320 3500 45.5
## 321 3675 50.9
## 322 4450 50.8
## 323 3400 50.1
## 324 4300 49.0
## 325 3250 51.5
## 326 3675 49.8
## 327 3325 48.1
## 328 3950 51.4
## 329 3600 45.7
## 330 4050 50.7
## 331 3350 42.5
## 332 3450 52.2
## 333 3250 45.2
## 334 4050 49.3
## 335 3800 50.2
## 336 3525 45.6
## 337 3950 51.9
## 338 3650 46.8
## 339 3650 45.7
## 340 4000 55.8
## 341 3400 43.5
## 342 3775 49.6
## 343 4100 50.8
## 344 3775 50.2
Before we make our scatterplot, let’s acquire the equation for the line of best fit using the lm() function, which stands for ‘linear model’. Inside the parantheses for the function, make sure you use the names of the columns you made in the dataframe (i.e. body.mass and bill.length, not X and Y).
# lm(y-axis data ~ x-axis data , data = df)
line.xy <- lm(bill.length ~ body.mass, data = df)
Next, you plot your scatterplot, then follow the call for that by a function called abline(), and use line.xy as your argument for it.
plot(bill.length ~ body.mass, data = df)
abline(line.xy)