pm2

write an R script to create a scatter plot, incorporating categorical analysis through color-coded data points representing different group,using ggplot2.

Step1: Load libraries

We load two libraries:

  • ‘ggplot2’ is used to build plots layer-by-layer(we will use it to create the scatter plot).

  • ‘dplyr’ provides functions for exploring and summarizing data(we will use it to understand the categorical in the data set).

library(ggplot2)
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

step 2:Load the dataset (iris)

we use the built-in data set iris.

what this data set contains:

  • Each row is one flower sample (an observation).
  • There are 150 total observation. -The column Species is a categorical variable with 3 groups:
    • setosa
    • versicolor
    • virginica
  • The column sepal.Length and Sepal.Width are numeric measurement that we will plot.
data <- iris

step:3

head(data,10)
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1           5.1         3.5          1.4         0.2  setosa
2           4.9         3.0          1.4         0.2  setosa
3           4.7         3.2          1.3         0.2  setosa
4           4.6         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4         0.2  setosa
6           5.4         3.9          1.7         0.4  setosa
7           4.6         3.4          1.4         0.3  setosa
8           5.0         3.4          1.5         0.2  setosa
9           4.4         2.9          1.4         0.2  setosa
10          4.9         3.1          1.5         0.1  setosa
tail(data)
    Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
145          6.7         3.3          5.7         2.5 virginica
146          6.7         3.0          5.2         2.3 virginica
147          6.3         2.5          5.0         1.9 virginica
148          6.5         3.0          5.2         2.0 virginica
149          6.2         3.4          5.4         2.3 virginica
150          5.9         3.0          5.1         1.8 virginica
names(data)
[1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"     
summary(data)
  Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
 Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
 1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
 Median :5.800   Median :3.000   Median :4.350   Median :1.300  
 Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
 3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
 Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
       Species  
 setosa    :50  
 versicolor:50  
 virginica :50  
                
                
                
str(data)
'data.frame':   150 obs. of  5 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
dim(data)
[1] 150   5
data$Sepal.Length
  [1] 5.1 4.9 4.7 4.6 5.0 5.4 4.6 5.0 4.4 4.9 5.4 4.8 4.8 4.3 5.8 5.7 5.4 5.1
 [19] 5.7 5.1 5.4 5.1 4.6 5.1 4.8 5.0 5.0 5.2 5.2 4.7 4.8 5.4 5.2 5.5 4.9 5.0
 [37] 5.5 4.9 4.4 5.1 5.0 4.5 4.4 5.0 5.1 4.8 5.1 4.6 5.3 5.0 7.0 6.4 6.9 5.5
 [55] 6.5 5.7 6.3 4.9 6.6 5.2 5.0 5.9 6.0 6.1 5.6 6.7 5.6 5.8 6.2 5.6 5.9 6.1
 [73] 6.3 6.1 6.4 6.6 6.8 6.7 6.0 5.7 5.5 5.5 5.8 6.0 5.4 6.0 6.7 6.3 5.6 5.5
 [91] 5.5 6.1 5.8 5.0 5.6 5.7 5.7 6.2 5.1 5.7 6.3 5.8 7.1 6.3 6.5 7.6 4.9 7.3
[109] 6.7 7.2 6.5 6.4 6.8 5.7 5.8 6.4 6.5 7.7 7.7 6.0 6.9 5.6 7.7 6.3 6.7 7.2
[127] 6.2 6.1 6.4 7.2 7.4 7.9 6.4 6.3 6.1 7.7 6.3 6.4 6.0 6.9 6.7 6.9 5.8 6.8
[145] 6.7 6.7 6.3 6.5 6.2 5.9
typeof(data$Sepal.Length)
[1] "double"
typeof(data[1])
[1] "list"
data[][1]
    Sepal.Length
1            5.1
2            4.9
3            4.7
4            4.6
5            5.0
6            5.4
7            4.6
8            5.0
9            4.4
10           4.9
11           5.4
12           4.8
13           4.8
14           4.3
15           5.8
16           5.7
17           5.4
18           5.1
19           5.7
20           5.1
21           5.4
22           5.1
23           4.6
24           5.1
25           4.8
26           5.0
27           5.0
28           5.2
29           5.2
30           4.7
31           4.8
32           5.4
33           5.2
34           5.5
35           4.9
36           5.0
37           5.5
38           4.9
39           4.4
40           5.1
41           5.0
42           4.5
43           4.4
44           5.0
45           5.1
46           4.8
47           5.1
48           4.6
49           5.3
50           5.0
51           7.0
52           6.4
53           6.9
54           5.5
55           6.5
56           5.7
57           6.3
58           4.9
59           6.6
60           5.2
61           5.0
62           5.9
63           6.0
64           6.1
65           5.6
66           6.7
67           5.6
68           5.8
69           6.2
70           5.6
71           5.9
72           6.1
73           6.3
74           6.1
75           6.4
76           6.6
77           6.8
78           6.7
79           6.0
80           5.7
81           5.5
82           5.5
83           5.8
84           6.0
85           5.4
86           6.0
87           6.7
88           6.3
89           5.6
90           5.5
91           5.5
92           6.1
93           5.8
94           5.0
95           5.6
96           5.7
97           5.7
98           6.2
99           5.1
100          5.7
101          6.3
102          5.8
103          7.1
104          6.3
105          6.5
106          7.6
107          4.9
108          7.3
109          6.7
110          7.2
111          6.5
112          6.4
113          6.8
114          5.7
115          5.8
116          6.4
117          6.5
118          7.7
119          7.7
120          6.0
121          6.9
122          5.6
123          7.7
124          6.3
125          6.7
126          7.2
127          6.2
128          6.1
129          6.4
130          7.2
131          7.4
132          7.9
133          6.4
134          6.3
135          6.1
136          7.7
137          6.3
138          6.4
139          6.0
140          6.9
141          6.7
142          6.9
143          5.8
144          6.8
145          6.7
146          6.7
147          6.3
148          6.5
149          6.2
150          5.9
data[2][]
    Sepal.Width
1           3.5
2           3.0
3           3.2
4           3.1
5           3.6
6           3.9
7           3.4
8           3.4
9           2.9
10          3.1
11          3.7
12          3.4
13          3.0
14          3.0
15          4.0
16          4.4
17          3.9
18          3.5
19          3.8
20          3.8
21          3.4
22          3.7
23          3.6
24          3.3
25          3.4
26          3.0
27          3.4
28          3.5
29          3.4
30          3.2
31          3.1
32          3.4
33          4.1
34          4.2
35          3.1
36          3.2
37          3.5
38          3.6
39          3.0
40          3.4
41          3.5
42          2.3
43          3.2
44          3.5
45          3.8
46          3.0
47          3.8
48          3.2
49          3.7
50          3.3
51          3.2
52          3.2
53          3.1
54          2.3
55          2.8
56          2.8
57          3.3
58          2.4
59          2.9
60          2.7
61          2.0
62          3.0
63          2.2
64          2.9
65          2.9
66          3.1
67          3.0
68          2.7
69          2.2
70          2.5
71          3.2
72          2.8
73          2.5
74          2.8
75          2.9
76          3.0
77          2.8
78          3.0
79          2.9
80          2.6
81          2.4
82          2.4
83          2.7
84          2.7
85          3.0
86          3.4
87          3.1
88          2.3
89          3.0
90          2.5
91          2.6
92          3.0
93          2.6
94          2.3
95          2.7
96          3.0
97          2.9
98          2.9
99          2.5
100         2.8
101         3.3
102         2.7
103         3.0
104         2.9
105         3.0
106         3.0
107         2.5
108         2.9
109         2.5
110         3.6
111         3.2
112         2.7
113         3.0
114         2.5
115         2.8
116         3.2
117         3.0
118         3.8
119         2.6
120         2.2
121         3.2
122         2.8
123         2.8
124         2.7
125         3.3
126         3.2
127         2.8
128         3.0
129         2.8
130         3.0
131         2.8
132         3.8
133         2.8
134         2.8
135         2.6
136         3.0
137         3.4
138         3.1
139         3.0
140         3.1
141         3.1
142         3.1
143         2.7
144         3.2
145         3.3
146         3.0
147         2.5
148         3.0
149         3.4
150         3.0
data[][5]
       Species
1       setosa
2       setosa
3       setosa
4       setosa
5       setosa
6       setosa
7       setosa
8       setosa
9       setosa
10      setosa
11      setosa
12      setosa
13      setosa
14      setosa
15      setosa
16      setosa
17      setosa
18      setosa
19      setosa
20      setosa
21      setosa
22      setosa
23      setosa
24      setosa
25      setosa
26      setosa
27      setosa
28      setosa
29      setosa
30      setosa
31      setosa
32      setosa
33      setosa
34      setosa
35      setosa
36      setosa
37      setosa
38      setosa
39      setosa
40      setosa
41      setosa
42      setosa
43      setosa
44      setosa
45      setosa
46      setosa
47      setosa
48      setosa
49      setosa
50      setosa
51  versicolor
52  versicolor
53  versicolor
54  versicolor
55  versicolor
56  versicolor
57  versicolor
58  versicolor
59  versicolor
60  versicolor
61  versicolor
62  versicolor
63  versicolor
64  versicolor
65  versicolor
66  versicolor
67  versicolor
68  versicolor
69  versicolor
70  versicolor
71  versicolor
72  versicolor
73  versicolor
74  versicolor
75  versicolor
76  versicolor
77  versicolor
78  versicolor
79  versicolor
80  versicolor
81  versicolor
82  versicolor
83  versicolor
84  versicolor
85  versicolor
86  versicolor
87  versicolor
88  versicolor
89  versicolor
90  versicolor
91  versicolor
92  versicolor
93  versicolor
94  versicolor
95  versicolor
96  versicolor
97  versicolor
98  versicolor
99  versicolor
100 versicolor
101  virginica
102  virginica
103  virginica
104  virginica
105  virginica
106  virginica
107  virginica
108  virginica
109  virginica
110  virginica
111  virginica
112  virginica
113  virginica
114  virginica
115  virginica
116  virginica
117  virginica
118  virginica
119  virginica
120  virginica
121  virginica
122  virginica
123  virginica
124  virginica
125  virginica
126  virginica
127  virginica
128  virginica
129  virginica
130  virginica
131  virginica
132  virginica
133  virginica
134  virginica
135  virginica
136  virginica
137  virginica
138  virginica
139  virginica
140  virginica
141  virginica
142  virginica
143  virginica
144  virginica
145  virginica
146  virginica
147  virginica
148  virginica
149  virginica
150  virginica
data[1:5,1:3]
  Sepal.Length Sepal.Width Petal.Length
1          5.1         3.5          1.4
2          4.9         3.0          1.4
3          4.7         3.2          1.3
4          4.6         3.1          1.5
5          5.0         3.6          1.4
data[1,]
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa

step 4:

table(data$Species)

    setosa versicolor  virginica 
        50         50         50 
data$Species
  [1] setosa     setosa     setosa     setosa     setosa     setosa    
  [7] setosa     setosa     setosa     setosa     setosa     setosa    
 [13] setosa     setosa     setosa     setosa     setosa     setosa    
 [19] setosa     setosa     setosa     setosa     setosa     setosa    
 [25] setosa     setosa     setosa     setosa     setosa     setosa    
 [31] setosa     setosa     setosa     setosa     setosa     setosa    
 [37] setosa     setosa     setosa     setosa     setosa     setosa    
 [43] setosa     setosa     setosa     setosa     setosa     setosa    
 [49] setosa     setosa     versicolor versicolor versicolor versicolor
 [55] versicolor versicolor versicolor versicolor versicolor versicolor
 [61] versicolor versicolor versicolor versicolor versicolor versicolor
 [67] versicolor versicolor versicolor versicolor versicolor versicolor
 [73] versicolor versicolor versicolor versicolor versicolor versicolor
 [79] versicolor versicolor versicolor versicolor versicolor versicolor
 [85] versicolor versicolor versicolor versicolor versicolor versicolor
 [91] versicolor versicolor versicolor versicolor versicolor versicolor
 [97] versicolor versicolor versicolor versicolor virginica  virginica 
[103] virginica  virginica  virginica  virginica  virginica  virginica 
[109] virginica  virginica  virginica  virginica  virginica  virginica 
[115] virginica  virginica  virginica  virginica  virginica  virginica 
[121] virginica  virginica  virginica  virginica  virginica  virginica 
[127] virginica  virginica  virginica  virginica  virginica  virginica 
[133] virginica  virginica  virginica  virginica  virginica  virginica 
[139] virginica  virginica  virginica  virginica  virginica  virginica 
[145] virginica  virginica  virginica  virginica  virginica  virginica 
Levels: setosa versicolor virginica

step 5 create a basic scatter plot ( no categories yet)

A scatter plot shows the relationship between two numeric variables.

Here we plot:

  • X-axis: Sepal.Length
  • Y-axis: Sepal.Length

Important point:

  • Each dot represents one flower (one row in the dataset) .
ggplot(data,aes(x =Sepal.Length, y = Sepal.Width)) +
  geom_point()

step 6: Add categorical grouping using color = Species

  • color = Species tells ggplot2 to assign a different color to each species.

what changes?

  • The plot now visually separates the three species based on color. _ The is the main “categorical analysis” idea: we can see if different groups cluster differently.
ggplot(data,aes(x =Sepal.Length, y = Sepal.Width, color = Species)) +
  geom_point()

step 7: Improve point visibility (size and transparency)

we adjust how points look:

  • size = 3 makes each dot bigger, so it is easier to see. _ alpha = 0.7 makes dots slight transparent,which helps when points overlap.

why transparency helps:

  • If may points overlap in the same region, transparency makes dense areas more visible.
ggplot(data,aes(x =Sepal.Length, y = Sepal.Width, color = Species)) +
  geom_point(size = 3,alpha = 0.7)

step 8: Add informative labels(title,axes,legend)

Good plots should clearly communicate what the viewer is seeing.

labs() adds:

  • title for the plot heading
  • x and y axis labels
  • color legend title(so the legend has a meaningful name)
ggplot(data,aes(x =Sepal.Length, y = Sepal.Width, color = Species)) +
  geom_point(size = 3,alpha = 0.7) +
  labs(
    title = "Scatter plot of Sepal Dimensions",
    x = "Sepal Length",
    y = "Sepal Length",
    color = "Species"
  )

)

step 9: Apply a clean theme and move the legend

Theme control the background, grids,and text styling.

  • theme_minimal() removes heavy backgrounds and gives a clean look.
  • theme(legend.position = "top") moves the legend above the plot

why move the legend/

-when the legend is it at the top,it is often easier to noties and read,especially in presentations

ggplot(data,aes(x =Sepal.Length, y = Sepal.Width, color = Species)) +
  geom_point(size = 3,alpha = 0.7) +
  labs(
    title = "Scatter plot of Sepal Dimensions",
    x = "Sepal Length",
    y = "Sepal Length",
    color = "Species"
  )+
  theme_minimal()+
  theme(legend.position = "top")

Discssion Questions

1.Do you see cluster of points by species? 2.which species appears most separated from the others? 3.what happens if you plot ‘petal length’ vs ‘petal width’ instead? 4.what changes if you remove ’alpha, or increase “size”further?