program 2

###Step:1 LOAD LIBRARIES

we load two libraries

ggplot2 is used to build layer by layer (we will use it to create the scatter plot)

dplyr provides func for exploring nd summarizing data(we will use it to understand the categories in dataset).

library("ggplot2")
library("dplyr")

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

##STEP:2 LOAD THE DATASET(iris) we use the built-in dataset ‘iris’ what this dataset contains: -Each row is one flower sample(an observation). -there are 150 total observations. -the column ‘species’ is a categorical variable with 3 groups: -setosa -versicolor -virginica - the columns Sepal.Length and Sepal.Width are numeric measurements that we will plot.

data<-iris
head(data,n=10)
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1           5.1         3.5          1.4         0.2  setosa
2           4.9         3.0          1.4         0.2  setosa
3           4.7         3.2          1.3         0.2  setosa
4           4.6         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4         0.2  setosa
6           5.4         3.9          1.7         0.4  setosa
7           4.6         3.4          1.4         0.3  setosa
8           5.0         3.4          1.5         0.2  setosa
9           4.4         2.9          1.4         0.2  setosa
10          4.9         3.1          1.5         0.1  setosa
tail(data,n=10)
    Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
141          6.7         3.1          5.6         2.4 virginica
142          6.9         3.1          5.1         2.3 virginica
143          5.8         2.7          5.1         1.9 virginica
144          6.8         3.2          5.9         2.3 virginica
145          6.7         3.3          5.7         2.5 virginica
146          6.7         3.0          5.2         2.3 virginica
147          6.3         2.5          5.0         1.9 virginica
148          6.5         3.0          5.2         2.0 virginica
149          6.2         3.4          5.4         2.3 virginica
150          5.9         3.0          5.1         1.8 virginica
names(data)
[1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"     
summary(data)
  Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
 Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
 1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
 Median :5.800   Median :3.000   Median :4.350   Median :1.300  
 Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
 3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
 Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
       Species  
 setosa    :50  
 versicolor:50  
 virginica :50  
                
                
                
str(data)
'data.frame':   150 obs. of  5 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
data[1]
    Sepal.Length
1            5.1
2            4.9
3            4.7
4            4.6
5            5.0
6            5.4
7            4.6
8            5.0
9            4.4
10           4.9
11           5.4
12           4.8
13           4.8
14           4.3
15           5.8
16           5.7
17           5.4
18           5.1
19           5.7
20           5.1
21           5.4
22           5.1
23           4.6
24           5.1
25           4.8
26           5.0
27           5.0
28           5.2
29           5.2
30           4.7
31           4.8
32           5.4
33           5.2
34           5.5
35           4.9
36           5.0
37           5.5
38           4.9
39           4.4
40           5.1
41           5.0
42           4.5
43           4.4
44           5.0
45           5.1
46           4.8
47           5.1
48           4.6
49           5.3
50           5.0
51           7.0
52           6.4
53           6.9
54           5.5
55           6.5
56           5.7
57           6.3
58           4.9
59           6.6
60           5.2
61           5.0
62           5.9
63           6.0
64           6.1
65           5.6
66           6.7
67           5.6
68           5.8
69           6.2
70           5.6
71           5.9
72           6.1
73           6.3
74           6.1
75           6.4
76           6.6
77           6.8
78           6.7
79           6.0
80           5.7
81           5.5
82           5.5
83           5.8
84           6.0
85           5.4
86           6.0
87           6.7
88           6.3
89           5.6
90           5.5
91           5.5
92           6.1
93           5.8
94           5.0
95           5.6
96           5.7
97           5.7
98           6.2
99           5.1
100          5.7
101          6.3
102          5.8
103          7.1
104          6.3
105          6.5
106          7.6
107          4.9
108          7.3
109          6.7
110          7.2
111          6.5
112          6.4
113          6.8
114          5.7
115          5.8
116          6.4
117          6.5
118          7.7
119          7.7
120          6.0
121          6.9
122          5.6
123          7.7
124          6.3
125          6.7
126          7.2
127          6.2
128          6.1
129          6.4
130          7.2
131          7.4
132          7.9
133          6.4
134          6.3
135          6.1
136          7.7
137          6.3
138          6.4
139          6.0
140          6.9
141          6.7
142          6.9
143          5.8
144          6.8
145          6.7
146          6.7
147          6.3
148          6.5
149          6.2
150          5.9
data$Sepal.Length
  [1] 5.1 4.9 4.7 4.6 5.0 5.4 4.6 5.0 4.4 4.9 5.4 4.8 4.8 4.3 5.8 5.7 5.4 5.1
 [19] 5.7 5.1 5.4 5.1 4.6 5.1 4.8 5.0 5.0 5.2 5.2 4.7 4.8 5.4 5.2 5.5 4.9 5.0
 [37] 5.5 4.9 4.4 5.1 5.0 4.5 4.4 5.0 5.1 4.8 5.1 4.6 5.3 5.0 7.0 6.4 6.9 5.5
 [55] 6.5 5.7 6.3 4.9 6.6 5.2 5.0 5.9 6.0 6.1 5.6 6.7 5.6 5.8 6.2 5.6 5.9 6.1
 [73] 6.3 6.1 6.4 6.6 6.8 6.7 6.0 5.7 5.5 5.5 5.8 6.0 5.4 6.0 6.7 6.3 5.6 5.5
 [91] 5.5 6.1 5.8 5.0 5.6 5.7 5.7 6.2 5.1 5.7 6.3 5.8 7.1 6.3 6.5 7.6 4.9 7.3
[109] 6.7 7.2 6.5 6.4 6.8 5.7 5.8 6.4 6.5 7.7 7.7 6.0 6.9 5.6 7.7 6.3 6.7 7.2
[127] 6.2 6.1 6.4 7.2 7.4 7.9 6.4 6.3 6.1 7.7 6.3 6.4 6.0 6.9 6.7 6.9 5.8 6.8
[145] 6.7 6.7 6.3 6.5 6.2 5.9
typeof(data$Sepal.Length)
[1] "double"
typeof(data[1])
[1] "list"
data [1]
    Sepal.Length
1            5.1
2            4.9
3            4.7
4            4.6
5            5.0
6            5.4
7            4.6
8            5.0
9            4.4
10           4.9
11           5.4
12           4.8
13           4.8
14           4.3
15           5.8
16           5.7
17           5.4
18           5.1
19           5.7
20           5.1
21           5.4
22           5.1
23           4.6
24           5.1
25           4.8
26           5.0
27           5.0
28           5.2
29           5.2
30           4.7
31           4.8
32           5.4
33           5.2
34           5.5
35           4.9
36           5.0
37           5.5
38           4.9
39           4.4
40           5.1
41           5.0
42           4.5
43           4.4
44           5.0
45           5.1
46           4.8
47           5.1
48           4.6
49           5.3
50           5.0
51           7.0
52           6.4
53           6.9
54           5.5
55           6.5
56           5.7
57           6.3
58           4.9
59           6.6
60           5.2
61           5.0
62           5.9
63           6.0
64           6.1
65           5.6
66           6.7
67           5.6
68           5.8
69           6.2
70           5.6
71           5.9
72           6.1
73           6.3
74           6.1
75           6.4
76           6.6
77           6.8
78           6.7
79           6.0
80           5.7
81           5.5
82           5.5
83           5.8
84           6.0
85           5.4
86           6.0
87           6.7
88           6.3
89           5.6
90           5.5
91           5.5
92           6.1
93           5.8
94           5.0
95           5.6
96           5.7
97           5.7
98           6.2
99           5.1
100          5.7
101          6.3
102          5.8
103          7.1
104          6.3
105          6.5
106          7.6
107          4.9
108          7.3
109          6.7
110          7.2
111          6.5
112          6.4
113          6.8
114          5.7
115          5.8
116          6.4
117          6.5
118          7.7
119          7.7
120          6.0
121          6.9
122          5.6
123          7.7
124          6.3
125          6.7
126          7.2
127          6.2
128          6.1
129          6.4
130          7.2
131          7.4
132          7.9
133          6.4
134          6.3
135          6.1
136          7.7
137          6.3
138          6.4
139          6.0
140          6.9
141          6.7
142          6.9
143          5.8
144          6.8
145          6.7
146          6.7
147          6.3
148          6.5
149          6.2
150          5.9
data[2]
    Sepal.Width
1           3.5
2           3.0
3           3.2
4           3.1
5           3.6
6           3.9
7           3.4
8           3.4
9           2.9
10          3.1
11          3.7
12          3.4
13          3.0
14          3.0
15          4.0
16          4.4
17          3.9
18          3.5
19          3.8
20          3.8
21          3.4
22          3.7
23          3.6
24          3.3
25          3.4
26          3.0
27          3.4
28          3.5
29          3.4
30          3.2
31          3.1
32          3.4
33          4.1
34          4.2
35          3.1
36          3.2
37          3.5
38          3.6
39          3.0
40          3.4
41          3.5
42          2.3
43          3.2
44          3.5
45          3.8
46          3.0
47          3.8
48          3.2
49          3.7
50          3.3
51          3.2
52          3.2
53          3.1
54          2.3
55          2.8
56          2.8
57          3.3
58          2.4
59          2.9
60          2.7
61          2.0
62          3.0
63          2.2
64          2.9
65          2.9
66          3.1
67          3.0
68          2.7
69          2.2
70          2.5
71          3.2
72          2.8
73          2.5
74          2.8
75          2.9
76          3.0
77          2.8
78          3.0
79          2.9
80          2.6
81          2.4
82          2.4
83          2.7
84          2.7
85          3.0
86          3.4
87          3.1
88          2.3
89          3.0
90          2.5
91          2.6
92          3.0
93          2.6
94          2.3
95          2.7
96          3.0
97          2.9
98          2.9
99          2.5
100         2.8
101         3.3
102         2.7
103         3.0
104         2.9
105         3.0
106         3.0
107         2.5
108         2.9
109         2.5
110         3.6
111         3.2
112         2.7
113         3.0
114         2.5
115         2.8
116         3.2
117         3.0
118         3.8
119         2.6
120         2.2
121         3.2
122         2.8
123         2.8
124         2.7
125         3.3
126         3.2
127         2.8
128         3.0
129         2.8
130         3.0
131         2.8
132         3.8
133         2.8
134         2.8
135         2.6
136         3.0
137         3.4
138         3.1
139         3.0
140         3.1
141         3.1
142         3.1
143         2.7
144         3.2
145         3.3
146         3.0
147         2.5
148         3.0
149         3.4
150         3.0
data[][1]
    Sepal.Length
1            5.1
2            4.9
3            4.7
4            4.6
5            5.0
6            5.4
7            4.6
8            5.0
9            4.4
10           4.9
11           5.4
12           4.8
13           4.8
14           4.3
15           5.8
16           5.7
17           5.4
18           5.1
19           5.7
20           5.1
21           5.4
22           5.1
23           4.6
24           5.1
25           4.8
26           5.0
27           5.0
28           5.2
29           5.2
30           4.7
31           4.8
32           5.4
33           5.2
34           5.5
35           4.9
36           5.0
37           5.5
38           4.9
39           4.4
40           5.1
41           5.0
42           4.5
43           4.4
44           5.0
45           5.1
46           4.8
47           5.1
48           4.6
49           5.3
50           5.0
51           7.0
52           6.4
53           6.9
54           5.5
55           6.5
56           5.7
57           6.3
58           4.9
59           6.6
60           5.2
61           5.0
62           5.9
63           6.0
64           6.1
65           5.6
66           6.7
67           5.6
68           5.8
69           6.2
70           5.6
71           5.9
72           6.1
73           6.3
74           6.1
75           6.4
76           6.6
77           6.8
78           6.7
79           6.0
80           5.7
81           5.5
82           5.5
83           5.8
84           6.0
85           5.4
86           6.0
87           6.7
88           6.3
89           5.6
90           5.5
91           5.5
92           6.1
93           5.8
94           5.0
95           5.6
96           5.7
97           5.7
98           6.2
99           5.1
100          5.7
101          6.3
102          5.8
103          7.1
104          6.3
105          6.5
106          7.6
107          4.9
108          7.3
109          6.7
110          7.2
111          6.5
112          6.4
113          6.8
114          5.7
115          5.8
116          6.4
117          6.5
118          7.7
119          7.7
120          6.0
121          6.9
122          5.6
123          7.7
124          6.3
125          6.7
126          7.2
127          6.2
128          6.1
129          6.4
130          7.2
131          7.4
132          7.9
133          6.4
134          6.3
135          6.1
136          7.7
137          6.3
138          6.4
139          6.0
140          6.9
141          6.7
142          6.9
143          5.8
144          6.8
145          6.7
146          6.7
147          6.3
148          6.5
149          6.2
150          5.9
data[150, 5]
[1] virginica
Levels: setosa versicolor virginica
data[5]
       Species
1       setosa
2       setosa
3       setosa
4       setosa
5       setosa
6       setosa
7       setosa
8       setosa
9       setosa
10      setosa
11      setosa
12      setosa
13      setosa
14      setosa
15      setosa
16      setosa
17      setosa
18      setosa
19      setosa
20      setosa
21      setosa
22      setosa
23      setosa
24      setosa
25      setosa
26      setosa
27      setosa
28      setosa
29      setosa
30      setosa
31      setosa
32      setosa
33      setosa
34      setosa
35      setosa
36      setosa
37      setosa
38      setosa
39      setosa
40      setosa
41      setosa
42      setosa
43      setosa
44      setosa
45      setosa
46      setosa
47      setosa
48      setosa
49      setosa
50      setosa
51  versicolor
52  versicolor
53  versicolor
54  versicolor
55  versicolor
56  versicolor
57  versicolor
58  versicolor
59  versicolor
60  versicolor
61  versicolor
62  versicolor
63  versicolor
64  versicolor
65  versicolor
66  versicolor
67  versicolor
68  versicolor
69  versicolor
70  versicolor
71  versicolor
72  versicolor
73  versicolor
74  versicolor
75  versicolor
76  versicolor
77  versicolor
78  versicolor
79  versicolor
80  versicolor
81  versicolor
82  versicolor
83  versicolor
84  versicolor
85  versicolor
86  versicolor
87  versicolor
88  versicolor
89  versicolor
90  versicolor
91  versicolor
92  versicolor
93  versicolor
94  versicolor
95  versicolor
96  versicolor
97  versicolor
98  versicolor
99  versicolor
100 versicolor
101  virginica
102  virginica
103  virginica
104  virginica
105  virginica
106  virginica
107  virginica
108  virginica
109  virginica
110  virginica
111  virginica
112  virginica
113  virginica
114  virginica
115  virginica
116  virginica
117  virginica
118  virginica
119  virginica
120  virginica
121  virginica
122  virginica
123  virginica
124  virginica
125  virginica
126  virginica
127  virginica
128  virginica
129  virginica
130  virginica
131  virginica
132  virginica
133  virginica
134  virginica
135  virginica
136  virginica
137  virginica
138  virginica
139  virginica
140  virginica
141  virginica
142  virginica
143  virginica
144  virginica
145  virginica
146  virginica
147  virginica
148  virginica
149  virginica
150  virginica
data[5, 3]
[1] 1.4
data[1:5,1:3]
  Sepal.Length Sepal.Width Petal.Length
1          5.1         3.5          1.4
2          4.9         3.0          1.4
3          4.7         3.2          1.3
4          4.6         3.1          1.5
5          5.0         3.6          1.4
table(data$Species)

    setosa versicolor  virginica 
        50         50         50 

###STEP:3 Create a basic scatter plot(no categories yet)

A scatter plot shows the relationship between two numeric variables.

Here we plot: - X-axis: Sepal.Length - Y-axis: Sepal.Width

Important point:

ggplot(data, aes(x = Sepal.Length, y = Sepal.Width))+
  geom_point()

At this point,all points are the same color,so we can’t see species based grouping yet.

###STEP:4 Add categorical grouping using color=Species

Now we include the categorical variable: - color = Species tells ggplot2 to assign a different color to each species. what changes? - The plot now visually seperates the three species based on color - This is the main “ccategorical analysis” idea:we can see if different grps cluster differently.

ggplot(data,aes(x= Sepal.Length,y = Sepal.Width, color = Species))+
  geom_point()

###STEP:5 Improve point visibility (size and transparency)

We adjust how points look: - size = 3 makes each dot bigger, so it is easier to see. - alpha = 0.7 makes dots slightly transparent, which helps when points overlap.

Why transparency helps:

-If many points overlap in the same region, transparency makes dense areas more visible .

ggplot(data, aes(x= Sepal.Length,y= Sepal.Width, color= Species))+ geom_point(size=3,alpha=0.7)

###STEP:6 Add informative labels (Title, axes, legend)

Good plots should clearly communicate what the viewer is seeing.

labs() adds: - title for the plot heading - x and y axis labels - color legend title (so the legend has a meaningful name)

ggplot(data, aes(x= Sepal.Length,y= Sepal.Width, color = Species))+
  geom_point(size= 3,alpha=0.7)+
  labs(
    title = "Scatter plot of sepal dimensions",
    x="Sepal length" ,
    y="Sepal Width" ,
    color="Species" 
    )

###STEP:7 Apply a clean theme and move the legend

Themes control the background, grids, and text styling. - theme_minimal() removes heavy bgs and gives a clean look. - theme(legend.position="top") moves the legend above the plot.

Why move the legend? - When the legend is at top, it is often easier to notice and read, especially in presentations.

ggplot(data, aes(x= Sepal.Length,y= Sepal.Width, color = Species))+
  geom_point(size= 3,alpha=0.7)+
  labs(
    title = "Scatter plot of sepal dimensions",
    x="Sepal length" ,
    y="Sepal Width" ,
    color="Species" 
    )+
  theme_minimal()+
  theme(legend.position="top")