Write a simple R Markdown to explain some of the R codes you have learned regarding data frame. You are free to write your own program and add new codes such as how would you show all the rows except the last one or how would you get the last 6 rows of the data frame. Eg. you can explain what a data frame is, and then write the code to show how R handles data frame, and so on. But one mandatory topic is you must explain the different ways R returns back a vector or a data frame when you access values from a data frame. Your program doesn’t have to be long. Play around with the different font sizes in markdown to make your markdown readable. Explore. Publish your markdown on RPubs and submit the link only. Adding new codes that have not been discussed will earn you high marks.
What is an R data frame? # . A data frame is a list or two-dimensional array-like structure where each column contains one variable ’s values, and each row contains one column’s collection of values.A data frame is used for storing data tables. It is a list of vectors of equal length.
Creating a data frame from scratch.
atomic_number <- 1:40
element_name <- c("Hydrogen", "Helium", "Lithium","Beryllium", "Boron", "Carbon", "Nitrogen", "Oxygen", "Fluorine", "Neon", "Sodium", "Megnesium", "Aluminium", "Silicon", "Phosphorus", "Sulfur", "Chlorine", "Argon", "Potassium", "Calcium", "Scandium", "Titanium", "Vanadium", "Chromium", "Manganese", "Iron", "Cobalt", "Nickel", "Copper", "Zinc", "Gallium", "Germanium", "Arsenic", "Selenium", "Bromine", "Krypton", "Rubidium", "Strontium", "Yttrium", "Zirconium")
atomic_symbol <- c("H", "He", "Li", "Be", "B", "C", "N", "O", "F", "Ne", "Na", "Mg", "Al", "Si", "P", "S", "Cl", "Ar", "K", "Ca", "Sc", "Ti", "V", "Cr", "Mn", "Fe", "Co", "Ni", "Cu", "Zn", "Ga", "Ge", "As", "Se", "Br", "Kr", "Rb", "Sr", "Y", "Zr")
atomic_weight <-c(1.008, 4.0026, 6.94, 9.0122, 10.81, 12.011, 14.007, 15.999, 18.998, 20.180, 22.990, 24.304, 26.982, 28.085, 30.974, 32.06, 35.45, 39.948, 39.098, 40.078, 44.956, 47.867, 50.942, 51.996, 54.938, 55.845, 58.933, 58.693, 63.546, 65.38, 69.723, 72.630, 74.922, 78.971, 79.904, 83.798, 85.468, 87.62, 88.906, 91.224)
df <- data.frame(atomic_number, element_name, atomic_symbol, atomic_weight)
print(df)
## atomic_number element_name atomic_symbol atomic_weight
## 1 1 Hydrogen H 1.0080
## 2 2 Helium He 4.0026
## 3 3 Lithium Li 6.9400
## 4 4 Beryllium Be 9.0122
## 5 5 Boron B 10.8100
## 6 6 Carbon C 12.0110
## 7 7 Nitrogen N 14.0070
## 8 8 Oxygen O 15.9990
## 9 9 Fluorine F 18.9980
## 10 10 Neon Ne 20.1800
## 11 11 Sodium Na 22.9900
## 12 12 Megnesium Mg 24.3040
## 13 13 Aluminium Al 26.9820
## 14 14 Silicon Si 28.0850
## 15 15 Phosphorus P 30.9740
## 16 16 Sulfur S 32.0600
## 17 17 Chlorine Cl 35.4500
## 18 18 Argon Ar 39.9480
## 19 19 Potassium K 39.0980
## 20 20 Calcium Ca 40.0780
## 21 21 Scandium Sc 44.9560
## 22 22 Titanium Ti 47.8670
## 23 23 Vanadium V 50.9420
## 24 24 Chromium Cr 51.9960
## 25 25 Manganese Mn 54.9380
## 26 26 Iron Fe 55.8450
## 27 27 Cobalt Co 58.9330
## 28 28 Nickel Ni 58.6930
## 29 29 Copper Cu 63.5460
## 30 30 Zinc Zn 65.3800
## 31 31 Gallium Ga 69.7230
## 32 32 Germanium Ge 72.6300
## 33 33 Arsenic As 74.9220
## 34 34 Selenium Se 78.9710
## 35 35 Bromine Br 79.9040
## 36 36 Krypton Kr 83.7980
## 37 37 Rubidium Rb 85.4680
## 38 38 Strontium Sr 87.6200
## 39 39 Yttrium Y 88.9060
## 40 40 Zirconium Zr 91.2240
nrow() used to get number of rows are on the table.
nrow(df)
## [1] 40
ncol() used to get number of columns are on the table.
ncol(df)
## [1] 4
dim() used to get number of rows and columns are on the tabel.
dim(df)
## [1] 40 4
Second data frame
atomic_number <- 1:40
element_discovered<-c(1766, 1895,1817,1797,1808,3750, 1772,1774,1886,1898,1807,1755,1825,1824,1669,500,1774,1894,1807,1808,1879,1791,1801,1797,1774,2000,1735,1751,8000,1500,1875,1886,1250,1817,1826,1898,1861,1790,1794,1789)
element_radius<-c(53,31,167,112,87,67,56,48,42,38,190,145,118,111,98,88,79,71,243,194,184,176,167,154,76,199,188,173,154,113,85,89,103,106,53,39,210,192,172,30)
df1<-data.frame(atomic_number,element_discovered,element_radius)
print(df1)
## atomic_number element_discovered element_radius
## 1 1 1766 53
## 2 2 1895 31
## 3 3 1817 167
## 4 4 1797 112
## 5 5 1808 87
## 6 6 3750 67
## 7 7 1772 56
## 8 8 1774 48
## 9 9 1886 42
## 10 10 1898 38
## 11 11 1807 190
## 12 12 1755 145
## 13 13 1825 118
## 14 14 1824 111
## 15 15 1669 98
## 16 16 500 88
## 17 17 1774 79
## 18 18 1894 71
## 19 19 1807 243
## 20 20 1808 194
## 21 21 1879 184
## 22 22 1791 176
## 23 23 1801 167
## 24 24 1797 154
## 25 25 1774 76
## 26 26 2000 199
## 27 27 1735 188
## 28 28 1751 173
## 29 29 8000 154
## 30 30 1500 113
## 31 31 1875 85
## 32 32 1886 89
## 33 33 1250 103
## 34 34 1817 106
## 35 35 1826 53
## 36 36 1898 39
## 37 37 1861 210
## 38 38 1790 192
## 39 39 1794 172
## 40 40 1789 30
merge() being used to combine two data frame.
table<-merge(df,df1)
print(table)
## atomic_number element_name atomic_symbol atomic_weight element_discovered
## 1 1 Hydrogen H 1.0080 1766
## 2 2 Helium He 4.0026 1895
## 3 3 Lithium Li 6.9400 1817
## 4 4 Beryllium Be 9.0122 1797
## 5 5 Boron B 10.8100 1808
## 6 6 Carbon C 12.0110 3750
## 7 7 Nitrogen N 14.0070 1772
## 8 8 Oxygen O 15.9990 1774
## 9 9 Fluorine F 18.9980 1886
## 10 10 Neon Ne 20.1800 1898
## 11 11 Sodium Na 22.9900 1807
## 12 12 Megnesium Mg 24.3040 1755
## 13 13 Aluminium Al 26.9820 1825
## 14 14 Silicon Si 28.0850 1824
## 15 15 Phosphorus P 30.9740 1669
## 16 16 Sulfur S 32.0600 500
## 17 17 Chlorine Cl 35.4500 1774
## 18 18 Argon Ar 39.9480 1894
## 19 19 Potassium K 39.0980 1807
## 20 20 Calcium Ca 40.0780 1808
## 21 21 Scandium Sc 44.9560 1879
## 22 22 Titanium Ti 47.8670 1791
## 23 23 Vanadium V 50.9420 1801
## 24 24 Chromium Cr 51.9960 1797
## 25 25 Manganese Mn 54.9380 1774
## 26 26 Iron Fe 55.8450 2000
## 27 27 Cobalt Co 58.9330 1735
## 28 28 Nickel Ni 58.6930 1751
## 29 29 Copper Cu 63.5460 8000
## 30 30 Zinc Zn 65.3800 1500
## 31 31 Gallium Ga 69.7230 1875
## 32 32 Germanium Ge 72.6300 1886
## 33 33 Arsenic As 74.9220 1250
## 34 34 Selenium Se 78.9710 1817
## 35 35 Bromine Br 79.9040 1826
## 36 36 Krypton Kr 83.7980 1898
## 37 37 Rubidium Rb 85.4680 1861
## 38 38 Strontium Sr 87.6200 1790
## 39 39 Yttrium Y 88.9060 1794
## 40 40 Zirconium Zr 91.2240 1789
## element_radius
## 1 53
## 2 31
## 3 167
## 4 112
## 5 87
## 6 67
## 7 56
## 8 48
## 9 42
## 10 38
## 11 190
## 12 145
## 13 118
## 14 111
## 15 98
## 16 88
## 17 79
## 18 71
## 19 243
## 20 194
## 21 184
## 22 176
## 23 167
## 24 154
## 25 76
## 26 199
## 27 188
## 28 173
## 29 154
## 30 113
## 31 85
## 32 89
## 33 103
## 34 106
## 35 53
## 36 39
## 37 210
## 38 192
## 39 172
## 40 30
Merge has various optional arguments that let us tweak how it operates. For example if we wanted to retain all rows from our first data frame we could specify all.table=TRUE. This is a “left join”
table<-merge(df,df1, all.table=TRUE)
table
## atomic_number element_name atomic_symbol atomic_weight element_discovered
## 1 1 Hydrogen H 1.0080 1766
## 2 2 Helium He 4.0026 1895
## 3 3 Lithium Li 6.9400 1817
## 4 4 Beryllium Be 9.0122 1797
## 5 5 Boron B 10.8100 1808
## 6 6 Carbon C 12.0110 3750
## 7 7 Nitrogen N 14.0070 1772
## 8 8 Oxygen O 15.9990 1774
## 9 9 Fluorine F 18.9980 1886
## 10 10 Neon Ne 20.1800 1898
## 11 11 Sodium Na 22.9900 1807
## 12 12 Megnesium Mg 24.3040 1755
## 13 13 Aluminium Al 26.9820 1825
## 14 14 Silicon Si 28.0850 1824
## 15 15 Phosphorus P 30.9740 1669
## 16 16 Sulfur S 32.0600 500
## 17 17 Chlorine Cl 35.4500 1774
## 18 18 Argon Ar 39.9480 1894
## 19 19 Potassium K 39.0980 1807
## 20 20 Calcium Ca 40.0780 1808
## 21 21 Scandium Sc 44.9560 1879
## 22 22 Titanium Ti 47.8670 1791
## 23 23 Vanadium V 50.9420 1801
## 24 24 Chromium Cr 51.9960 1797
## 25 25 Manganese Mn 54.9380 1774
## 26 26 Iron Fe 55.8450 2000
## 27 27 Cobalt Co 58.9330 1735
## 28 28 Nickel Ni 58.6930 1751
## 29 29 Copper Cu 63.5460 8000
## 30 30 Zinc Zn 65.3800 1500
## 31 31 Gallium Ga 69.7230 1875
## 32 32 Germanium Ge 72.6300 1886
## 33 33 Arsenic As 74.9220 1250
## 34 34 Selenium Se 78.9710 1817
## 35 35 Bromine Br 79.9040 1826
## 36 36 Krypton Kr 83.7980 1898
## 37 37 Rubidium Rb 85.4680 1861
## 38 38 Strontium Sr 87.6200 1790
## 39 39 Yttrium Y 88.9060 1794
## 40 40 Zirconium Zr 91.2240 1789
## element_radius
## 1 53
## 2 31
## 3 167
## 4 112
## 5 87
## 6 67
## 7 56
## 8 48
## 9 42
## 10 38
## 11 190
## 12 145
## 13 118
## 14 111
## 15 98
## 16 88
## 17 79
## 18 71
## 19 243
## 20 194
## 21 184
## 22 176
## 23 167
## 24 154
## 25 76
## 26 199
## 27 188
## 28 173
## 29 154
## 30 113
## 31 85
## 32 89
## 33 103
## 34 106
## 35 53
## 36 39
## 37 210
## 38 192
## 39 172
## 40 30
head() used to get fast 6 rows.
head(table)
## atomic_number element_name atomic_symbol atomic_weight element_discovered
## 1 1 Hydrogen H 1.0080 1766
## 2 2 Helium He 4.0026 1895
## 3 3 Lithium Li 6.9400 1817
## 4 4 Beryllium Be 9.0122 1797
## 5 5 Boron B 10.8100 1808
## 6 6 Carbon C 12.0110 3750
## element_radius
## 1 53
## 2 31
## 3 167
## 4 112
## 5 87
## 6 67
Tail() used to get default set number of rows from last part.
tail(df)
## atomic_number element_name atomic_symbol atomic_weight
## 35 35 Bromine Br 79.904
## 36 36 Krypton Kr 83.798
## 37 37 Rubidium Rb 85.468
## 38 38 Strontium Sr 87.620
## 39 39 Yttrium Y 88.906
## 40 40 Zirconium Zr 91.224
Create a table of all the rows and columns except last row.
df[1:39, ]
## atomic_number element_name atomic_symbol atomic_weight
## 1 1 Hydrogen H 1.0080
## 2 2 Helium He 4.0026
## 3 3 Lithium Li 6.9400
## 4 4 Beryllium Be 9.0122
## 5 5 Boron B 10.8100
## 6 6 Carbon C 12.0110
## 7 7 Nitrogen N 14.0070
## 8 8 Oxygen O 15.9990
## 9 9 Fluorine F 18.9980
## 10 10 Neon Ne 20.1800
## 11 11 Sodium Na 22.9900
## 12 12 Megnesium Mg 24.3040
## 13 13 Aluminium Al 26.9820
## 14 14 Silicon Si 28.0850
## 15 15 Phosphorus P 30.9740
## 16 16 Sulfur S 32.0600
## 17 17 Chlorine Cl 35.4500
## 18 18 Argon Ar 39.9480
## 19 19 Potassium K 39.0980
## 20 20 Calcium Ca 40.0780
## 21 21 Scandium Sc 44.9560
## 22 22 Titanium Ti 47.8670
## 23 23 Vanadium V 50.9420
## 24 24 Chromium Cr 51.9960
## 25 25 Manganese Mn 54.9380
## 26 26 Iron Fe 55.8450
## 27 27 Cobalt Co 58.9330
## 28 28 Nickel Ni 58.6930
## 29 29 Copper Cu 63.5460
## 30 30 Zinc Zn 65.3800
## 31 31 Gallium Ga 69.7230
## 32 32 Germanium Ge 72.6300
## 33 33 Arsenic As 74.9220
## 34 34 Selenium Se 78.9710
## 35 35 Bromine Br 79.9040
## 36 36 Krypton Kr 83.7980
## 37 37 Rubidium Rb 85.4680
## 38 38 Strontium Sr 87.6200
## 39 39 Yttrium Y 88.9060
Create a table of 1st row with All the columns.
df[1, ]
## atomic_number element_name atomic_symbol atomic_weight
## 1 1 Hydrogen H 1.008
Create a table of 1st and 3rd row with all columns.
table[c(1,3),]
## atomic_number element_name atomic_symbol atomic_weight element_discovered
## 1 1 Hydrogen H 1.008 1766
## 3 3 Lithium Li 6.940 1817
## element_radius
## 1 53
## 3 167
Create a table of all the rows with 2nd and 3rd columns.
df[,2:3]
## element_name atomic_symbol
## 1 Hydrogen H
## 2 Helium He
## 3 Lithium Li
## 4 Beryllium Be
## 5 Boron B
## 6 Carbon C
## 7 Nitrogen N
## 8 Oxygen O
## 9 Fluorine F
## 10 Neon Ne
## 11 Sodium Na
## 12 Megnesium Mg
## 13 Aluminium Al
## 14 Silicon Si
## 15 Phosphorus P
## 16 Sulfur S
## 17 Chlorine Cl
## 18 Argon Ar
## 19 Potassium K
## 20 Calcium Ca
## 21 Scandium Sc
## 22 Titanium Ti
## 23 Vanadium V
## 24 Chromium Cr
## 25 Manganese Mn
## 26 Iron Fe
## 27 Cobalt Co
## 28 Nickel Ni
## 29 Copper Cu
## 30 Zinc Zn
## 31 Gallium Ga
## 32 Germanium Ge
## 33 Arsenic As
## 34 Selenium Se
## 35 Bromine Br
## 36 Krypton Kr
## 37 Rubidium Rb
## 38 Strontium Sr
## 39 Yttrium Y
## 40 Zirconium Zr
Create a tabel of 1st and 2nd row with 2nd and 3rd column.
df[1:2,2:3]
## element_name atomic_symbol
## 1 Hydrogen H
## 2 Helium He
Create a table of last 4 rows with 2nd and 3rd column.
df[37:40, 2:3]
## element_name atomic_symbol
## 37 Rubidium Rb
## 38 Strontium Sr
## 39 Yttrium Y
## 40 Zirconium Zr
Check if a variable is a data frame or not
class(df)
## [1] "data.frame"
Rows can be added to a data frame using the rbind() function.
rbind(df,list(41,"Niobium","Nb", 92.906))
## atomic_number element_name atomic_symbol atomic_weight
## 1 1 Hydrogen H 1.0080
## 2 2 Helium He 4.0026
## 3 3 Lithium Li 6.9400
## 4 4 Beryllium Be 9.0122
## 5 5 Boron B 10.8100
## 6 6 Carbon C 12.0110
## 7 7 Nitrogen N 14.0070
## 8 8 Oxygen O 15.9990
## 9 9 Fluorine F 18.9980
## 10 10 Neon Ne 20.1800
## 11 11 Sodium Na 22.9900
## 12 12 Megnesium Mg 24.3040
## 13 13 Aluminium Al 26.9820
## 14 14 Silicon Si 28.0850
## 15 15 Phosphorus P 30.9740
## 16 16 Sulfur S 32.0600
## 17 17 Chlorine Cl 35.4500
## 18 18 Argon Ar 39.9480
## 19 19 Potassium K 39.0980
## 20 20 Calcium Ca 40.0780
## 21 21 Scandium Sc 44.9560
## 22 22 Titanium Ti 47.8670
## 23 23 Vanadium V 50.9420
## 24 24 Chromium Cr 51.9960
## 25 25 Manganese Mn 54.9380
## 26 26 Iron Fe 55.8450
## 27 27 Cobalt Co 58.9330
## 28 28 Nickel Ni 58.6930
## 29 29 Copper Cu 63.5460
## 30 30 Zinc Zn 65.3800
## 31 31 Gallium Ga 69.7230
## 32 32 Germanium Ge 72.6300
## 33 33 Arsenic As 74.9220
## 34 34 Selenium Se 78.9710
## 35 35 Bromine Br 79.9040
## 36 36 Krypton Kr 83.7980
## 37 37 Rubidium Rb 85.4680
## 38 38 Strontium Sr 87.6200
## 39 39 Yttrium Y 88.9060
## 40 40 Zirconium Zr 91.2240
## 41 41 Niobium Nb 92.9060
We can use either [, [[ or $ operator to access columns of data frame.
table["element_name"]
## element_name
## 1 Hydrogen
## 2 Helium
## 3 Lithium
## 4 Beryllium
## 5 Boron
## 6 Carbon
## 7 Nitrogen
## 8 Oxygen
## 9 Fluorine
## 10 Neon
## 11 Sodium
## 12 Megnesium
## 13 Aluminium
## 14 Silicon
## 15 Phosphorus
## 16 Sulfur
## 17 Chlorine
## 18 Argon
## 19 Potassium
## 20 Calcium
## 21 Scandium
## 22 Titanium
## 23 Vanadium
## 24 Chromium
## 25 Manganese
## 26 Iron
## 27 Cobalt
## 28 Nickel
## 29 Copper
## 30 Zinc
## 31 Gallium
## 32 Germanium
## 33 Arsenic
## 34 Selenium
## 35 Bromine
## 36 Krypton
## 37 Rubidium
## 38 Strontium
## 39 Yttrium
## 40 Zirconium
str() being used to examined the data frame
str(table)
## 'data.frame': 40 obs. of 6 variables:
## $ atomic_number : int 1 2 3 4 5 6 7 8 9 10 ...
## $ element_name : chr "Hydrogen" "Helium" "Lithium" "Beryllium" ...
## $ atomic_symbol : chr "H" "He" "Li" "Be" ...
## $ atomic_weight : num 1.01 4 6.94 9.01 10.81 ...
## $ element_discovered: num 1766 1895 1817 1797 1808 ...
## $ element_radius : num 53 31 167 112 87 67 56 48 42 38 ...
The statistical summary and nature of the data can be obtained by applying summary() function.
print(summary(table))
## atomic_number element_name atomic_symbol atomic_weight
## Min. : 1.00 Length:40 Length:40 Min. : 1.008
## 1st Qu.:10.75 Class :character Class :character 1st Qu.:22.288
## Median :20.50 Mode :character Mode :character Median :42.517
## Mean :20.50 Mean :44.980
## 3rd Qu.:30.25 3rd Qu.:66.466
## Max. :40.00 Max. :91.224
## element_discovered element_radius
## Min. : 500 Min. : 30.0
## 1st Qu.:1774 1st Qu.: 70.0
## Median :1807 Median :108.5
## Mean :1966 Mean :117.5
## 3rd Qu.:1876 3rd Qu.:172.2
## Max. :8000 Max. :243.0
Data frame columns can be deleted by assigning NULL to it.Similarly, rows can be deleted through reassignments.
df$atomic_weight<-NULL
print(df)
## atomic_number element_name atomic_symbol
## 1 1 Hydrogen H
## 2 2 Helium He
## 3 3 Lithium Li
## 4 4 Beryllium Be
## 5 5 Boron B
## 6 6 Carbon C
## 7 7 Nitrogen N
## 8 8 Oxygen O
## 9 9 Fluorine F
## 10 10 Neon Ne
## 11 11 Sodium Na
## 12 12 Megnesium Mg
## 13 13 Aluminium Al
## 14 14 Silicon Si
## 15 15 Phosphorus P
## 16 16 Sulfur S
## 17 17 Chlorine Cl
## 18 18 Argon Ar
## 19 19 Potassium K
## 20 20 Calcium Ca
## 21 21 Scandium Sc
## 22 22 Titanium Ti
## 23 23 Vanadium V
## 24 24 Chromium Cr
## 25 25 Manganese Mn
## 26 26 Iron Fe
## 27 27 Cobalt Co
## 28 28 Nickel Ni
## 29 29 Copper Cu
## 30 30 Zinc Zn
## 31 31 Gallium Ga
## 32 32 Germanium Ge
## 33 33 Arsenic As
## 34 34 Selenium Se
## 35 35 Bromine Br
## 36 36 Krypton Kr
## 37 37 Rubidium Rb
## 38 38 Strontium Sr
## 39 39 Yttrium Y
## 40 40 Zirconium Zr
The different ways R returns back a vector or a data frame:
dataframex[1] is different than dataframe[[1]].The one with [[ will show up as integer and not a new list/data frame. If you use [ only, it will return the result in list/dataframe type.If you still want to use [, have to add in the drop=FALSE statement.
df[,3,drop=FALSE]
## atomic_symbol
## 1 H
## 2 He
## 3 Li
## 4 Be
## 5 B
## 6 C
## 7 N
## 8 O
## 9 F
## 10 Ne
## 11 Na
## 12 Mg
## 13 Al
## 14 Si
## 15 P
## 16 S
## 17 Cl
## 18 Ar
## 19 K
## 20 Ca
## 21 Sc
## 22 Ti
## 23 V
## 24 Cr
## 25 Mn
## 26 Fe
## 27 Co
## 28 Ni
## 29 Cu
## 30 Zn
## 31 Ga
## 32 Ge
## 33 As
## 34 Se
## 35 Br
## 36 Kr
## 37 Rb
## 38 Sr
## 39 Y
## 40 Zr
dataframe[,3,drop=FALSE] will produce diff type of output compared to dataframe[,3].
df[,3]
## [1] "H" "He" "Li" "Be" "B" "C" "N" "O" "F" "Ne" "Na" "Mg" "Al" "Si" "P"
## [16] "S" "Cl" "Ar" "K" "Ca" "Sc" "Ti" "V" "Cr" "Mn" "Fe" "Co" "Ni" "Cu" "Zn"
## [31] "Ga" "Ge" "As" "Se" "Br" "Kr" "Rb" "Sr" "Y" "Zr"