EDA - Programming Club 2023 General Members Dataset
Introduction
This project aims at showing various attributes of registered general members of the `Programming Club, Dept. of Statistics, CU” in 2023. I have used ggplot2 for the visualizations.
Data
At first data is imported from the google sheets.
df <- read.csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vRjuuwOeCjtHHWHmCSmoSs4PFSNQNM7I-NQuNbtaHaVtzpcRj_vLszef5P1gnG-Mi6wedgJrAybFX8i/pub?gid=0&single=true&output=csv")Column names have been converted to small letters. Blood group, contest participation status, sim use status, and datacamp grant date had some inconsistencies. Those are resolved here.
The editing column names so that they are all in small letters and . replaced by _ for consistency.
df2 <- df %>%
rename_all(function(x) (tolower(gsub("\\.", "_", x)))) %>%
mutate(x2023_contest_perticipated = case_when(x2023_contest_perticipated == "" ~ "No",
.default = x2023_contest_perticipated),
blood_group = case_when(
blood_group %in% "B (+ve)" ~ "B+",
blood_group == "O(+)" ~ "O+",
blood_group == "A +(ve)" ~ "A+",
blood_group == "" ~ NA,
.default = blood_group),
datacamp_granted_date = case_when(
datacamp_granted_date == "" ~ NA,
.default = datacamp_granted_date
),
sim = case_when(
sim == "#N/A" ~ NA,
.default = sim
),
across(is.character, as.factor)) Summary of dataset:
Data Frame Summary
df2
Dimensions: 155 x 8
Duplicates: 10
+----+----------------------------+-----------------------------+--------------------+----------------------+----------+---------+
| No | Variable | Stats / Values | Freqs (% of Valid) | Graph | Valid | Missing |
+====+============================+=============================+====================+======================+==========+=========+
| 1 | session | 1. 2017-2018 | 9 ( 5.8%) | I | 155 | 0 |
| | [factor] | 2. 2018-2019 | 18 (11.6%) | II | (100.0%) | (0.0%) |
| | | 3. 2019-2020 | 13 ( 8.4%) | I | | |
| | | 4. 2020-2021 | 64 (41.3%) | IIIIIIII | | |
| | | 5. 2021-2022 | 51 (32.9%) | IIIIII | | |
+----+----------------------------+-----------------------------+--------------------+----------------------+----------+---------+
| 2 | year | 1. 1st year | 49 (31.6%) | IIIIII | 155 | 0 |
| | [factor] | 2. 2nd year | 66 (42.6%) | IIIIIIII | (100.0%) | (0.0%) |
| | | 3. 3rd year | 14 ( 9.0%) | I | | |
| | | 4. 4th year | 19 (12.3%) | II | | |
| | | 5. Master's | 7 ( 4.5%) | | | |
+----+----------------------------+-----------------------------+--------------------+----------------------+----------+---------+
| 3 | programming_exp | 1. No | 101 (65.2%) | IIIIIIIIIIIII | 155 | 0 |
| | [factor] | 2. Yes | 54 (34.8%) | IIIIII | (100.0%) | (0.0%) |
+----+----------------------------+-----------------------------+--------------------+----------------------+----------+---------+
| 4 | laptop | 1. No | 10 ( 6.5%) | I | 155 | 0 |
| | [factor] | 2. No - But I can manage | 6 ( 3.9%) | | (100.0%) | (0.0%) |
| | | 3. No - But planning to buy | 35 (22.6%) | IIII | | |
| | | 4. Yes - Desktop | 13 ( 8.4%) | I | | |
| | | 5. Yes - Laptop | 91 (58.7%) | IIIIIIIIIII | | |
+----+----------------------------+-----------------------------+--------------------+----------------------+----------+---------+
| 5 | blood_group | 1. A- | 1 ( 0.8%) | | 125 | 30 |
| | [factor] | 2. A+ | 40 (32.0%) | IIIIII | (80.6%) | (19.4%) |
| | | 3. AB+ | 10 ( 8.0%) | I | | |
| | | 4. B+ | 33 (26.4%) | IIIII | | |
| | | 5. O- | 1 ( 0.8%) | | | |
| | | 6. O+ | 40 (32.0%) | IIIIII | | |
+----+----------------------------+-----------------------------+--------------------+----------------------+----------+---------+
| 6 | datacamp_granted_date | 1. 03/02/2023 | 15 (11.5%) | II | 131 | 24 |
| | [factor] | 2. 03/04/2023 | 3 ( 2.3%) | | (84.5%) | (15.5%) |
| | | 3. 08/03/2023 | 27 (20.6%) | IIII | | |
| | | 4. 12/04/2023 | 8 ( 6.1%) | I | | |
| | | 5. 13/02/2023 | 1 ( 0.8%) | | | |
| | | 6. 18/02/2023 | 12 ( 9.2%) | I | | |
| | | 7. 22/05/2023 | 1 ( 0.8%) | | | |
| | | 8. No | 64 (48.9%) | IIIIIIIII | | |
+----+----------------------------+-----------------------------+--------------------+----------------------+----------+---------+
| 7 | x2023_contest_perticipated | 1. No | 144 (92.9%) | IIIIIIIIIIIIIIIIII | 155 | 0 |
| | [factor] | 2. Yes | 11 ( 7.1%) | I | (100.0%) | (0.0%) |
+----+----------------------------+-----------------------------+--------------------+----------------------+----------+---------+
| 8 | sim | 1. Airtel | 26 (16.9%) | III | 154 | 1 |
| | [factor] | 2. Blink | 18 (11.7%) | II | (99.4%) | (0.6%) |
| | | 3. GP | 46 (29.9%) | IIIII | | |
| | | 4. Robi | 47 (30.5%) | IIIIII | | |
| | | 5. Teletalk | 17 (11.0%) | II | | |
+----+----------------------------+-----------------------------+--------------------+----------------------+----------+---------+
Univariate Visualization
ggplot(df2) +
aes(x = factor(session, levels = rev(levels(session)))) +
geom_bar(fill = "#5872A1") +
labs(x = NULL, y = "Number of Memebers",
title = "General Members Statistics by Session") +
# coord_flip() +
theme_linedraw()df2 %>%
ggplot() +
aes(x = rev(year)) +
geom_bar(position = "dodge", fill = "#5872A1") +
labs(x = " ", y = "Number of Members",
title = "General Members Statistics by Year") +
theme_linedraw()df2 %>%
filter(!sim %in% "#N/A") %>%
count(sim) %>%
ggplot() +
aes(x = reorder(sim,-n), y = n) +
geom_bar(stat = "identity", fill = "#5872A1") +
labs(x = " ", y = "Number of Members",
title = "General Members Statistics by Mobile Phone Operators") +
theme_linedraw()df2 %>%
ggplot() +
aes(x = programming_exp) +
geom_bar(position = "dodge", fill = "#5872A1") +
labs(x = "Programming Experience",
y = "Number of Members", title = "Number of General Members by Programming Experience") +
theme_linedraw()df2 %>%
filter(!blood_group %in% "") %>%
count(blood_group) %>%
ggplot() +
aes(x = reorder(blood_group, -n), y = n) +
geom_bar(stat = "identity", fill = "#5872A1") +
labs(x = " ", y = "Number of Members",
title = "Bar Plot of Blood Group of General Members") +
theme_linedraw()Bivariate Visualization
ggplot(df2) +
aes(x = year, fill = programming_exp) +
geom_bar(position = "fill") +
scale_fill_brewer(palette = "Paired",
direction = 1) +
labs(x = " ",
y = "Percentage",
title = "Percentage of General Members With Programming Experience by Year",
fill = "Programming\nExperience") +
theme_linedraw() +
scale_y_continuous(labels = scales::percent) ggplot(df2) +
aes(x = x2023_contest_perticipated, fill = laptop) +
geom_bar(position = "fill") +
scale_fill_brewer(palette = "Paired", direction = -1) +
labs(x = " Participated in Programming Contest 2023",
y = "Percentage",
title = "Stacked Bar Plot of Members With Computer Access and Contest Participation Status ",
fill = "Access to Computer") +
theme_linedraw() +
scale_y_continuous(labels = scales::percent) df2 %>%
filter(!(sim %in% "#N/A")) %>%
ggplot() +
aes(x = sim, fill = x2023_contest_perticipated) +
geom_bar(position = "dodge") +
scale_fill_brewer(palette = "Paired",
direction = -1) +
labs(
x = " ",
y = "Number of Members",
title = "Stacked Bar Plot of Members Using Different SIM and Contest Participation Status ",
caption = "Data: Programming Contest 2023 Participants Statistics",
fill = "Participated in Contest"
) +
theme_linedraw() +
theme(legend.position = "top")LS0tDQp0aXRsZTogIkVEQSAtIFByb2dyYW1taW5nIENsdWIgMjAyMyBHZW5lcmFsIE1lbWJlcnMgRGF0YXNldCINCmF1dGhvcjogJ01kLiBBaHNhbnVsIElzbGFtJw0Kb3V0cHV0OiANCiAgcm1kZm9ybWF0czo6cmVhZHRoZWRvd246DQogICAgaGlnaGxpZ2h0OiBrYXRlDQogICAgY29kZV9kb3dubG9hZDogdHJ1ZQ0KICAgIGNvZGVfZm9sZGluZzogaGlkZQ0KLS0tDQoNCmBgYHtyLCBpbmNsdWRlPUZBTFNFLCBjbGFzcy5zb3VyY2UgPSAnZm9sZC1zaG93J30NCiMgY2xhc3Muc291cmNlID0gJ2ZvbGQtaGlkZScNCmtuaXRyOjpvcHRzX2NodW5rJHNldCgNCiAgY29tbWVudCA9ICIiLCBwcm9tcHQgPSBGLCBtZXNzYWdlID0gRiwgd2FybmluZyA9IEYNCikNCmBgYA0KDQojIEludHJvZHVjdGlvbg0KDQpUaGlzIHByb2plY3QgYWltcyBhdCBzaG93aW5nIHZhcmlvdXMgYXR0cmlidXRlcyBvZiByZWdpc3RlcmVkIGdlbmVyYWwgbWVtYmVycyBvZiB0aGUgYFByb2dyYW1taW5nIENsdWIsIERlcHQuIG9mIFN0YXRpc3RpY3MsIENVIiBpbiAyMDIzLiBJIGhhdmUgdXNlZCBnZ3Bsb3QyIGZvciB0aGUgdmlzdWFsaXphdGlvbnMuIA0KDQojIFBhY2thZ2VzDQoNCmBgYHtyfQ0KbGlicmFyeShkcGx5cikNCmxpYnJhcnkoZ2dwbG90MikNCmxpYnJhcnkoc3VtbWFyeXRvb2xzKQ0KYGBgDQoNCg0KIyBEYXRhDQoNCkF0IGZpcnN0IGRhdGEgaXMgaW1wb3J0ZWQgZnJvbSB0aGUgZ29vZ2xlIHNoZWV0cy4NCmBgYHtyfQ0KZGYgPC0gcmVhZC5jc3YoImh0dHBzOi8vZG9jcy5nb29nbGUuY29tL3NwcmVhZHNoZWV0cy9kL2UvMlBBQ1gtMXZSanV1d09lQ2p0SEhXSG1DU21vU3M0UEZTTlFOTTdJLU5RdU5idGFIYVZ0enBjUmpfdkxzemVmNVAxZ25HLU1pNndlZGdKckF5YkZYOGkvcHViP2dpZD0wJnNpbmdsZT10cnVlJm91dHB1dD1jc3YiKQ0KYGBgDQoNCkNvbHVtbiBuYW1lcyBoYXZlIGJlZW4gY29udmVydGVkIHRvIHNtYWxsIGxldHRlcnMuIEJsb29kIGdyb3VwLCBjb250ZXN0IHBhcnRpY2lwYXRpb24gc3RhdHVzLCBzaW0gdXNlIHN0YXR1cywgYW5kIGRhdGFjYW1wIGdyYW50IGRhdGUgaGFkIHNvbWUgaW5jb25zaXN0ZW5jaWVzLiBUaG9zZSBhcmUgcmVzb2x2ZWQgaGVyZS4NCg0KVGhlIGVkaXRpbmcgY29sdW1uIG5hbWVzIHNvIHRoYXQgdGhleSBhcmUgYWxsIGluIHNtYWxsIGxldHRlcnMgYW5kIC4gcmVwbGFjZWQgYnkgXyBmb3IgY29uc2lzdGVuY3kuDQpgYGB7cn0NCmRmMiA8LSBkZiAlPiUNCiAgcmVuYW1lX2FsbChmdW5jdGlvbih4KSAodG9sb3dlcihnc3ViKCJcXC4iLCAiXyIsIHgpKSkpICU+JSANCiAgbXV0YXRlKHgyMDIzX2NvbnRlc3RfcGVydGljaXBhdGVkID0gY2FzZV93aGVuKHgyMDIzX2NvbnRlc3RfcGVydGljaXBhdGVkID09ICIiIH4gIk5vIiwNCiAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIC5kZWZhdWx0ID0geDIwMjNfY29udGVzdF9wZXJ0aWNpcGF0ZWQpLA0KICAgICAgICAgYmxvb2RfZ3JvdXAgPSBjYXNlX3doZW4oDQogICAgICAgICAgIGJsb29kX2dyb3VwICVpbiUgIkIgKCt2ZSkiIH4gIkIrIiwNCiAgICAgICAgICAgYmxvb2RfZ3JvdXAgPT0gIk8oKykiIH4gIk8rIiwNCiAgICAgICAgICAgYmxvb2RfZ3JvdXAgPT0gIkEgKyh2ZSkiIH4gIkErIiwNCiAgICAgICAgICAgYmxvb2RfZ3JvdXAgPT0gIiIgfiBOQSwNCiAgICAgICAgICAgLmRlZmF1bHQgPSBibG9vZF9ncm91cCksDQogICAgICAgICBkYXRhY2FtcF9ncmFudGVkX2RhdGUgPSBjYXNlX3doZW4oDQogICAgICAgICAgIGRhdGFjYW1wX2dyYW50ZWRfZGF0ZSA9PSAiIiB+IE5BLA0KICAgICAgICAgICAuZGVmYXVsdCA9IGRhdGFjYW1wX2dyYW50ZWRfZGF0ZQ0KICAgICAgICAgKSwNCiAgICAgICAgIHNpbSA9IGNhc2Vfd2hlbigNCiAgICAgICAgICAgc2ltID09ICIjTi9BIiB+IE5BLA0KICAgICAgICAgICAuZGVmYXVsdCA9IHNpbQ0KICAgICAgICAgKSwNCiAgICAgICAgIGFjcm9zcyhpcy5jaGFyYWN0ZXIsIGFzLmZhY3RvcikpIA0KYGBgDQoNClN1bW1hcnkgb2YgZGF0YXNldDoNCmBgYHtyfQ0KZGZTdW1tYXJ5KGRmMiwgc3R5bGUgPSAiZ3JpZCIpDQpgYGANCg0KDQojIFVuaXZhcmlhdGUgVmlzdWFsaXphdGlvbg0KDQpgYGB7cn0NCmdncGxvdChkZjIpICsNCiAgYWVzKHggPSBmYWN0b3Ioc2Vzc2lvbiwgbGV2ZWxzID0gcmV2KGxldmVscyhzZXNzaW9uKSkpKSArDQogIGdlb21fYmFyKGZpbGwgPSAiIzU4NzJBMSIpICsNCiAgbGFicyh4ID0gTlVMTCwgeSA9ICJOdW1iZXIgb2YgTWVtZWJlcnMiLA0KICAgICAgIHRpdGxlID0gIkdlbmVyYWwgTWVtYmVycyBTdGF0aXN0aWNzIGJ5IFNlc3Npb24iKSArDQogICMgY29vcmRfZmxpcCgpICsNCiAgdGhlbWVfbGluZWRyYXcoKQ0KYGBgDQoNCg0KYGBge3J9DQpkZjIgJT4lDQogIGdncGxvdCgpICsNCiAgYWVzKHggPSByZXYoeWVhcikpICsNCiAgZ2VvbV9iYXIocG9zaXRpb24gPSAiZG9kZ2UiLCBmaWxsID0gIiM1ODcyQTEiKSArDQogIGxhYnMoeCA9ICIgIiwgeSA9ICJOdW1iZXIgb2YgTWVtYmVycyIsDQogICAgICAgdGl0bGUgPSAiR2VuZXJhbCBNZW1iZXJzIFN0YXRpc3RpY3MgYnkgWWVhciIpICsNCiAgdGhlbWVfbGluZWRyYXcoKQ0KYGBgDQoNCg0KYGBge3J9DQpkZjIgJT4lDQogIGZpbHRlcighc2ltICVpbiUgIiNOL0EiKSAlPiUNCiAgY291bnQoc2ltKSAlPiUNCiAgZ2dwbG90KCkgKw0KICBhZXMoeCA9IHJlb3JkZXIoc2ltLC1uKSwgeSA9IG4pICsNCiAgZ2VvbV9iYXIoc3RhdCA9ICJpZGVudGl0eSIsIGZpbGwgPSAiIzU4NzJBMSIpICsNCiAgbGFicyh4ID0gIiAiLCB5ID0gIk51bWJlciBvZiBNZW1iZXJzIiwNCiAgICAgICB0aXRsZSA9ICJHZW5lcmFsIE1lbWJlcnMgU3RhdGlzdGljcyBieSBNb2JpbGUgUGhvbmUgT3BlcmF0b3JzIikgKw0KICB0aGVtZV9saW5lZHJhdygpDQpgYGANCg0KDQpgYGB7cn0NCmRmMiAlPiUNCiAgZ2dwbG90KCkgKw0KICBhZXMoeCA9IHByb2dyYW1taW5nX2V4cCkgKw0KICBnZW9tX2Jhcihwb3NpdGlvbiA9ICJkb2RnZSIsIGZpbGwgPSAiIzU4NzJBMSIpICsNCiAgbGFicyh4ID0gIlByb2dyYW1taW5nIEV4cGVyaWVuY2UiLA0KICAgICAgIHkgPSAiTnVtYmVyIG9mIE1lbWJlcnMiLCB0aXRsZSA9ICJOdW1iZXIgb2YgR2VuZXJhbCBNZW1iZXJzIGJ5IFByb2dyYW1taW5nIEV4cGVyaWVuY2UiKSArDQogIHRoZW1lX2xpbmVkcmF3KCkNCmBgYA0KDQoNCmBgYHtyfQ0KZGYyICU+JQ0KICBmaWx0ZXIoIWJsb29kX2dyb3VwICVpbiUgIiIpICU+JSANCiAgY291bnQoYmxvb2RfZ3JvdXApICU+JSANCiAgZ2dwbG90KCkgKw0KICBhZXMoeCA9IHJlb3JkZXIoYmxvb2RfZ3JvdXAsIC1uKSwgeSA9IG4pICsNCiAgZ2VvbV9iYXIoc3RhdCA9ICJpZGVudGl0eSIsIGZpbGwgPSAiIzU4NzJBMSIpICsNCiAgbGFicyh4ID0gIiAiLCB5ID0gIk51bWJlciBvZiBNZW1iZXJzIiwNCiAgICAgICB0aXRsZSA9ICJCYXIgUGxvdCBvZiBCbG9vZCBHcm91cCBvZiBHZW5lcmFsIE1lbWJlcnMiKSArDQogIHRoZW1lX2xpbmVkcmF3KCkNCmBgYA0KDQoNCiMgQml2YXJpYXRlIFZpc3VhbGl6YXRpb24NCg0KYGBge3J9DQpnZ3Bsb3QoZGYyKSArDQogIGFlcyh4ID0geWVhciwgZmlsbCA9IHByb2dyYW1taW5nX2V4cCkgKw0KICBnZW9tX2Jhcihwb3NpdGlvbiA9ICJmaWxsIikgKw0KICBzY2FsZV9maWxsX2JyZXdlcihwYWxldHRlID0gIlBhaXJlZCIsDQogICAgICAgICAgICAgICAgICAgIGRpcmVjdGlvbiA9IDEpICsNCiAgbGFicyh4ID0gIiAiLA0KICAgICAgIHkgPSAiUGVyY2VudGFnZSIsDQogICAgICAgdGl0bGUgPSAiUGVyY2VudGFnZSBvZiBHZW5lcmFsIE1lbWJlcnMgV2l0aCBQcm9ncmFtbWluZyBFeHBlcmllbmNlIGJ5IFllYXIiLA0KICAgICAgIGZpbGwgPSAiUHJvZ3JhbW1pbmdcbkV4cGVyaWVuY2UiKSArDQogIHRoZW1lX2xpbmVkcmF3KCkgKw0KICBzY2FsZV95X2NvbnRpbnVvdXMobGFiZWxzID0gc2NhbGVzOjpwZXJjZW50KSANCmBgYA0KDQoNCmBgYHtyfQ0KZ2dwbG90KGRmMikgKw0KICBhZXMoeCA9IHgyMDIzX2NvbnRlc3RfcGVydGljaXBhdGVkLCBmaWxsID0gbGFwdG9wKSArDQogIGdlb21fYmFyKHBvc2l0aW9uID0gImZpbGwiKSArDQogIHNjYWxlX2ZpbGxfYnJld2VyKHBhbGV0dGUgPSAiUGFpcmVkIiwgZGlyZWN0aW9uID0gLTEpICsNCiAgbGFicyh4ID0gIiBQYXJ0aWNpcGF0ZWQgaW4gUHJvZ3JhbW1pbmcgQ29udGVzdCAyMDIzIiwNCiAgICAgICB5ID0gIlBlcmNlbnRhZ2UiLA0KICAgICAgIHRpdGxlID0gIlN0YWNrZWQgQmFyIFBsb3Qgb2YgTWVtYmVycyBXaXRoIENvbXB1dGVyIEFjY2VzcyBhbmQgQ29udGVzdCBQYXJ0aWNpcGF0aW9uIFN0YXR1cyAiLA0KICAgICAgIGZpbGwgPSAiQWNjZXNzIHRvIENvbXB1dGVyIikgKw0KICB0aGVtZV9saW5lZHJhdygpICsNCiAgc2NhbGVfeV9jb250aW51b3VzKGxhYmVscyA9IHNjYWxlczo6cGVyY2VudCkgDQpgYGANCg0KDQpgYGB7cn0NCmRmMiAlPiUNCiAgZmlsdGVyKCEoc2ltICVpbiUgIiNOL0EiKSkgJT4lDQogIGdncGxvdCgpICsNCiAgYWVzKHggPSBzaW0sIGZpbGwgPSB4MjAyM19jb250ZXN0X3BlcnRpY2lwYXRlZCkgKw0KICBnZW9tX2Jhcihwb3NpdGlvbiA9ICJkb2RnZSIpICsNCiAgc2NhbGVfZmlsbF9icmV3ZXIocGFsZXR0ZSA9ICJQYWlyZWQiLA0KICAgICAgICAgICAgICAgICAgICBkaXJlY3Rpb24gPSAtMSkgKw0KICBsYWJzKA0KICAgIHggPSAiICIsDQogICAgeSA9ICJOdW1iZXIgb2YgTWVtYmVycyIsDQogICAgdGl0bGUgPSAiU3RhY2tlZCBCYXIgUGxvdCBvZiBNZW1iZXJzIFVzaW5nIERpZmZlcmVudCBTSU0gYW5kIENvbnRlc3QgUGFydGljaXBhdGlvbiBTdGF0dXMgIiwNCiAgICBjYXB0aW9uID0gIkRhdGE6IFByb2dyYW1taW5nIENvbnRlc3QgMjAyMyBQYXJ0aWNpcGFudHMgU3RhdGlzdGljcyIsDQogICAgZmlsbCA9ICJQYXJ0aWNpcGF0ZWQgaW4gQ29udGVzdCINCiAgKSArDQogIHRoZW1lX2xpbmVkcmF3KCkgKw0KICB0aGVtZShsZWdlbmQucG9zaXRpb24gPSAidG9wIikNCmBgYA0KDQoNCg0KDQoNCg0KDQoNCg0KDQoNCg0KDQo=