Below code generates five sub-samples by randomly selecting rows (with replacement) from the original dataset. Each sub-sample contains a random selection of rows.
For each sub-sample, it also randomly selects a column set from a predefined list, creating a mix of categorical and continuous columns in each sub-sample.
These sub-samples are stored in separate data frames (df_1, df_2, …, df_5) and are appended to the sub sample_list for further analysis.
The sub-samples represent random selections of data from the original dataset. As a result, each sub-sample is a unique subset of the data, and there will be variability in the observations and columns included in each sub-sample.
The significance lies in understanding how data variability affects subsequent analyses. It helps assess the robustness of conclusions drawn from the data and whether they are consistent across different sub-samples.
This approach is useful for exploring the sensitivity of analytical results to variations in data, which is essential for making informed decisions based on data analysis.
data<-read.csv('./Downloads/students_dropout_and_academic_success.csv')
# Set the number of sub samples
num_subsamples <- 5
sample_size <- 100
# Creating a list to store the sub samples
subsample_list <- list()
# Creating a list of column sets (mix of categorical and continuous)
column_sets <- list (
c(1, 2, 3, 7, 13, 14),
c(2, 9, 10, 14, 22,23),
c(1, 9, 15, 24, 23, 26),
c(15, 10, 18, 27, 26, 29),
c(19,21,1, 30,31,32)
)
# Generate and store sub samples
for (i in 1:num_subsamples) {
# Randomly select rows with replacement
subsample_indices <- sample(nrow(data), sample_size, replace = TRUE)
# Randomly select a column set from the list
selected_columns <- unlist(sample(column_sets, 1))
# Creating the sub sample by indexing the original data with selected rows and columns
subsample <- data[subsample_indices, selected_columns]
# Storing the sub sample in a data frame
df_name <- paste("df_", i, sep = "")
assign(df_name, subsample)
# Append the sub sample to the list for further analysis if needed
subsample_list[[i]] <- subsample
}
print(df_1)
## Marital_status Application_mode Application_order
## 2504 2 39 1
## 3807 1 1 3
## 3846 1 39 1
## 2186 1 17 1
## 4285 1 39 2
## 3860 1 43 1
## 2630 1 39 9
## 3942 1 39 1
## 3921 1 1 5
## 3686 1 43 1
## 1606 1 53 1
## 2766 1 16 4
## 4377 1 17 6
## 2335 1 1 1
## 443 1 17 6
## 84 1 17 1
## 1924 1 17 3
## 4282 1 1 1
## 1209 1 7 1
## 625 1 17 4
## 2378 1 39 1
## 2326 2 39 1
## 4394 1 39 1
## 3726 1 44 1
## 177 1 39 1
## 3698 1 17 1
## 3353 1 39 1
## 3515 1 39 1
## 2017 1 39 1
## 1906 1 44 1
## 3487 1 1 2
## 4363 1 1 3
## 1313 1 1 1
## 1439 1 18 1
## 385 1 53 1
## 1848 2 39 1
## 3391 1 17 1
## 699 1 44 1
## 3005 1 17 1
## 585 1 17 1
## 2432 1 1 1
## 2747 1 1 1
## 4173 1 43 2
## 1702 1 1 1
## 4392 2 7 1
## 3490 1 1 1
## 2706 1 39 2
## 1083 1 51 1
## 1 1 17 5
## 54 1 1 3
## 4296 1 39 1
## 2941 4 39 1
## 3569 1 17 5
## 38 1 43 1
## 2973 1 5 1
## 2460 1 1 1
## 1769 1 7 1
## 2593 1 7 1
## 918 2 39 1
## 3372 1 1 6
## 3803 4 43 1
## 64 1 17 1
## 1088 1 1 1
## 2379 1 43 1
## 1311 1 53 1
## 2888 1 1 1
## 4058 1 42 1
## 1516 1 1 1
## 120 1 1 1
## 3363 1 1 4
## 3941 2 39 1
## 2035 2 39 1
## 809 1 1 4
## 392 1 39 1
## 2221 1 39 1
## 2748 1 42 1
## 743 1 17 3
## 4185 1 1 1
## 983 1 1 1
## 3163 1 1 4
## 625.1 1 17 4
## 474 1 1 2
## 3327 2 43 1
## 663 1 17 4
## 3031 1 7 1
## 3666 1 39 1
## 3118 1 1 4
## 2354 1 17 1
## 2734 1 17 2
## 625.2 1 17 4
## 1100 1 1 2
## 3216 1 1 6
## 951 1 39 1
## 1791 2 39 1
## 3256 1 17 6
## 2212 1 1 1
## 2404 1 1 1
## 1933 1 17 2
## 3089 5 39 1
## 2782 1 1 1
## Previous_qualification_grade Admission_grade Displaced
## 2504 133.1 148.8 0
## 3807 147.0 135.8 1
## 3846 133.1 128.2 0
## 2186 125.0 118.0 1
## 4285 130.0 110.0 0
## 3860 133.1 136.1 0
## 2630 120.0 144.8 0
## 3942 133.1 116.0 1
## 3921 126.0 125.7 1
## 3686 120.0 134.4 0
## 1606 130.0 130.0 1
## 2766 145.0 127.5 1
## 4377 143.0 127.3 1
## 2335 137.0 122.3 0
## 443 136.0 132.5 1
## 84 120.0 125.5 0
## 1924 142.0 129.4 0
## 4282 133.1 105.0 1
## 1209 130.0 130.0 0
## 625 116.0 109.0 0
## 2378 120.0 110.0 0
## 2326 110.0 106.8 0
## 4394 120.0 146.2 0
## 3726 130.0 130.0 0
## 177 130.0 102.5 0
## 3698 119.0 125.0 0
## 3353 138.7 111.3 0
## 3515 120.0 114.0 1
## 2017 160.0 116.3 1
## 1906 150.0 145.6 1
## 3487 149.0 137.8 1
## 4363 133.0 128.2 0
## 1313 151.0 138.4 1
## 1439 133.0 121.5 1
## 385 140.0 140.2 0
## 1848 140.0 130.0 1
## 3391 133.0 127.4 0
## 699 140.0 140.0 1
## 3005 135.0 128.0 1
## 585 102.0 102.0 0
## 2432 172.0 149.3 0
## 2747 106.0 102.2 1
## 4173 131.0 123.7 1
## 1702 137.0 129.3 1
## 4392 130.0 130.0 0
## 3490 106.0 105.0 0
## 2706 143.0 100.0 1
## 1083 117.0 120.0 1
## 1 122.0 127.3 1
## 54 167.0 159.3 0
## 4296 130.0 120.0 1
## 2941 100.0 130.0 0
## 3569 118.0 120.5 1
## 38 140.0 122.9 0
## 2973 150.0 139.4 0
## 2460 133.0 129.9 0
## 1769 140.0 140.0 0
## 2593 130.0 130.0 0
## 918 133.1 100.0 1
## 3372 136.0 120.8 1
## 3803 140.0 126.0 0
## 64 127.0 130.2 1
## 1088 133.1 146.0 0
## 2379 136.0 133.9 0
## 1311 130.0 133.8 1
## 2888 153.0 153.7 0
## 4058 140.0 140.0 0
## 1516 146.0 153.0 1
## 120 127.0 121.8 0
## 3363 123.0 117.4 1
## 3941 120.0 124.6 0
## 2035 133.1 162.0 0
## 809 143.0 129.7 1
## 392 150.0 140.0 0
## 2221 130.0 150.0 0
## 2748 130.0 105.3 0
## 743 142.0 132.9 1
## 4185 136.0 133.2 1
## 983 134.0 122.8 0
## 3163 136.0 123.0 1
## 625.1 116.0 109.0 0
## 474 136.0 140.0 1
## 3327 140.0 130.0 1
## 663 127.0 115.8 1
## 3031 130.0 130.0 0
## 3666 133.1 140.0 0
## 3118 122.0 121.0 1
## 2354 125.0 121.5 1
## 2734 118.0 110.7 1
## 625.2 116.0 109.0 0
## 1100 133.0 128.5 1
## 3216 102.0 101.7 1
## 951 130.0 120.2 1
## 1791 140.0 127.3 0
## 3256 120.0 113.7 1
## 2212 124.0 125.4 1
## 2404 130.0 117.8 1
## 1933 125.0 118.0 1
## 3089 140.0 160.0 0
## 2782 125.0 129.2 1
print(df_2)
## Scholarship_holder International Marital_status
## 3424 0 0 2
## 2088 0 0 1
## 1894 0 0 1
## 2513 0 0 1
## 1710 0 0 1
## 2479 0 0 1
## 1613 0 0 1
## 2053 0 0 1
## 1373 0 0 1
## 4225 1 0 1
## 3493 0 0 2
## 1517 0 0 1
## 783 1 0 1
## 2710 0 0 4
## 4222 0 0 1
## 3685 0 0 2
## 1242 0 0 2
## 3074 0 0 1
## 2317 0 0 1
## 911 0 0 1
## 4290 0 0 1
## 692 0 0 2
## 241 0 0 1
## 1088 0 0 1
## 573 0 0 1
## 2692 0 0 1
## 3924 1 0 4
## 3977 0 0 1
## 3207 0 0 1
## 3598 0 0 1
## 2199 0 0 1
## 4332 0 0 1
## 108 0 0 1
## 2019 0 0 1
## 1694 0 0 1
## 3509 1 0 1
## 3183 0 1 1
## 2196 1 0 1
## 843 0 0 1
## 3146 1 0 2
## 2488 0 0 1
## 2344 0 0 1
## 994 0 0 1
## 16 0 0 1
## 565 0 0 1
## 73 0 0 1
## 1916 0 0 1
## 3617 1 0 1
## 348 0 0 4
## 2137 0 0 1
## 4215 1 0 1
## 1723 0 0 1
## 3805 0 0 1
## 3987 0 0 1
## 196 0 0 1
## 1430 1 0 1
## 2869 0 0 2
## 1185 0 0 1
## 1788 0 0 1
## 658 0 0 1
## 798 0 0 1
## 1443 0 0 1
## 1730 0 0 2
## 1490 0 0 1
## 621 0 0 1
## 651 0 0 1
## 3604 0 0 1
## 1003 0 0 1
## 3213 1 0 1
## 1223 0 0 2
## 1249 0 0 1
## 2581 1 0 1
## 89 0 0 1
## 51 1 0 1
## 833 0 1 1
## 3767 1 0 1
## 1077 0 0 1
## 4340 0 0 1
## 3777 0 0 1
## 2108 0 0 1
## 1235 0 0 1
## 1180 1 0 1
## 2403 0 0 1
## 1075 0 0 1
## 3757 0 0 1
## 3836 0 0 2
## 2548 0 0 1
## 1964 1 0 1
## 610 1 0 1
## 206 0 0 1
## 810 0 0 1
## 173 0 0 1
## 3260 0 0 1
## 83 0 0 1
## 2240 0 0 1
## 3274 0 0 1
## 4388 0 0 2
## 3915 0 0 2
## 598 0 0 4
## 2101 0 0 1
## Curricular_units_2nd_sem_evaluations Curricular_units_2nd_sem_approved
## 3424 5 5
## 2088 8 5
## 1894 0 0
## 2513 8 8
## 1710 7 6
## 2479 11 7
## 1613 12 8
## 2053 1 0
## 1373 7 6
## 4225 11 8
## 3493 5 1
## 1517 0 0
## 783 8 8
## 2710 15 7
## 4222 11 6
## 3685 6 4
## 1242 9 3
## 3074 8 3
## 2317 9 2
## 911 6 6
## 4290 6 6
## 692 13 13
## 241 8 7
## 1088 6 6
## 573 11 3
## 2692 10 4
## 3924 9 7
## 3977 8 8
## 3207 6 5
## 3598 6 6
## 2199 7 2
## 4332 0 0
## 108 17 4
## 2019 6 5
## 1694 8 4
## 3509 6 4
## 3183 20 8
## 2196 8 6
## 843 6 0
## 3146 10 8
## 2488 0 0
## 2344 5 5
## 994 6 6
## 16 7 0
## 565 8 6
## 73 0 0
## 1916 7 6
## 3617 6 5
## 348 9 5
## 2137 6 6
## 4215 8 8
## 1723 20 12
## 3805 13 2
## 3987 9 5
## 196 0 0
## 1430 7 6
## 2869 12 5
## 1185 11 6
## 1788 6 4
## 658 13 11
## 798 8 8
## 1443 6 6
## 1730 13 13
## 1490 9 5
## 621 6 6
## 651 7 6
## 3604 9 3
## 1003 0 0
## 3213 6 6
## 1223 7 0
## 1249 11 1
## 2581 6 6
## 89 5 5
## 51 6 6
## 833 7 2
## 3767 6 6
## 1077 8 8
## 4340 14 12
## 3777 10 6
## 2108 9 5
## 1235 8 7
## 1180 11 8
## 2403 7 3
## 1075 7 6
## 3757 10 5
## 3836 16 5
## 2548 9 8
## 1964 8 8
## 610 8 5
## 206 9 4
## 810 9 6
## 173 7 6
## 3260 7 6
## 83 14 10
## 2240 6 0
## 3274 8 6
## 4388 6 5
## 3915 18 8
## 598 9 5
## 2101 8 5
## Curricular_units_2nd_sem_grade
## 3424 15.40000
## 2088 15.16667
## 1894 0.00000
## 2513 13.06250
## 1710 12.45000
## 2479 11.34714
## 1613 12.28125
## 2053 0.00000
## 1373 11.71429
## 4225 13.92727
## 3493 10.00000
## 1517 0.00000
## 783 14.63375
## 2710 12.04286
## 4222 16.00000
## 3685 11.50000
## 1242 11.66667
## 3074 13.66667
## 2317 16.00000
## 911 13.33333
## 4290 11.16667
## 692 12.38462
## 241 11.46250
## 1088 13.16667
## 573 10.80000
## 2692 13.50000
## 3924 13.00000
## 3977 12.61250
## 3207 13.33333
## 3598 12.16667
## 2199 10.00000
## 4332 0.00000
## 108 13.25000
## 2019 13.00000
## 1694 12.40000
## 3509 12.25000
## 3183 11.14286
## 2196 12.28571
## 843 0.00000
## 3146 12.00000
## 2488 0.00000
## 2344 13.00000
## 994 13.50000
## 16 0.00000
## 565 13.62500
## 73 0.00000
## 1916 13.45000
## 3617 14.20000
## 348 11.66667
## 2137 12.16667
## 4215 15.63750
## 1723 15.42857
## 3805 10.00000
## 3987 13.00000
## 196 0.00000
## 1430 12.83333
## 2869 13.00000
## 1185 11.93333
## 1788 13.75000
## 658 11.45455
## 798 12.95000
## 1443 14.66667
## 1730 12.84615
## 1490 14.83333
## 621 12.50000
## 651 16.42857
## 3604 12.00000
## 1003 0.00000
## 3213 12.16667
## 1223 0.00000
## 1249 11.00000
## 2581 13.00000
## 89 13.80000
## 51 14.16667
## 833 12.50000
## 3767 13.00000
## 1077 12.00000
## 4340 13.29167
## 3777 13.00000
## 2108 13.20000
## 1235 13.92571
## 1180 12.51250
## 2403 11.00000
## 1075 13.00000
## 3757 11.60000
## 3836 11.80000
## 2548 13.23250
## 1964 13.71250
## 610 11.98000
## 206 10.75000
## 810 11.50000
## 173 11.68333
## 3260 14.31667
## 83 13.30000
## 2240 0.00000
## 3274 12.00000
## 4388 10.16667
## 3915 12.22222
## 598 12.60000
## 2101 11.83333
print(df_3)
## Application_mode Mothers_qualification Fathers_qualification Displaced
## 3981 17 1 38 0
## 2353 1 3 3 0
## 3133 1 3 1 1
## 3567 43 37 37 1
## 2822 18 5 3 1
## 2217 18 3 3 0
## 2274 1 34 34 1
## 420 1 1 19 0
## 689 39 37 1 0
## 1523 17 19 1 1
## 2451 1 1 38 0
## 2347 43 3 19 0
## 1090 43 19 19 0
## 2947 44 3 37 0
## 2453 1 1 1 1
## 1271 17 37 19 0
## 4410 43 37 37 0
## 3826 1 1 38 1
## 2167 17 38 37 1
## 4193 17 37 37 1
## 2332 39 19 37 1
## 1324 17 38 38 1
## 2853 53 1 1 0
## 3013 16 37 37 1
## 3570 39 3 1 0
## 1467 17 10 12 1
## 2219 7 19 1 0
## 2755 7 38 38 0
## 2355 1 37 19 0
## 3925 1 19 19 1
## 2047 1 37 1 1
## 1557 1 37 19 1
## 4390 1 19 19 0
## 1649 1 19 37 1
## 2795 39 37 37 0
## 3813 43 37 37 1
## 4324 39 37 37 0
## 1681 1 38 38 1
## 1105 1 19 19 1
## 471 17 1 37 1
## 2565 1 37 37 1
## 2820 43 19 1 1
## 3790 17 1 44 1
## 2237 1 38 19 0
## 2136 39 19 1 0
## 2091 1 38 37 1
## 3490 1 1 1 0
## 1814 1 37 38 1
## 3899 39 37 37 1
## 343 39 37 37 1
## 2746 1 37 19 1
## 3060 1 19 37 1
## 3916 1 37 37 1
## 4240 39 3 2 1
## 4226 44 38 19 0
## 166 17 19 37 0
## 1051 1 1 38 1
## 3770 1 1 1 1
## 2427 39 19 38 0
## 1641 1 1 37 1
## 871 1 1 1 0
## 2033 39 19 37 1
## 438 1 19 37 0
## 3798 43 3 19 1
## 1590 17 1 19 1
## 1033 1 37 37 0
## 465 17 1 1 0
## 2146 17 3 3 1
## 4362 17 3 3 0
## 329 1 2 19 1
## 141 1 37 37 1
## 1413 43 37 37 1
## 3218 1 37 37 1
## 1752 1 38 19 1
## 1840 39 1 19 1
## 1032 1 19 37 0
## 509 1 5 5 0
## 3433 1 38 38 1
## 1253 17 38 38 1
## 410 39 34 34 0
## 4296 39 38 19 1
## 3752 1 3 1 1
## 623 17 38 38 1
## 467 1 1 38 1
## 535 44 4 4 0
## 3805 1 1 19 1
## 2307 7 2 38 1
## 3201 39 19 38 0
## 3601 18 1 4 1
## 3635 1 3 38 0
## 11 1 38 19 1
## 1475 17 1 3 1
## 3887 1 19 19 1
## 3390 43 19 37 0
## 1226 17 38 38 1
## 3083 18 38 38 0
## 708 39 37 37 0
## 1498 17 1 1 1
## 860 1 34 38 0
## 1110 17 19 36 1
## Curricular_units_1st_sem_credited Curricular_units_1st_sem_enrolled
## 3981 0 6
## 2353 0 5
## 3133 0 6
## 3567 14 18
## 2822 0 5
## 2217 0 7
## 2274 0 6
## 420 0 7
## 689 0 5
## 1523 0 6
## 2451 0 0
## 2347 0 6
## 1090 0 0
## 2947 0 6
## 2453 0 5
## 1271 0 6
## 4410 0 5
## 3826 0 0
## 2167 0 8
## 4193 0 6
## 2332 0 6
## 1324 0 7
## 2853 7 13
## 3013 0 7
## 3570 0 6
## 1467 0 5
## 2219 0 6
## 2755 10 10
## 2355 0 6
## 3925 6 12
## 2047 0 7
## 1557 0 5
## 4390 0 6
## 1649 0 7
## 2795 0 5
## 3813 8 11
## 4324 0 5
## 1681 0 5
## 1105 0 6
## 471 0 8
## 2565 0 6
## 2820 0 6
## 3790 3 9
## 2237 0 6
## 2136 0 6
## 2091 0 7
## 3490 0 6
## 1814 0 5
## 3899 0 6
## 343 0 6
## 2746 0 6
## 3060 0 8
## 3916 0 5
## 4240 0 6
## 4226 3 6
## 166 0 6
## 1051 0 0
## 3770 0 6
## 2427 0 5
## 1641 0 7
## 871 0 6
## 2033 0 6
## 438 0 6
## 3798 0 6
## 1590 0 6
## 1033 0 6
## 465 0 6
## 2146 0 6
## 4362 0 8
## 329 0 6
## 141 0 7
## 1413 3 9
## 3218 0 5
## 1752 0 0
## 1840 0 6
## 1032 0 6
## 509 0 5
## 3433 0 6
## 1253 0 6
## 410 0 5
## 4296 7 13
## 3752 7 12
## 623 0 6
## 467 0 6
## 535 0 0
## 3805 0 5
## 2307 2 5
## 3201 0 6
## 3601 0 0
## 3635 0 5
## 11 0 6
## 1475 0 5
## 3887 0 6
## 3390 0 6
## 1226 0 6
## 3083 6 12
## 708 17 18
## 1498 0 6
## 860 0 6
## 1110 0 5
print(df_4)
## Marital_status Mothers_qualification Educational_special_needs
## 60 1 38 0
## 784 1 3 0
## 3279 1 3 0
## 4113 4 38 0
## 869 1 19 0
## 3856 1 3 0
## 2139 1 19 0
## 967 2 19 0
## 4397 1 19 0
## 1319 1 1 0
## 2105 1 19 1
## 216 1 1 0
## 2665 2 37 0
## 2230 1 1 0
## 2465 2 37 0
## 4105 1 37 0
## 310 1 37 0
## 128 1 19 0
## 2625 1 1 0
## 2562 1 19 0
## 2827 1 1 0
## 1345 1 1 0
## 740 1 1 0
## 527 1 19 0
## 1954 1 1 0
## 2474 1 1 0
## 2811 1 38 0
## 3566 2 1 0
## 3 1 37 0
## 2445 1 34 0
## 4357 1 37 0
## 1967 1 1 0
## 3411 1 1 0
## 338 1 38 0
## 3469 1 37 0
## 1432 1 1 0
## 1406 1 1 0
## 4005 1 37 0
## 2415 1 34 0
## 2620 1 1 0
## 244 1 37 0
## 1004 1 19 0
## 1019 1 19 1
## 3886 1 38 0
## 1888 2 37 0
## 550 1 19 0
## 404 1 19 0
## 4219 1 1 0
## 879 1 19 0
## 3541 1 1 0
## 2820 1 19 0
## 3782 1 1 0
## 2066 1 1 0
## 450 1 19 0
## 2943 1 40 0
## 3136 1 1 0
## 909 1 19 0
## 2383 1 1 0
## 799 1 1 0
## 3999 4 19 0
## 4169 1 34 0
## 329 1 2 0
## 487 1 38 0
## 322 1 37 0
## 2343 1 2 0
## 1596 1 1 0
## 2830 1 3 0
## 1479 2 1 0
## 4405 2 2 0
## 3386 1 19 0
## 1689 1 19 0
## 1340 1 37 0
## 3581 1 19 0
## 169 1 2 0
## 2755 1 38 0
## 628 1 19 0
## 3831 1 1 0
## 522 1 3 0
## 4407 1 1 0
## 1341 1 37 0
## 2034 2 37 0
## 3947 1 3 0
## 1040 1 3 0
## 200 1 1 0
## 827 1 1 0
## 1472 1 3 0
## 3989 1 19 0
## 3997 1 1 0
## 3388 1 38 0
## 3535 1 34 0
## 2847 2 34 0
## 1159 1 1 0
## 1273 2 38 0
## 3128 1 19 0
## 1160 1 19 0
## 2325 1 1 0
## 989 1 2 0
## 222 2 37 1
## 2214 1 19 0
## 3075 2 37 0
## Curricular_units_1st_sem_evaluations Curricular_units_1st_sem_enrolled
## 60 0 0
## 784 9 8
## 3279 8 5
## 4113 11 8
## 869 7 6
## 3856 6 6
## 2139 7 7
## 967 5 5
## 4397 7 6
## 1319 11 6
## 2105 9 6
## 216 10 7
## 2665 6 5
## 2230 6 6
## 2465 7 5
## 4105 9 6
## 310 7 6
## 128 8 6
## 2625 6 6
## 2562 8 8
## 2827 6 6
## 1345 6 6
## 740 7 7
## 527 9 5
## 1954 7 6
## 2474 0 6
## 2811 6 6
## 3566 12 5
## 3 0 6
## 2445 12 6
## 4357 11 6
## 1967 12 5
## 3411 7 6
## 338 8 8
## 3469 11 5
## 1432 12 6
## 1406 5 5
## 4005 6 6
## 2415 19 10
## 2620 6 6
## 244 5 5
## 1004 8 5
## 1019 6 6
## 3886 9 6
## 1888 4 4
## 550 8 5
## 404 6 6
## 4219 6 6
## 879 7 5
## 3541 10 5
## 2820 6 6
## 3782 6 6
## 2066 7 6
## 450 8 6
## 2943 12 12
## 3136 0 0
## 909 8 6
## 2383 6 6
## 799 8 6
## 3999 10 7
## 4169 10 6
## 329 9 6
## 487 7 6
## 322 8 8
## 2343 22 18
## 1596 8 5
## 2830 0 0
## 1479 17 6
## 4405 12 6
## 3386 9 6
## 1689 0 0
## 1340 8 8
## 3581 12 6
## 169 8 8
## 2755 10 10
## 628 8 7
## 3831 8 8
## 522 14 5
## 4407 6 6
## 1341 6 6
## 2034 4 4
## 3947 0 0
## 1040 9 6
## 200 4 3
## 827 8 7
## 1472 12 6
## 3989 13 5
## 3997 11 6
## 3388 8 6
## 3535 5 5
## 2847 7 3
## 1159 9 6
## 1273 7 5
## 3128 8 5
## 1160 7 5
## 2325 9 6
## 989 11 6
## 222 6 6
## 2214 10 6
## 3075 17 15
## Curricular_units1st_sem_grade
## 60 0.00000
## 784 13.34500
## 3279 10.00000
## 4113 12.16000
## 869 13.85714
## 3856 16.83333
## 2139 11.97143
## 967 0.00000
## 4397 12.66667
## 1319 12.00000
## 2105 11.87500
## 216 12.12500
## 2665 14.00000
## 2230 11.83333
## 2465 11.25000
## 4105 13.71429
## 310 11.60000
## 128 12.80000
## 2625 12.33333
## 2562 13.07375
## 2827 13.33333
## 1345 11.83333
## 740 13.83333
## 527 12.00000
## 1954 12.28571
## 2474 0.00000
## 2811 12.50000
## 3566 10.66667
## 3 0.00000
## 2445 0.00000
## 4357 12.66667
## 1967 12.33333
## 3411 13.42857
## 338 13.39125
## 3469 11.40000
## 1432 12.00000
## 1406 0.00000
## 4005 13.16667
## 2415 11.62500
## 2620 14.83333
## 244 0.00000
## 1004 11.20000
## 1019 11.00000
## 3886 10.83333
## 1888 0.00000
## 550 13.25000
## 404 12.00000
## 4219 13.50000
## 879 12.33333
## 3541 10.00000
## 2820 0.00000
## 3782 12.83333
## 2066 14.00000
## 450 12.80000
## 2943 14.50000
## 3136 0.00000
## 909 11.25000
## 2383 13.16667
## 799 13.33333
## 3999 12.62000
## 4169 10.50000
## 329 12.60000
## 487 11.50000
## 322 13.05857
## 2343 11.66667
## 1596 12.50000
## 2830 0.00000
## 1479 14.50000
## 4405 11.00000
## 3386 10.80000
## 1689 0.00000
## 1340 11.40000
## 3581 13.62500
## 169 14.25714
## 2755 13.60000
## 628 13.02500
## 3831 12.36571
## 522 12.00000
## 4407 12.00000
## 1341 12.33333
## 2034 0.00000
## 3947 0.00000
## 1040 12.33333
## 200 11.81667
## 827 11.57143
## 1472 11.60000
## 3989 14.00000
## 3997 12.00000
## 3388 13.28571
## 3535 11.00000
## 2847 14.00000
## 1159 11.80000
## 1273 10.00000
## 3128 11.00000
## 1160 14.20000
## 2325 10.16667
## 989 12.00000
## 222 13.33333
## 2214 11.57143
## 3075 11.75000
print(df_5)
## Marital_status Mothers_qualification Educational_special_needs
## 2603 1 38 0
## 1010 1 1 0
## 3121 2 37 0
## 2156 1 19 0
## 2611 1 1 0
## 2615 1 3 0
## 4394 1 38 0
## 1887 1 37 0
## 538 1 3 0
## 3436 1 1 0
## 2526 1 3 0
## 2208 1 37 0
## 825 1 1 0
## 3029 1 37 0
## 4280 1 19 0
## 1233 1 19 0
## 3102 1 1 0
## 3156 1 37 0
## 3578 1 19 0
## 3630 1 19 0
## 2118 1 38 0
## 448 1 1 0
## 2608 1 37 0
## 2767 1 37 0
## 3611 1 38 0
## 621 1 3 0
## 1327 1 1 0
## 488 1 1 0
## 167 3 1 0
## 3821 1 4 0
## 801 1 40 0
## 304 1 19 0
## 3566 2 1 0
## 342 1 3 0
## 3659 1 19 0
## 1897 1 19 1
## 1300 1 38 0
## 409 1 1 0
## 1274 2 37 0
## 2264 1 38 0
## 711 2 38 0
## 3296 1 19 0
## 1057 1 1 0
## 1554 1 1 0
## 5 2 37 0
## 4152 1 37 0
## 672 1 37 0
## 2152 1 3 0
## 4306 1 19 0
## 2292 1 37 0
## 26 1 19 0
## 246 1 19 0
## 3588 1 1 0
## 1821 1 19 0
## 3348 1 37 0
## 2474 1 1 0
## 3444 1 1 0
## 3383 1 37 0
## 2180 1 38 0
## 3419 1 38 0
## 1623 2 37 0
## 4220 1 37 0
## 3282 2 35 0
## 2628 1 1 0
## 2010 1 1 0
## 2944 1 19 0
## 494 1 37 0
## 3081 1 19 0
## 1475 1 1 0
## 2926 1 37 0
## 3303 1 37 0
## 1948 1 1 0
## 4137 5 1 0
## 1429 3 37 0
## 658 1 1 0
## 4229 2 37 0
## 808 1 3 0
## 2389 1 38 0
## 415 1 1 0
## 2793 4 37 0
## 546 1 19 0
## 258 1 37 0
## 4057 1 1 0
## 697 2 37 0
## 3320 1 1 0
## 3640 1 1 0
## 3187 1 12 0
## 914 4 37 0
## 1867 1 19 0
## 932 1 19 0
## 2664 1 1 0
## 569 1 38 0
## 2904 1 1 0
## 1034 1 1 0
## 679 2 37 0
## 2687 1 19 0
## 2485 2 37 0
## 2346 1 1 0
## 2651 1 3 0
## 2518 1 19 0
## Curricular_units_1st_sem_evaluations Curricular_units_1st_sem_enrolled
## 2603 7 6
## 1010 8 6
## 3121 16 10
## 2156 10 7
## 2611 9 7
## 2615 6 5
## 4394 10 5
## 1887 18 9
## 538 7 6
## 3436 6 6
## 2526 13 6
## 2208 5 5
## 825 6 6
## 3029 8 8
## 4280 6 6
## 1233 8 7
## 3102 6 5
## 3156 8 5
## 3578 16 8
## 3630 6 6
## 2118 8 8
## 448 8 7
## 2608 8 6
## 2767 8 8
## 3611 9 5
## 621 6 6
## 1327 12 6
## 488 13 6
## 167 1 1
## 3821 13 6
## 801 8 6
## 304 6 5
## 3566 12 5
## 342 0 0
## 3659 5 5
## 1897 13 6
## 1300 10 6
## 409 6 6
## 1274 7 7
## 2264 14 11
## 711 5 5
## 3296 8 6
## 1057 7 7
## 1554 10 5
## 5 9 6
## 4152 10 6
## 672 19 11
## 2152 13 6
## 4306 14 5
## 2292 9 6
## 26 8 6
## 246 8 8
## 3588 12 12
## 1821 12 6
## 3348 0 5
## 2474 0 6
## 3444 6 6
## 3383 6 6
## 2180 8 6
## 3419 7 7
## 1623 18 7
## 4220 8 6
## 3282 22 15
## 2628 7 7
## 2010 0 6
## 2944 13 12
## 494 9 6
## 3081 5 5
## 1475 7 5
## 2926 6 6
## 3303 8 7
## 1948 8 5
## 4137 8 8
## 1429 8 6
## 658 16 16
## 4229 8 6
## 808 10 6
## 2389 8 5
## 415 9 6
## 2793 13 9
## 546 9 5
## 258 6 6
## 4057 8 6
## 697 7 6
## 3320 11 6
## 3640 8 6
## 3187 13 5
## 914 8 6
## 1867 5 5
## 932 10 7
## 2664 5 5
## 569 7 5
## 2904 6 6
## 1034 7 5
## 679 7 6
## 2687 7 5
## 2485 6 5
## 2346 15 11
## 2651 10 7
## 2518 8 8
## Curricular_units1st_sem_grade
## 2603 12.28571
## 1010 13.87500
## 3121 11.40000
## 2156 15.11500
## 2611 12.38571
## 2615 12.40000
## 4394 12.33333
## 1887 11.83333
## 538 11.00000
## 3436 11.83333
## 2526 13.15385
## 2208 0.00000
## 825 12.50000
## 3029 13.62500
## 4280 0.00000
## 1233 14.35714
## 3102 12.40000
## 3156 10.00000
## 3578 10.25000
## 3630 13.83333
## 2118 12.82875
## 448 12.92875
## 2608 13.00000
## 2767 14.41429
## 3611 12.50000
## 621 13.83333
## 1327 12.50000
## 488 11.40000
## 167 0.00000
## 3821 13.00000
## 801 11.00000
## 304 12.50000
## 3566 10.66667
## 342 0.00000
## 3659 12.40000
## 1897 12.00000
## 1300 10.00000
## 409 14.66667
## 1274 0.00000
## 2264 15.09091
## 711 0.00000
## 3296 10.50000
## 1057 12.49571
## 1554 14.33333
## 5 12.33333
## 4152 13.83333
## 672 13.44444
## 2152 11.75000
## 4306 11.00000
## 2292 10.80000
## 26 11.60000
## 246 13.22000
## 3588 13.16667
## 1821 13.00000
## 3348 0.00000
## 2474 0.00000
## 3444 13.83333
## 3383 15.16667
## 2180 12.66667
## 3419 13.28571
## 1623 10.25000
## 4220 13.00000
## 3282 13.33333
## 2628 12.20000
## 2010 0.00000
## 2944 13.00000
## 494 13.33333
## 3081 15.20000
## 1475 12.00000
## 2926 14.50000
## 3303 13.46875
## 1948 11.00000
## 4137 11.85714
## 1429 10.20000
## 658 10.92308
## 4229 13.60000
## 808 14.00000
## 2389 10.80000
## 415 12.33333
## 2793 12.00000
## 546 10.20000
## 258 11.83333
## 4057 14.00000
## 697 13.66667
## 3320 11.20000
## 3640 14.00000
## 3187 0.00000
## 914 11.20000
## 1867 10.50000
## 932 13.00000
## 2664 0.00000
## 569 14.60000
## 2904 14.00000
## 1034 13.25000
## 679 15.00000
## 2687 11.00000
## 2485 14.83333
## 2346 14.22222
## 2651 13.97667
## 2518 12.21429
This code is designed to analyze a set of sub samples, each stored as a separate data frame (df_1, df_2, …, df_n), and generate frequency tables for the first three columns of each sub sample (which are categorical).
By examining the frequency tables across sub samples, you can observe variations in the distribution of categories. Some sub samples may have a higher prevalence of certain categories compared to others. This information helps identify the diversity and differences in categorical data across sub samples.
Understanding the variations in categorical data among sub samples is crucial for drawing meaningful conclusions. It allows you to identify patterns, anomalies, or trends that may be unique to specific sub samples.
for (i in 1:num_subsamples) {
df <- get(paste("df_", i, sep = ""))
categorical_columns <- names(df)[1:3] #As i have assigned first three columns in my sub samples as categorical columns
# Filter the data frame to include only categorical columns
df_categorical <- df[, categorical_columns]
# Print the DataFrame name
cat("DataFrame:", names(df), "\n")
# Create and print frequency tables for each categorical column
for (col in names(df_categorical)) {
freq_table <- table(df_categorical[[col]])
cat("Column:", col, "\n")
print(freq_table)
}
}
## DataFrame: Marital_status Application_mode Application_order Previous_qualification_grade Admission_grade Displaced
## Column: Marital_status
##
## 1 2 4 5
## 88 9 2 1
## Column: Application_mode
##
## 1 5 7 16 17 18 39 42 43 44 51 53
## 30 1 5 1 21 1 25 2 7 3 1 3
## Column: Application_order
##
## 1 2 3 4 5 6 9
## 69 8 5 9 3 5 1
## DataFrame: Scholarship_holder International Marital_status Curricular_units_2nd_sem_evaluations Curricular_units_2nd_sem_approved Curricular_units_2nd_sem_grade
## Column: Scholarship_holder
##
## 0 1
## 84 16
## Column: International
##
## 0 1
## 98 2
## Column: Marital_status
##
## 1 2 4
## 84 12 4
## DataFrame: Application_mode Mothers_qualification Fathers_qualification Displaced Curricular_units_1st_sem_credited Curricular_units_1st_sem_enrolled
## Column: Application_mode
##
## 1 7 16 17 18 39 43 44 53
## 43 3 1 20 4 16 9 3 1
## Column: Mothers_qualification
##
## 1 2 3 4 5 10 19 34 37 38
## 21 2 12 1 2 1 20 3 23 15
## Column: Fathers_qualification
##
## 1 2 3 4 5 12 19 34 36 37 38 44
## 16 1 6 2 1 1 21 2 1 29 19 1
## DataFrame: Marital_status Mothers_qualification Educational_special_needs Curricular_units_1st_sem_evaluations Curricular_units_1st_sem_enrolled Curricular_units1st_sem_grade
## Column: Marital_status
##
## 1 2 4
## 86 12 2
## Column: Mothers_qualification
##
## 1 2 3 19 34 37 38 40
## 31 5 8 25 5 16 9 1
## Column: Educational_special_needs
##
## 0 1
## 97 3
## DataFrame: Marital_status Mothers_qualification Educational_special_needs Curricular_units_1st_sem_evaluations Curricular_units_1st_sem_enrolled Curricular_units1st_sem_grade
## Column: Marital_status
##
## 1 2 3 4 5
## 84 11 2 2 1
## Column: Mothers_qualification
##
## 1 3 4 12 19 35 37 38 40
## 30 8 1 1 20 1 27 11 1
## Column: Educational_special_needs
##
## 0 1
## 99 1
Frequency Tables Analysis:
Marital_status: Across sub sample, there are differences in marital status distributions. Some sub samples are dominated by category 1 (like in sub sample 1), while others have more diversity in marital status (like in sub sample 3). This suggests variations in marital status representation.
Mothers_qualification: Sub samples exhibit varying distributions of maternal qualification. For instance, sub sample 1 has a significant number of category 1, where as sub sample 4 has a more balanced distribution. This indicates disparities in maternal education levels.
Educational_special_needs: The presence of special needs (category 0) differs among sub samples. Sub sample 1 predominantly contains category 0, while sub sample 5 has a more even distribution between 0 and 1. This indicates differences in the prevalence of educational special needs.
Fathers_qualification: Sub samples display disparities in paternal qualification categories. For example, sub sample 1 has a concentration in category 1, whereas sub sample 4 exhibits a more varied distribution. This suggests variations in paternal education levels.
Marital_status: In sub sample 1, a large proportion of category 1 might not be considered an anomaly, but in sub sample 3, where category 1 is less prevalent, an increase could be seen as an anomaly.
Educational_special_needs: A high number of category 0 in sub sample 2 may not be an anomaly within that context, but in sub sample 4, where category 0 is less common, it might be regarded as an anomaly.
Scholarship_holder: This variable appears consistently in all sub samples, indicating the presence of scholarship holders in each group.
International: Similarly, the “International” variable has some presence in all subsamples, suggesting the representation of international students in each group.
Marital_status: The categories of marital status (1-5) consistently appear in all subsamples, indicating that various marital statuses are represented in each group.
Mothers_qualification, Educational_special_needs, Fathers_qualification: These categorical variables also have consistent representations across sub samples, but the distribution of categories within each variable varies.
These observations provide valuable insights into the diversity and characteristics of the dataset:
Differences in categorical distributions among sub samples highlight the heterogeneity of the data. Future analysis should consider these differences when drawing conclusions or making decisions.
Anomalies should be defined within the context of each sub sample. What is considered unusual in one sub sample may not be so in another due to varying distributions.
While variations exist, certain aspects of the data, such as the presence of scholarship holders and international students, remain consistent. These can be considered general trends across all sub samples.