library(readr)
project <- read_csv("D:/project r/CA3/project.csv")
## Rows: 231 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): state, data
## dbl (8): 1951, 1961, 1971, 1981, 1991, 2001, 2011, 2021
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(project)
. View first few rows
head(project, n=5)
## # A tibble: 5 × 10
## state `1951` `1961` `1971` `1981` `1991` `2001` `2011` `2021` data
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 Andaman & Nicob… 31 64 115 189 281 356 381 399 Tota…
## 2 Andhra Pradesh 31115 35983 43503 53551 66508 76210 84581 91702 Tota…
## 3 Arunachal Prade… NA 337 468 632 865 1098 1384 1712 Tota…
## 4 Assam 8029 10837 14625 18041 22414 26656 31206 35999 Tota…
## 5 Bihar 29085 34841 42126 52303 64531 82999 104099 128500 Tota…
Interpretation:- In the given output, the head(project, n = 5) command has been used in R to view the first five rows of a dataset named project. This is a common data exploration step used to get an initial understanding of the dataset’s structure and contents. The dataset is displayed in the form of a tibble, which is a modern version of a data frame in R. It consists of information on different Indian states or union territories along with numerical data corresponding to various census years ranging from 1951 to 2021. Each row represents a state or union territory, while each column from 1951 to 2021 likely represents population data for those years. The last column named data, which appears to be truncated in the image, might indicate the type or category of the data (such as total population). For instance, Andaman shows a gradual increase from 31 in 1951 to 399 in 2021, while Arunachal has missing data for 1951, represented as NA. This output helps in quickly identifying trends, verifying data integrity, and detecting any missing or unusual values before proceeding with more detailed analysis or visualization.
. How to show last 20 rows of data set
tail(project, n=20)
## # A tibble: 20 × 10
## state `1951` `1961` `1971` `1981` `1991` `2001` `2011` `2021` data
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 Jharkhand 122 146 179 221 274 338 414 414 densi…
## 2 Karnataka 101 123 153 194 235 276 319 319 densi…
## 3 Kerala 349 435 549 655 749 820 860 860 densi…
## 4 Lakshadweep 657 753 994 1258 1616 2022 2149 2149 densi…
## 5 Madhya Pradesh 61 75 98 124 158 196 236 236 densi…
## 6 Maharashtra 104 128 164 204 257 315 365 365 densi…
## 7 Manipur 26 35 48 64 82 97 115 115 densi…
## 8 Meghalaya 27 34 45 60 79 103 132 132 densi…
## 9 Mizoram 9 13 16 23 33 42 52 52 densi…
## 10 Nagaland 13 22 31 47 73 120 119 119 densi…
## 11 Odisha 94 113 141 169 203 236 270 270 densi…
## 12 Puducherry 645 750 959 1229 1683 1989 2547 2547 densi…
## 13 Punjab 182 221 269 333 403 484 551 551 densi…
## 14 Rajasthan 47 59 75 100 129 165 200 200 densi…
## 15 Sikkim 19 23 30 45 57 76 86 86 densi…
## 16 Tamil Nadu 232 259 317 372 429 480 555 555 densi…
## 17 Tripura 61 109 148 196 263 305 350 350 densi…
## 18 Uttar Pradesh 250 291 348 436 548 690 829 829 densi…
## 19 Uttarakhand 55 67 84 107 133 159 189 189 densi…
## 20 West Bengal 296 394 499 615 767 903 1028 1028 densi…
Intepretation:-In this output, the R command tail(project, n = 20) has been used to display the last 20 rows of the dataset named project. This is a useful way to inspect the bottom portion of a dataset, especially when it contains a large number of entries. The output is presented in a tibble format, showing data for various Indian states and union territories. Each row corresponds to a specific region, while the columns represent numerical values across different census years from 1951 to 2021. The values seem to represent population density figures, as suggested by the repeated entry “density” in the last column named data. From the output, we can observe trends in population density over time. For instance, the density in Kerala increased from 349 in 1951 to 860 in 2021, and in West Bengal, from 296 to 1028 over the same period, indicating significant growth. Some regions like Lakshadweep show particularly high density figures (e.g., 2149 in 2021), while others like Mizoram and Sikkim remain relatively low. This kind of data is valuable for understanding demographic changes, urbanization trends, and regional differences in population distribution over the decades. The use of tail() in this context helps ensure that the entire dataset, including its final rows, is explored and verified during analysis.
. How to View all columns names of Data Set
names(project)
## [1] "state" "1951" "1961" "1971" "1981" "1991" "2001" "2011" "2021"
## [10] "data"
Interpretation :- In the displayed output, the R command names(project) has been used to retrieve the column names of the dataset named project. This function is helpful for understanding the structure of the dataset, especially to see what kind of information each column contains. It returns a character vector listing all the column names present in the dataset. From the output, we can interpret that the dataset contains a mix of temporal data columns (e.g., “1951”, “1961”, …, “2021”) that likely represent data collected during those census years, as well as additional descriptive or calculated columns. These include: “state”: likely holds the names of states or union territories, “data”: possibly indicates the type of data (like population or density), “Growth_1951_1981” and “Growth_1991_2021”: these appear to be derived columns representing the growth in values (perhaps population or density) between the specified years. This command provides a quick summary of the dataset’s structure, allowing users to understand what information is available and plan their analysis or data manipulation accordingly
. How to View Row names in Data Set
rownames(project)
## [1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12"
## [13] "13" "14" "15" "16" "17" "18" "19" "20" "21" "22" "23" "24"
## [25] "25" "26" "27" "28" "29" "30" "31" "32" "33" "34" "35" "36"
## [37] "37" "38" "39" "40" "41" "42" "43" "44" "45" "46" "47" "48"
## [49] "49" "50" "51" "52" "53" "54" "55" "56" "57" "58" "59" "60"
## [61] "61" "62" "63" "64" "65" "66" "67" "68" "69" "70" "71" "72"
## [73] "73" "74" "75" "76" "77" "78" "79" "80" "81" "82" "83" "84"
## [85] "85" "86" "87" "88" "89" "90" "91" "92" "93" "94" "95" "96"
## [97] "97" "98" "99" "100" "101" "102" "103" "104" "105" "106" "107" "108"
## [109] "109" "110" "111" "112" "113" "114" "115" "116" "117" "118" "119" "120"
## [121] "121" "122" "123" "124" "125" "126" "127" "128" "129" "130" "131" "132"
## [133] "133" "134" "135" "136" "137" "138" "139" "140" "141" "142" "143" "144"
## [145] "145" "146" "147" "148" "149" "150" "151" "152" "153" "154" "155" "156"
## [157] "157" "158" "159" "160" "161" "162" "163" "164" "165" "166" "167" "168"
## [169] "169" "170" "171" "172" "173" "174" "175" "176" "177" "178" "179" "180"
## [181] "181" "182" "183" "184" "185" "186" "187" "188" "189" "190" "191" "192"
## [193] "193" "194" "195" "196" "197" "198" "199" "200" "201" "202" "203" "204"
## [205] "205" "206" "207" "208" "209" "210" "211" "212" "213" "214" "215" "216"
## [217] "217" "218" "219" "220" "221" "222" "223" "224" "225" "226" "227" "228"
## [229] "229" "230" "231"
Interpretation:- In this output, the R command rownames(project) is used to display the row names of the dataset named project. Row names in R are identifiers assigned to each row of a data frame or tibble. By default, when a dataset is created or imported without custom row names, R automatically assigns sequential numeric row names as character strings, starting from “1” and going up to the total number of rows in the dataset. From the image, we can interpret that the project dataset contains 231 rows, as shown by the final row name being “231”. Each row name is displayed as a string (in quotes), and they follow a continuous sequence from “1” to “231”. These row names help in identifying and accessing specific rows in the dataset, especially during data manipulation or extraction. Using rownames() is particularly useful when rows are uniquely labeled (e.g., with IDs or names), but in this case, it shows that the dataset uses default numeric row identifiers.
. How to Check missing values per column
colSums(is.na(project))
## state 1951 1961 1971 1981 1991 2001 2011 2021 data
## 0 54 8 2 2 1 0 0 164 0
Interpreataion:- In this output, the R command colSums(is.na(project)) has been used to identify the number of missing values (NA) in each column of the project dataset. This is an essential step in data cleaning and preprocessing, as missing data can affect analysis and model accuracy. The function is.na(project) generates a logical matrix indicating the presence of missing values (TRUE for NA, FALSE otherwise), and colSums() calculates the total number of TRUE values (i.e., missing entries) in each column. According to the output, several columns have missing values. For instance, the column for the year 1961 has 54 missing values, 1971 has 8, and 2021 has the highest with 164 missing entries. Derived columns like Growth_1951_1981 and Growth_1991_2021 also show 55 and 164 missing values, respectively, which is likely due to dependency on other incomplete columns. On the other hand, columns such as state, 1951, 2001, and data are complete with no missing values. This analysis helps to highlight where data imputation or special treatment may be required before proceeding with further steps in the analysis.
. Is there any N.A Values in the Data Set?Find if any
is.na(project)
## state 1951 1961 1971 1981 1991 2001 2011 2021 data
## [1,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [3,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [4,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [5,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [6,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [7,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [8,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [9,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [10,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [11,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [12,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [13,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [14,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [15,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [16,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [17,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [18,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [19,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [20,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [21,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [22,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [23,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [24,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [25,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [26,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [27,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [28,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [29,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [30,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [31,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [32,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [33,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [34,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [35,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [36,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [37,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [38,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [39,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [40,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [41,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [42,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [43,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [44,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [45,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [46,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [47,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [48,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [49,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [50,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [51,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [52,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [53,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [54,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [55,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [56,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [57,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [58,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [59,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [60,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [61,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [62,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [63,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [64,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [65,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [66,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [67,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [68,] FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [69,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [70,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [71,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [72,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [73,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [74,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [75,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [76,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [77,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [78,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [79,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [80,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [81,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [82,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [83,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [84,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [85,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [86,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [87,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [88,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [89,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [90,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [91,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [92,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [93,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [94,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [95,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [96,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [97,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [98,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [99,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [100,] FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [101,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [102,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [103,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [104,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [105,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [106,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [107,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [108,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [109,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [110,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [111,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [112,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [113,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [114,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [115,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [116,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [117,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [118,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [119,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [120,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [121,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [122,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [123,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [124,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [125,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [126,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [127,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [128,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [129,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [130,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [131,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [132,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [133,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [134,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [135,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [136,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [137,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [138,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [139,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [140,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [141,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [142,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [143,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [144,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [145,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [146,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [147,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [148,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [149,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [150,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [151,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [152,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [153,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [154,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [155,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [156,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [157,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [158,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [159,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [160,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [161,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [162,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [163,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [164,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [165,] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE
## [166,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [167,] FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [168,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [169,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [170,] FALSE TRUE TRUE TRUE FALSE FALSE FALSE FALSE TRUE FALSE
## [171,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [172,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [173,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [174,] FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [175,] FALSE TRUE TRUE TRUE TRUE FALSE FALSE FALSE TRUE FALSE
## [176,] FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE
## [177,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [178,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [179,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [180,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [181,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [182,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [183,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [184,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [185,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [186,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [187,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [188,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [189,] FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [190,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [191,] FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [192,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [193,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [194,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [195,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [196,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [197,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [198,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [199,] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [200,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [201,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [202,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [203,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [204,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [205,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [206,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [207,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [208,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [209,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [210,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [211,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [212,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [213,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [214,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [215,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [216,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [217,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [218,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [219,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [220,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [221,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [222,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [223,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [224,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [225,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [226,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [227,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [228,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [229,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [230,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [231,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Interpretation:- In this output, the R command is.na(project) has been used to detect missing values (NA) in the dataset named project. This function checks each cell in the dataset and returns a logical matrix of the same dimensions, where each entry is marked as TRUE if the value is missing (i.e., NA), and FALSE if it is present. From the output, we can see that the majority of values are FALSE, indicating no missing data. However, in row 3 and column “1961”, we see a TRUE value, which means that this specific cell contains a missing entry. This kind of output is especially useful when visually identifying the exact positions of missing values in the dataset. It allows users to pinpoint which rows and columns have gaps in the data, enabling more targeted data cleaning or imputation strategies. Overall, this function provides a clear, cell-by-cell overview of missing data in the dataset.
. To change the Data type from Double to Integer
project <- project %>%
mutate(across(where(is.double),as.integer))
Interpretation:- In this output, the R code demonstrates how to convert all numeric columns with type double to integer within the project dataset. The operation is performed using the dplyr package’s mutate() function combined with across() and where(). The line project <- project %>% mutate(across(where(is.double), as.integer)) efficiently scans the dataset for columns with the double data type (floating-point numbers) and converts them to the integer type. The function str(project) is then used to display the internal structure of the modified dataset. From the output, we can observe that all year-based columns (such as 1951, 1961, etc.) that were initially of type double are now of type int (integer). The column data, which contains character values like “Total Population”, remains unchanged as it’s not of numeric type. This transformation is helpful for memory optimization and ensuring appropriate data types for further statistical or analytical operations where decimal precision is unnecessary.
. Which state had the highest population in 1951?
project %>%
filter(`1951` == max(`1951`, na.rm = TRUE)) %>%
select(state,`1951`)
## # A tibble: 1 × 2
## state `1951`
## <chr> <int>
## 1 Uttar Pradesh 60274
Interpretation:- This R code filters the dataset project to identify the state(s) with the maximum population in the year 1951. It uses max() with na.rm = TRUE to ignore any missing values (NA) in the 1951 column while determining the highest value. Then, it selects only the state and 1951 columns for display. According to the output, Uttar Pradesh had the highest population in 1951, with a population count of 60,274 (in thousands, likely representing 60.27 million). This highlights Uttar Pradesh as the most populous state at that point in time.
. Which state had the highest population in 2021?
project %>%
filter(`2021` == max(`2021`, na.rm = TRUE)) %>%
select(state,`2021`)
## # A tibble: 1 × 2
## state `2021`
## <chr> <int>
## 1 Uttar Pradesh 231503
Interpretation:- This R code filters the dataset project to identify the state(s) with the highest population in the year 2021. It uses the max() function to find the largest value in the 2021 column, while na.rm = TRUE ensures that any missing values are ignored. The result is then displayed with only the state and 2021 columns. According to the output, Uttar Pradesh had the highest population in 2021, with a value of 231,503 (most likely in thousands, i.e., approximately 231.5 million people). This confirms Uttar Pradesh’s position as the most populous state in India in 2021.
. How to Filter union territories (assumption: UTs are known)
ut_list <- c("Delhi", "Chandigarh", "Puducherry", "Andaman & Nicobar Island")
project_ut <- project %>% filter(state %in% ut_list)
project_ut
## # A tibble: 27 × 10
## state `1951` `1961` `1971` `1981` `1991` `2001` `2011` `2021` data
## <chr> <int> <int> <int> <int> <int> <int> <int> <int> <chr>
## 1 Andaman & Nico… 31 64 115 189 281 356 381 399 Tota…
## 2 Chandigarh 24 120 257 452 642 901 1055 1158 Tota…
## 3 Delhi 1744 2659 4066 6220 9421 13851 16788 19301 Tota…
## 4 Puducherry 317 369 472 604 808 974 1248 1646 Tota…
## 5 Andaman & Nico… 23 49 89 139 206 240 244 NA Popu…
## 6 Chandigarh 24 21 24 29 66 92 29 NA Popu…
## 7 Delhi 307 299 419 452 949 945 419 NA Popu…
## 8 Puducherry 317 280 273 288 291 326 394 NA Popu…
## 9 Andaman & Nico… 8 14 26 50 75 116 136 NA Popu…
## 10 Chandigarh NA 99 233 423 576 809 1026 NA Popu…
## # ℹ 17 more rows
Interpretation:- The output shows population data for four Union Territories: Delhi, Chandigarh, Puducherry, and Andaman & Nicobar Island from 1951 to 2021. Each UT appears multiple times with different context in the data column: “Total Population”: Gives overall population numbers. “Population in Rural Area” / “Urban Area”: Shows the rural/urban split. “Decadal Growth” / “Sex Ratio”: Shows growth rates and demographic stats.
. Which States have Population more than 10 Thousand
great=project %>% filter(`1971` > 10000)
great
## # A tibble: 37 × 10
## state `1951` `1961` `1971` `1981` `1991` `2001` `2011` `2021` data
## <chr> <int> <int> <int> <int> <int> <int> <int> <int> <chr>
## 1 Andhra Pradesh 31115 35983 43503 53551 66508 76210 84581 91702 Total…
## 2 Assam 8029 10837 14625 18041 22414 26656 31206 35999 Total…
## 3 Bihar 29085 34841 42126 52303 64531 82999 104099 128500 Total…
## 4 Chhattisgarh 7457 9154 11637 14010 17615 20834 25545 32200 Total…
## 5 Gujarat 16263 20633 26697 34086 41310 50671 60440 70400 Total…
## 6 Haryana 5674 7591 10036 12922 16464 21145 25351 28901 Total…
## 7 Jharkhand 9697 11606 14227 17612 21844 26946 32988 40100 Total…
## 8 Karnataka 19402 23587 29299 37136 44977 52851 61095 69600 Total…
## 9 Kerala 13549 16904 21347 25454 29099 31841 33406 34699 Total…
## 10 Madhya Pradesh 18615 23218 30017 38169 48566 60348 72627 85002 Total…
## # ℹ 27 more rows
Interpretation: The given output displays the result of filtering states from the dataset where the population exceeded 10,000 (i.e., 10 million) by the year 1971. Using the R code project %>% filter(1971 > 10000), the dataset is narrowed down to include only those states whose total population was more than 10,000 thousand in 1971. The resulting table includes 18 entries, each representing a state that met this criterion. These states include Uttar Pradesh, Maharashtra, Bihar, West Bengal, and several others. The population values for these states show consistent growth in subsequent decades up to the year 2021. This filtered view highlights the major populous states during that time, which were already experiencing significant demographic expansion. It also aids in understanding population trends and the early onset of high-density habitation in certain regions of India.
. States with missing 1951 but present later
presentcheck=project %>% filter(is.na(`1951`) & !is.na(`1961`))
presentcheck
## # A tibble: 46 × 10
## state `1951` `1961` `1971` `1981` `1991` `2001` `2011` `2021` data
## <chr> <int> <int> <int> <int> <int> <int> <int> <int> <chr>
## 1 Arunachal Prad… NA 337 468 632 865 1098 1384 1712 Tota…
## 2 Arunachal Prad… NA 337 450 590 754 870 1066 NA Popu…
## 3 Chandigarh NA 99 233 423 576 809 1026 NA Popu…
## 4 Puducherry NA 89 198 316 517 649 850 NA Popu…
## 5 Andaman & Nico… NA 105 81 63 48 26 6 NA Deca…
## 6 Andhra Pradesh NA 15 20 23 24 14 11 NA Deca…
## 7 Assam NA 34 34 23 24 18 16 NA Deca…
## 8 Bihar NA 19 20 24 23 28 25 NA Deca…
## 9 Chandigarh NA 394 114 75 42 40 17 NA Deca…
## 10 Chhattisgarh NA 22 27 20 25 18 22 NA Deca…
## # ℹ 36 more rows
Interpretation:- The output displays states and union territories that had missing population data in 1951 but became available from 1961 onwards. This indicates that these regions were either not formed, not recognized, or not included in census records during 1951 but were added in subsequent decades. Examples include Arunachal Pradesh, Chandigarh, and Puducherry. This insight is useful for understanding administrative changes and the historical evolution of census data coverage in India.
. To find those States that never crossed 1 Thousand
project %>% filter(if_all(`1951`:`2021`, ~ . < 1000 | is.na(.)))
## # A tibble: 138 × 10
## state `1951` `1961` `1971` `1981` `1991` `2001` `2011` `2021` data
## <chr> <int> <int> <int> <int> <int> <int> <int> <int> <chr>
## 1 Andaman & Nico… 31 64 115 189 281 356 381 399 Tota…
## 2 Sikkim 138 162 210 316 406 541 611 658 Tota…
## 3 Andaman & Nico… 23 49 89 139 206 240 244 NA Popu…
## 4 Chandigarh 24 21 24 29 66 92 29 NA Popu…
## 5 Delhi 307 299 419 452 949 945 419 NA Popu…
## 6 Goa 477 503 592 685 690 677 552 NA Popu…
## 7 Mizoram 189 252 295 372 372 448 525 NA Popu…
## 8 Puducherry 317 280 273 288 291 326 394 NA Popu…
## 9 Sikkim 135 155 190 265 369 481 457 NA Popu…
## 10 Andaman & Nico… 8 14 26 50 75 116 136 NA Popu…
## # ℹ 128 more rows
Interpretation:- The output identifies states or regions that never had a population exceeding 1 million (10 lakh) in any census year from 1951 to 2021. This includes areas like Andaman & Nicobar Islands, Sikkim, Chandigarh, Goa, Mizoram, and Puducherry. These regions have consistently had smaller populations compared to other Indian states, often due to factors like geographic size, remoteness, or administrative classification (e.g., Union Territories or hilly regions). The presence of NA values indicates some missing data in specific years, but overall, their population values remained under the 1 million mark.
. What is the total Population early wise from 1951 to 2021
colSums(project[ , 2:9], na.rm = TRUE)
## 1951 1961 1971 1981 1991 2001 2011 2021
## 758157 918954 1140112 1415738 1748149 2120491 2490131 1464334
Interpretation:- The total population per year, calculated using the colSums() function, provides an aggregate view of the population across all Indian states and union territories from 1951 to 2021. According to the output, the population has shown a consistent increase over the decades, starting from approximately 758,157 in 1951 and rising to a peak of around 2,490,131 by 2011. This growth reflects the demographic expansion over time. However, a noticeable drop to 1,464,334 in 2021 may indicate missing data or incomplete entries in the dataset for that year, rather than an actual decline in population. This summary offers valuable insights into long-term population trends based on available census data.
. What will be the Average population in 1981 and 2021
mean(project$`1981`, na.rm = TRUE)
## [1] 6182.262
mean(project$`2021`, na.rm = TRUE)
## [1] 21855.73
Interpretation:- The average population across Indian states and union territories in 1981 was approximately 6,182, while in 2021, it rose significantly to around 21,856. This substantial increase in the average population over four decades highlights the overall population growth trend in the country. It suggests both natural population growth and possibly improvements in data coverage or administrative changes over time. The figures reflect how, on average, each state or union territory has seen its population more than triple during this period.
. How to replace N.A values with Mean
project <- project %>%
mutate(across(where(is.numeric), ~ ifelse(is.na(.), mean(., na.rm = TRUE), .)))
is.na(project)
## state 1951 1961 1971 1981 1991 2001 2011 2021 data
## [1,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [3,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [4,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [5,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [6,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [7,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [8,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [9,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [10,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [11,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [12,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [13,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [14,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [15,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [16,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [17,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [18,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [19,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [20,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [21,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [22,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [23,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [24,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [25,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [26,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [27,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [28,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [29,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [30,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [31,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [32,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [33,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [34,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [35,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [36,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [37,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [38,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [39,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [40,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [41,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [42,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [43,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [44,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [45,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [46,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [47,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [48,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [49,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [50,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [51,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [52,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [53,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [54,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [55,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [56,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [57,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [58,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [59,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [60,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [61,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [62,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [63,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [64,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [65,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [66,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [67,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [68,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [69,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [70,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [71,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [72,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [73,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [74,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [75,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [76,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [77,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [78,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [79,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [80,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [81,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [82,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [83,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [84,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [85,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [86,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [87,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [88,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [89,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [90,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [91,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [92,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [93,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [94,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [95,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [96,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [97,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [98,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [99,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [100,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [101,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [102,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [103,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [104,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [105,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [106,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [107,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [108,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [109,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [110,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [111,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [112,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [113,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [114,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [115,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [116,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [117,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [118,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [119,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [120,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [121,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [122,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [123,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [124,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [125,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [126,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [127,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [128,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [129,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [130,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [131,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [132,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [133,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [134,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [135,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [136,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [137,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [138,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [139,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [140,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [141,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [142,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [143,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [144,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [145,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [146,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [147,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [148,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [149,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [150,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [151,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [152,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [153,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [154,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [155,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [156,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [157,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [158,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [159,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [160,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [161,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [162,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [163,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [164,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [165,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [166,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [167,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [168,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [169,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [170,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [171,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [172,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [173,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [174,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [175,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [176,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [177,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [178,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [179,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [180,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [181,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [182,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [183,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [184,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [185,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [186,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [187,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [188,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [189,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [190,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [191,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [192,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [193,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [194,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [195,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [196,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [197,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [198,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [199,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [200,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [201,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [202,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [203,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [204,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [205,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [206,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [207,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [208,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [209,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [210,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [211,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [212,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [213,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [214,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [215,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [216,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [217,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [218,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [219,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [220,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [221,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [222,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [223,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [224,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [225,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [226,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [227,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [228,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [229,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [230,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [231,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Interpretation:- The operation shown in Q18 replaces all missing (NA) values in the dataset with the mean of their respective columns, but only for numeric columns. The output, which uses is.na(project), confirms the success of this replacement: all values across the dataset now return FALSE, indicating that no missing values remain. This step is crucial for ensuring the dataset is clean and suitable for further analysis, as missing values can interfere with statistical calculations and data visualizations.
. What will be the Average growth per decade
project <- project %>% mutate(
Growth_1951_1981 = (`1981` - `1951`) / 3,
Growth_1991_2021 = (`2021` - `1991`) / 3
)
view(project)
Interpretation:- The output in Q19 shows the average population growth per decade for each state over two time periods: from 1951 to 1981 and from 1991 to 2021. This was calculated by taking the difference in population between the start and end years of each period and dividing by 3 decades. The newly added columns Growth_1951_1981 and Growth_1991_2021 reflect these calculated values. For example, Andhra Pradesh had an average population increase of approximately 7475.67 thousand per decade from 1951 to 1981, and about 8398.00 thousand per decade from 1991 to 2021, indicating strong and consistent population growth. Interestingly, some regions like Arunachal Pradesh show a negative growth in the earlier period, which could be due to missing data, boundary changes, or errors in earlier censuses. This analysis helps identify trends in demographic expansion and compare how growth patterns have shifted over time across different states.
. Which state has the highest Rank by 2001,2021 population
project %>% arrange(desc(`2001`)) %>% select(state, `2001`,`2021`)
## # A tibble: 231 × 3
## state `2001` `2021`
## <chr> <int> <dbl>
## 1 Uttar Pradesh 166198 231503
## 2 Uttar Pradesh 131658 21856.
## 3 Maharashtra 96879 124904
## 4 Bihar 82999 128500
## 5 West Bengal 80176 100897
## 6 Andhra Pradesh 76210 91702
## 7 Bihar 74317 21856.
## 8 Tamil Nadu 62406 83698
## 9 Madhya Pradesh 60348 85002
## 10 West Bengal 57749 21856.
## # ℹ 221 more rows
Interpretation:- The output in Q20 displays the ranking of Indian states based on their population in the year 2001, sorted in descending order. This ranking helps identify the most populous states during that census year. From the results, Uttar Pradesh holds the top position with a population of 166,198 thousand, followed by another entry also labeled Uttar Pradesh with 131,658 thousand, which likely indicates data duplication or sub-division. Maharashtra and Bihar follow next with populations of 96,879 and 82,999 thousand, respectively. Other high-ranking states include West Bengal, Andhra Pradesh, and Tamil Nadu, which further confirms their status as major population centers in India. This ranking provides useful insights for demographic analysis, policy planning, and resource allocation based on state-wise population sizes in 2001.
. What is the population difference between 2011–2021
project %>% mutate(Growth = `2021` - `2011`) %>% arrange(desc(Growth))
## # A tibble: 231 × 13
## state `1951` `1961` `1971` `1981` `1991` `2001` `2011` `2021` data
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int> <dbl> <chr>
## 1 Telangana 0 0 0 0 0 0 0 3.82e4 Tota…
## 2 Uttar Pradesh 60274 70144 83849 105137 132062 166198 199812 2.32e5 Tota…
## 3 Bihar 29085 34841 42126 52303 64531 82999 104099 1.28e5 Tota…
## 4 Nagaland 4283. 14 39 50 56 64 0 2.19e4 Deca…
## 5 Kerala 4283. 24 26 19 14 9 4 2.19e4 Deca…
## 6 Andaman & Nico… 4283. 105 81 63 48 26 6 2.19e4 Deca…
## 7 Lakshadweep 4283. 14 31 26 28 17 6 2.19e4 Deca…
## 8 Goa 4283. 7 34 26 16 15 8 2.19e4 Deca…
## 9 Andhra Pradesh 4283. 15 20 23 24 14 11 2.19e4 Deca…
## 10 Himachal Prade… 4283. 17 23 23 20 17 12 2.19e4 Deca…
## # ℹ 221 more rows
## # ℹ 3 more variables: Growth_1951_1981 <dbl>, Growth_1991_2021 <dbl>,
## # Growth <dbl>
Interpretation: The output in Q21 ranks Indian states by their population growth from 2011 to 2021, calculated as the difference between the 2021 and 2011 population values. The states are sorted in descending order of growth, highlighting those with the highest population increases over the decade. From the results, Telangana shows the highest growth with an increase of 19,000 thousand, followed by Uttar Pradesh and Bihar, which also experienced substantial growth during this period. This indicates a strong upward demographic trend in these regions. On the other hand, some states and union territories, such as Andaman & Nicobar Islands, Lakshadweep, and Nagaland, exhibit very minimal or no population growth, possibly due to limited geographic area, migration patterns, or declining birth rates. This analysis helps identify fast-growing regions, which is valuable for infrastructure planning, urban development, and public service expansion based on shifting population dynamics.
. What will be the Top and bottom 5 growth from 1951–2021
project %>% mutate(Total_Growth = `2021` - `1951`) %>%
arrange(desc(Total_Growth)) %>%
slice(c(1:5, (n()-4):n()))
## # A tibble: 10 × 13
## state `1951` `1961` `1971` `1981` `1991` `2001` `2011` `2021` data
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int> <dbl> <chr>
## 1 Uttar Pradesh 60274 70144 83849 105137 132062 166198 199812 2.32e5 Tota…
## 2 Bihar 29085 34841 42126 52303 64531 82999 104099 1.28e5 Tota…
## 3 Maharashtra 32003 39554 50412 62783 78937 96879 112374 1.25e5 Tota…
## 4 West Bengal 26300 34926 44312 54581 68078 80176 91276 1.01e5 Tota…
## 5 Madhya Pradesh 18615 23218 30017 38169 48566 60348 72627 8.50e4 Tota…
## 6 Arunachal Prad… 4283. 337 468 632 865 1098 1384 1.71e3 Tota…
## 7 Andhra Pradesh 25695 29709 35100 41063 48621 55401 56362 2.19e4 Popu…
## 8 Arunachal Prad… 4283. 4 6 8 10 13 17 1.7 e1 dens…
## 9 Bihar 27219 32261 38770 47158 57819 74317 92341 2.19e4 Popu…
## 10 Uttar Pradesh 52049 61160 72195 86387 106090 131658 155317 2.19e4 Popu…
## # ℹ 3 more variables: Growth_1951_1981 <dbl>, Growth_1991_2021 <dbl>,
## # Total_Growth <dbl>
. How much the population Grow between 1951 to 2021
popgth <- project %>% mutate(Percent_Growth = (`2021` - `1951`) / `1951` * 100)
popgth
## # A tibble: 231 × 13
## state `1951` `1961` `1971` `1981` `1991` `2001` `2011` `2021` data
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int> <dbl> <chr>
## 1 Andaman & Nico… 31 64 115 189 281 356 381 399 Tota…
## 2 Andhra Pradesh 31115 35983 43503 53551 66508 76210 84581 91702 Tota…
## 3 Arunachal Prad… 4283. 337 468 632 865 1098 1384 1712 Tota…
## 4 Assam 8029 10837 14625 18041 22414 26656 31206 35999 Tota…
## 5 Bihar 29085 34841 42126 52303 64531 82999 104099 128500 Tota…
## 6 Chandigarh 24 120 257 452 642 901 1055 1158 Tota…
## 7 Chhattisgarh 7457 9154 11637 14010 17615 20834 25545 32200 Tota…
## 8 Delhi 1744 2659 4066 6220 9421 13851 16788 19301 Tota…
## 9 Goa 547 590 795 1008 1170 1348 1459 1522 Tota…
## 10 Gujarat 16263 20633 26697 34086 41310 50671 60440 70400 Tota…
## # ℹ 221 more rows
## # ℹ 3 more variables: Growth_1951_1981 <dbl>, Growth_1991_2021 <dbl>,
## # Percent_Growth <dbl>
Interpretation:- The output of Q23 calculates the percentage growth in population for each state from 1951 to 2021. This is done using the formula: (Population in 2021 − Population in 1951) / Population in 1951 × 100. This metric helps to understand how drastically the population has increased over the 70-year span. From the data, Chandigarh shows an exceptionally high percent growth (4725%), likely because it started with a very low population base in 1951 and experienced rapid development. Other regions like Delhi and Jammu & Kashmir also demonstrate significant growth, reflecting urbanization and migration trends. Conversely, Arunachal Pradesh shows a negative percent growth, which seems to be an error or anomaly, possibly due to incorrect or missing data in the 1951 column. Overall, the percentage growth provides a clearer relative view of population expansion across regions, emphasizing which areas have undergone the most demographic change over time.
. What will be the Average annual growth rate(in numbers)
Avg_Annual_Growth <- project %>% mutate(Avg_Annual_Growth = (`2021` - `1951`) / 70)
Interpretation:- The output of Q24 shows the average annual population growth from 1951 to 2021 for each Indian state, calculated over 70 years. It highlights states like Uttar Pradesh with the highest annual growth, while others like Arunachal Pradesh show lower or negative values, possibly due to data issues. This metric helps understand long-term population trends for planning and development.
. each state, which decade experienced the maximum population increase from 1951-2021
project <- project %>% mutate(
Diff_51_61 = `1961` - `1951`,
Diff_61_71 = `1971` - `1961`,
Diff_71_81 = `1981` - `1971`,
Diff_81_91 = `1991` - `1981`,
Diff_91_01 = `2001` - `1991`,
Diff_01_11 = `2011` - `2001`,
Diff_11_21 = `2021` - `2011`
)
project$Max_Decade <- apply(project %>% select(starts_with("Diff_")), 1, function(x) {
names(x)[which.max(x)]
})
Interpretation:- The output of Q25 identifies the decade with the maximum population increase for each state from 1951 to 2021. It calculates the population difference across each decade and then determines which decade had the highest increase per state. Most states show 2011–2021 (Diff_11_21) as the decade with the highest growth, indicating a strong recent population surge. However, some states like Andaman & Nicobar Islands and Delhi peaked in earlier decades like 1981–1991 (Diff_81_91). This analysis helps highlight the period of most rapid demographic change for each region.
. Top 11 States population Bar graph in 2011
top11 <- project %>% arrange(desc(`2011`)) %>% head(20)
#bar graph
ggplot(top11, aes(x = reorder(state, `2011`), y = `2011`)) +
geom_col(fill = "steelblue") +
ggtitle("Top 11 States by Population (2011)") +
xlab("State") +
ylab("Population in 2011") +
theme_minimal()
Interpretation:-The given R code is used to create a bar graph that visualizes the top 11 states based on their population in the year 2011. First, the dataset project is processed to arrange the states in descending order of their 2011 population using arrange(desc(2011)), and then the top 20 entries are selected with head(20). From these, the top 11 states are likely being used for plotting. The ggplot function is then used to generate the bar graph. In the plot, the x-axis represents the states (reordered by their 2011 population for better readability), and the y-axis represents their respective population counts. The bars are colored using a “steelblue” fill, giving the chart a clean and professional appearance. Labels for the x-axis (“State”) and y-axis (“Population in 2011”) are added, along with a title (“Top 11 States by Population (2011)”). Finally, theme_minimal() is applied to give the plot a simple, minimalistic design
. grouped bar chart for comparing state populations in 1951 and 2011?
top5 <- project %>% arrange(desc(2011)) %>% tail(6)
top5_long <- top5 %>%
pivot_longer(cols = c(`1961`, `2011`), names_to = "Year", values_to = "Population")
ggplot(top5_long, aes(x = state, y = Population, fill = Year)) +
geom_col(position = "dodge") +
ggtitle("Grouped Bar Chart: Population of bottom 6 States (1951 vs 2011)") +
xlab("State") +
ylab("Population") +
labs(fill = "Year") +
theme_minimal()
Interpretation:-The provided R code creates a grouped bar chart comparing the populations of the top 5 states between 1951 and 2011. First, the project dataset is sorted by 2011 population, and the last 6 entries are selected using tail(6), though usually, the top 5 are intended. The selected data is reshaped into long format using pivot_longer(), combining the 1951 and 2011 population columns into “Year” and “Population.” Using ggplot(), a grouped bar chart is plotted with states on the x-axis, population on the y-axis, and bars colored by year. geom_col(position = “dodge”) places bars side-by-side for comparison, with labels and a clean theme_minimal() style.
. create a horizontal bar chart for the top 5 states by population in 2011?
top5_states <- project %>% arrange(desc(2001)) %>% head(5)
ggplot(top5_states, aes(x = reorder(state, `2001`), y = `2001`, fill = state)) +
geom_col() + # Using actual population values for the bars
coord_flip() + # Flip bars to horizontal
ggtitle("Top 5 States by Population in 2001") +
xlab("State") +
ylab("Population (2001)") +
theme_light()
Interpretation:-This R code selects the top 5 states with the highest
population in 2011, then creates a horizontal bar chart using ggplot2.
The states are reordered by population, the bars represent actual
population values, and coord_flip() flips the chart for better
readability. The plot includes appropriate labels, a clear title, and a
clean theme_light() style.
. create a horizontal bar chart for the 1991 population of selected southern Indian states?
southern_states <- c("Tamil Nadu", "Kerala", "Karnataka", "Andhra Pradesh")
southern_df <- project %>% filter(state %in% southern_states)
ggplot(southern_df, aes(x = reorder(state, `1991`), y = `1991`, fill = state)) +
geom_col() + # Create bar chart
coord_flip() + # Flip to horizontal bars
ggtitle("Population of Southern Indian States (1991)") +
xlab("State") +
ylab("Population in 1991") +
theme_light()
Interpretation:-The R code creates a horizontal bar chart showing the
1991 population of four southern Indian states by filtering the data,
plotting with ggplot() and geom_col(), reordering states by population,
and styling the plot with theme_light().
. create a boxplot to compare population distributions across four different years?
boxplot(project[, c("1951", "1981", "2001", "2021")],
main = "Population Across Different Years",
col = rainbow(4))
This R code creates a boxplot to compare the distribution of population
values across the years 1951, 1981, 2001, and 2021 from the project
dataset, using different rainbow colors for each year and giving the
plot a clear title.
, Basic histogram of population in 2021
ggplot(project, aes(x = `2011`)) +
geom_histogram(binwidth = 15000, fill = "skyblue", color = "black") +
labs(title = "Histogram of Population in 2011", x = "Population", y = "Frequency") +
theme_minimal()
Interpretation:-The histogram shows that in 2011, most regions had a low
population, primarily below 20,000. The distribution is highly
right-skewed, with only a few regions having very high populations above
50,000. This indicates an uneven population spread, where a small number
of urban areas have much larger populations compared to the majority of
regions.
. create a scatter plot to compare the 2011 and 2021 population data?
ggplot(project, aes(x = `2011`, y = `2021`)) +
geom_point(color = "purple") +
geom_smooth(method = "lm", se = FALSE, color = "red") +
labs(title = "Scatter Plot: Population 2001 vs 2021",
x = "2011 Population",
y = "2021 Population") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
Interpretation:-This R code creates a scatter plot comparing 2011 and
2021 population values using ggplot2, with purple data points, a red
linear regression line, and a minimal theme.
. Create a scatter plot comparing each state’s absolute population growth from 2001–2011 against its growth from 2011–2021 to see if fast-growing states stayed fast-growing.
project$Growth_2001_2011 <- project$`2011` - project$`2001`
project$Growth_2011_2021 <- project$`2021` - project$`2011`
ggplot(project, aes(x = Growth_2001_2011, y = Growth_2011_2021)) +
geom_point(color = "darkblue") +
geom_smooth(method = "lm", se = FALSE, color = "orange") +
labs(
title = "Population Growth: 2001–2011 vs 2011–2021",
x = "Growth 2001–2011",
y = "Growth 2011–2021"
) +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
Interpretation:-The scatter plot shows a general positive trend,
indicating that states with higher population growth in 2001–2011 tended
to continue growing in 2011–2021, but with considerable variation,
suggesting other factors influenced growth patterns.-
Create pairplot for the population data of 1951, 1981, 2001, and 2021 to analyze the relationships between these years’ populations.
population_subset <- project[, c("1951", "1981", "2001", "2021")]
ggpairs(population_subset,
title = "Pairplot of Population (1961, 1991, 2011, 2021)")
Interpretation:-This R code selects the 1951, 1981, 2001, and 2021 population columns from the project dataset and creates a pairplot using ggpairs(), allowing visualization of the relationships between populations across different decades,
. Does the population in 2021 differ significantly between different Zones (e.g., North, South, East, West)?
ggplot(project, aes(x = `1951`, y = `2021`)) +
geom_point() +
geom_smooth(method = "lm", formula = y ~ poly(x, 2), se = FALSE) +
labs(
title = "Polynomial Regression: Pop2021 ~ Year1951^2",
x = "1951 Population",
y = "2021 Population"
) +
theme_minimal()
Interpretation:-The scatter shows a strong positive relationship—states
that were more populous in 1951 tend to be more populous in 2021—but the
curved fit line tells us it isn’t just a straight‐line increase. The
upward curvature of the polynomial regression indicates that, up to a
point, states with larger 1951 populations experienced
disproportionately larger absolute growth by 2021, but at the very
highest initial populations the growth rate begins to taper off. In
other words, mid‐sized states “pulled ahead” fastest, while the very
smallest and very largest states saw somewhat slower change than a
strictly linear model would predict.
**