##introduction to R R is a powerful, open-source programming language and environment designed for statistical computing and data analysis. Widely used by data scientists and statisticians, R offers extensive libraries, data visualization tools, and statistical techniques. Its flexibility and versatility make it a preferred choice for exploring, modeling, and visualizing data.
##basic components Variables: Store and manipulate data. Functions: Perform operations or computations. Data Structures: Organize and store data. Control Structures: Control the flow of program execution (e.g., loops and conditionals). Libraries (Packages): Extend R’s capabilities with specialized functions. Data Import/Export: Read and write data from/to various formats. Documentation: Add comments and documentation for code clarity. Graphics: Create visualizations and plots.
##datatypes in r Numeric: Represents numbers (e.g., 3.14, -42). Character (String): Stores text (e.g., “Hello, World!”). Logical: Represents TRUE or FALSE values. Date and Time: Handles date and time values.
# Define variables with different data types
numeric_var <- 18
character_var <- "hello how are you"
logical_var <- FALSE
date_time_var <- as.POSIXct("2023-01-15 14:30:00")
# Print variables
cat("Numeric Variable:", numeric_var, "\n")
## Numeric Variable: 18
cat("Character Variable:", character_var, "\n")
## Character Variable: hello how are you
cat("Logical Variable:", logical_var, "\n")
## Logical Variable: FALSE
cat("Date and Time Variable:", date_time_var, "\n")
## Date and Time Variable: 1673773200
##data structures in r Vector: A one-dimensional array of elements of the same data type (e.g., c(1, 2, 3)). Matrix: A two-dimensional array of elements of the same data type. List: A versatile data structure that can hold elements of different data types (e.g., list(1, “apple”, TRUE)). Data Frame: A tabular structure where columns can have different data types (e.g., data imported from a CSV file).
# Create data structures
vector_example <- c(1, 2, 3, 4, 5)
matrix_example <- matrix(1:6, nrow = 2, ncol = 3)
list_example <- list(1, "apple", TRUE)
data_frame_example <- data.frame(
Name = c("vaishnavi", "lasya", "lisa"),
Age = c(25, 18, 26),
Score = c(99, 95, 90)
)
# Print data structures
cat("Vector Example:", vector_example, "\n")
## Vector Example: 1 2 3 4 5
cat("Matrix Example:\n")
## Matrix Example:
print(matrix_example)
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
cat("List Example:\n")
## List Example:
print(list_example)
## [[1]]
## [1] 1
##
## [[2]]
## [1] "apple"
##
## [[3]]
## [1] TRUE
cat("Data Frame Example:\n")
## Data Frame Example:
print(data_frame_example)
## Name Age Score
## 1 vaishnavi 25 99
## 2 lasya 18 95
## 3 lisa 26 90
the above we created a data frame using different data structures and it gets printed in a tabular form in a well organised way.
# a.Create two vectors of integers
vector1 <- c(10, 23, 34,55)
vector2 <- c(43, 51, 62,77)
# Add the two vectors element-wise
result_vector <- vector1 + vector2
# Print the result
print(result_vector)
## [1] 53 74 96 132
here vector1 and vector2 will be added index wise so 10+43 will be in the first index of the resultant vector and so on.
# b.Create a vector
my_vector <- c(22, 98, 63, 85, 100)
# Calculate the sum, mean, and product
sum_result <- sum(my_vector)
mean_result <- mean(my_vector)
product_result <- prod(my_vector)
# Print the results
print(paste("Sum:", sum_result))
## [1] "Sum: 368"
print(paste("Mean:", mean_result))
## [1] "Mean: 73.6"
print(paste("Product:", product_result))
## [1] "Product: 1154538000"
in the above using the predefined functions sum,mean and product(prod is the function), we are finding mean sum prod of the vector.
# c.Create a vector
my_vector <- c(32, 15, 72)
# Find the minimum and maximum
min_value <- min(my_vector)
max_value <- max(my_vector)
# Print the results
print(paste("Minimum:", min_value))
## [1] "Minimum: 15"
print(paste("Maximum:", max_value))
## [1] "Maximum: 72"
we found the maximum and minimum of the given vector using min and max predefined functions.
#d. Create a list
my_list <- list(
string_element = "Hello heyy",
numeric_element = 427,
vector_element = c(1, 3,3,4),
logical_element = TRUE
)
# Print the list
print(my_list)
## $string_element
## [1] "Hello heyy"
##
## $numeric_element
## [1] 427
##
## $vector_element
## [1] 1 3 3 4
##
## $logical_element
## [1] TRUE
printing string numeric vector and logical elements and getting to know te difference between them. hetereogenous list is made
#e. Create a list with named elements
my_list <- list(
vector_element = c(1, 2, 3),
matrix_element = matrix(1:6, nrow = 2),
nested_list = list(a = "apple", b = "banana")
)
# Access the first and second elements of the list
first_element <- my_list$vector_element
second_element <- my_list$matrix_element
# Print the accessed elements
print(first_element)
## [1] 1 2 3
print(second_element)
## [,1] [,2] [,3]
## [1,] 1 3 5
## [2,] 2 4 6
now we are going to create and print a matrix
#f. Create a 3x5 matrix filled with zeros
my_matrix <- matrix(9, nrow = 4, ncol = 6)
# Print the matrix
print(my_matrix)
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 9 9 9 9 9 9
## [2,] 9 9 9 9 9 9
## [3,] 9 9 9 9 9 9
## [4,] 9 9 9 9 9 9
#g. Create a sample matrix
my_matrix <- matrix(1:16, nrow = 4)
# Access specific elements
element_1 <- my_matrix[2, 3] # 3rd column, 2nd row
element_2 <- my_matrix[3, ] # 3rd row
element_3 <- my_matrix[, 4] # 4th column
# Print the accessed elements
print(element_1)
## [1] 10
print(element_2)
## [1] 3 7 11 15
print(element_3)
## [1] 13 14 15 16
#h. Create vectors
name <- c("Alice", "cinderella", "beauty")
age <- c(20, 20, 25)
# Create a DataFrame
df <- data.frame(Name = name, Age = age)
# Display the DataFrame
print(df)
## Name Age
## 1 Alice 20
## 2 cinderella 20
## 3 beauty 25
#i. Create a DataFrame
df <- data.frame(Name = c("Alice", "cinderella"), Age = c(20, 20))
# New data to insert
new_data <- data.frame(Name = c("Charlie", "David"), Age = c(35, 40))
# Insert new rows
df <- rbind(df, new_data)
# Display the updated DataFrame
print(df)
## Name Age
## 1 Alice 20
## 2 cinderella 20
## 3 Charlie 35
## 4 David 40
#j. Create a DataFrame
df <- data.frame(Name = c("Alice", "Bob","charle2"), Age = c(25, 30,40))
# Add a new column
df$Salary <- c(50000, 60000,80000)
# Display the updated DataFrame
print(df)
## Name Age Salary
## 1 Alice 25 50000
## 2 Bob 30 60000
## 3 charle2 40 80000
#k. Create a DataFrame
df <- data.frame(Name = c("Alice", "Bob", "Charlie", "David"), Age = c(25, 30, 35, 40))
# Extract the first 2 rows
first_two_rows <- df[1:2, ]
# Display the extracted rows
print(first_two_rows)
## Name Age
## 1 Alice 25
## 2 Bob 30
#l. Create a DataFrame
df <- data.frame(Name = c("Cinderela", "lisa", "Boba"), Age = c(35, 25, 30))
# Sort the DataFrame by the "Age" column
sorted_df <- df[order(df$Age), ]
# Display the sorted DataFrame
print(sorted_df)
## Name Age
## 2 lisa 25
## 3 Boba 30
## 1 Cinderela 35
#m. Create two DataFrames
df1 <- data.frame(ID = 1:3, Name = c("Alice", "Bob", "Charlie"))
df2 <- data.frame(ID = 2:4, Salary = c(50000, 60000, 70000))
# Merge the DataFrames based on the "ID" column
merged_df <- merge(df1, df2, by = "ID", all = TRUE)
# Display the merged DataFrame
print(merged_df)
## ID Name Salary
## 1 1 Alice NA
## 2 2 Bob 50000
## 3 3 Charlie 60000
## 4 4 <NA> 70000
#n. Create two DataFrames
df1 <- data.frame(Name = c("Alice", "Bob"), Age = c(25, 30))
df2 <- data.frame(Name = c("Charlie", "David"), Age = c(35, 40))
# Append df2 to the end of df1
appended_df <- rbind(df1, df2)
# Display the appended DataFrame
print(appended_df)
## Name Age
## 1 Alice 25
## 2 Bob 30
## 3 Charlie 35
## 4 David 40
# Display the result#o. Load the dplyr package
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.3.2
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# Create a sample DataFrame
df <- data.frame(Group = c("A", "A", "B", "B", "C"),
Value = c(10, 15, 25, 20, 30))
# Select rows with maximum value in each group
result <- df %>%
group_by(Group) %>%
filter(Value == max(Value))
print(result)
## # A tibble: 3 × 2
## # Groups: Group [3]
## Group Value
## <chr> <dbl>
## 1 A 15
## 2 B 25
## 3 C 30
#p. Create two dataframes
df1 <- data.frame(ID = 1:4, Name = c("Alice", "Bob", "Charlie", "David"))
df2 <- data.frame(ID = 2:5, Salary = c(50000, 60000, 70000, 55000))
# Merge the dataframes based on the "ID" column
merged_df <- merge(df1, df2, by = "ID", all = TRUE)
# Display the merged dataframe
print(merged_df)
## ID Name Salary
## 1 1 Alice NA
## 2 2 Bob 50000
## 3 3 Charlie 60000
## 4 4 David 70000
## 5 5 <NA> 55000
#q.a. Read data from the console
data <- as.numeric(readline("Enter a number: "))
## Enter a number:
print(data)
## [1] NA
#q.b. reading data from csv file
data=read.csv("C:/Users/chvss/Downloads/student-mat.csv")
#data is read successfully we shall only print few top results as the data set is very large
print(head(data))
## school sex age address famsize Pstatus Medu Fedu Mjob Fjob reason
## 1 GP F 18 U GT3 A 4 4 at_home teacher course
## 2 GP F 17 U GT3 T 1 1 at_home other course
## 3 GP F 15 U LE3 T 1 1 at_home other other
## 4 GP F 15 U GT3 T 4 2 health services home
## 5 GP F 16 U GT3 T 3 3 other other home
## 6 GP M 16 U LE3 T 4 3 services other reputation
## guardian traveltime studytime failures schoolsup famsup paid activities
## 1 mother 2 2 0 yes no no no
## 2 father 1 2 0 no yes no no
## 3 mother 1 2 3 yes no yes no
## 4 mother 1 3 0 no yes yes yes
## 5 father 1 2 0 no yes yes no
## 6 mother 1 2 0 no yes yes yes
## nursery higher internet romantic famrel freetime goout Dalc Walc health
## 1 yes yes no no 4 3 4 1 1 3
## 2 no yes yes no 5 3 3 1 1 3
## 3 yes yes yes no 4 3 2 2 3 3
## 4 yes yes yes yes 3 2 2 1 1 5
## 5 yes yes no no 4 3 2 1 2 5
## 6 yes yes yes no 5 4 2 1 2 5
## absences G1 G2 G3
## 1 6 5 6 6
## 2 4 5 5 6
## 3 10 7 8 10
## 4 2 15 14 15
## 5 4 6 10 10
## 6 10 15 15 15