week -1 data visualization-

##introduction to R R is a powerful, open-source programming language and environment designed for statistical computing and data analysis. Widely used by data scientists and statisticians, R offers extensive libraries, data visualization tools, and statistical techniques. Its flexibility and versatility make it a preferred choice for exploring, modeling, and visualizing data.

##basic components Variables: Store and manipulate data. Functions: Perform operations or computations. Data Structures: Organize and store data. Control Structures: Control the flow of program execution (e.g., loops and conditionals). Libraries (Packages): Extend R’s capabilities with specialized functions. Data Import/Export: Read and write data from/to various formats. Documentation: Add comments and documentation for code clarity. Graphics: Create visualizations and plots.

##datatypes in r Numeric: Represents numbers (e.g., 3.14, -42). Character (String): Stores text (e.g., “Hello, World!”). Logical: Represents TRUE or FALSE values. Date and Time: Handles date and time values.

# Define variables with different data types
numeric_var <- 18
character_var <- "hello how are you"
logical_var <- FALSE
date_time_var <- as.POSIXct("2023-01-15 14:30:00")

# Print variables
cat("Numeric Variable:", numeric_var, "\n")

## Numeric Variable: 18

cat("Character Variable:", character_var, "\n")

## Character Variable: hello how are you

cat("Logical Variable:", logical_var, "\n")

## Logical Variable: FALSE

cat("Date and Time Variable:", date_time_var, "\n")

## Date and Time Variable: 1673773200

##data structures in r Vector: A one-dimensional array of elements of the same data type (e.g., c(1, 2, 3)). Matrix: A two-dimensional array of elements of the same data type. List: A versatile data structure that can hold elements of different data types (e.g., list(1, “apple”, TRUE)). Data Frame: A tabular structure where columns can have different data types (e.g., data imported from a CSV file).

# Create data structures
vector_example <- c(1, 2, 3, 4, 5)
matrix_example <- matrix(1:6, nrow = 2, ncol = 3)
list_example <- list(1, "apple", TRUE)
data_frame_example <- data.frame(
  Name = c("vaishnavi", "lasya", "lisa"),
  Age = c(25, 18, 26),
  Score = c(99, 95, 90)
)
# Print data structures
cat("Vector Example:", vector_example, "\n")

## Vector Example: 1 2 3 4 5

cat("Matrix Example:\n")

## Matrix Example:

print(matrix_example)

##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6

cat("List Example:\n")

## List Example:

print(list_example)

## [[1]]
## [1] 1
## 
## [[2]]
## [1] "apple"
## 
## [[3]]
## [1] TRUE

cat("Data Frame Example:\n")

## Data Frame Example:

print(data_frame_example)

##        Name Age Score
## 1 vaishnavi  25    99
## 2     lasya  18    95
## 3      lisa  26    90

the above we created a data frame using different data structures and it gets printed in a tabular form in a well organised way.

# a.Create two vectors of integers
vector1 <- c(10, 23, 34,55)
vector2 <- c(43, 51, 62,77)

# Add the two vectors element-wise
result_vector <- vector1 + vector2

# Print the result
print(result_vector)

## [1]  53  74  96 132

here vector1 and vector2 will be added index wise so 10+43 will be in the first index of the resultant vector and so on.

# b.Create a vector
my_vector <- c(22, 98, 63, 85, 100)

# Calculate the sum, mean, and product
sum_result <- sum(my_vector)
mean_result <- mean(my_vector)
product_result <- prod(my_vector)

# Print the results
print(paste("Sum:", sum_result))

## [1] "Sum: 368"

print(paste("Mean:", mean_result))

## [1] "Mean: 73.6"

print(paste("Product:", product_result))

## [1] "Product: 1154538000"

in the above using the predefined functions sum,mean and product(prod is the function), we are finding mean sum prod of the vector.

# c.Create a vector
my_vector <- c(32, 15, 72)

# Find the minimum and maximum
min_value <- min(my_vector)
max_value <- max(my_vector)

# Print the results
print(paste("Minimum:", min_value))

## [1] "Minimum: 15"

print(paste("Maximum:", max_value))

## [1] "Maximum: 72"

we found the maximum and minimum of the given vector using min and max predefined functions.

#d. Create a list
my_list <- list(
  string_element = "Hello heyy",
  numeric_element = 427,
  vector_element = c(1, 3,3,4),
  logical_element = TRUE
)

# Print the list
print(my_list)

## $string_element
## [1] "Hello heyy"
## 
## $numeric_element
## [1] 427
## 
## $vector_element
## [1] 1 3 3 4
## 
## $logical_element
## [1] TRUE

printing string numeric vector and logical elements and getting to know te difference between them. hetereogenous list is made

#e. Create a list with named elements
my_list <- list(
  vector_element = c(1, 2, 3),
  matrix_element = matrix(1:6, nrow = 2),
  nested_list = list(a = "apple", b = "banana")
)

# Access the first and second elements of the list
first_element <- my_list$vector_element
second_element <- my_list$matrix_element
# Print the accessed elements
print(first_element)

## [1] 1 2 3

print(second_element)

##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6

now we are going to create and print a matrix

#f. Create a 3x5 matrix filled with zeros
my_matrix <- matrix(9, nrow = 4, ncol = 6)

# Print the matrix
print(my_matrix)

##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    9    9    9    9    9    9
## [2,]    9    9    9    9    9    9
## [3,]    9    9    9    9    9    9
## [4,]    9    9    9    9    9    9

#g. Create a sample matrix
my_matrix <- matrix(1:16, nrow = 4)

# Access specific elements
element_1 <- my_matrix[2, 3]  # 3rd column, 2nd row
element_2 <- my_matrix[3, ]    # 3rd row
element_3 <- my_matrix[, 4]    # 4th column

# Print the accessed elements
print(element_1)

## [1] 10

print(element_2)

## [1]  3  7 11 15

print(element_3)

## [1] 13 14 15 16

#h. Create vectors
name <- c("Alice", "cinderella", "beauty")
age <- c(20, 20, 25)

# Create a DataFrame
df <- data.frame(Name = name, Age = age)

# Display the DataFrame
print(df)

##         Name Age
## 1      Alice  20
## 2 cinderella  20
## 3     beauty  25

#i. Create a DataFrame
df <- data.frame(Name = c("Alice", "cinderella"), Age = c(20, 20))

# New data to insert
new_data <- data.frame(Name = c("Charlie", "David"), Age = c(35, 40))

# Insert new rows
df <- rbind(df, new_data)

# Display the updated DataFrame
print(df)

##         Name Age
## 1      Alice  20
## 2 cinderella  20
## 3    Charlie  35
## 4      David  40

#j. Create a DataFrame
df <- data.frame(Name = c("Alice", "Bob","charle2"), Age = c(25, 30,40))

# Add a new column
df$Salary <- c(50000, 60000,80000)

# Display the updated DataFrame
print(df)

##      Name Age Salary
## 1   Alice  25  50000
## 2     Bob  30  60000
## 3 charle2  40  80000

#k. Create a DataFrame
df <- data.frame(Name = c("Alice", "Bob", "Charlie", "David"), Age = c(25, 30, 35, 40))

# Extract the first 2 rows
first_two_rows <- df[1:2, ]

# Display the extracted rows
print(first_two_rows)

##    Name Age
## 1 Alice  25
## 2   Bob  30

#l. Create a DataFrame
df <- data.frame(Name = c("Cinderela", "lisa", "Boba"), Age = c(35, 25, 30))

# Sort the DataFrame by the "Age" column
sorted_df <- df[order(df$Age), ]

# Display the sorted DataFrame
print(sorted_df)

##        Name Age
## 2      lisa  25
## 3      Boba  30
## 1 Cinderela  35

#m. Create two DataFrames
df1 <- data.frame(ID = 1:3, Name = c("Alice", "Bob", "Charlie"))
df2 <- data.frame(ID = 2:4, Salary = c(50000, 60000, 70000))

# Merge the DataFrames based on the "ID" column
merged_df <- merge(df1, df2, by = "ID", all = TRUE)

# Display the merged DataFrame
print(merged_df)

##   ID    Name Salary
## 1  1   Alice     NA
## 2  2     Bob  50000
## 3  3 Charlie  60000
## 4  4    <NA>  70000

#n. Create two DataFrames
df1 <- data.frame(Name = c("Alice", "Bob"), Age = c(25, 30))
df2 <- data.frame(Name = c("Charlie", "David"), Age = c(35, 40))

# Append df2 to the end of df1
appended_df <- rbind(df1, df2)

# Display the appended DataFrame
print(appended_df)

##      Name Age
## 1   Alice  25
## 2     Bob  30
## 3 Charlie  35
## 4   David  40

# Display the result#o. Load the dplyr package
library(dplyr)

## Warning: package 'dplyr' was built under R version 4.3.2

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

# Create a sample DataFrame
df <- data.frame(Group = c("A", "A", "B", "B", "C"),
                 Value = c(10, 15, 25, 20, 30))

# Select rows with maximum value in each group
result <- df %>%
  group_by(Group) %>%
  filter(Value == max(Value))

print(result)

## # A tibble: 3 × 2
## # Groups:   Group [3]
##   Group Value
##   <chr> <dbl>
## 1 A        15
## 2 B        25
## 3 C        30

#p. Create two dataframes
df1 <- data.frame(ID = 1:4, Name = c("Alice", "Bob", "Charlie", "David"))
df2 <- data.frame(ID = 2:5, Salary = c(50000, 60000, 70000, 55000))

# Merge the dataframes based on the "ID" column
merged_df <- merge(df1, df2, by = "ID", all = TRUE)

# Display the merged dataframe
print(merged_df)

##   ID    Name Salary
## 1  1   Alice     NA
## 2  2     Bob  50000
## 3  3 Charlie  60000
## 4  4   David  70000
## 5  5    <NA>  55000

#q.a. Read data from the console
data <- as.numeric(readline("Enter a number: "))

## Enter a number:

print(data)

## [1] NA

#q.b. reading data from csv file
data=read.csv("C:/Users/chvss/Downloads/student-mat.csv")
#data is read successfully we shall only print few top results as the data set is very large
print(head(data))

##   school sex age address famsize Pstatus Medu Fedu     Mjob     Fjob     reason
## 1     GP   F  18       U     GT3       A    4    4  at_home  teacher     course
## 2     GP   F  17       U     GT3       T    1    1  at_home    other     course
## 3     GP   F  15       U     LE3       T    1    1  at_home    other      other
## 4     GP   F  15       U     GT3       T    4    2   health services       home
## 5     GP   F  16       U     GT3       T    3    3    other    other       home
## 6     GP   M  16       U     LE3       T    4    3 services    other reputation
##   guardian traveltime studytime failures schoolsup famsup paid activities
## 1   mother          2         2        0       yes     no   no         no
## 2   father          1         2        0        no    yes   no         no
## 3   mother          1         2        3       yes     no  yes         no
## 4   mother          1         3        0        no    yes  yes        yes
## 5   father          1         2        0        no    yes  yes         no
## 6   mother          1         2        0        no    yes  yes        yes
##   nursery higher internet romantic famrel freetime goout Dalc Walc health
## 1     yes    yes       no       no      4        3     4    1    1      3
## 2      no    yes      yes       no      5        3     3    1    1      3
## 3     yes    yes      yes       no      4        3     2    2    3      3
## 4     yes    yes      yes      yes      3        2     2    1    1      5
## 5     yes    yes       no       no      4        3     2    1    2      5
## 6     yes    yes      yes       no      5        4     2    1    2      5
##   absences G1 G2 G3
## 1        6  5  6  6
## 2        4  5  5  6
## 3       10  7  8 10
## 4        2 15 14 15
## 5        4  6 10 10
## 6       10 15 15 15

week -1 data visualization-

vaishnavi chillara

2023-11-14