week5

plots in r

Plots in R are visual representations of data that help analysts and data scientists explore, analyze, and communicate insights. R offers a rich ecosystem of plotting functions and libraries, each designed for specific data visualization needs.

#One categorical and one continuous variable (Bar Chart)
# Sample data
data <- data.frame(Category = c("A", "B", "A", "C", "B"), Value = c(10, 15, 8, 20, 12))

# Create a bar chart
barplot(data$Value, names.arg = data$Category, col = "skyblue", main = "Categorical vs. Continuous", xlab = "Category", ylab = "Value")

#The bar chart shows the values of a categorical variable (Category) on the x-axis and the values of a continuous variable (Value) on the y-axis.
#The two categories with the highest values are A and C, with values of 10 and 20, respectively.
#The two categories with the lowest values are B and A, with values of 12 and 8, respectively.
#Overall, the bar chart shows that the Category variable has a significant impact on the Value variable.

#One continuous variable (Histogram)
# Sample data
data <- rnorm(100)

# Create a histogram
hist(data, col = "green", main = "Histogram", xlab = "Value", ylab = "Frequency")

#The histogram shows that the data is approximately normally distributed, with a mean of approximately 0 and a standard deviation of approximately 1. There are a few outliers, but the majority of the values fall within the normal range. This suggests that the data is likely representative of a larger population of normally distributed values.

#Two continuous variables (Scatter Plot)
# Sample data
x <- rnorm(100)
y <- rnorm(100)

# Create a scatter plot
plot(x, y, col = "blue", main = "Scatter Plot", xlab = "X", ylab = "Y")

#The code is a function that takes a data frame as input and returns a bar plot of the Value variable for each Category in the data frame.
#The image shows a bar plot of the Value variable for each Category in the data data frame.
#The two categories with the highest values are A and C, with values of 10 and 20, respectively.
#The two categories with the lowest values are B and A, with values of 12 and 8, respectively.
#Conclusion: The Category variable has a significant impact on the Value variable.

# Load the airquality dataset
data(airquality)

# Create a horizontal bar chart for the "Ozone" attribute
barplot(airquality$Ozone, names.arg = airquality$Month, col = "skyblue",
        main = "Ozone Distribution by Month", xlab = "Month", ylab = "Ozone")

#A horizontal bar chart illustrates the distribution of ozone levels by month in the airquality dataset. It visually represents variations in ozone levels throughout the year, with higher concentrations in some months, mainly during month 8 and less during month 6.

# Create a histogram for the "Wind" attribute
hist(airquality$Wind, col = "green", main = "Wind Speed Distribution", xlab = "Wind Speed", ylab = "Frequency")

#The histogram provides a clear view of wind speed distribution in the airquality dataset. It shows that most wind speeds fall within around 10.

# Create a scatter plot for "Wind" vs. "Ozone"
plot(airquality$Wind, airquality$Ozone, col = "blue", main = "Wind Speed vs. Ozone", xlab = "Wind Speed", ylab = "Ozone")

#The scatter plot reveals the relationship between wind speed and ozone concentration. It suggests a lack of strong correlation between the two variables, as data points are scattered without a clear pattern, indicating that wind speed doesn't significantly impact ozone levels.

#basic graphical data analysis of mtcars dataset using basic plots:

#Bar plots represent data using rectangular bars of varying lengths. They are ideal for visualizing categorical data and comparing values between different categories.

# Bar chart showing the number of cars by the number of cylinders
barplot(table(mtcars$cyl), main = "Number of Cars by Cylinders", xlab = "Cylinders", ylab = "Count")

#most of the cars contain 8 cylinders.

#Line plots are used to visualize trends over time. They connect data points with lines, making them suitable for time series data or data with a natural ordering.

# Line plot showing the trend of car horsepower over time
plot(mtcars$hp, type = "l", main = "Horsepower Over Time", xlab = "Car Index", ylab = "Horsepower")

#Scatter plots display individual data points as dots on a two-dimensional graph. They are used to explore relationships between two continuous variables, making it easy to identify patterns and trends.

# Scatter plot showing the relationship between car weight and miles per gallon
plot(mtcars$wt, mtcars$mpg, main = "Car Weight vs. MPG", xlab = "Weight", ylab = "Miles per Gallon")

# Pie charts display parts of a whole by dividing a circle into segments. While less commonly used for data visualization in R, they can represent categorical data as percentages of the whole.

# Count the number of cars by the number of cylinders
cylinder_counts <- table(mtcars$cyl)

# Create a pie chart
pie(cylinder_counts, main = "Car Counts by Cylinders")

#Box plots display the distribution of data and its central tendency. They show the median, quartiles, and potential outliers, making them useful for comparing multiple groups or distributions.

# Box plot showing the distribution of car miles per gallon
boxplot(mtcars$mpg, main = "Miles per Gallon Distribution", ylab = "Miles per Gallon")

#Pie charts display parts of a whole by dividing a circle into segments. While less commonly used for data visualization in R, they can represent categorical data as percentages of the whole.

# Histogram showing the distribution of car horsepower
hist(mtcars$hp, main = "Horsepower Distribution", xlab = "Horsepower", ylab = "Frequency")

week5

Hemanth T

plots in r