1 What is Data Analysis?

  • The process of examining datasets (Telling stories).
  • Supports decision-making and problem-solving.
  • Common tools: Excel (for quick analysis) & R (for advanced statistics and automation).

2 Importance of Data Analytics:

  • Improves Decision-Making: Enables data-driven decisions, reducing guesswork.
  • Identifies Trends and Patterns: Helps businesses, researchers, and policymakers detect insights that may not be immediately obvious. Think of yourself as detectives.
  • Enhances Efficiency: Automates processes, reduces errors, and saves time in analysis.
  • Supports Predictive Analysis: Forecasts future trends and behaviors based on past data.
  • Improves Public Health and Research: Helps analyze disease trends, evaluate treatments, and optimize healthcare resource allocation.

3 Overview of Excel for Data Analytics

  • Importing data from various sources.
  • Data cleaning and transformation to prepare datasets.
  • Data visualization using charts and pivot tables.
  • Basic statistical analysis using built-in functions.

3.1 Importing Data in Excel

  • From CSV/Excel Files: Data > Get Data > From File.
  • From Databases: Data > Get Data > From Database.
  • Power Query: Efficient data transformation.
  • Text to Columns: Convert structured text into separate columns.

3.2 Data Cleaning in Excel

  • Remove Duplicates: Data > Remove Duplicates.
  • Find & Replace: Ctrl + H to replace incorrect values.
  • Handling Missing Data:
    • =IF(ISBLANK(A2), "Missing", A2) to flag missing values.
    • Replace missing values: =IF(ISBLANK(A2), AVERAGE(A:A), A2).
    • Go To Special (F5 > Special > Blanks) to highlight missing cells.

3.3 Data Transformation in Excel

  • Sorting & Filtering: Data > Sort, Data > Filter.
  • Flash Fill: Ctrl + E to auto-fill patterns.
  • Pivot Tables:
    • Insert > PivotTable, drag fields to Rows, Columns, and Values.
    • Use Value Field Settings for custom summaries.

3.4 Data Visualization in Excel

  • Insert Charts: Bar, Line, Pie, Histogram.
  • Customize Charts: Design > Chart Styles.
  • Conditional Formatting:
    • Highlight trends (Color Scales, Data Bars, Icons).
    • Dynamic rules (e.g., Cell Value > 100).

3.5 Basic Statistical Analysis in Excel

  • Mean: =AVERAGE(range)
  • Median: =MEDIAN(range)
  • Mode: =MODE.SNGL(range)
  • Standard Deviation: =STDEV.P(range)
  • Correlation: =CORREL(range1, range2)
  • Regression Analysis:
    • Data > Data Analysis > Regression.
    • Set Y Range (Dependent) and X Range (Independent).

4 Overview of R for Data Analytics

  • Importing and cleaning data efficiently.
  • Data visualization using ggplot2.
  • Statistical analysis and automation.

4.1 Importing Data in R

# Load dataset from CSV
my_data <- read.csv("health_study_data.csv")
head(my_data)

4.2 Data Cleaning in R

# Check for missing values
sum(is.na(my_data))

# Remove rows with missing values
my_data <- na.omit(my_data)

# Replace missing values with column mean
my_data$column_name[is.na(my_data$column_name)] <- mean(my_data$column_name, na.rm = TRUE)

4.3 Data Visualization in R

# Load ggplot2 package
library(ggplot2)

ggplot(my_data, aes(x=Age, y=BMI, color=Gender)) +
  geom_point() +
  theme_minimal()

4.4 Basic Statistical Analysis in R

# Summary statistics
summary(my_data)

# Correlation between Age and BMI
cor(my_data$Age, my_data$BMI, use = "complete.obs")