Final Project Proposal

1. Introduction

1.1 Dataset Selection

Objective: Introduce the “iris” dataset and explain the reason for its selection.

I selected the “iris” dataset for analysis due to its ease of accessibility in R and its suitability for demonstrating data analysis techniques.

# Load the iris dataset
data(iris)

2. Description of the Dataset

2.1 Variables

Objective: Describe the variables present in the “iris” dataset.

# Display column names and their corresponding variables
names(iris)

## [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"

Sepal.Length: Represents the length of the sepal (in centimeters). This is a continuous variable.

Sepal.Width: Represents the width of the sepal (in centimeters). This is a continuous variable.

Petal.Length: Represents the length of the petal (in centimeters). This is a continuous variable.

Petal.Width: Represents the width of the petal (in centimeters). This is a continuous variable.

Species: Represents the species of iris flower. This is a categorical variable with three levels: setosa, versicolor, and virginica.

2.2 Observations

Objective: Discuss the number of observations in the “iris” dataset.

# Check the number of rows/observations in the dataset
nrow(iris)

## [1] 150

Each row in the “iris” dataset represents an individual iris flower. There are a total of 150 rows/observations in the dataset.

2.3 Missing Values

Objective: Analyze if there are any missing values in the dataset.

# Check for missing values
any(is.na(iris))

## [1] FALSE

There are no missing values in “iris” dataset.

3. Conclusion

Objective: Summarize the key points of the report

The “iris” dataset is a simple yet informative dataset suitable for various analysis tasks. It contains no missing values, making it ideal for demonstration purposes.