Binomial Standard Deviation: 1.581139
Binomial Standard Deviation: 1.581139
Probability of getting exactly 6 heads in 10 flips: 0.07160367
Bernoulli Standard Deviation: 0.4582576
Hypergeometric Standard Deviation: 0.9787004
Negative Binomial Standard Deviation: 4.330127
For a situation to be modeled by a Poisson distribution, the following conditions should be satisfied:
Independence: The occurrences of events are independent of each other. The occurrence of one event does not affect the probability of another event occurring.
Constant Mean Rate: The average rate (λ) of occurrence of events must be constant over the observed interval. This means that if you know the average number of events that occur in a day, this average should remain stable over time.
Discrete Events: The number of events counted is a discrete variable. You can count occurrences like the number of emails received in an hour, the number of phone calls at a call center, etc.
Rare Events: The Poisson distribution is particularly useful for modeling rare events. If the number of trials (or the time frame) is large, but the actual event count is small relative to that (like the number of accidents at a specific intersection over a year), it fits well.
Modeling Count Data: The Poisson distribution is ideal for modeling the number of times an event occurs in a fixed period. Examples include:
Number of cars passing through a toll booth in an hour.
Number of emails received per hour.
Number of decay events per unit time from a radioactive source.
Applications in Various Fields:
Healthcare: Modeling the number of patients arriving at an emergency department.
Telecommunications: Analyzing the number of phone calls received at a call center.
Traffic Engineering: Predicting the number of accidents at intersections.
Relation to Other Distributions:
If the number of trials is large, and the probability of success is small, the Binomial distribution can be approximated by the Poisson distribution with λ=n⋅p= n pλ=n⋅p.
This makes it easier to compute probabilities in situations with many trials and low probabilities.
Probability of receiving exactly 3 calls: 0.1403739
Scenario: A call center receives an average of 5 calls per hour (λ=5= 5λ=5). We want to analyze the probability of receiving a certain number of calls in an hour.
Rarity: In this context, if we look at the probabilities for receiving 0, 1, or 2 calls, we might find that these probabilities are relatively high compared to receiving 10 or more calls, which would be considered rare events given the average rate.
A higher kkk (like 10 or more calls) represents an outcome that is less likely (rare) compared to lower values (0-5 calls), where most outcomes are clustered around the mean.
Shape: Bell-shaped curve.
Parameters: Mean (μ) and Standard Deviation (σ).
Use Cases: Heights, test scores, measurement errors—many natural phenomena are approximately normally distributed due to the Central Limit Theorem.
Shape: Right-skewed.
Parameter: Rate (λ).
Use Cases: Time until an event occurs (e.g., waiting time in queues, lifetime of devices)
Shape: Flat, rectangular.
Parameters: Minimum (aaa) and Maximum (bbb).
Use Cases: Modeling situations where all outcomes are equally likely (e.g., rolling a fair die)
Shape: Can be right-skewed or resemble a normal distribution depending on parameters.
Parameters: Shape (kkk) and Scale (θ).
Use Cases: Modeling waiting times, reliability data.
Shape: Right-skewed, with the skewness decreasing as the degrees of freedom increase.
Parameter: Degrees of freedom (dfdfdf).
Use Cases: Commonly used in hypothesis testing, particularly in tests of independence and goodness-of-fit.
Shape: Bell-shaped, similar to the normal distribution but with heavier tails. The shape depends on the degrees of freedom.
Parameter: Degrees of freedom (dfdfdf).
Use Cases: Used in hypothesis testing, especially when the sample size is small and the population standard deviation is unknown.
Shape: Can take on different shapes depending on its parameters, typically right-skewed.
Parameters: Shape parameter (kkk) and scale parameter (λ).
Use Cases: Used in reliability analysis and survival studies.
Normality:
Independence:
Equal Variances (for Independent t-test):
Independent t-test:
Compares means from two different groups.
Example: Comparing test scores of students from two different classes.
Paired t-test:
Compares means from the same group at two different times.
Example: Measuring blood pressure before and after treatment in the same group of patients.
One-sample t-test:
The plot shows the probability density function (PDF) of the Student’s t-distribution for different degrees of freedom.
As the degrees of freedom increase, the t-distribution approaches the standard normal distribution. This is because, with larger sample sizes, the sample mean becomes a better estimator of the population mean.
Welch Two Sample t-test
data: group1 and group2
t = -1.0742, df = 37.082, p-value = 0.2897
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-8.863821 2.721440
sample estimates:
mean of x mean of y
51.41624 54.48743
Independent t-test:
Purpose: Compares the means of two independent groups to determine if they are significantly different from each other.
Example: Comparing the test scores of students from two different classes.
Assumptions: The samples must be independent, and the data should be normally distributed with equal variances.
Paired t-test:
Purpose: Compares the means of two related groups (the same subjects measured at two different times) to see if there is a significant difference.
Example: Measuring the blood pressure of patients before and after treatment.
Assumptions: The differences between pairs should be normally distributed.
One-sample t-test:
Purpose: Compares the mean of a single group to a known value (such as a population mean).
Example: Testing whether the average height of a sample of students is significantly different from the national average.
Assumptions: The sample should be normally distributed.
Statistical Tests and degrees of freedom
When you conduct a statistical test (like a t-test), degrees of freedom help define how many independent values are used to estimate the variability in the data.
For example, in an independent t-test comparing two groups, if you have the scores of two different classes, the number of scores you have minus the number of groups tells you how much freedom you have in calculating the statistics.
Influences Critical Values: Degrees of freedom affect the shape of the distribution used in statistical tests. This, in turn, affects the critical values that determine whether your test results are significant.
Informs Sample Size: More degrees of freedom typically mean you have a larger sample size, which can lead to more reliable estimates.
Power of the Test: Higher degrees of freedom can increase the power of the test, meaning a better chance of detecting a true effect if it exists
Paired t-test
data: first_instrument and second_instrument
t = -1.7928, df = 9, p-value = 0.1066
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
-2.261771 0.261771
sample estimates:
mean difference
-1
Mean of differences: 1
Standard deviation of differences: 1.763834
Standard Error of the Mean (SEM): 0.5577734
Degrees of Freedom: 9
t-Statistic: -1.792843
Paired t-test
data: first_instrument and second_instrument
t = -1.7928, df = 9, p-value = 0.1066
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
-2.261771 0.261771
sample estimates:
mean difference
-1
Mean of differences: 1
Standard deviation of differences: 1.763834
Standard Error of the Mean (SEM): 0.5577734
Degrees of Freedom: 9
t-Statistic: -1.792843
A random sample of 6 patients with ischemic heart disease were treated with clofibrate and the concentration of their plasma fibrinogen determined as follows
patients no : 1 2 3 4 5 6
pre value 379 351 420 303 346 370
post-value 325 333 391 275 311 323
Does the treatment have any statistically significant effect.
Paired t-test
data: pre_values and post_values
t = 6.4974, df = 5, p-value = 0.001289
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
21.25356 49.07977
sample estimates:
mean difference
35.16667
Mean of differences: -35.16667
Standard deviation of differences: 13.2577
Standard Error of the Mean (SEM): 5.412434
Degrees of Freedom: 5
t-Statistic: 6.497385
t-Statistic: 6.4974
Degrees of Freedom (df): 5
p-value: 0.001289
Mean Difference: -35.16667
95% Confidence Interval: (21.25356, 49.07977)
Statistical Significance:
Mean Difference:
Confidence Interval:
Let’s say we have two groups of patients with different treatments for ischemic heart disease, and we want to compare their plasma fibrinogen levels.
Group A (Treatment 1): 379, 351, 420, 303, 346, 370
Group B (Treatment 2): 325, 333, 391, 275, 311, 323
Formulate Hypotheses:
Null Hypothesis (H0): The means of the two groups are equal (no difference).
Alternative Hypothesis (H1): The means of the two groups are not equal (there is a difference).
Perform the t-Test.
Visualize the Data.
Interpret the Results.
Welch Two Sample t-test
data: group_A and group_B
t = 1.5896, df = 9.99, p-value = 0.143
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-14.13316 84.46649
sample estimates:
mean of x mean of y
361.5000 326.3333
Hypotheses:
Null Hypothesis (H0): There is no association between the variables (they are independent).
Alternative Hypothesis (H1): There is an association between the variables (they are dependent).
Pearson's Chi-squared test with Yates' continuity correction
data: data
X-squared = 15.042, df = 1, p-value = 0.0001052
Correlation is a statistical measure that describes the strength and direction of a relationship between two variables. It helps determine how closely related the variables are and whether an increase in one variable corresponds to an increase or decrease in another.
Correlation Coefficient:
The most common measure of correlation is the Pearson correlation coefficient (r), which ranges from -1 to 1.
r = 1: Perfect positive correlation (as one variable increases, the other also increases).
r = -1: Perfect negative correlation (as one variable increases, the other decreases).
r = 0: No correlation (no linear relationship between the variables).
There are also other types of correlation coefficients, such as Spearman’s rank correlation, which is used for non-parametric data.
Interpreting r:
0.1 to 0.3: Weak correlation
0.3 to 0.5: Moderate correlation
0.5 to 0.7: Strong correlation
0.7 to 0.9: Very strong correlation
0.9 to 1.0: Extremely strong correlation
Scatter Plot:
Correlation coefficient (r): 0.9965217
Regression analysis is a statistical technique used to model and analyze the relationships between a dependent variable (outcome) and one or more independent variables (predictors). It helps in predicting the value of the dependent variable based on the values of the independent variables.
Types of Regression:
Linear Regression: Models the relationship between two variables by fitting a linear equation (line) to the observed data.
Multiple Linear Regression: Extends linear regression to include multiple independent variables.
Logistic Regression: Used when the dependent variable is categorical (e.g., binary outcomes like yes/no).
Polynomial Regression: Models the relationship as a polynomial equation, allowing for curves in the data.
Call:
lm(formula = Y ~ X, data = data)
Residuals:
Min 1Q Median 3Q Max
-3.3333 -1.6667 0.1667 1.1667 4.0000
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.3333 1.5899 0.839 0.426
X 8.6667 0.2562 33.823 6.38e-10 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.327 on 8 degrees of freedom
Multiple R-squared: 0.9931, Adjusted R-squared: 0.9922
F-statistic: 1144 on 1 and 8 DF, p-value: 6.377e-10
| Blood Pressure Data for Patients | ||
|---|---|---|
| Systolic Blood Pressure (SBP) and Diastolic Blood Pressure (DBP) | ||
| Patient | Systolic BP (mmHg) | Diastolic BP (mmHg) |
| 1 | 110 | 65 |
| 2 | 124 | 70 |
| 3 | 116 | 75 |
| 4 | 120 | 80 |
| 5 | 135 | 85 |
| 6 | 148 | 90 |
| 7 | 136 | 95 |
| 8 | 165 | 100 |
| 9 | 152 | 105 |
| 10 | 172 | 110 |
Call:
lm(formula = DBP ~ SBP, data = data)
Residuals:
Min 1Q Median 3Q Max
-8.3153 -4.2159 -0.4493 3.7626 8.6980
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.21439 13.48479 -0.313 0.762630
SBP 0.66556 0.09685 6.872 0.000128 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.111 on 8 degrees of freedom
Multiple R-squared: 0.8551, Adjusted R-squared: 0.837
F-statistic: 47.23 on 1 and 8 DF, p-value: 0.0001281