Machine Learning (ML) is a field of artificial intelligence that involves teaching computers to learn patterns from data without being explicitly programmed. This document provides a step-by-step guide to various machine learning techniques in R, from fundamental concepts to more advanced algorithms.
We will cover:
Throughout this document, we will use real datasets from R libraries to illustrate practical examples.
The terms Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL) are often used interchangeably, but they have distinct meanings and a hierarchical relationship.
Artificial Intelligence is defined as computer algorithms that perform tasks believed to require human intelligence. AI includes:
Examples of AI applications include: - Advanced home automation - Speech recognition (Natural Language Processing, NLP) - Optical Character Recognition (OCR) - Self-driving cars - Art generation (e.g., DALL-E).
Machine Learning is a subset of AI that is data-driven rather than rule-based.
Rule-Based Systems | Machine Learning Systems |
---|---|
Require experts to define rules. | Derive patterns from data. |
Explicit “if-then” logic. | Adjust rules based on training. |
Example: Handwritten regex for email spam detection. | Example: Spam classifier using past emails. |
Deep Learning is a subset of ML that relies on Neural Networks with many layers.
If a neural network contains many neurons (often millions) and multiple layers, it is referred to as a Deep Learning model.
✅ Natural Language Processing (NLP)
✅ Advanced Image Recognition
✅ Programming Code Completion
Machine Learning algorithms can be categorized based on the tasks they perform.
Classification tasks can be: - Binary Classification: Yes/No, True/False, 0/1. - Multiclass Classification: Red/Blue/Green, Disease A/B/C.
Task | Goal | Example | Algorithms |
---|---|---|---|
Regression | Predict a continuous variable. | House price prediction. | OLS, Neural Networks. |
Classification | Predict a category. | Spam filtering, Cancer detection. | Logistic Regression, k-NN. |
Clustering | Group similar observations. | Customer segmentation. | k-Means, Hierarchical Clustering. |
This document provides a structured, visually appealing explanation of AI, ML, and DL, with clear hierarchical relationships, tables, and mathematical formulations. 🚀
We will use several R packages to facilitate data manipulation, visualization, and modeling:
dplyr
and tidyr
for data wranglingggplot2
for data visualizationcaret
for streamlined machine learning modeling
(optional, but highly recommended)rpart
for decision treesrandomForest
for random forest modelsxgboost
for gradient boostingnnet
or keras
(optional) for neural
networksInstall any packages you do not have by using
install.packages("package_name")
.
# Data Wrangling
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
# Visualization
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.4.1
# Machine Learning & Modeling
library(caret) # For cross-validation, model training pipeline
## Warning: package 'caret' was built under R version 4.4.1
## Loading required package: lattice
library(rpart) # For decision trees
library(randomForest) # For random forest
## Warning: package 'randomForest' was built under R version 4.4.1
## randomForest 4.7-1.1
## Type rfNews() to see new features/changes/bug fixes.
##
## Attaching package: 'randomForest'
## The following object is masked from 'package:ggplot2':
##
## margin
## The following object is masked from 'package:dplyr':
##
## combine
library(xgboost) # For gradient boosting
## Warning: package 'xgboost' was built under R version 4.4.1
##
## Attaching package: 'xgboost'
## The following object is masked from 'package:dplyr':
##
## slice
# Neural Network
library(nnet) # Basic feed-forward neural network
library(keras) # Deep learning in R (requires TensorFlow backend)
## Warning: package 'keras' was built under R version 4.4.1
set.seed(123) # For reproducibility
For illustration, let’s start with two well-known datasets:
mtcars
(built-in dataset in R) for a regression
example.iris
(built-in dataset in R) for a classification
example.mtcars
Dataset (Regression)The mtcars
dataset contains information about miles per
gallon (mpg
) and various characteristics of different car
models.
data("mtcars")
# Basic structure
str(mtcars)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
# First few rows
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
# Convert some variables to factor (e.g., am, cyl, gear)
mtcars$am <- factor(mtcars$am, labels = c("Automatic", "Manual"))
mtcars$cyl <- factor(mtcars$cyl)
mtcars$gear <- factor(mtcars$gear)
For illustration, let’s start with two well-known datasets:
mtcars
(built-in dataset in R) for a regression
example.iris
(built-in dataset in R) for a classification
example.mtcars
Dataset (Regression)The mtcars
dataset contains information about miles per
gallon (mpg
) and various characteristics of different car
models.
ggplot(mtcars, aes(x = mpg)) +
geom_histogram(fill = "blue", bins = 10, alpha = 0.7) +
theme_minimal() +
labs(title = "Distribution of MPG", x = "MPG", y = "Count")
ggplot(mtcars, aes(x = wt, y = mpg)) +
geom_point(color = "red") +
geom_smooth(method = "lm", se = FALSE, color = "blue") +
theme_minimal() +
labs(title = "MPG vs. Weight", x = "Weight (1000 lbs)", y = "MPG")
## `geom_smooth()` using formula = 'y ~ x'
We observe a negative relationship between mpg and wt: as weight
increases, mpg tends to decrease.
iris
Dataset (Classification)The iris
dataset has 150 observations of iris flowers
with four features: Sepal.Length
, Sepal.Width
,
Petal.Length
, Petal.Width
, and the species:
Iris-setosa, Iris-versicolor, Iris-virginica.
data("iris")
# Basic structure
str(iris)
## 'data.frame': 150 obs. of 5 variables:
## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
# First few rows
head(iris)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
table(iris$Species)
##
## setosa versicolor virginica
## 50 50 50
pairs(iris, col = iris$Species,
main = "Iris Feature Scatterplot Matrix")
Linear Regression predicts a continuous outcome variable (Y) from one or more predictor variables (X).
Model Form:
MPG = β0 + β1 × Weight + β2 × Horsepower + … + ε
Let’s model mpg
using wt
only:
model_lm_simple <- lm(mpg ~ wt, data = mtcars)
summary(model_lm_simple)
##
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.5432 -2.3647 -0.1252 1.4096 6.8727
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.2851 1.8776 19.858 < 2e-16 ***
## wt -5.3445 0.5591 -9.559 1.29e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared: 0.7528, Adjusted R-squared: 0.7446
## F-statistic: 91.38 on 1 and 30 DF, p-value: 1.294e-10
We can include more predictors, such as wt
,
hp
(horsepower), and am
(transmission
type).
model_lm_multi <- lm(mpg ~ wt + hp + am, data = mtcars)
summary(model_lm_multi)
##
## Call:
## lm(formula = mpg ~ wt + hp + am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4221 -1.7924 -0.3788 1.2249 5.5317
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 34.002875 2.642659 12.867 2.82e-13 ***
## wt -2.878575 0.904971 -3.181 0.003574 **
## hp -0.037479 0.009605 -3.902 0.000546 ***
## amManual 2.083710 1.376420 1.514 0.141268
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.538 on 28 degrees of freedom
## Multiple R-squared: 0.8399, Adjusted R-squared: 0.8227
## F-statistic: 48.96 on 3 and 28 DF, p-value: 2.908e-11
mpg
is explained.par(mfrow = c(2, 2))
plot(model_lm_multi)
mpg
is explained.Logistic Regression predicts a binary (or multi-class) outcome.
Although iris$Species
has three classes, we can simplify to
a binary problem by filtering for two species for demonstration.
Let’s consider only Setosa vs. Versicolor for a binary classification:
iris_binary <- iris %>%
filter(Species != "virginica") %>%
mutate(Species = factor(Species))
table(iris_binary$Species)
##
## setosa versicolor
## 50 50
We’ll predict Species
using Petal.Length
and Petal.Width
.
model_logistic <- glm(Species ~ Petal.Length + Petal.Width,
data = iris_binary,
family = binomial)
## Warning: glm.fit: algorithm did not converge
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(model_logistic)
##
## Call:
## glm(formula = Species ~ Petal.Length + Petal.Width, family = binomial,
## data = iris_binary)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -72.73 70289.28 -0.001 0.999
## Petal.Length 18.37 74002.45 0.000 1.000
## Petal.Width 35.76 199094.68 0.000 1.000
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1.3863e+02 on 99 degrees of freedom
## Residual deviance: 1.8210e-09 on 97 degrees of freedom
## AIC: 6
##
## Number of Fisher Scoring iterations: 25
Petal.Length
and Petal.Width
are expected
to help distinguish Setosa from Versicolor.# Predict probabilities
iris_binary$prob <- predict(model_logistic, type = "response")
# Convert probabilities to classes using 0.5 threshold
iris_binary$pred <- ifelse(iris_binary$prob > 0.5, "versicolor", "setosa")
iris_binary$pred <- factor(iris_binary$pred)
# Confusion Matrix
confusionMatrix(data = iris_binary$pred, reference = iris_binary$Species)
## Confusion Matrix and Statistics
##
## Reference
## Prediction setosa versicolor
## setosa 50 0
## versicolor 0 50
##
## Accuracy : 1
## 95% CI : (0.9638, 1)
## No Information Rate : 0.5
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 1
##
## Mcnemar's Test P-Value : NA
##
## Sensitivity : 1.0
## Specificity : 1.0
## Pos Pred Value : 1.0
## Neg Pred Value : 1.0
## Prevalence : 0.5
## Detection Rate : 0.5
## Detection Prevalence : 0.5
## Balanced Accuracy : 1.0
##
## 'Positive' Class : setosa
##
We can evaluate the model using metrics like:
Decision trees split the data based on feature thresholds to predict an outcome (either regression or classification).
iris
We can use all three classes of the iris
dataset.
# rpart for classification
model_tree <- rpart(Species ~ ., data = iris, method = "class")
model_tree
## n= 150
##
## node), split, n, loss, yval, (yprob)
## * denotes terminal node
##
## 1) root 150 100 setosa (0.33333333 0.33333333 0.33333333)
## 2) Petal.Length< 2.45 50 0 setosa (1.00000000 0.00000000 0.00000000) *
## 3) Petal.Length>=2.45 100 50 versicolor (0.00000000 0.50000000 0.50000000)
## 6) Petal.Width< 1.75 54 5 versicolor (0.00000000 0.90740741 0.09259259) *
## 7) Petal.Width>=1.75 46 1 virginica (0.00000000 0.02173913 0.97826087) *
library(rpart.plot)
## Warning: package 'rpart.plot' was built under R version 4.4.2
rpart.plot(model_tree, main = "Decision Tree for Iris Dataset")
* Each node in the tree represents a split based on a feature and
threshold. * Eventually, the tree leads to leaf nodes, which represent
the predicted classes or values.
pred_tree <- predict(model_tree, iris, type = "class")
confusionMatrix(pred_tree, iris$Species)
## Confusion Matrix and Statistics
##
## Reference
## Prediction setosa versicolor virginica
## setosa 50 0 0
## versicolor 0 49 5
## virginica 0 1 45
##
## Overall Statistics
##
## Accuracy : 0.96
## 95% CI : (0.915, 0.9852)
## No Information Rate : 0.3333
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.94
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: setosa Class: versicolor Class: virginica
## Sensitivity 1.0000 0.9800 0.9000
## Specificity 1.0000 0.9500 0.9900
## Pos Pred Value 1.0000 0.9074 0.9783
## Neg Pred Value 1.0000 0.9896 0.9519
## Prevalence 0.3333 0.3333 0.3333
## Detection Rate 0.3333 0.3267 0.3000
## Detection Prevalence 0.3333 0.3600 0.3067
## Balanced Accuracy 1.0000 0.9650 0.9450
A binary decision tree is a flowchart-like structure in which each internal node represents a decision rule, each branch represents an outcome of the rule, and each leaf node represents a final prediction.
A decision tree recursively splits data based on feature values. It follows a hierarchical structure:
A decision tree splits data to minimize impurity using criteria such as:
G(X) = 1 - Σi=1C pi2
where pi is the proportion of observations belonging to class i.
H(X) = -Σi=1C pi log2 pi
MSE = (1/N) Σi=1N (yi - ŷ)2
where N is the number of observations, yi is the actual value, and ŷ is the predicted value.
A node splits where the chosen criterion (e.g., Gini, Entropy, MSE) is minimized.
\[ \begin{array}{c} \textbf{Start (Root Node)} \\ \downarrow \\ \text{Feature 1 < Threshold?} \\ \begin{array}{cc} \text{Yes} & \text{No} \\ \downarrow & \downarrow \\ \text{Feature 2 < Threshold?} & \text{Class B} \\ \begin{array}{cc} \text{Yes} & \text{No} \\ \downarrow & \downarrow \\ \text{Class A} & \text{Class C} \end{array} \end{array} \end{array} \]
A Random Forest is an ensemble learning method that combines multiple decision trees to improve generalization and reduce overfitting.
A Random Forest follows these steps:
Suppose we have \(B\) trees, each trained on different bootstrap samples. The final prediction for an input \(x\) depends on the type of problem:
\[ \hat{y} = \text{mode} \{ T_1(x), T_2(x), ..., T_B(x) \} \] where \(T_b(x)\) is the prediction from the \(b\)-th tree.
\[ \hat{y} = \frac{1}{B} \sum_{b=1}^{B} T_b(x) \]
where: - \(\hat{y}\) is the final predicted value. - \(T_b(x)\) is the prediction from the \(b\)-th decision tree.
✅ Handles Non-linearity: Works well with complex
patterns.
✅ Reduces Overfitting: Combines multiple trees to
improve generalization.
✅ Feature Importance: Identifies the most significant
variables.
✅ Works with Large Datasets: Efficient for large-scale
problems.
\[ \begin{array}{c} \textbf{Dataset} \\ \downarrow \\ \text{Bootstrap Samples} \\ \begin{array}{ccc} \text{Tree 1} & \text{Tree 2} & \text{Tree B} \\ \downarrow & \downarrow & \downarrow \\ \text{Predictions} \\ \downarrow \\ \text{Majority Vote / Averaging} \end{array} \end{array} \] |
Random Forest is an ensemble of decision trees, each trained on a bootstrap sample of the data and a random subset of features. It typically improves generalization performance compared to a single decision tree.
iris
model_rf <- randomForest(Species ~ ., data = iris, ntree = 100, importance = TRUE)
model_rf
##
## Call:
## randomForest(formula = Species ~ ., data = iris, ntree = 100, importance = TRUE)
## Type of random forest: classification
## Number of trees: 100
## No. of variables tried at each split: 2
##
## OOB estimate of error rate: 4.67%
## Confusion matrix:
## setosa versicolor virginica class.error
## setosa 50 0 0 0.00
## versicolor 0 47 3 0.06
## virginica 0 4 46 0.08
ntree = 100
: Number of trees in the forest.importance = TRUE
: Compute feature importance.importance(model_rf)
## setosa versicolor virginica MeanDecreaseAccuracy
## Sepal.Length 3.596153 3.6267879 4.758109 5.558137
## Sepal.Width 2.882770 0.6114827 2.330913 2.792841
## Petal.Length 8.877766 14.0149004 13.072048 14.387083
## Petal.Width 10.693787 15.1471250 13.399962 16.699318
## MeanDecreaseGini
## Sepal.Length 11.546338
## Sepal.Width 3.270738
## Petal.Length 40.353781
## Petal.Width 44.103010
varImpPlot(model_rf, main = "Feature Importance in Random Forest")
* Feature Importance
Feature importance refers to calculating a score for each input feature of a machine learning model. This score reflects the feature’s contribution to the model’s predictive performance. A higher score indicates a greater influence on the model’s predictions.
pred_rf <- predict(model_rf, iris)
confusionMatrix(pred_rf, iris$Species)
## Confusion Matrix and Statistics
##
## Reference
## Prediction setosa versicolor virginica
## setosa 50 0 0
## versicolor 0 50 0
## virginica 0 0 50
##
## Overall Statistics
##
## Accuracy : 1
## 95% CI : (0.9757, 1)
## No Information Rate : 0.3333
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 1
##
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: setosa Class: versicolor Class: virginica
## Sensitivity 1.0000 1.0000 1.0000
## Specificity 1.0000 1.0000 1.0000
## Pos Pred Value 1.0000 1.0000 1.0000
## Neg Pred Value 1.0000 1.0000 1.0000
## Prevalence 0.3333 0.3333 0.3333
## Detection Rate 0.3333 0.3333 0.3333
## Detection Prevalence 0.3333 0.3333 0.3333
## Balanced Accuracy 1.0000 1.0000 1.0000
XGBoost is an optimized gradient boosting algorithm that builds trees sequentially, where each new tree corrects the errors of the previous trees. It is widely used in machine learning competitions and real-world applications due to its efficiency and accuracy.
XGBoost follows an iterative approach to improving model performance:
Given a dataset \((X, y)\) with \(N\) observations, XGBoost minimizes the loss function:
\[ L(\theta) = \sum_{i=1}^{N} l(y_i, \hat{y}_i) + \sum_{k=1}^{K} \Omega(f_k) \]
where:
\(l(y_i, \hat{y}_i)\) is the loss function, such as:
\(\Omega(f_k)\) is the regularization term to prevent overfitting, defined as:
\[ \Omega(f_k) = \gamma T + \frac{1}{2} \lambda \sum_j w_j^2 \]
where:
Each new tree \(f_k(x)\) predicts the negative gradient of the loss function:
\[ g_i = \frac{\partial l(y_i, \hat{y}_i)}{\partial \hat{y}_i}. \]
The updated prediction at iteration \(t+1\) is:
\[ \hat{y}_i^{(t+1)} = \hat{y}_i^{(t)} + \eta f_t(x_i), \]
where \(\eta\) is the learning rate.
✅ Handles Missing Values: Automatically deals with
missing data.
✅ Highly Efficient: Uses parallel computing and
optimized algorithms.
✅ Feature Importance: Identifies the most significant
predictors.
✅ Regularization: Reduces overfitting using L1
(LASSO) and L2 (Ridge) penalties.
*** XGBoost employs a specialized sparsity-aware algorithm that detects missing values throughout the model training process. When building decision trees, it assesses potential splits based not only on available data but also on the locations of missing values
Gradient boosting builds trees sequentially, with each new tree correcting the errors of the previous ensemble. XGBoost is a highly optimized library for gradient boosting.
For demonstration, we will use the iris
dataset (all
three classes), but XGBoost typically works with numeric matrices.
# Encode Species as numeric (0,1,2)
iris_xgb <- iris
iris_xgb$Species <- as.numeric(iris_xgb$Species) - 1 # setosa=0, versicolor=1, virginica=2
# Prepare matrix for xgboost
train_matrix <- as.matrix(iris_xgb[, 1:4])
train_label <- iris_xgb$Species
For multi-class classification, we use
objective = "multi:softprob"
and specify
num_class = 3
.
xgb_data <- xgb.DMatrix(data = train_matrix, label = train_label)
params <- list(
booster = "gbtree",
objective = "multi:softprob",
eval_metric = "mlogloss",
num_class = 3
)
model_xgb <- xgb.train(
params = params,
data = xgb_data,
nrounds = 50, # number of boosting rounds
verbose = 0
)
# Check model
model_xgb
## ##### xgb.Booster
## raw: 126.6 Kb
## call:
## xgb.train(params = params, data = xgb_data, nrounds = 50, verbose = 0)
## params (as set within xgb.train):
## booster = "gbtree", objective = "multi:softprob", eval_metric = "mlogloss", num_class = "3", validate_parameters = "TRUE"
## xgb.attributes:
## niter
## # of features: 4
## niter: 50
## nfeatures : 4
pred_xgb <- predict(model_xgb, xgb_data)
# pred_xgb is a probability matrix with 3 columns
# Convert to class predictions
pred_xgb_matrix <- matrix(pred_xgb, ncol = 3, byrow = TRUE)
pred_class <- max.col(pred_xgb_matrix) - 1 # convert to 0,1,2
accuracy_xgb <- sum(pred_class == train_label) / nrow(iris_xgb)
accuracy_xgb
## [1] 1
If you want to explore Neural Networks in R, there are two common approaches:
nnet
package for a basic feed-forward,
single-hidden-layer neural network.keras
package for deep learning (requires Python
and TensorFlow).nnet
library(nnet)
# For iris classification
iris_nn <- iris
iris_nn$Species <- class.ind(iris_nn$Species) # one-hot encoding
nn_model <- nnet(Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width,
data = iris_nn,
size = 2, # number of hidden units
rang = 0.1,
decay = 5e-4,
maxit = 200)
## # weights: 19
## initial value 112.679059
## iter 10 value 50.295223
## iter 20 value 50.136134
## iter 30 value 45.494950
## iter 40 value 4.756196
## iter 50 value 3.679492
## iter 60 value 3.411043
## iter 70 value 3.371026
## iter 80 value 3.359917
## iter 90 value 3.356568
## iter 100 value 3.352885
## iter 110 value 3.351186
## final value 3.350599
## converged