Computational Modeling with Real-World Data for Prospective Mathematics Teachers

08/06/2021

Computational Modeling

Complex Modeling

Our technological society has become depended on complex systems like the stock market, and education and health care systems (Bar-Yam, 2004; Jacobson & Wilensky, 2006; Lesh, 2006).
Complex systems cannot be understood by studying their parts, but only studying the systems in their entirety and connections among parts and with other systems (Cilliers, 1998).
Educators have emphasized the importance of students developing new competencies in understanding and dealing with complex systems constructing, describing, explaining, manipulating, and predicting complex systems; working on multi-component modeling projects in which planning, monitoring, and communicating are critical for success; and adapting rapidly to ever-evolving conceptual tools and new information (Gainsburg, 2006; Lesh & Doerr, 2003; Lesh & Zawojewski, 2007)

Interdisciplinary Modeling

Modelling is not just confined to mathematics and science, however. Other disciplines including engineering, economics, social and environmental science, are also interconnected with powerful mathematical models we use to study a range of complex problems (Beckmann, Michelsen, & Sriraman, 2005).
In addition, recent local and global challenges, have drawn renewed attention to the imperatives of understanding modeling, being able to critically engage with results and their interpretation. As a highly complex issue, climate change is an example of an urgent issue, which requires the participation of citizens much more than traditional science. This implies a role for mathematics education in educating current and future citizens to contribute to the kind of dialogue that is needed (Bartwell, 2013).

Modeling with Technology

In our highly technological society, it has become imperative to integrate new technologies for communication, collaboration, conceptualization, and modeling (Jacobson & Wilensky, 2006).
We have come to recognize techno-mathematical literacy (“where the mathematics is expressed through technological artefacts” (Hoyles, Wolf, Molyneux-Hodgson, & Kent, 2010).
Skovsmose (2021) refers to it as “technological knowing”, that is, the application of mathematics in context, and the interpretation of its outcomes.

Zooming on the Task Dimension

Prospective teachers need to engage with:

Complex and interdisciplinary modeling tasks
Modeling related to urgent social issues of the day
Conceptualizations and assumptions
Modeling with technology
Critically engaging and interpreting results
Communicating conclusions

Deepwater Horizon Oil Spill

Introduction

This picture shows the flow of oil leaking from the Deepwater Horizon platform in the Gulf of Mexico in 2010.

A very important question is the following:

How much oil in total was leaked during the 87 days of the BP oil spill?

Experts estimated the flow rate of the oil spill from pictures like the one shown here.

Photo Source: Review of flow rate estimates of the Deepwater Horizon oil spill

Flow rate estimates

On April 20, 2010, the oil drilling rig Deepwater Horizon in the Gulf of Mexico, exploded and sank resulting in the death of 11 workers on the Deepwater Horizon and the largest spill of oil in the history of marine oil drilling operations.

It was estimated that around 4.9 million barrels of oil leaked from the damaged well over an 87-day period, before it was finally capped on July 15, 2010.

The objective of this project is to use the estimated flow rates over the 87-day oil spill to estimate the total volume of the leaked oil.

The government report cited below provides data for the flow rate of the oil spill at 8 time points, given in the table on the next slide.

Report from the US Department of the Interior: Assessment of Flow Rate Estimates for the Deepwater Horizon/Macondo Well Oil Spill

day	flow rate
1	0
2	0
3	62000
45	57500
46	59800
84	55200
86	53000
87	0

The idea behind computing the volume

The idea is to use these data and create a continuous mathematical function of the flow rate \(\text{flow}(t)\) as a function of time \(t\), which interpolates the data linearly, meaning all data points are connected by line segments.

Then, to find the total volume of the oil leaked during this 87-day period, we just need to integrate this function over this time period.

\[\text{Volume} = \int_1^{87} \text{flow}(t) dt\] We can create a continuous, piece-wise linear function \(\text{flow}(t)\) for the flow rate by linearly interpolating the 8 data points, using the connector() function from the mosaic R package, authored by Daniel Kaplan, Randall Pruim and Nicholas Horton. We also use the developmental version of the mosaicCalc R package.

Piece-wise linear model of flow rate

First, we need to convert the given data points into a data frame.

# given data points
key_days <- c(1,2,3,45,46,84,86,87)
flow_rates <- c(0,0,62000,57500,59800,55200,53000,0)
flow_data <- data_frame(key_days,flow_rates)

We can use connector() to create a piece-wise linear function from the data:

flow<-connector(flow_rates ~ key_days, data=flow_data)

The connector() returns a function of time in days, named flow. This is the R function representing the piece-wise linear, continuous mathematical function \(\text{flow}(t)\) that connects linearly the data points. We can visualize the given data points along with the interpolating function \(\text{flow}(t)\).

Piece-wise linear model of flow rate

gf_point(flow_rates ~ key_days, data=flow_data, size=2, col="blue", 
         xlab="days", ylab="flow rate") %>% 
slice_plot(flow(day) ~ day, domain(day=c(1,87)), size=1, col="blue", alpha=0.6, n=1000)

Volume as the integral of the flow rate

To integrate the flow rate, we compute numerically the anti-derivative function volume of the function \(\text{flow}(t)\), using antiD() from the mosaic R package.

volume = antiD(flow(t) ~ t)

Mathematically, this is equivalent to:

\[\text{volume}(t) = \int \text{flow}(t) dt\]

The antiD() has a default value for the constant of integration \(C=0\). By the fundamental theorem of calculus, the total volume of the oil spill is:

\[\text{Vol} = \int_1^{87} \text{flow}(t) dt = \text{volume}(87) - \text{volume}(1) = 4.91885\times 10^{6}\]

The volume of the oil spill over time

We can visualize the volume of the oil spill as a function of time:

slice_plot(volume(t) ~ t, domain(t=c(1,84)), col="blue", size=1, n=500) %>% 
  gf_labs(x= "time", y = "volume of oil spill")

Conclusions

The main idea of this project was to interpolate linearly the flow rate data and create a continuous mathematical function that can be integrated numerically to compute the total volume of the oil spill over the 87-day period.

The students can also compute the volume of the oil spill by hand, given the piece-wise linear nature of the flow rate function, which would be a good exercise to supplement the computational side of the project.

The total volume of the oil spill that we obtained is 4,918,850 barrels of oil.
The report from the US Department of the Interior, cited above, gives an estimate of 4.9 million barrels of oil, with estimated uncertainty of \(\pm 10\%\).

The Deepwater Horizon oil spill is the largest oil spill in the history of marine oil drilling operations.

Estimating the Area of a Tumor

What is a lung nodule?

A lung nodule is a round area that is more dense than normal lung tissue. It shows up as a white spot on an X-ray or a CT scan. Lung nodules are usually caused by scar tissue, a healed infection that may never have made you sick. Sometimes, a nodule can be an early lung cancer.

The lung nodule usually represents a benign tumor such as a granuloma or hamartoma, but in around 20% of cases it represents a malignant cancer, especially in older adults and smokers. About 10 to 20% of patients with lung cancer are diagnosed in this way.

Causes for having a lung nodule

The causes for a lung nodule can vary from an infection to a lung cancer, including rare forms of cancer such as pulmonary lymphoma, carcinoid tumor and a solitary metastasis to the lung.

The most common benign nodule is a granuloma (inflammatory nodule), which can be due to tuberculosis or a fungal infection. Other infectious causes include a lung abscess, pneumonia or rarely a worm infection. Lung nodules can also occur in immune disorders, such as rheumatoid arthritis.

If the patient has a history of smoking or the nodule is growing, the possibility of cancer may need to be excluded through further studies and interventions, including surgical removal.

Source: Wikipedia

Estimating the area of a circular lung nodule

A circular lung nodule needs to be removed and the surgeon needs to know how much tissue to remove. The goal of this project is to estimate the area of the tumor from the X-ray image of the tumor shown earlier.

We have the X-ray image as an png image file, named nodule.png. Thus, we can use the png R library to read the png image with readPNG() and plot it on a coordinate system.

However, before we can do that we need to convert the image dimensions from pixels into centimeters [cm]. Usually, on a regular X-ray image, one pixel corresponds to 175 \(\mu m\) (microns) and since 1 \(\mu m = 10^{-4} cm\), we can approximate the image dimensions in cm using the scaling:

\[\text{#pixels} \times 175 \times 10^{-4}[cm]\]

Image dimensions

We can find the dimensions of the X-ray image, first in pixels and then in cm, using the code below, where nrow(image) gives the vertical dimension in pixels and the ncol(image) gives the horizontal dimension in pixels:

# load the png library, which must be installed first
library(png)
# load the png image from the working directory
image <- readPNG("nodule.png")
# image dimensions converted to cm, assuming 1 pixel measures 175 microns
ydim <- nrow(image)*175*1e-4 # in cm, assuming 1 pixel measures 175 microns
xdim <- ncol(image)*175*1e-4 # in cm, assuming 1 pixel measures 175 microns
c(xdim, ydim) # image dimensions in cm

## [1] 36.5225 40.1275

The ydim and xdim are the vertical and horizontal image dimensions in [cm].

X-ray image in a coordinate system

We can plot the X-ray image on a coordinate system where the dimensions are measured in centimeters. First, we create an empty plot with the same dimensions measured in centimeters as the X-ray image, using the plotting command:

# create an empty plot with the dimensions of the image
plot(NULL, type="n", ylim=c(0,ydim), xlim=c(0,xdim), xlab="x", ylab="y")

We can plot the X-ray image inside the empty plot using the rasterImage() function, where the coordinates (xleft, ybottom) of the left-most bottom corner of the image, and the coordinates (xright, ytop) of the right-most top corner are extracted from the empty plot, using par("usr").

X-ray image in a coordinate system

The call par("usr") returns a vector of the form c(x1, x2, y1, y2) with the coordinates of the plot corners.

xleft=par("usr")[1], ybottom=par("usr")[3], xright=par("usr")[2], ytop=par("usr")[4]

Below is the R code that plots the X-ray image inside the empty plot with the image dimensions specified in centimeters.

plot(NULL, type="n", ylim=c(0,ydim), xlim=c(0,xdim), xlab="x", ylab="y")
rasterImage(image, xleft=par("usr")[1], ybottom=par("usr")[3], 
            xright=par("usr")[2], ytop=par("usr")[4])

Sampling the tumor boundary

We manually sample 18 points from the tumor boundary using the mouse inside the RStudio Viewer. The manual sampling can be implemented using the locator() function, which is supported only on screen devices such as X11, windows and quartz.

Once we sample the 18 points from the boundary of the tumor, we can plot them on the top of the X-ray image and write the \(x\)-and-\(y\) coordinates of the sample points inside a csv file.

Note: All R code given in the slides up to this point would have to be run from a plain R script and the working directory must be the folder that contains the R scrip and the png image.

R code for sampling the tumor boundary

# sample points inside the RStudio Viewer
coords <- locator(n=18, type="o") 

# plot the sampled points from the list coords
plot(coords,type="o",col="purple",lwd=3)

# write the x and y coordinates of the sampled points in a csv file
write.csv(coords,file="xy.csv", row.names=FALSE)

The locator() function returns a list with the \(x\) and \(y\) coordinates of the sampled points, which are plotted on the top of the X-ray image, specified by the argument type="o". We can also plot the sampled points only with plot(coords), and we then write the coordinates into the xy.csv file.

xycoords <- read.csv("xy.csv")
x<-xycoords[,1]
y<-xycoords[,2]
plot(c(x,head(x,1)),c(y,head(y,1)),type="o",pch=20,cex=1.4, col="purple", lwd=2, 
     xlab="", ylab="", cex.axis=0.6, asp=1)

The surveyor’s method for estimating area

The surveyor estimates first the location of the center of the area. From this point we measure the distances to each polygon vertex. Then the distances are averaged to obtain \(\bar{R}\). One can think of this as an average radius of a circle approximating the area. The area \(A\) is calculated as:

\[A=\pi \bar{R}^2\]

We can find the center of the polygon by computing its center of mass \(C=(\bar{x},\bar{y})\), where \(\bar{x}\) is the average of the \(x\)-coordinates of the polygon vertices, and \(\bar{y}\) is the average of the \(y\)-coordinates of the polygon vertices:

\[\bar{x} = \frac{1}{n}\sum_{k=1}^n x_k, \quad \bar{y} = \frac{1}{n}\sum_{k=1}^n y_k\] where \(n=18\) is the number of data points sampled from the tumor boundary.

The surveyor’s method for estimating area

The area of the approximating circle using the average radius is:

\[A=\pi \bar{R}^2 = 8.93\]

The area of the approximating circle using the maximum radius is:

\[A=\pi R_{\text{max}}^2 = 11.68\]

We can plot the circle centered at \((\bar{x},\bar{y})\) with the average radius \(\bar{R}\) in polar coordinates:

\[x = \bar{x} + \bar{R}\cos(\alpha), \, y = \bar{y} + \bar{R}\sin(\alpha), \, \alpha \in (0,2\pi)\]

We can similarly plot the approximating circle using the maximum radius.

The surveyor’s method for estimating area

A circle that contains all sampled points

Other approaches to estimating the area

To minimize the area to be surgically removed, we can use other approaches to estimating the area spanned by the points sampled from the tumor boundary.

Implementing the shoelace formula: We can compute exactly the area of the polygon formed by the vertices of the points sampled from the tumor boundary as the minimum area to me removed. The area of the polygon in complex form is given by the shoelace formula: \(A=\frac{1}{2} \text{Im}\left (\sum_{k=0}^{n-1} z_k \bar{z}_{k+1} \right )\)
Implementing a circle fitting using the sampled points.
Implementing a Monte Carlo simulation for estimating the area.

This is a very rich area that gives students the opportunity to explore different approaches and combine mathematical and computational modeling.