2023-03-14

Interpolation

What is Interpolation?


Interpolation is a statistical method that constructs an estimation of data points based on discrete known values.


So why not Linear Regression?


well its not exactly the same, in Linear Least Squares Regression, its basis is that Y and X have some relation (Linear) that we can draw a basis from. Interpolation hits all data points by fitting a curve to pass through its entirety, thus finding approximate values in a known domain. Many fields of engineering utilize this method because of the type of data gathered through sampling and experimentation.

All data was used from trees

Linear Interpolation

\(y=y_a+(y_b-y_a)\frac{x-x_a}{x_b-x_a}\) at the point \((x,y)\)

Linear Interpolation requires two sets of data points, in order to approximate another within the range. You can estimate a point given two discrete points, inorder to some what extend the data.


An example of this is shown in the next slide over.

Linear Interpolation using ggplot

Polynomial Interpolation

\(f\left(x\right)=\begin{cases} a_1x^3+b_1x^2+c_1x+d_1&\text{if }x\in\left[x_1,x_2\right]\\ a_2x^3+b_2x^2+c_2x+d_2&\text{if }x\in\left(x_2,x_3\right]\\ a_3x^3+b_3x^2+c_3x+d_3&\text{if }x\in\left(x_3,x_4\right]\\ \dots\\ a_nx^3+b_nx^2+c_nx+d_n&\text{if }x\in\left(x_{n},x_{n+1}\right]\,. \end{cases}\)

The higher the degree of polynomials, the more accurate the individual splines. In this formula, this represents cubic spline interpolation, where the highest degree polynomial is 3.


An example of this is shown in the next slide over.

Poly Interpolation using ggplot

The higher the degree of polynomials you consider, the closer it will match every point possible, in this example the degrees are 1 (purple), 3 (red), 6 (blue), 9 (green)

Spline Interpolation

spline interpolation, takes the idea of polynomial interpolation but breaks off into specific ranges, that satisfy the dataset to a higher degree.

Difference using plotly

You can see how much more smooth cubic spline offers. It does a better job smoothing in between data points within the range.

R CODE

Here is some example code to run the Difference using plotly.

plot_ly(data = trees, x=~Girth, y=~Height, name = "linear", 
        mode='lines+markers', type="scatter",
        mode="markers", line = list(color = "orange", width = 3,
                                    shape = "linear")) %>%
  add_trace(y=~Height, name = "Spline", 
            mode='lines+markers', line = list(color = "purple",
                                    width = 3, shape = "spline")) %>%
  layout(title = "Girth x Height Interpolated",
         xaxis = list(title="Girth"),
         yaxis = list(title="Height"))