AI, fall 2025

Kirby Arinder
2025/08/29

It's a funny time for AI.

The discourse frequently has little to do with the thing.

So insofar as it is possible, I want to talk about the thing itself. This is hard in only one hour.

(But of course, just like intelligence isn't one thing, AI isn't one thing. We're talking about the current incarnation.)

Outline

I. The foundational math of prediction
II. Machine learning
III. Transformers and scaling
IV. The present moment
V. Extrinsics and implications

I: The underlying math

Let's start by cutting through the mystery:

At its core, contemporary AI runs on really complicated mathematical models that are supposed to predict some stuff.

(What stuff? Words, mostly. It's really good autocomplete. We'll get to that.)

If you understand that, you understand AI!

So how does THAT work?

Let's start with the regression model.

It's a method of assessing the central tendency of multidimensional data and making certain inferences (or predictions) based on that assessment.

Here's that kind of set!

It's central tendency is in blue.

Conditional expectations

Now, a prediction that's one- (or more!) dimensional may not be very useful.

By holding a number of predictor dimensions constant, we can reduce the dimensionality of our expected value!

This creates a conditional expectation.

This practice is very useful:

  • We can make tractable, low-dimensional predictions
  • We can isolate the variables we don't care about from the variables we do!

So how does this work?

Well, let's just hold the x axis constant!

What does "holding constant" mean in the real world?

Think of predictive text on your phone.

With nothing to go on, the possibilities for the next word are essentially limitless.

With knowledge of the previous word, they begin to be bounded.

With knowledge of two previous words, they begin to be more bounded still…

II. Machine learning

So that's the math. Now let's bring in the machines!

Ordinarily, much of the art in modeling is in specifying the mathematical form of the model itself.

Nominally, the form of the model ought to reflect the causal structure of reality.

The model models what's going on in the world, and that's how it produces successful predictions!

But that is extremely hard! What if we had alternatives?

Well, as it happens...

Enter machine learning.

This is the first thing today we might recognize as AI.

Instead of taking a set of data and laboriously formulating an equation to relate a set of independent variables to a dependent variable…

We take the same set of data and let the machine decide how the IVs relate to the DV!

First we need a dataset (this is 10 of 150!)

   SepalL SepalW PetalL PetalW Species
1     5.1    3.5    1.4    0.2  setosa
2     4.9    3.0    1.4    0.2  setosa
3     4.7    3.2    1.3    0.2  setosa
4     4.6    3.1    1.5    0.2  setosa
5     5.0    3.6    1.4    0.2  setosa
6     5.4    3.9    1.7    0.4  setosa
7     4.6    3.4    1.4    0.3  setosa
8     5.0    3.4    1.5    0.2  setosa
9     4.4    2.9    1.4    0.2  setosa
10    4.9    3.1    1.5    0.1  setosa

Then we train the algorithm...

This is it, by the way. You see 100% of the code; I wanted there to be no mystery here.

library(caret)
library(randomForest)
part <- createDataPartition(y=iris$Species, p=0.7, list=FALSE)
trainingdata <- iris[part,]
testingdata <- iris[-part,]
modFit <- train(Species ~ .,data=trainingdata, method="rf")

These are its performance specs...

Random Forest 

105 samples
  4 predictor
  3 classes: 'setosa', 'versicolor', 'virginica' 

No pre-processing
Resampling: Bootstrapped (25 reps) 
Summary of sample sizes: 105, 105, 105, 105, 105, 105, ... 
Resampling results across tuning parameters:

  mtry  Accuracy   Kappa    
  2     0.9509804  0.9250961
  3     0.9530017  0.9281329
  4     0.9541128  0.9298062

Accuracy was used to select the optimal model using the largest value.
The final value used for the model was mtry = 4.

And here's how it does!

(And this is live, so I don't know before you do!)


mypred       setosa versicolor virginica
  setosa         15          0         0
  versicolor      0         14         2
  virginica       0          1        13

III. Transformers and scaling

All very well and good, you might say.

But there's a world of difference between that and modern LLMs.

Well, yes and no.

Multiple machine learning models

There are many methods of machine learning, and they're good for different purposes.

The one I used above is called a random forest. It's good for classification. It's a supervised method.

But there also unsupervised methods. They are good for other purposes… like predicting text.

For the modern world, there is one method that dominates all others:

Cue "Also sprach Zarathustra"

In 2017, eight scientists at Google published a paper.

Superficially, it's just a new unguided machine learning method.

But it's faster to train and allows for larger models than previous methods.

These advantages mattered... a LOT

The transformer – the name of this new method – seemingly overnight became almost synonymous with AI, rightly or wrongly, in the public imagination, and transformers got the lion's share of investment money.

All the major products you've heard about are transformers!

The “T” in “GPT” stands for “transformer.” You get the idea.

But let's back up a second.

What do we mean, larger models?

Size is important in large language models in two respects:

One, in terms of the number of parameters in the model itself;

And two, in terms of the amount of data upon which the model is trained.

No diminishing returns?

In an old-school predictive modeling, if you make it too fine-grained, you run the risk of overfitting.

In old-school machine-learning, if you train on too much data, you run the risk of just memorizing inputs verbatim.

But then, in 2020, researchers at OpenAI publish another paper.

A very quick footnote, not ominous at all

“At present we do not have a solid theoretical understanding for any of our proposed scaling laws. The scaling relations with model size and compute are especially mysterious… Without a theory or a systematic understanding of the corrections to our scaling laws, it’s difficult to determine in what circumstances they can be trusted.” -p. 22

Never mind that, cue the Wagner again!

Now it looks like the sky is the limit!

With the transformer architecture, the notion that arbitrary scaling helps rather than hurting, and a ton of VC money, we entered what looked like a golden age of AI (at least, from some perspectives).

Ever-larger transformers with access to ever-larger percentages of the internet began to achieve ever-more-impressive capabilities, to the point that they seemed not just to be able to mimic linguistic capabilities, but to be able to solve essentially arbitrary problems.

And that brings us to…

IV. The status quo

LLMs have ingested the entire internet.

The current generation of models has about 1% of the number of parameters that a brain has neurons (though, asterisk, that's a super imprecise analogy) and does fancy agentic stuff and chain-of-thought reasoning to correct for its shortcomings.

So what do we get for all that?

Well, let's circle back to the beginning.

Outside the models themselves, things look funny.

Capital expenditures on AI contributed more to GDP growth this summer than consumer spending – which is itself a majority of that growth!

But no AI company is profitable, and 95% of actual business projects using AI fail. (There are theories about the purely economic aspects of all this, but I'm not competent to evaluate them, so I'll just let this rest here. Not a money guy.)

AI is impressive in the moment but under extended use its cracks begin to show.

Why? Back to the models.

Well, here are some other important things we should know.

1. Scaling isn't a panacea after all.

2. AIs don't do formal reasoning.

3. AIs break rapidly on complex problems.

4. Techniques designed to compensate for these problems don't.

If we go back to our basics, these facts shouldn't be too surprising!

V. Extrinsics and implications

So let's talk practicality.

For our work, I think there are five major concerns we face:

  • Accuracy
  • Reproducibility
  • Privacy
  • Security
  • Expense

Accuracy

AI hallucinates. We've all seen the stories.

It makes up legal citations; it screws up accounting ledgers. If you give it powers to actually do stuff, it can inflict some real damage.

Why?

Think back to what we know

Remember that AIs don't have internal world models; they don't know about law, accounting, physics, math, or logic. They are contextual text predictors.

Complex problems, novel (out-of-training-distribution) problems, and even problems phrased in a novel way can foil their powers of prediction, and thus their powers of “reasoning.”

Additionally, because their internal workings are opaque (to us, and to some degree to everyone) what constitutes novelty is an open question.

In some sense...

Let's think more broadly. What is a hallucination?

In humans, it's (often considered to be) a belief-state produced by a causally inappropriate process.

But every assertion of an AI is like this!

Reproducibility

Accuracy is one thing.

But for our work, we need to not only be able to produce facts; we need to show how we came to know them!

Not only does AI's work have to be checked, but its citations and even its reasoning and code have to be checked.

Remember the chain of thought paper: CoT is not the actual under-the-hood process, and may or may not be true or even coherent!

(Incidentally, that means that if you can't check the code, you shouldn't ask it to code. This is not that kind of workplace.)

Privacy

Remember, access to data made contemporary AI. And faith in scaling still drives the companies. So you should assume that any input to an AI is used in training that AI, and that training data can emerge at any point as output given the right prompt.

(OpenAI is explicit about this. I'm sure others are as well if you look.)

This is why confidential data can NEVER be used as input to AI you don't control 100%. Note that the linked article about OpenAI points out that many major banks and similar security-minded organizations forbid business use of outside AI for just this reason.

Security

This isn't as big an issue right now, but things are moving fast, and so it seems worth mentioning.

A huge problem with AIs is that data = input = command.

That is, any input to a LLM, in any format, can also be used to issue commands to that LLM.

This is called prompt injection. With agents… this is quite an issue.

Expense

Remember that no AI company makes a profit; OpenAI, the most profitable, loses money on every query.

If it is to survive, this is going to have to change.

Which, in turn, means that AI services, if they survive in something like their current form, will have to become more expensive.

An interaction like this:

“Generate the longest grammatically correct English sentence you can using the smallest number of distinct words.”

[1] "buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo buffalo"

Takes a lot.

It could go on forever, but if it went to 1000 iterations of 'buffalo', it would be like running your computer at top capacity for an hour!

Even what you can see on the previous screen is six minutes of your computer, pedal to the metal.

Somebody is paying for it….

If you want to find this presentation again

Questions?