1. Three Rules for AI in Finance

I’m teaching a new course at NYU Stern on AI in Finance. This is a change for me, since I typically do real estate and urban economics. But as I use more AI tools in my own research and work (here to explore zoning for instance), I’ve found there is a gap in the market for people to bring an economics orientation to this subject that I’m trying to fill.

So I’m going to use this newsletter to share some of what I’m developing for this course. I’ve created an open course page here, where you can find a syllabus and lecture notes. On this Substack, I’ll be sharing weekly course summaries. My motivation here is Josh Rauh, who has also opened up his Energy Finance course on Substack, which I’ve found very helpful. I’ll be sharing weekly updates on my main newsletter here over the semester.

This has been a fun experience for me to figure out what is worth teaching about AI and how to do it. The class itself features a lot of case discussions and applications of AI (in assignments and a final project). The content here is going to be more about the conceptual framework and economics.

I start with three principles which I think helps capture where AI helps to add value in finance.

First let’s start out by summarizing what AI does: it builds compressed maps, or lower dimensional representations of reality so we can make decisions. This is really valuable because the world is complicated and we need simplifications in order to take actions.

Borges has a famous short story about an empire which creates a map so detailed it’s the same size as the empire itself, which is eventually abandoned. The lesson here is that maps are valuable precisely because they are more simple than reality (there is some dimensional reduction).

A couple of examples from this: the famous set of lithographs by Picasso showing progressively more abstracted depictions of bulls, and the 1972 Vignelli Subway Map. This subway map in particular tells us that the point of the simpler or more abstracted representation is to help us make a decision: how to get from point A to point B.

LLMs are, in that sense, very complicated map-making machines. They compress their training data and inputs into something which captures basic patterns and relationships. That can be valuable insofar as it helps us to make a decision in the real world.

In Finance specifically, the gap between understanding and action is pretty crucial. The usual benchmarks of AI performance on narrow and specific tasks are less helpful than evaluating them along other dimensions:

What are the cost-latent-quality tradeoffs?

What are the costs of inaction vs. the costs of flawed action?

i.e., we need to be more Bayesian in our decision analysis, and think about when a simplified representation facilitates a better decision.

One of the biggest challenges in deploying AI tools, especially for finance applications, is that AI advancement is capped by the slowest link of the system.

This is just Amdahl’s Law, taken from computer science: system speedup is fundamentally limited by the time you actually improve. If you have a process where 90% of the time is spent waiting for compliance review, and you build an AI that makes document drafting 10x faster, that’s going to improve your overall process by maybe 5%.

This is a pretty straightforward idea, but I think it helps to reconcile some of the conflicting stats on productivity (discussed there by Alex Imas). AI can result in really impressive improvements in discrete and specific tasks in the finance workflow, but are those really the main bottlenecks to getting stuff done? It all depends.

This is another drawback of standard benchmarks. We should think instead whether and how AI helps to speed up the really slow and constraining parts of a particular workflow; and what the frictions and barriers are towards that kind of speedup. Often, the constraints there aren’t technical, but regulatory, organizational, or incentive-based.

Automation drives price adjustment. Things that have a near zero marginal cost are going to get very cheap. What are the downstream effects? The answer here depends on substitution and elasticities.

When something gets cheap, the price of substitutes falls and the price of complements rise. So if AI makes basic financial analysis commodified, the value of having basic financial analysis declines, but the value of things which complement it (private information, relationships, judgment) might increase.

What happens to the overall size of the pie? If automation improves production, does demand increase in tandem or not? This is Jevons Paradox: improving efficiency, if something is in high demand, can actually lead to more overall consumption.

One example of this is in the graph above—tradable goods have been getting cheaper, relative to services, as productivity advances in more capital intensive-sectors drive growth and output, which then concentrates more rents into other sectors with lower productivity growth. The big thing that’s changing now is that even services are getting disrupted by advanced automation.

My guess is a lot of finance work has relatively elastic demand, and is currently rate-capped by the human complexity required to produce it. Firms might do a lot more due diligence, scenario analysis, and valuation if it was cheaper to produce the analysis. Doing so will wipe out certain tasks which get completely automated, but be an advantage for higher value add analysts.

Of course, the distributional effects could still be brutal here, depending on whether your job is made up of easy to replace tasks or not. But the takeaway here is to follow the price action to determine what gets commodified, and what stays scarce. The challenge is to use the low cost products in your production function, while generating outputs that are more scarce complements to automation.

To understand how these principles apply, it helps to understand how we got here, and why finance was in some ways slow to adapt.

In 2001, the statistician Leo Breiman wrote a famous paper called “Statistical Modeling: The Two Cultures” arguing that the field had broken into two camps. One side assumed the data came from an interpretable model: linear regression, logistic regression, or something you can write down and explain. The other culture treated the data-generation process as fundamentally unknown and used tools like decision trees and neural nets to just find algorithms which worked well at prediction.

At the time, finance was in the first camp, along with 98% of statisticians (as Breiman estimates). There were good reasons for this. We tend to favor simple, interpretable rules. We think they are going to be robust out of sample (less overfitting), and they are easier to explain to regulators and clients. Finance cares about causality, not just prediction (though of course these two things are linked). Our “gold standard” is the RCT or A/B test. We write down a model: the CAPM, Fama-French model, whatever it is, and test it against data while being careful about the biases which show up in estimation.

Then came what Rich Sutton calls the “Bitter Lesson.” The lesson from the last 70 years of AI research, Sutton argues, is that general approaches which leverage improvements in computational ability and declining costs scale up very well and defeat human-designed approaches, over and over again across domains.

You can see this playing out in the history of text analysis in finance. The first generation used simple dictionaries to classify words as positive or negative, count frequencies, and assess sentiment. One of the challenges, as Loughran and McDonald identified, is that finance words are different from other words (“liability” is typically a negative in the world, but can have varying meanings in a 10-K). They showed that three-fourths of words tagged as negative in the commonly used Harvard dictionary are not actually negative when used in a finance context.

So to address that we first used some human intuition to develop a finance-specific dictionary. Then came n-grams (pairs or longer sequences of words like “strong demand” vs “weak demand” which help to disambiguate language). Finally we get to word embeddings, which turn words into vectors where similar meanings are closer together in geometric space. This enabled contextual embeddings, whereby “bank” gets a different representation depending on whether we are talking about a river or finance. Then we get the landmark attention paper, transformers, and LLMs. Tim Lee and Sean Trott have a fantastic overview walking through the basics of how LLMs work, which I highly recommend you check out.

The key point though is that each step involves less human curation and more letting the algorithm just learn from data. The models get bigger, more opaque, and more capable.

The weird thing here was that classical intuitions about model fit turned out to be wrong in important ways, and discounted the levels of emergent ability that have shown up. Traditional statistics says that if your model has more parameters than data points, you’re going to overfit horribly and perform poorly out of sample. But ML models exhibit properties like “double descent” whereby errors rise as you add parameters, but then fall again with improved scaling. These massively overparameterized models generalize and perform better than we thought possible.

This doesn’t mean interpretability and causality don’t matter at all. I just had a discussion on this with Alex Imas in this newsletter all about the ways in which purely AI approaches can and cannot learn from data. We have to be careful about the ways in which generative models have implicit world models that may be very wrong in ways that inhibit performance. At the same time, sometimes prediction is all you need. Just as one illustration, the FT recently published a geopolitical tracker which estimates sentiment across articles, correlates with the traditional dictionary based geopolitical risk approach, but captures a broader context. You could imagine this being useful to track sentiment, manage risk, and potentially even make trades.

The tension here is that the bitter lesson says to bet on scale, but finance is full of constraints that don’t easily get computed away. So the basic strategy for this course is to figure out 1) how to identify and address the critical bottlenecks; 2) use the improved predictions to make better decisions; and 3) keep our eye on price theory because the resulting surplus is going to accrue to the scarce factors.

That’s it for this week: subscribe if you want to follow along (or unsubscribe if you’re tired of hearing about AI; it’s okay I get fatigued too).

Ethan Mollick has a nice guide on getting started with AI right now.

Tim Lee and Sean Trott have a great explainer on the basics of LLMs.

Dario Amodei, the CEO of Anthropic, has a podcast episode with Dwarkesh Patel discussing scaling laws

The Bitter Lesson, by Rich Sutton

Case: BloombergGPT. An initial paper; evaluation of different models; and an open source version

What are the cost-latent-quality tradeoffs?

Ethan Mollick has a nice guide on getting started with AI right now.

Tim Lee and Sean Trott have a great explainer on the basics of LLMs.

Dario Amodei, the CEO of Anthropic, has a podcast episode with Dwarkesh Patel discussing scaling laws

The Bitter Lesson, by Rich Sutton

Similar Posts