Explaining Financial Models by Deleting Transactions

Author: Denis Reis

How can you trust a model to support financial decisions if you can’t understand its decisions? For us, this wasn’t just an academic question. It was a core requirement for security, debugging, and responsible deployment. Here, at Nubank, we are leveraging the advent of transformer-based architectures, which have enabled the development of foundation models that automatically discover general features and learn representations directly from raw transaction data. By processing sequences of transactions (often by converting them into tokens like a natural language model would), these systems can efficiently summarize complex financial behavior.

In this post, we explore the *explainability *of such models, that is, the ability to examine how a single transacti…

Author: Denis Reis

In this post, we explore the *explainability *of such models, that is, the ability to examine how a single transaction, and its individual properties, influences the final prediction produced by a model. In this new scenario, where we have a transformer that processes a sequence of transactions, standard tools come with some limitations and requirements that make them difficult to maintain in a rapidly evolving environment like ours. However, we adopt a simpler approach, Leave One Transaction Out (LOTO), which can provide a good assessment of the impact of transactions (and their properties) on the final model prediction, while being easily compatible with virtually any model architecture.

Why Explainability is Crucial for Transaction Foundation Models

Understanding how the input influences the output of these models is paramount for responsible deployment, continuous improvement, and, critically, for monitoring and preventing exploitation.

Monitoring Model Behavior and Drift: Explainability allows us to track changes in the relative importance of different transaction types over time. Such changes can signal a shift in how customers interact with their finances, which may require updating our models and decision rules.
Debugging and Gaining Insights: It’s essential to understand how the different parts of the input contribute to a model’s output, both for individual predictions (local explainability) and across the entire dataset (global explainability). This insight is invaluable for debugging model anomalies, deepening our understanding of the financial behaviors, and guiding “feature selection” by highlighting which transaction types are most relevant.
Exploitability Monitoring and Prevention: This is a core concern. We define exploitability as a model vulnerability that malicious users could leverage to manipulate behavioral assessments, potentially leading to unwarranted benefits or other undesirable outcomes. In the case of these models, a vulnerability could be a transaction that is easily produced by a malicious actor and, when present, makes the model produce an output that leads to an unreasonably more favorable outcome for the actor. By monitoring changes in customer behavior, we can detect active exploits, and by examining how relevant transactions are to the model’s prediction before deploying the model, we can detect potential vulnerabilities.

How to Explain?

When addressing model explainability, a standard approach in both literature and industry is SHAP [1], a powerful framework for local explainability that has several beneficial properties for our use case.

For each data point, SHAP assigns a value, which we’ll henceforth refer to as importance, to each component of the input that represents how much that component is pushing the model’s prediction away from a baseline prediction (usually, the average prediction of a dataset that is provided as “background data”). For example, if the SHAP value of an attribute is large for a particular data point, that feature is pushing the model’s prediction of that data point upwards. We can move from these local explainability assessments to a global one by aggregating the SHAP values, such as the average absolute value, across several data points.

In our context, our input is a sequence of transactions, which for transformers are represented as a sequence of large embeddings. In this scenario, when assessing the relevance of transactions as a whole on the model’s prediction, SHAP values of the individual embedding dimensions aren’t relevant. Fortunately, in such cases, SHAP can group these smaller parts into transaction inputs and assign a single SHAP value to each transaction as a whole.

Another very positive aspect of SHAP is that it is model architecture-agnostic: we can compute SHAP values regardless of how the model operates internally. This property is especially relevant for us, since we are exploring several different architectures, including hybrid architectures that combine both neural networks and tree-based models.

Although SHAP satisfies our explainability requirements, unfortunately, it is computationally prohibitive for deep neural network use cases like ours. This has led to the exploration of more efficient, gradient-based approximations, such as Integrated Gradients [2] or Layer-Wise Relevance Propagation (LRP) [3]. However, while more efficient, we lose some of the properties of SHAP that are valuable to us, which makes adopting these other approaches challenging.

The first challenge is that, unlike SHAP, these methods cannot group components together to obtain a single importance for the entire transaction; we would need to aggregate these individual values in some way. We investigated different aggregation schemes for gradient-based attributions, and the ones we considered to be the best tended to use absolute values. The problem, in this case, is that they end up disregarding directionality (i.e., whether a transaction pushed the prediction up or down).

Moreover, in our particular case, we couldn’t easily separate the importance that a transaction has due to its attributes (value, weekday, etc) from its position in the sequence: the model seemed to assign high importance to the first and last transactions in a sequence, regardless of their other characteristics. We hypothesize this is likely due to how the model “understands” the sequence, that is, the structure of the input data. This bias is visualized in the figure below, where importance scores are heavily concentrated at both ends of the transaction sequence. Disclaimer: this figure, as well as the other figures displayed in this post, is based on synthesized data that illustrates the behavior observed on our real data.

Outro ponto é que métodos baseados em gradiente não são agnósticos à arquitetura: precisam de modelos diferenciáveis e acesso aos gradientes internos — o que é incompatível com algumas das arquiteturas que estudamos. Além disso, certos métodos impõem regras específicas por tipo de camada (como o LRP) ou exigem a definição de entradas “neutras” complexas (como o IG). Isso tornava a manutenção desses métodos muito custosa frente à nossa velocidade de desenvolvimento.

Por fim, também avaliamos o LIME (Local Interpretable Model-Agnostic Explanations) [4], outra alternativa agnóstica à arquitetura. Mas o LIME exige gerar novos conjuntos de dados com perturbações e treinar modelos substitutos para cada explicação — o que aumenta o custo computacional e introduz novas camadas de complexidade.

Esses desafios nos levaram a uma solução mais simples e direta — o LOTO (Leave One Transaction Out) — que continua atendendo às nossas necessidades, mantendo a compatibilidade com qualquer arquitetura, entregando valores direcionais e oferecendo explicabilidade local e global.

Leave One Transaction Out (LOTO)

Uma característica comum entre todas as arquiteturas que usamos é a capacidade de lidar com sequências variáveis de transações. Isso permite medir o impacto marginal real de uma transação de forma simples: basta removê-la e observar o que muda.

The method is straightforward: remove each transaction from a customer’s sequence one at a time and measure the difference between the new prediction and the original one. This directly reveals the impact of that specific transaction in that specific fixed context. It asks the simple question, “What was the direct contribution of this particular transaction’s presence to the prediction, considering the presence of all other transactions (the context)?” We can refer to this contribution as a LOTO value.

It’s important to clarify what “removing” means in this context. This is not like traditional feature ablation, which would be analogous to altering the feature space (e.g., modeling P(y | a=i, b=j) instead of the original P(y | a=i, b=j, c=k)).

Since our models are designed to process variable-length sets or sequences of transactions, “removing” one is simply a change in the input data for a single prediction, while the model’s architecture remains fixed. Conceptually, it’s like the difference between assessing P(y | … transaction_c=PRESENT) and P(y | … transaction_c=ABSENT). We are just feeding the model a sequence with one less element to see how its output changes.

The main drawback is that LOTO only tests transactions one by one. This means we miss out on interaction effects. For example, a payment to a utility company might be unimportant on its own, but when combined with a large, unexpected cash withdrawal, the two together might have a big impact. LOTO can’t capture this combined effect (what we call interaction effects). We accepted this trade-off for the vast gains in simplicity and computational efficiency, reserving more complex interaction analyses for future work.

We apply LOTO in two distinct modes: a global analysis to understand general trends by randomly choosing and removing one transaction from each customer in a large set, and a local analysis for deep dive debugging by removing every transaction, one at a time, from a single customer’s sequence. In either case, each removed transaction creates a new variant of the original input that the model must process. In the former case, since we remove only one transaction from each data point, the total number of variants is the same as the number of data points in the dataset. With hundreds of thousands to millions of these data points, we can build robust aggregations or even train simple meta-models (like a decision tree) on the LOTO results to automatically discover rules like, “Transactions of type X over amount Y consistently have a negative impact”.

For example, as illustrated in the figure below, we could identify that the type of transaction has a clear bias on the impact of a certain binary target. Once again, the figure is based on synthesized data for illustrative purposes.

For a deep, local analysis of a single customer, we remove each of their transactions one by one, generating hundreds of variants for each customer. This is substantially more computationally expensive than gradient-based techniques, which generally require running the model only once. For this reason, we reserve local analyses for specific deep dives. However, in this regard, LOTO provides a clearer picture of a transaction’s importance, free from the positional bias that can mislead gradient-based methods. By directly measuring the impact of removing a transaction, LOTO helps distinguish whether a transaction is important because of its content or simply because of its location in the sequence. As the figure below illustrates, LOTO reveals a much more even distribution of high-impact transactions, helping us accurately diagnose the model’s behavior.

Note that, in the figure above, it is still visible that more recent transactions are frequently more impactful. Another way of visualizing this in the global analysis is by simply checking the correlation between the age of a transaction and its impact on the prediction, as shown below.

Conclusion: Simplicity as a Feature

In our journey to build a reliable and interpretable financial AI system, we found that the simplest explainability method was also the most effective. By directly measuring the impact of removing each transaction, LOTO provides clear, actionable, and unbiased insights that are robust to our evolving model architecture.

References

[1] Lundberg, S. M., & Lee, S. I. (2017). A Unified Approach to Interpreting Model Predictions. Advances in Neural Information Processing Systems, 30. arXiv. https://arxiv.org/abs/1705.07874

[2] Sundararajan, M., Taly, A., & Yan, Q. (2017). Axiomatic Attribution for Deep Networks. Proceedings of the 34th International Conference on Machine Learning, 70, 3319-3328. arXiv. https://arxiv.org/abs/1703.01365

[3] Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K. R., & Samek, W. (2015). On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE, 10(7), e0130140. https://doi.org/10.1371/journal.pone.0130140 [4] Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135–1144. https://arxiv.org/abs/1602.04938

Why Explainability is Crucial for Transaction Foundation Models

How to Explain?

Leave One Transaction Out (LOTO)

Conclusion: Simplicity as a Feature

References

Similar Posts