29 September 2025
Let’s solve the problem but let’s not make it any worse by guessing.
Apollo 13 Flight Journal - Gene Kranz, Flight Director, Apollo 13 One of the key things aspiring librarians are taught is “the reference interview”. The basic premise is that often when people ask for help or information, what they ask for isn’t what they actually want. They ask for books on Spain, when they actually want to understand the origins of the global influenza pandemic of 1918-20. They ask if you have anything on men’s fashion, when they want to know how to tie a cravat. They ask if you have anything by Charles Dickens, when they are looking for a primer on Charles Darwin’s theory of evolution. The “reference interview” is…
29 September 2025
Let’s solve the problem but let’s not make it any worse by guessing.
Apollo 13 Flight Journal - Gene Kranz, Flight Director, Apollo 13 One of the key things aspiring librarians are taught is “the reference interview”. The basic premise is that often when people ask for help or information, what they ask for isn’t what they actually want. They ask for books on Spain, when they actually want to understand the origins of the global influenza pandemic of 1918-20. They ask if you have anything on men’s fashion, when they want to know how to tie a cravat. They ask if you have anything by Charles Dickens, when they are looking for a primer on Charles Darwin’s theory of evolution. The “reference interview” is a technique to gently ask clarifying questions so as to ensure that you help the library user connect to what they are really looking for, rather than what they originally asked.
Sometimes this vagueness is deliberate – perhaps they don’t want the librarian to know which specific medical malady they are suffering from, or they’re embarrased about their hobby. Sometimes it’s because people “don’t want to bother” librarians, who they perceive have more important things to do than our literal job of connecting people to the information and cultural works they are interested in – so they’ll ask a vague question hoping for a quick answer. Often it’s simply that the less we know about something, the harder it is to formulate a clear question about it (like getting our Charles’ mixed up). Some of us are merely over-confident that we will figure it out if someone points us in vaguely the right direction. But for many people, figuring out what it is we actually want to know, and whether it was even a good question, is the point.
I was thinking about this after reading Ernie Smith’s recent post about Google AI summaries, which are at the centre of a legal case brought against Alphabet Inc by Penske Media. Ernie asks, rhetorically:
Does Google understand why people look up information?
I thought this was an interesting question, because – in the context of the rest of the post – the implication is that Google does not understand why people look up information, despite their gigantic horde of data about what people search for and how they behave after Google loads a page in response to that query. How could this be? Isn’t “behavioural data” supposed to tell us about people’s “revealed preferences”? Can analytics from billions of searches really be wrong? Maybe if we compare Google’s approach to centuries of library science we might find out.
Two kinds of power
In Libraries and Large Language Models as Cultural Technologies and Two Kinds of Power Mita Williams introduces Patrick Wilson’s Two kinds of power, a philosophical essay about bibliographic work:
In this work, Wilson described bibliographic work as two powers: Descriptive and Exploitative. The first is the evaluatively neutral description of books called Bibliographic control. The second is the appraisal of texts, which facilitates the exploitation of the texts by the reader.
Libraries and Large Language Models as Cultural Technologies and Two Kinds of Power - Mita Williams Professor Hope Olson memorably called this “descriptive power” the power to name. The words we use to describe our reality have a material impact on shaping that reality, or a least our relationship with it. In August this year the Australian Stock Exchange accidentally wiped $400 million off the market value of a listed company because they mixed up the names of TPG Telecom Limited and TPG Capital Asia. Defamation law continues to exist because of the power of being described as a criminal or a liar. Description matters.
Web search too originally sought to describe, and on that basis to return likely results for your search.
Different search engines have approached the creation of indexes using their own strategies, but the purpose was always to map individual web pages both to concepts or keywords and – since Google – to each other. A key difference between web search engines and the systems created and used by libraries is that the latter make use of controlled metadata, whereas the latter cannot and do not. Google in particular has made various attempts at outsourcing the work of creating more structured metadata and even controlled vocabularies. All have struggled to varying degrees on the basis that ordinary people creating websites aren’t much interested in and don’t know how to do this work (for free), and businesses can’t see much, if any, profit in it. At least, their own profit.
Whilst there is a widespread feeling that Google search used to be much better, the fact that “Search Engine Optimisation” became a concept soon after the creation of web search engines points to the fundamental limitation of uncontrolled indexes. Librarians describe what creative works are about, thus connecting items to each other through their descriptions. Search engines approach the problem from the other direction: describing the connections between works, thus inferring what concepts they are most strongly associated with. Are either of these approaches really “evaluatively neutral”?
Long held up as a core tenet of librarianship, neutrality or objectivity has been fiercely debated in recent years. Mirroring similar criticisms of journalistic standards, many have pointed out that “objectivity” in social matters very often simply means upholding the status quo. We are social animals. Nothing we can say about ourselves is ever “neutral” or objective, because everything is contested and relational. Yet humans on the whole are anthropocentric in how we see the world. We are a species that frequently thinks it can see the literal face of god in a piece of toast. Everything we say about anything can be seen as being about us, whether it’s the mating habits of penguins or the movement of celestial bodies.
So as soon as we recognise Descriptive Power, arguing over semantics is inevitable. In democratic systems an enormous amount of effort is expended attempting to move the Overton Window. Was Australia “settled” or “invaded” in 1788? Are certain countries “poor”, “developing”, “in the third world” or “from the global south”? Is a political movement “conservative”, “alt-right” or “fascist”? Is it still “folk music” if it’s played with an electric guitar? We argue about these descriptions because they also say something about how we see ourselves and want to be seen by others. When description has power, you can’t really be “neutral” when wielding it. Much like judges or umpires, when using the power to name the librarian’s task is to be fair and factual. We are members of the Reality Based Community.
Memory holes
Mita goes on to explore the idea that the library profession has generally seen our job as beginning and ending with descriptive power:
Library catalogues don’t tell you what is true or not. While libraries facilitate claims of authorship, we do not claim ownership of the works we hold. We don’t tell you if the work is good or not. It is up to authors to choose to cite in their bibliographies to connect their work with others and it is up to readers to follow the citation trails that best suit their aims..
Here is where “neutrality” comes in. We might describe a book as being about something, written by a particular person, or connected to other works and people. But we make no claims as to the veracity of the assertions you might read within it. Assessing the truth or artistry of a certain work is, as they say, “left as an exercise for the reader”. This is where the (not always exercised) professional commitment to reader privacy comes in. If it’s up to the reader to glean their own meaning from the works in our collections, then we can’t know and should not assume their purpose or their conclusions. And if people are to be given the freedom to explore these meanings, they can’t have the fear of persecution for reading “the wrong things” hanging over their heads.
We can see an almost perfect case study of this clash of approaches in the debacle of Clarivate’s “Alma Research Assistant”. Aaron Tay lays this out clearly in The AI powered Library Search That Refused to Search:
Imagine a first‑year student typing “Tulsa race riot” into the library search box and being greeted with zero results—or worse, an error suggesting the topic itself is off‑limits. That is exactly what Jay Singley reported in ACRLog when testing Summon Research Assistant, and it matches what I’ve found in my own tests with the functionally similar Primo Research Assistant (both are by Clarivate)...According to Ex Libris, the culprit is a content‑filtering layer imposed by Azure OpenAI, the service that underpins Summon Research Assistant and its Primo cousin.
Want to research something that might be “controversial”? I can’t let you do that Dave, computer says no. What’s worse in the case of Primo Research Assistant is that rather than declaring to the user that it won’t search, the system is designed to simply memory hole any relevant results and claim that it cannot find anything.
A library is a provocation
Whilst they are often associated with a certain stodginess, every library is a provocation. With all these books, how could you restrict yourself to reading only one or two? Look how many different ideas there are. See how many ways there are to describe our worlds. The thing that differentiates a library from a random pile of books is that all these ideas, concepts and stories are organised. They have indexes, catalogues, and classification systems. They are arranged into subjects, genres, and sub-collections. The connections between them and the patterns they map are made legible.
The pattern of activity that digital networks, ranging from the internet to the web, encourage is building connections, the creation of more complex networks. The work of making connections both among websites and in a person’s own thinking is what AI chatbots are designed to replace.
A linkless internet - Collin Jennings The most exciting developments in library science right now are exploring not how to provide “better answers” but rather how to provide richer opportunities to understand and map the connections between things. With linked data and modern catalogue interfaces we can overlay multiple ontologies onto the same collection, making different kinds of connections based on different ways of understanding the world.
LLMs are mid
Clarifying the connections between people and works. Disambiguating names. Mapping concepts to works, and re-mapping and organising them in line with different ontological understandings. All of this requires precision. Because they under-estimate the skill of manual classification and indexing, and over-estimate their own technologies, AI Bros thought they could combine Wilson’s two powers. But description for the purpose of identifying information sources is an exact science. This is why we have invested so much energy and time in things like controlled vocabularies, authority files, and the ever-growing list of Permanent Identifiers (PIDs) like DOI, ISNI, and ORCID. It’s why we’ve been slowly and carefully talking about Linked Open Data.
And then these arrogant little napoleons think they can just YOLO it.
The problem is not the core computing concepts behind machine learning but rather the implementation details and the claims made about the resulting systems. I agree with Aaron Tay’s on embedding vector search – this is an incredibly compelling idea that has been stretched far beyond what it is most useful for. Transformer models of any kind are ultimately intended to find the most probable match to an input. Not the best match. Not a perfect match. Not a complete list of matches. An arbitrary number of “result most likely”. In contrast to the exactness of library science, these new approaches are merely averaging devices.
Answering machines
Let us now return to Ernie Smith’s question: Does Google understand why people look up information?
Google thinks people look up information in order to find “the most likely answer”. The company has to think like this, because the key purpose of Google’s search tool is to sell audiences to advertisers via automated, hundred-millisecond-long auctions for the purpose of one-shot advertising. Everything else we observe about how it works stems from this key fact about their business. Endless listicles and slop generated by both humans and machines is the inevitable result. And since the purpose of the site is simply to display ads, why not make it “more efficient” by providing an “answer” immediately?
What I think Ernie is gesturing at is that when we search for information what we often want is to know what other people think. We want to explore ideas and expand our horizons. We want to know how things are connected. We want to understand our world. When you immediately answer the question you think someone asked instead of engaging with them to work out what they are actually looking for, it’s likely to be unhelpful. Asking good questions is harder than giving great answers.
Chat bots and LLMs can’t solve the problem. They’re just guessing.