Oleg Bartunov: Unpublished interview

I found a copy of an interview that apparently has not been published anywhere

Interview with Oleg Bartunov

“Making Postgres available in multiple languages was not my goal—I was just working on my actual task.”

Oleg Bartunov has been involved in PostgreSQL development for over 20 years. He was the first person to introduce locale support, worked on non-atomic data types in this popular DBMS, and improved a number of index access methods. He is also known as a Himalayan traveler, astronomer, and photographer.

— Many of your colleagues know you as a member of the PostgreSQL Global Development Group. How did you get started with Postgres? When and why did you join the PostgreSQL community?

— My scientific interests led me to the PostgreSQL community in the early 1990s. I…

I found a copy of an interview that apparently has not been published anywhere

Interview with Oleg Bartunov

“Making Postgres available in multiple languages was not my goal—I was just working on my actual task.”

— Many of your colleagues know you as a member of the PostgreSQL Global Development Group. How did you get started with Postgres? When and why did you join the PostgreSQL community?

— My scientific interests led me to the PostgreSQL community in the early 1990s. I’m a professional astronomer, and astronomy is the science of data. At first, I wrote databases myself for my specific tasks. Then I went to the United States—in California, I learned that ready-made databases existed, including open-source ones you could just take and use for free! At the University of Berkeley, I encountered a system suited to my needs. At that time, it was still Ingres, not even Postgres95 yet. Back then, the future PostgreSQL DBMS didn’t have many users—the entire mailing list consisted of 400 people, including me. Those were happy times.

— Does your experience as a PostgreSQL developer help you contribute to astronomy?

— We’ve done a lot for astronomy. Astronomical objects are objects on a sphere, so they have spherical coordinates. We introduced several specialized data types for astronomers and implemented full support for them—on par with numbers and strings.

— How did scientific tasks transform into business ones?

— I started using PostgreSQL to solve my astronomical problems. Gradually, people from outside my field began reaching out to me. In 1994, it became necessary to create a digital archive for The Teacher’s Newspaper, and I was surprised to discover that PostgreSQL didn’t recognize the Cyrillic alphabet. I looked into the source code and realized the issue: PostgreSQL used 7-bit ASCII encoding—it simply ignored the 8th bit. I’d previously worked on internationalization in Perl, so I had some idea of what to do.

— Did you realize at the time that your work on locale support would help a huge number of people start using PostgreSQL?

— No, I didn’t realize it right away. But the community quickly appreciated the opportunity and jumped on it. Once I completed locale support, it became technically possible to use PostgreSQL with many European languages. Soon after, Japanese community members added support for Japanese. UTF-8 encoding also appeared later—though I didn’t work on that myself.

Before locale support was merged into PostgreSQL’s core, I’d been using it personally for a full year. That’s the essence of open source: if you need functionality that’s missing, build it yourself. Once your patch enters the core, the community takes it further. Of course, on April 2, 1997, when I committed locale support, I had no idea it would significantly affect PostgreSQL’s popularity. I was just solving my own problem.

— After the locale commit, you became involved with full-text search. Is there a connection between these two efforts?

— Yes. I designed the architecture for locale support and have worked extensively on full-text search in PostgreSQL. While they’re different tasks, there’s a clear link: once multiple locales became available, demand grew for full-text search in those languages. I’ve been working on search for over 20 years—it’s my longest-running project.

Search involves “stemming”—reducing words to their root forms—and requires dictionaries. In PostgreSQL, this process is standardized: new dictionaries are first added to the Snowball project for testing, then integrated into PostgreSQL. But nothing works without native speakers. Their initiative is essential—they must create the dictionaries.

— How many languages does PostgreSQL support now?

— PostgreSQL currently supports about 30 languages—I think 28. Hindi and Nepali were added recently, and Turkish and Tamil are also supported. The key requirement is having native speakers willing to build dictionaries. Integration via Snowball is straightforward.

— How many popular languages are still not supported?

— India alone has around 50 languages, and technically, they could all be added one by one. As I mentioned, Hindi and Tamil are already supported. PostgreSQL even allows custom parsers—you can write your own lexeme parser and add absolutely any language.

— Any language? Is it true you worked on Nepali support?

— At some point, I wanted to fulfill my dream of seeing big mountains. It turned out to be simple—just buy a ticket and fly. In 2009, I traveled to Nepal and wanted to do something meaningful for the country. While preparing a talk on full-text search, I discovered PostgreSQL didn’t support Devanagari—the national script of Nepal. Together with Teodor Sigaev, we implemented support, I gave the presentation… and then, for ten years, nothing happened.

— But not forever? Did you return to this later?

— Ten years later, we organized another event—this time with around 200 PostgreSQL users. We found someone interested in the project, and he created a Nepali dictionary. Now, a search engine in Nepali is being built. We’re doing something similar for my native Kalmyk language—one enthusiast is currently working on a dictionary. It’s vital for small nations to be able to build search engines in their own languages. In general, wherever PostgreSQL gains traction, the need for national-language search quickly follows.

— Which languages posed the biggest challenges? Asian ones, perhaps?

— Don’t assume complex parsers are only needed for “obviously complex” languages. German and Norwegian, for example, require special handling. There’s a phenomenon called agglutination, where a single word can contain multiple roots. PostgreSQL supports multi-stem words, which are common in German.

— Is full-text search a mathematical problem, or is there some “magic” to it?

— The multi-stem example shows that full-text search isn’t a mathematical problem with a single correct answer. Take the notation “1/2”—is it “building 1, unit 2” or the fraction “one half”? Without context, it’s ambiguous.

— A bit of trolling: why have built-in search in a DBMS at all, when there are external systems designed specifically for search?

— External search often fails for large enterprises. If you use an external solution alongside built-in DB search, you’re storing a copy of your data outside the database—and that copy may be outdated. Say you update a record, but the external index hasn’t caught up yet. The user searches, clicks a result—and the page is gone. That’s a poor experience.

But that’s not even the worst issue. Consider access control: suppose you have multiple roles, each allowed to search only specific data. With external search, enforcing this is hard—you need row-level security at the database level. And then there’s confidentiality: not everyone wants to feed sensitive data into an external search engine. I’ve even given talks specifically on why full-text search belongs inside the database.

— Locale support, full-text search, indexing methods for semi-structured data… Is there a common thread in your PostgreSQL work?

— If you look at all my PostgreSQL projects, they’re tied together by one theme: handling unstructured data in a relational DBMS. Most of our commits revolve around this. The Russian PostgreSQL team has always been deeply interested in this area. Today, we collaborate closely with those who know our expertise—and our recent work reflects real enterprise requirements.

— JSON support is a good example of a community-driven feature. When it launched, nearly every DBMS blog covered it…

— Actually, it’s mostly about JSONB—that’s what made PostgreSQL super popular. Credit also goes to the community, which laid the groundwork before JSONB arrived. Unstructured data has always mattered to me. I helped create HSTORE, PostgreSQL’s first key-value store for documents—long before JSON became a standard.

In 2012, PostgreSQL introduced “text JSON”—just raw JSON stored as text, with no real functionality. So we built JSONB, a binary format optimized for indexing and querying. Many users chose PostgreSQL specifically for these NoSQL-like capabilities.

— Is JSON support now a standard for relational databases?

— PostgreSQL had JSON and JSONB before JSON became an SQL standard. In fact, PostgreSQL’s adoption of JSONB sparked such interest that other databases—MySQL, SQL Server, Oracle—followed suit. That’s why the SQL Standard Committee eventually included JSON in the specification. I even found references to my old LiveJournal posts in their documentation!

— What’s new in the upcoming SQL standard?

— The new standard will formally define a JSON data type. You’ll be able to write: CREATE TABLE ... (data JSON);. This matters when a project uses multiple DBMSes and needs seamless data exchange—interoperability is increasingly crucial.

— PostgreSQL historically has two JSON types. Does that comply with the standard?

— Exactly. We have “text JSON” and JSONB, so standardization is needed. At Postgres Professional, one of our major projects is aligning PostgreSQL’s JSON implementation with the upcoming SQL standard. We expect unified JSON support by PostgreSQL 14. We’ve been working on this for four years—parts of it already shipped in versions 12 and 13.

— JSON seems clear now. What about full-text search? Are you planning improvements?

— We want PostgreSQL to support even more languages. I’d like to build better infrastructure for search management—improve the configuration language, which is currently quite primitive. We’ve already started this work and need a more flexible, user-friendly syntax.

Performance remains critical. Our GiST and GIN indexes were originally designed to accelerate search. Now we’re actively developing the RUM index, which offers functionality similar to external search engines—but fully integrated into PostgreSQL.

Another “eternal” challenge is relevance: you always want better ranking. Sometimes you trade accuracy for speed—when you don’t need all results, just the first few fast.

— Are you satisfied with what’s been implemented? Are you open to criticism?

— Nothing new is ever perfect—there’s always room for improvement. The deeper you dive into your own work, the more flaws you see. But that just means there’s still exciting work ahead. I receive many messages from PostgreSQL and Postgres Pro users—feature requests, bug reports, ideas. We exchange dozens of messages, and sometimes the outcome is very positive.

— What’s the most memorable feedback you’ve received from the community?

— The most memorable is when someone recognizes me on the street and simply says, “Thank you.” It can happen anywhere in the world. And it’s especially meaningful when it happens in front of my children.

Interview with Oleg Bartunov

Interview with Oleg Bartunov

Similar Posts