Announcing BigQuery-managed AI functions for better SQL

For decades, SQL has been the universal language for data analysis, offering access to analytics on structured data. Large Language Models (LLMs) like Gemini now provide a path to get nuanced insights from unstructured data such as text, image and video. However, integrating LLMs into standard SQL flow requires data movement, at least some prompt and parameter tuning to optimize result quality. This is expensive to perform at scale, which keeps these capabilities out of reach for many data practitioners. Today, we are excited to announce the public preview of BigQuery-managed AI functions, a new set of capabilities that reimagine SQL for the AI era. These functions — AI.IF, AI.CLASSIFY, and AI.SCORE — allow you to use generative AI for common analytical tasks directly within your SQL queries, no complex prompt tuning or new tools required. These functions have been optimized for their target use cases, and do not need you to choose models or tune their parameters. Further through intelligent optimizations on your provided prompt and query plans we keep the costs minimal.With these new functions, you can perform sophisticated AI-driven analysis using familiar SQL operators:

Filter and join data based on semantic meaning using AI.IF in a WHERE or ON clause.

Categorize unstructured text or images using AI.CLASSIFY in a GROUP BY clause.

Rank rows based on natural language criteria using AI.SCORE in an ORDER BY clause.

Together, these functions allow answering new kinds of questions previously out of reach for SQL analytics, for example, companies to news articles which mention them even when an old or unofficial name is used.

Let’s dive deeper into how each of these functions works.

Function deep dive

AI.IF: Semantic filtering and joining

With AI.IF, you can filter or join data using conditions written in natural language. This is useful for tasks like identifying negative customer reviews, filtering images that have specific attributes, or finding relevant information in documents. BigQuery optimizes the query plan to reduce the number of calls to LLM by evaluating non-AI filters first. For example, the following query finds tech news articles from BBC that are related to Google.

You can also use AI.IF() for powerful semantic joins, such as performing entity resolution** **between two different product catalogs. The following query finds products that are semantically identical, even if their names are not an exact match.

AI.CLASSIFY: Data classification

The AI.CLASSIFY function lets you categorize text or image based on labels you provide. You can use it to route support tickets by topic or classify images based on their style. For instance, you can classify news articles by topic and then count the number of articles in each category with a single query.

AI.SCORE: Semantic ranking

You can use AI.SCORE to rank rows based on natural language criteria. This is powerful for ranking items based on a rubric. To give you consistent and high-quality results, BigQuery automatically refines your prompt into a structured scoring rubric. This example finds the top 10 most positive reviews for a movie of your choosing.

Built-in optimizations

These functions allow you to easily mix AI processing with common SQL operators like WHERE, JOIN, ORDER BY, and GROUP BY. BigQuery handles prompt optimization, model selection, and model parameter tuning for you.

Prompt optimization: LLMs are sensitive to the wording of a prompt, the same question can be expressed in different ways which affect quality and consistency. BigQuery optimizes your prompts into a structured format specifically for Gemini, helping to ensure higher-quality results and an improved cache hit rate.

Query plan optimization: Running generative AI models over millions of rows can be slow and expensive. BigQuery query planner reorders AI functions in your filters and pulls AI functions out from join to reduce the number of calls to the model, which saves costs and improves performance.

**Model endpoint and parameter tuning: **BigQuery tunes model endpoint and model parameters to improve both result quality and results consistency across query runs.

Get started

The new managed AI functions — AI.IF() , AI.SCORE() and AI.CLASSIFY() — complement the existing general-purpose Gemini inference functions such as AI.GENERATE from BigQuery. In addition to optimizations discussed above, you can expect future optimization and mixed query processing between BigQuery and Gemini for even better price-performance. You can indicate your interest for early access here.

**What to use and when: **When your use case fits them, start with the managed AI functions as they are optimized for cost and quality. Use the AI.GENERATE family of functions when you need control on your prompt and input parameters, and want to choose from a wide range of supported models for LLM inference.

To learn more refer to our documentation. The new managed AI functions are also available in BigQuery DataFrames. See this notebook and documentation for Python examples.

For questions or feedback, reach out to us at bqml-feedback@google.com.

Posted in