by Ian Eyre Publication date Dec 15, 2025 Reading time estimate 24m advanced data-science python
Narwhals is intended for Python library developers who need to analyze DataFrames in a range of standard formats, including Polars, pandas, DuckDB, and others. It does this by providing a compatibility layer of code that handles any differences between the various formats.
In this tutorial, you’ll learn how to use the same Narwhals code to analyze data produced by the latest versions of two very common data libraries. You’ll also discover how Narwhals utilizes the efficiencies of your source data’s underlying library when analyzing your data…
by Ian Eyre Publication date Dec 15, 2025 Reading time estimate 24m advanced data-science python
Narwhals is intended for Python library developers who need to analyze DataFrames in a range of standard formats, including Polars, pandas, DuckDB, and others. It does this by providing a compatibility layer of code that handles any differences between the various formats.
In this tutorial, you’ll learn how to use the same Narwhals code to analyze data produced by the latest versions of two very common data libraries. You’ll also discover how Narwhals utilizes the efficiencies of your source data’s underlying library when analyzing your data. Furthermore, because Narwhals uses syntax that is a subset of Polars, you can reuse your existing Polars knowledge to quickly gain proficiency with Narwhals.
The table below will allow you to quickly decide whether or not Narwhals is for you:
| Use Case | Use Narwhals | Use Another Tool |
|---|---|---|
| You need to produce DataFrame-agnostic code. | ✅ | ❌ |
| You want to learn a new DataFrame library. | ❌ | ✅ |
Whether you’re wondering how to develop a Python library to cope with DataFrames from a range of common formats, or just curious to find out if this is even possible, this tutorial is for you. The Narwhals library could provide exactly what you’re looking for.
** Take the Quiz:** Test your knowledge with our interactive “Writing DataFrame-Agnostic Python Code With Narwhals” quiz. You’ll receive a score upon completion to help you track your learning progress:
Get Ready to Explore Narwhals
Before you start, you’ll need to install Narwhals and have some data to play around with. You should also be familiar with the idea of a DataFrame. Although having an understanding of several DataFrame libraries isn’t mandatory, you’ll find a familiarity with Polars’ expressions and contexts syntax extremely useful. This is because Narwhals’ syntax is based on a subset of Polars’ syntax. However, Narwhals doesn’t replace Polars.
In this example, you’ll use data stored in the presidents Parquet file included in your downloadable materials.
This file contains the following six fields to describe United States presidents:
| Heading | Meaning |
|---|---|
last_name | The president’s last name |
first_name | The president’s first name |
term_start | Start of the presidential term |
term_end | End of the presidential term |
party_name | The president’s political party |
century | Century the president’s term started |
To work through this tutorial, you’ll need to install the pandas, Polars, PyArrow, and Narwhals libraries:
A key feature of Narwhals is that it’s DataFrame-agnostic, meaning your code can work with several formats. But you still need both Polars and pandas because Narwhals will use them to process the data you pass to it. You’ll also need them to create your DataFrames to pass to Narwhals to begin with.
You installed the PyArrow library to correctly read the Parquet files. Finally, you installed Narwhals itself.
With everything installed, make sure you create the project’s folder and place your downloaded presidents.parquet file inside it. You might also like to add both the books.parquet and authors.parquet files as well. You’ll need them later.
With that lot done, you’re good to go!
Understand How Narwhals Works
The documentation describes Narwhals as follows:
Extremely lightweight and extensible compatibility layer between dataframe libraries! (Source)
Narwhals is lightweight because it wraps the original DataFrame in its own object ecosystem while still using the source DataFrame’s library to process it. Any data passed into it for processing doesn’t need to be duplicated, removing an otherwise resource-intensive and time-consuming operation.
Narwhals is also extensible. For example, you can write Narwhals code to work with the full API of the following libraries:
It also supports the lazy API of the following:
The Narwhals developers are always looking to add more libraries and even encourage you to contribute your own. In addition, its developers are convinced that extending Narwhals to support new libraries is relatively straightforward.
To understand the high-level architecture of Narwhals, take a look at the diagram below:
In this diagram, you’re passing a pandas DataFrame into Narwhals code, although DataFrames or LazyFrames from any of the supported formats will work just as well. Once you pass the pandas DataFrame in, Narwhals encapsulates it for analysis.
To analyze your Narwhals DataFrame, regardless of its original format, you write syntax similar to Polars. The Narwhals wrapper then transforms your analysis code into a library-specific format. In this example, it’ll create the pandas equivalents of the Narwhals expressions that you’re using to analyze the data.
Narwhals then passes these pandas expressions to the original pandas library for processing. The pandas library then uses these expressions to analyze the original data in the usual way. As the diagram shows, the pandas library operates on the original data, meaning that no separate copy is ever made.
Once the data has been analyzed, Narwhals again takes over and passes its results back to you. This will usually be in the same format as the original DataFrame, but it doesn’t have to be. For example, passing a pandas DataFrame in could result in a Polars DataFrame being returned to you. Just make sure the returned format’s library is installed on your system so you can work with it.
One important point to note is that Narwhals supports a subset of the analysis capabilities of standard data analysis libraries such as pandas and Polars. By itself, Narwhals complements their use rather than providing an alternative.
Now that you’ve got the heads-up on what Narwhals offers, it’s time to see it in action.
Write DataFrame-Agnostic Python Code With Narwhals
To use Narwhals, you pass it data in a format it supports. Narwhals then converts the DataFrame or LazyFrame to its own format, analyzes it, and finally returns the result to you in a format of your choice. You’ll now see this workflow in action.
Perform the Same Processing on Multiple Sources
Suppose you want a function that’s equally happy grouping data from either a pandas or Polars DataFrame, or from any other Narwhals-supported format. Create a file named universal_processing.py in your project’s folder, and add in a universal_groupby_v1() function as shown:
To begin with, you import the Narwhals library into your code using its standard nw alias. You’ve also decided to use the IntoFrameT type hint, so you’ve imported that from narwhals.typing. The IntoFrameT type hint indicates an object that’s convertible to either a Narwhals DataFrame or LazyFrame. Your function will accept an instance of this type as its input and return it back to the caller as output.
Your universal_groupby_v1() function takes a single parameter, df, of type IntoFrameT. The function then uses the nw.from_native() function to add a wrapper around the original input. This wrapper allows the Narwhals API to access your DataFrame. The specific object that Narwhals creates depends on the original input format.
Narwhals then performs a grouping operation. In this example, it groups the data by party_name, counts the number of entries in the last_name column for each group, and then sorts the results by party_name.
If you’re familiar with Polars expressions, then this syntax will be immediately familiar to you. Remember, Narwhals uses a subset of Polars. The API Completeness section of the documentation will tell you precisely what is or isn’t supported.
Finally, Narwhals takes the result of its analysis and uses .to_native() to return it in its original form. So if you pass in a Polars DataFrame, that’s what you’ll get back, and similarly for any other supported format.
One final point to note is that the universal_groupby_v1() function requires only the Narwhals library and the optional narwhals.typing.IntoFrameT. Nowhere do you import the source data’s library.
Now it’s time to test your function. Open a Python REPL within your project’s folder and run the following code:
After importing both the pandas and Polars libraries, you import your universal_groupby_v1() function. You also create both the presidents_pd pandas DataFrame and the presidents_pl Polars DataFrame.
To test your universal_groupby_v1() function, you first pass in a pandas DataFrame:
As you can see from the results, Narwhals can cope with your pandas DataFrame. The result is a pandas DataFrame because the function uses .to_native() to return the analyzed data in its original format.
Next, you’ll see if your function can also cope with a Polars DataFrame:
Yes, it can. This time, your function has returned a Polars DataFrame containing the same result.
Use Some Syntactic Sugar
Although the previous example works perfectly well, the Narwhals library provides some useful decorator syntactic sugar to make your code simpler and more readable.
Add a new version of your earlier function named universal_groupby_v2() into your existing universal_processing.py file:
This time, you’ve made use of the @nw.narwhalify function decorator to simplify your code. Usually, code needs to explicitly convert the DataFrame to a Narwhals format, analyze it, and then convert it back again to its original format before returning it. Because @nw.narwhalify does both conversions for you implicitly, you can concentrate on the data analysis steps.
Take a look back at the example above, and you’ll see neither from_native() nor .to_native() is explicitly used.
This time, you use the FrameT type hint instead of IntoFrameT. The Narwhals documentation defines FrameT as a type that represents a nw.DataFrame or nw.LazyFrame. The documentation favors FrameT when @nw.narwhalify is used. Because type hints aren’t enforced, the IntoFrameT hint would also work equally well.
But does version two of your function still produce the same result as version one did? Before you can find out, you’ll first need to exit your current Python REPL session by typing exit or pressing Ctrl+D, and then start a fresh session. Now, for the moment of truth:
It certainly gives you what you’d expect when a pandas DataFrame has been passed to it. Feel free to try it using your earlier presidents_pl DataFrame. You shouldn’t be disappointed.
Return a Different Output Format
So far, the returned DataFrame has been of the same type as the original DataFrame that you passed into your function. The DataFrame’s .to_native() method ensured this. You can specify an alternative output format using one of a range of available methods, such as .to_pandas() and .to_polars(), but you can’t use the @nw.narwhalify decorator.
You’ll next create a third version of your function that always returns a Polars DataFrame. Add this new version to your existing file:
This version looks very similar to version one. However, you specify that you always want a Polars DataFrame returned by using .to_polars() instead of .to_native().
As usual, restart your Python REPL and test your new function:
First of all, you pass a pandas DataFrame into your function, and then you pass a Polars DataFrame. In both cases, a Polars DataFrame is produced. Feel free to print both results to your console, and you’ll see that they’re indeed identical.
Work With Both Eager and Lazy Modes
If you’ve worked with Polars, you might be aware that it provides both eager and lazy APIs. Narwhals also supports both through its narwhals.DataFrame and narwhals.LazyFrame classes.
In eager mode, a query is executed immediately, whereas in lazy mode, a query plan is created by analyzing the query and determining its most efficient execution plan. Execution of the query to obtain the result is deferred until the code requests it.
Despite its somewhat derogatory name, lazy evaluation delivers significant performance gains, particularly when handling large volumes of data.
In general, if you pass a DataFrame or LazyFrame into Narwhals, that’s what .to_native() will return to you. You can test this using either the first or second version of your function because both versions return an object in the same format as they receive it:
As you can see, by passing in a Polars DataFrame to your universal_analysis_v1() function, you get a DataFrame back, and similarly for passing a LazyFrame. However, as with Polars, you need to take care when using Narwhals’ functions with LazyFrames, because not all analysis can be performed in lazy mode. More specifically, any analysis that needs to read data can only work in eager mode.
To investigate a situation where a LazyFrame fails, you decide to create a new universal_pivot_v1() function that produces a pivot table of your data:
This function performs a pivot table operation on the presidential data, allowing you to see a count of the number of presidents from each party in each century. To do this, you use Narwhals’ version of .pivot().
To display a separate column for each political party, you pass on="party_name", and to return a separate row for each of the different centuries, you pass index="century". With the rows and columns defined, you then tell .pivot() to produce a count of each century-and-party combination by counting each president’s last name. You use aggregate_function="count" and values="last_name" to do this.
You decide to try out your function with a Polars DataFrame, but you could also pass it a pandas DataFrame:
It worked! Next, you decide to try passing in a LazyFrame:
This time, your analysis fails. Instead of a result, you get an AttributeError exception. The problem, as the exception’s message points out, is that .pivot() doesn’t apply to LazyFrames. You can only use .pivot() with DataFrames because they contain the data that .pivot() needs to perform its calculations.
If you need to write Narwhals code that uses lazy mode, then make sure to check that the functionality you need is supported. The documentation’s API Completeness section provides both supported DataFrame methods and supported LazyFrame methods, giving you a heads-up on what’s available in each mode. Also, the API Reference section provides details on how each method works.
If you want your universal_pivot() function to work with both DataFrames and LazyFrames, you could use the .collect() method on the LazyFrame passed to it before you pass it to .pivot(). To see this working, you decide to create a universal_pivot_v2() function as shown:
This time, version two of your function tests for the presence of a LazyFrame using Python’s built-in isinstance() function. If a LazyFrame is present, after from_native() adds a Narwhals wrapper, you call .collect() to populate it with data and create a DataFrame. You then pass the DataFrame through the same pivoting operation as before.
If either a DataFrame or LazyFrame is passed to universal_pivot_v2() , it’s analyzed in the same way as it was earlier with universal_pivot_v1(). Feel free to test it for yourself, and you’ll see it produces the same results in both cases.
The final aspect of Narwhals you should be aware of is that its syntax is similar to that of Polars. You’ll investigate this next.
Apply Your Existing Polars Skills to Narwhals
Now that you’ve gained some idea of what Narwhals is and what it offers you, in this final section, you’ll see how you can use your existing Polars knowledge and apply it to a Narwhals problem. Remember, the Narwhals documentation explains the subset of the Polars-like syntax that Narwhals supports.
To complete this section, you’ll use both the books.parquet and authors.parquet files included in your downloadables. You’ll see they contain details of several books and their authors.
The books.parquet file contains the following four fields:
| Heading | Meaning |
|---|---|
book_title | Title of the book |
language | Language of first publication |
year_published | Year of book’s publication |
author_id | Author’s identification number |
Meanwhile, the authors.parquet file contains the following three fields:
| Heading | Meaning |
|---|---|
author_id | Author’s identification number |
first_name | Author’s first name |
last_name | Author’s last name |
Both files share a common author_id column. This allows their rows to be linked together.
Now see if you can complete the following challenge exercise:
Read the content of the books.parquet file into a Polars DataFrame.
1.
Read the content of the authors.parquet file into a Polars LazyFrame.
1.
Use your wizardry to write a function named rowling_books() that uses each of your Polars objects and produces a list of the books written by British author J. K. Rowling. Your function should return each book’s title, the year in which it was published, as well as the author’s full name. The books should be output as a pandas DataFrame in the order of their publication.
1.
Test your function.
Hint: There should be seven book titles in the final result.
One possible solution could be something like this:
To join the data from both files, Narwhals uses .join() in a similar way to Polars. You must convert both the DataFrame parameter df and the LazyFrame parameter lf into the same type of Narwhals object before joining them. It doesn’t make sense to join a Narwhals DataFrame and a LazyFrame together.
In the case of df, you pass it to .from_native() to produce a PolarsDataFrame Narwhals object. This uses the Polars library for processing.
Your LazyFrame needs a bit more preprocessing before you can join it to your DataFrame.
You again use from_native() on lf to convert it to a Narwhals-equivalent LazyFrame object. Once it’s converted, you use both .filter(), as well as .str.contains("Rowling"), to select only rows containing authors with a last_name of Rowling. However, this will still be in a LazyFrame, so to convert it to a PolarsDataFrame Narwhals object, you need to use .collect() in a way similar to how you would in Polars.
Once you have your two PolarsDataFrame objects, you can join them into a larger PolarsDataFrame using .join() and specifying "author_id" as the column to join them on.
You next use .select() to specify the columns you want to see, and .sort() to sort the output in the desired order of the publication year. These serve the same purpose as their equivalents in Polars.
At this point, you now have your final Narwhals DataFrame containing what you want. To get your desired output format, you use .to_pandas() to convert it into a pandas DataFrame.
Of course, you decide to test your function:
When you call your function, you see a list of the Harry Potter books stored in the DataFrame.
Conclusion
Narwhals enables Python library developers to create DataFrame-agnostic code that processes both DataFrames and LazyFrames, regardless of their original source.
It’s a lightweight library because it wraps the original source data in its own object ecosystem but delegates data processing back to the original source library to take advantage of its efficiencies.
Narwhals is also great for developers who have experience with the very popular Polars library, since its syntax is based on Polars.
In this tutorial, you’ve learned how to:
- Install Narwhals
- Write analysis code using data from different data analysis libraries
- Work in both lazy and eager modes
- Control the output format of what Narwhals returns to you
- Use your existing Polars skills to write Narwhals code
Narwhals is still a young library, and its developers are actively improving it. This tutorial has aimed to make you aware of the library and its key features. While it’s unlikely these will change, it’s still wise to keep an eye on the Narwhals documentation from time to time to see how its key features evolve and expand, as well as which new data analysis libraries Narwhals will support in the future.
Frequently Asked Questions
Now that you have some experience with Narwhals in Python, you can use the questions and answers below to check your understanding and recap what you’ve learned.
These FAQs are related to the most important concepts you’ve covered in this tutorial. Click the Show/Hide toggle beside each question to reveal the answer.
Narwhals lets library authors write DataFrame-agnostic code that runs against pandas, Polars, DuckDB, and more with the same API. It’s aimed at Python library developers who need one code path to analyze common DataFrame formats.
No. Narwhals wraps the input and translates your expressions, then hands execution to the original library. It complements pandas and Polars rather than replacing them.
Narwhals avoids copying by operating on the original DataFrame and delegating work to the source backend. This keeps memory use low and preserves the performance characteristics of the underlying library.
Wrap inputs with nw.from_native() and return the original type with .to_native(). If you always want a specific output, call .to_pandas() or .to_polars() instead, or use the @nw.narwhalify decorator to auto-handle the conversions around your function body.
Narwhals mirrors Polars with nw.DataFrame for eager and nw.LazyFrame for lazy execution. Some operations like .pivot() require materialized data, so convert a lazy input with .collect() before those steps.
** Take the Quiz:** Test your knowledge with our interactive “Writing DataFrame-Agnostic Python Code With Narwhals” quiz. You’ll receive a score upon completion to help you track your learning progress: