Photo by Helena Lopes on Unsplash
Throughout your open source journey, you have no doubt interacted with the core development team of the projects you contributed to. Have you ever wondered how people become core developers of a project?
To be a core developer, you don’t necessarily have to know the most about the project or be the most technical. Just like there are a variety of ways to contribute to a project, there are a variety of ways to add value as a p…
Photo by Helena Lopes on Unsplash
Throughout your open source journey, you have no doubt interacted with the core development team of the projects you contributed to. Have you ever wondered how people become core developers of a project?
To be a core developer, you don’t necessarily have to know the most about the project or be the most technical. Just like there are a variety of ways to contribute to a project, there are a variety of ways to add value as a part of the core development team: CI/CD, documentation, feature development, release management, testing, triage, etc. Each member of the team will have different strengths and weaknesses, but they will all be passionate about the project, which makes it possible for them to work together on moving the project forward.
Below I share my journey to becoming a core developer of numpydoc, from first learning about the project and identifying pain points, to adding value in a way that made sense for me and, eventually, joining the team. You will also learn a little bit about some useful modules in the Python standard library for analyzing code.
First exposure to numpydoc
In July 2022, I participated in my first open source development sprint with the Scikit-Learn team at EuroPython. We focused on fixing docstrings for existing code to align with the numpydoc standard, which is widely used in the scientific Python community. I was new to numpydoc, but I immediately understood the benefit of having consistent docstrings like the example below – something that is very hard to do when relying strictly on human validation:
Example of a docstring following the numpydoc style, ignoring rules ES01, EX01, SA01, and SS06. Excerpt taken from stefmolin/data-morph on November 1, 2025.
As we made our changes, we would run a script to test whether we had addressed all the docstring issues that numpydoc validated. This script was written by the Scikit-Learn team to analyze the entire package at once, as, back then, numpydoc only worked on one docstring per invocation. Towards the end of the sprint, one of the maintainers there told me about the pre-commit tool, which, when installed (see my How to Set Up Pre-Commit Hooks article) in the Scikit-Learn repository on my machine, would run a series of checks selected by the Scikit-Learn team on the code. I began to wonder why the numpydoc validation wasn’t being performed as a pre-commit check.
Building a pre-commit hook for numpydoc
When I got back home, I researched pre-commit, and, at work, I built a numpydoc validation pre-commit check. It wasn’t until I began working on one of my personal projects, Data Morph, that I needed the numpydoc validation pre-commit check outside of work, so I got permission from my employer to open source it.
I approached the numpydoc team to see if they were interested and indeed they were. However, when I tried to port the code, I realized that what I built was actually unusable for most use cases because of how numpydoc worked: in order to inspect the code, numpydoc needed to import it, but since pre-commit creates a separate environment for running the checks, the code would also need to be installed in this environment (see my A Behind-the-Scenes Look at How Pre-Commit Works article for more information). Pre-commit checks need to be fast, and this would make it way too slow. At this point, I realized this wouldn’t be that easy, and that, perhaps, this was the reason the hook didn’t exist already.
Creating numpydoc’s static code analysis functionality
At the time, numpydoc used the Validator class to import the Python code for inspection. It provides properties to access information about the object and its docstring (e.g., signature parameters and file name), which are used during the docstring validation checks. While this worked fine for existing applications, I needed to analyze the code without running it in order to build a pre-commit hook for numpydoc. This process is called static code analysis.
I had no clue how to perform static code analysis at the time, yet alone in Python. Thankfully, Python has a great open source community, so I knew someone had definitely solved this problem before. That’s when I realized I had been using some static code analysis tools already – black for one. How did black and other Python tools with pre-commit hook functionality do it? After poking around a few such codebases on GitHub, I had my answer: the ast module in the Python standard library.
The ast module provides tools to work with Abstract Syntax Trees (ASTs) in Python. An AST represents a program (Python in this case) using a tree structure in which the nodes are components of the language’s grammar, for example, if statements and class/function definitions. By simply reading the contents of a file, the ast module can parse syntactically-valid source code into an AST, which can then be traversed to perform static code analysis.
Without going into the gory details here, I spent the weekend creating an alternative to the existing Validator class: the AstValidator class. When running numpydoc as a static code analyzer, the new AstValidator class replaces the existing Validator class, overriding the Validator class to use AST-compatible logic, which eliminates the need to import the Python code being analyzed. For example, instead of inspecting the Python object, we check what kind of AST node we have:
The logic marked as removed is how the Validator implements this check; the logic marked as added is how the AstValidator implements the check. Excerpt taken from the initial PR to add this functionality to numpy/numpydoc.
For the AstValidator class to do its job, it must be initialized with an individual node in the AST to check. The Python ast module includes the ast.parse() function, which parses source code as a string into an AST. The result is an AST rooted at an ast.Module node. In order to inspect everything in that module, we must traverse the tree starting at the root and validate each of the nodes that should have docstrings, as we encounter them. The ast module provides the ast.NodeVisitor class to perform this transversal, so I created the DocstringVisitor class as a subclass of ast.NodeVisitor and overrode the visit() method, which is called on each node, to perform the check only if the node represents a module, class, or function:
Example of an ast.NodeVisitor.visit() method as implemented in the DocstringVisitor class. Excerpt taken from the initial PR to add this functionality to numpy/numpydoc.
If you are interested in learning more about ASTs, be sure to check out my keynote at PyCon Lithuania 2025.
The pre-commit hook
Using the AST solved the feasibility issue, and turning this into a pre-commit hook didn’t require much effort afterward (see my Pre-Commit Hook Creation Guide, if you are curious how to do that). The numpydoc-validation pre-commit hook is available in numpydoc versions 1.6.0 and higher. Configure it on your repository by adding the following to your .pre-commit-config.yaml file:
Please consult the documentation for implementation specifics.
Bells and whistles
While the hook was definitely useable at this point, it was missing some functionality that users of these tools have come to expect like configuration options in pyproject.toml and inline comments to ignore checks on specific lines. Working on these types of usability improvements may not seem as glamorous as adding new features to a project, however, they strongly communicate your passion for the project and your interest in improving it, which are things core developers look for.
Configuration options
Using a dedicated section in pyproject.toml, I added support for a few different configuration options, which you can see below. The supported options have since grown, so be sure to check out the documentation for the latest options:
Example numpydoc-validation hook configuration in pyproject.toml. Excerpt taken from stefmolin/data-morph on November 1, 2025.
To implement this, I used the tomllib module in the standard library (introduced in Python 3.11), which made quick work of finding the dedicated numpydoc section. However, I also wanted to automatically detect the pyproject.toml file for the user instead of requiring them to pass the path to it because it is a standard name and typically in a standard location (the root of the repository). This wasn’t as straightforward, so I once again researched how other tools do this and adapted the logic that black had.
I’ve had several PRs merged in different projects that boiled down to knowledge-sharing just like this change. It’s a great way to make a difference, and in some cases, there’s an opportunity to be proactive and suggest such changes in the first place, which will definitely make an impression on the core developers.
Ignoring checks with inline comments
The checks defined in the pyproject.toml configuration apply to the project globally. For flexibility and to bring the user experience closer to that of other popular tools like black and flake8, I wanted to make it possible to turn off checks on a per-docstring basis using inline comments. For example, the following bypasses the checks for parameter (PR01) and return value (RT01) documentation in a single function docstring:
Example of using inline comments to have numpydoc ignore validation rules PR01 and RT01 for the _easing() function’s docstring. Excerpt taken from stefmolin/data-morph on November 1, 2025.
This can’t be done with the AST alone because, when the ast module parses source code to generate an AST, it discards some information that is irrelevant for that representation, such as formatting and comments. This means that inline comments like the example above are not present at all in the AST.
To extract inline comments, I needed to use a parse tree instead. A parse tree retains the full structure of the code, but it isn’t as easy to work with or as efficient as the AST. The leaves of the parse tree are called tokens and can be extracted with the tokenize module from the standard library. Tokens have a type, one of which is token.COMMENT, so it’s easy to isolate only those that are comments, and, from there, match comments prefixed with numpydoc ignore to their corresponding AST nodes using the line number, saving the validation checks that should be ignored.
The review
With all of this in place, the PR to add the numpydoc-validation pre-commit hook was ready for review, which given the novelty of it, took a few months. During the review, I had agreed to help the core development team out in the future if something came up related to my feature as they were a little concerned about merging such a massive change without my support, since it was new to all of them. However, this didn’t make me part of the team yet – in fact, it hadn’t even been discussed.
CLI improvements
After the numpydoc-validation pre-commit hook was merged, I ended up sticking around to tackle some other improvements like streamlining the numpydoc CLI and bringing some of the functionality that only existed on the pre-commit hook side to other parts of the project. With these changes, we can now run numpydoc lint to run numpydoc validation on entire Python files using the AST logic:
This solved one of the pain points I had when first trying to use numpydoc on a personal project: I wanted to be able to validate the docstrings of an entire module without needing to write a script or test suite to do so. Since I’m a user of numpydoc, I’m passionate about making that experience better.
Welcome to the team
In April 2024, several months after the numpydoc-validation pre-commit hook went live and just a couple of days after I opened the PR for the numpydoc lint CLI changes, I was invited to become a core developer. One of my favorite things since joining has been seeing issues come in that show people have switched their projects over to the tools I built.
It’s important to note that, while I wrote and am very familiar with the AstValidator and pre-commit hook implementation and am well-versed with the CLI, I am not very familiar with other areas like the Sphinx logic or the docscraping logic that predates my involvement. However, others in the team have deep understanding of these areas, and lean on me for my strengths. This is what makes a development team successful – we complement each other.
Are you hoping to become a core developer of a project? The exact process will depend on the project, and in some cases, may be written down, like the “How to join the core team” page in Python Developer’s Guide. My advice is to continue making quality contributions to the project in a way that makes sense for you and is sustainable. Maintainers will notice contributors that are having a positive impact on the project and are passionate about it – these are the people they want to work with after all.