The Best Kind of Feedback
A week ago, I published "Go’s Regexp is Slow. So I Built My Own". The response was incredible - but the most valuable feedback came from Ben Hoyt, creator of GoAWK.
He didn’t just read the article. He tried to actually use coregex.
"I’ve started integrating coregex into GoAWK... I’m finding a few issues."
That message led to one of the most productive weeks of debugging I’ve ever had.
11 Bugs in 7 Days
Ben’s GoAWK test suite is ruthless - 1000+ regex patterns covering edge cases I never imagined. Here’s what he found:
| Day | Bug | Pattern | Symptom |
|---|---|---|---|
| … |
The Best Kind of Feedback
A week ago, I published "Go’s Regexp is Slow. So I Built My Own". The response was incredible - but the most valuable feedback came from Ben Hoyt, creator of GoAWK.
He didn’t just read the article. He tried to actually use coregex.
"I’ve started integrating coregex into GoAWK... I’m finding a few issues."
That message led to one of the most productive weeks of debugging I’ve ever had.
11 Bugs in 7 Days
Ben’s GoAWK test suite is ruthless - 1000+ regex patterns covering edge cases I never imagined. Here’s what he found:
| Day | Bug | Pattern | Symptom |
|---|---|---|---|
| 1 | [^,]* | Negated char class | Crash |
| 1 | [oO]+d | Case-insensitive | Wrong match |
| 2 | ^foo | Start anchor | Matched everywhere |
| 2 | \bword\b | Word boundary | Find returned empty |
| 3 | ^ in FindAll | Anchor in loop | Matched at every position |
| 3 | Error format | - | Different from stdlib |
| 4 | \w+@... | Capture groups | DFA returned false |
| 4 | (?s:.) | Inline flags | Ignored |
| 5 | a$ | End anchor | First call wrong |
| 6 | `(#\ | #!)` | Longest() |
Each bug taught me something. Some were embarrassing oversights. Others revealed fundamental gaps in my understanding.
The Worst Bug: ^ Anchor
The start anchor (^) was my nemesis. It seemed simple - match only at position 0. But in a multi-engine architecture, "simple" gets complicated fast.
Version 1: Naively checked pos == 0. Worked for IsMatch, broke for FindAllIndex.
Version 2: Added FindAt(haystack, offset) methods. Now FindAllIndex could tell the engine "this is position 5 in the original string."
Version 3: Discovered DFA’s epsilonClosure didn’t respect anchors. Implemented proper LookSet following Rust’s regex-automata.
Three attempts over two days. Ben kept testing. I kept fixing.
The Sneakiest Bug: Longest()
This one was humbling. The Longest() method existed since v0.8.2. Documentation claimed it worked. Tests passed.
It was a no-op stub.
// What I wrote (v0.8.2)
func (r *Regex) Longest() {
// TODO: implement leftmost-longest semantics
}
// What Ben expected
re := coregex.MustCompile(`(a|ab)`)
re.Longest()
// "ab" should match "ab" (longest), not "a" (first)
AWK uses POSIX semantics (leftmost-longest). Go’s stdlib uses Perl semantics (leftmost-first) by default, but Longest() switches modes. My engine only supported Perl semantics.
The fix required understanding a fundamental distinction:
Leftmost-First (Perl): (a|ab) on "ab" → "a" (first alternative wins)
Leftmost-Longest (POSIX): (a|ab) on "ab" → "ab" (longer match wins)
Implementing this in PikeVM took 100 lines. No performance regression in default mode.
The Fix Velocity
| Version | Date | Fixes |
|---|---|---|
| v0.8.3 | Dec 4 | Negated classes, case-insensitive |
| v0.8.4 | Dec 4 | ^ anchor (professional fix) |
| v0.8.5 | Dec 5 | Word boundaries \b \B |
| v0.8.6 | Dec 7 | ^ in FindAll/ReplaceAll |
| v0.8.7 | Dec 7 | Error message format |
| v0.8.8 | Dec 7 | DFA + capture groups |
| v0.8.9 | Dec 7 | Linter compatibility |
| v0.8.10 | Dec 7 | Inline flags (?s:...) |
| v0.8.11 | Dec 8 | End anchor first-call bug |
| v0.8.12 | Dec 8 | Longest() implementation |
9 releases in 5 days. Each one making coregex more stdlib-compatible.
Performance: Still Fast
The real question: did all these fixes kill performance?
Pattern: .*connection.*
Input: 250KB log file
stdlib: 12.6 ms
coregex: 4 µs
Speedup: 3,154x (unchanged from v0.8.0)
The architectural decisions paid off. SIMD prefiltering. Lazy DFA. Strategy selection. They handle the fast path. The bug fixes lived in edge case handling - code that rarely runs.
Full Stdlib Compatibility
After v0.8.12, GoAWK’s test suite passes completely:
$ cd goawk
$ go test ./...
ok github.com/benhoyt/goawk 4.832s
Drop-in replacement confirmed.
// Before
import "regexp"
// After
import "github.com/coregx/coregex"
// That's it. Same API. 5-3000x faster.
What I Learned
1. Real-world testing > Unit tests
My test coverage was 88%. GoAWK found 11 bugs. Unit tests catch what you imagine. Users catch what you don’t.
2. Multi-engine architecture = Multi-engine bugs
Each strategy (DFA, NFA, ReverseAnchored, OnePass) had its own edge cases. A fix in one could break another. Integration tests between engines became critical.
3. "Works on my machine" is worthless
Ben tested on different inputs, different patterns, different use cases. His AWK interpreter exercises regex in ways my benchmarks never did.
4. Fast feedback loops matter
GitHub Issues → Fix → Release → Test. Sometimes twice a day. Ben’s patience and detailed bug reports made this possible.
The Collaboration
I want to publicly thank Ben Hoyt. He could have said "this library has bugs, I’ll use stdlib." Instead, he filed detailed issues, provided test cases, and kept testing each release.
This is open source at its best.
Try It Yourself
go get github.com/coregx/coregex@v0.8.12
package main
import (
"fmt"
"github.com/coregx/coregex"
)
func main() {
re := coregex.MustCompile(`\w+@[\w.]+`)
fmt.Println(re.FindString("email: test@example.com"))
// Output: test@example.com
}
Found a bug? Open an issue. I’ll fix it.
What’s Next
- v0.9.0: ARM NEON SIMD (waiting for Go 1.26)
- v1.0.0: API stability guarantee, security audit
- Your feedback: The fastest path to production-ready
Links:
- GitHub: coregx/coregex
- GoAWK PR #264 - The integration that found everything
- Original article
From 0 to 11 bugs fixed. From "interesting project" to "production-ready." Thanks to one developer who actually tried to use it.