Fast Sequence Iteration in Common Lisp

Published on 2025-12-13

If you don’t know what sequences are in CL, here’s the gist of it: either a linked list or a vector. The main idea being that some operations like element search, comparison or deletion should have the same interface for both those types.

Basically, sequences are a band-aid over the lack of real iterator protocol (yes, even considering the mildy supported extensible sequences thing). But well, that’s what we have to work with and it’s not so bad really; we could be in barren C-ount…

Published on 2025-12-13

One of the problems of sequences is that if you want to iterate over these while supporting the conventional keywords (:start, :end, :key) without writing entirely separate (and gnarly) versions, there are only two unified interfaces provided by ANSI:

elt+length+loop (or do): naïve and simple, what’s actually used under the hood by iterate’s in-sequence driver. Also quadratic (thus extremely slow) on lists.
reduce: even simpler as it handles all the keywords for you (even :from-end) and much faster since any non-toy implementation internally dispatches over the actual sequence type. Still not ideal as it has the overhead of repeat funcall of your closure and you don’t have access to the often needed internal raw element (without :key applied) or iteration index.

So when I made a naïve max-elt (for this years’ Advent of Code) then was forced to rewrite it to use reduce, I said to myself I must build that Lovecraftian macro to handle the required nested monomorphization… so here’s the beast:

(defmacro do-sequence ((var seq &key (start 0) end key with-index with-raw-elt) &body body)
"Iterate on a sequence with the usual keywords available. :WITH-INDEX (resp. :WITH-RAW-ELT)
takes an unevaluated symbol to use as index (resp. element without :KEY applied).

NB: since iteration is done via `(LOOP ... :DO (LOCALLY ,@BODY)), said body can use RETURN and
contain declarations."
(once-only (seq start end key)
(macrolet ((list-impl (has-key-p has-end-p)
`(let ((ivar (or with-index   (gensym "IDX")))
(rvar (or with-raw-elt (gensym "RAW"))))
(declare (ignorable ivar rvar))
`(loop
,@(when (not ,has-key-p) `(:with ,rvar))                        ,@(when (or ,has-end-p with-index)
`(:for ,ivar :of-type (integer 0 #.array-dimension-limit)
:from ,start ,@(when ,has-end-p `(:below ,end))))
,@(if ,has-key-p
`(:for ,rvar :in ,seq
:for ,var := (funcall ,key ,rvar))
`(:for ,var :in ,seq))
:do (locally ,@body))))
(vec-impl (has-key-p)
`(let ((ivar (or with-index   (gensym "IDX")))
(rvar (or with-raw-elt (gensym "RAW"))))
(declare (ignorable ivar rvar))
`(loop
,@(when (not ,has-key-p) `(:with ,rvar))                        :for ,ivar :of-type (integer 0 #.array-dimension-limit)
:from ,start :below (or ,end (length ,seq))
,@(if ,has-key-p
`(:for ,rvar := (aref ,seq ,ivar)
:for ,var := (funcall ,key ,rvar))
`(:for ,var := (aref ,seq ,ivar)))
:do (locally ,@body)))))
`(etypecase ,seq
(list
(cond ((and ,key ,end) ,(list-impl t   t))
(,key            ,(list-impl t   nil))
(,end            ,(list-impl nil t))
(t               ,(list-impl nil nil))))
((simple-array * (*))           (if ,key
,(vec-impl t)
,(vec-impl nil)))
(vector
(if ,key
,(vec-impl t)
,(vec-impl nil)))))))

Except for the de facto standard once-only macro, it is ANSI CL compliant. You can copy it under the Zlib license, if you wish so.

Now, for some benchmarks! Here I compare the aforementioned max-elt core loop written with reduce then with do-sequence against large sequences of different types; needing both the index and raw element somewhat complicates the result and illustrate why I included :with-index and :with-raw-elt.

(defun max-elt-reduce (seq pred &key (start 0) end key)
(let* ((i    start)
(maxi start)
(max  (elt seq 0)))
(declare (type (integer 0 #.array-dimension-limit) i maxi))
(reduce (if key                 (lambda (kmax el)
(incf i)
(let ((kel (funcall key el)))
(if (funcall pred kel kmax)
(progn (setf maxi i max el) kel)
kmax)))
(lambda (kmax el)
(incf i)
(if (funcall pred el kmax)
(progn (setf maxi i max el) el)
kmax)))
seq
:start (+ start 1)
:end end
:initial-value (if key (funcall key max) max))
(values max maxi)))

(defun max-elt-do-seq (seq pred &key (start 0) end key)
(let* ((maxi start)
(rmax (elt seq 0))
(max  (if key (funcall key rmax) rmax)))
(do-sequence (el seq :start start :end end :key key :with-index i :with-raw-elt r)
(when (funcall pred el max)
(setf maxi i rmax r max el)))
(values rmax maxi)))


(defconstant +len+ 5000000)

(let ((l   (make-sequence 'list   +len+ :initial-element 0))
(sv  (make-sequence 'vector +len+ :initial-element 0))
(fsv (make-array +len+ :element-type 'fixnum :initial-element 0))
(fv  (make-array +len+ :element-type 'fixnum :initial-element 0 :adjustable t)))
(format t "Test,~A~%" (lisp-implementation-type))
(loop :for (args test) :in
`(((,l   ,#'>) "LIST")
((,l   ,#'> :start 100 :end ,(- +len+ 100) :key ,#'1+) "LIST (fiddly)")
((,sv  ,#'>) "SIMPLE-VECTOR")
((,fsv ,#'>) "(SIMPLE-ARRAY FIXNUM)")
((,fv  ,#'>) "(VECTOR FIXNUM)"))
:do (let ((tref (nth-value 1 (measure (apply #'max-elt-reduce args))))
(tnew (nth-value 1 (measure (apply #'max-elt-do-seq args)))))
(format t "~A,~,0F → ~,0F (~,0@F%)~%"
test (* tref 1000) (* tnew 1000) (* 100 (- (/ tref tnew) 1))))))

And here are the results on some popular implementations (durations in ms):

Test	SBCL	CCL	ECL
35 → 25 (+40%)	119 → 63 (+89%)	886 → 382 (+132%)	314 → 172 (+83%)
49 → 35 (+40%)	148 → 74 (+101%)	1113 → 588 (+89%)	383 → 325 (+18%)
42 → 33 (+27%)	143 → 84 (+69%)	858 → 510 (+68%)	338 → 238 (+42%)
41 → 34 (+21%)	144 → 86 (+67%)	897 → 484 (+85%)	337 → 237 (+42%)
52 → 35 (+49%)	180 → 123 (+47%)	866 → 519 (+67%)	339 → 249 (+36%)

Benchmarking details

Versions used: SBCL 2.5.11, CCL 1.13, ECL 24.5.10, CLISP 2.49.92 (bytecode compiler used)
Hardware: AMD 5900X, 64 GB DDR4
Software: Gentoo Linux, linux 6.12, gcc 15.2, glibc 2.41
optimize: (speed 3) (safety 0) (debug 0), speedups are comparable without, but used for the sake of benchmarking.

Well, isn’t that welcome? Both the performance and readability gains were worth the effort, in my opinion; especially outside of SBCL which seems to have a particularly well optimized reduce.

Beware code bloat and compilation slowdown though, such aggressive monomorphization of loop macros isn’t free; so don’t inline unless you intend on exploiting your compiler’s DCE pass.

Similar Posts