Abstract
Artificial intelligence (AI) has made significant strides towards efficient online processing of sensory signals at the edge through the use of deep neural networks with ever-expanding size. However, this trend has brought with it escalating computational costs and energy consumption, which have become major obstacles to the deployment and further upscaling of these models. In this Perspective, we present a neuro-inspired vision to boost the energy efficiency of AI for perception by leveraging brain-like dynamic sparsity. We categorize various forms of dynamic sparsity rooted in data redundancy and discuss potential strategies to enhance and exploit it through algorithm-hardware co-design. Additionally, we explore the technological, architectural, and algorithmic challen…
Abstract
Artificial intelligence (AI) has made significant strides towards efficient online processing of sensory signals at the edge through the use of deep neural networks with ever-expanding size. However, this trend has brought with it escalating computational costs and energy consumption, which have become major obstacles to the deployment and further upscaling of these models. In this Perspective, we present a neuro-inspired vision to boost the energy efficiency of AI for perception by leveraging brain-like dynamic sparsity. We categorize various forms of dynamic sparsity rooted in data redundancy and discuss potential strategies to enhance and exploit it through algorithm-hardware co-design. Additionally, we explore the technological, architectural, and algorithmic challenges that need to be addressed to fully unlock the potential of dynamic-sparsity-aware neuro-inspired AI for energy-efficient perception.
Introduction
In response to ever more complex and diverse perception tasks, AI models have grown substantially in both size and computational requirements. This trend follows empirical scaling laws[1](https://www.nature.com/articles/s41467-025-65387-7#ref-CR1 “Bourzac, K. Fixing AI’s energy crisis. Nature. https://doi.org/10.1038/d41586-024-03408-z
(2024).“), increasing the energy demands for training and inference. It poses a critical challenge to the deployment of AI models, particularly on edge platforms targeting applications such as mobile computing, smart wearables, and autonomous robots, where dynamic real-time interaction with the environment is necessary[2](https://www.nature.com/articles/s41467-025-65387-7#ref-CR2 “Zador, A. et al. Catalyzing next-generation Artificial Intelligence through NeuroAI. Nat. Commun. 14 https://doi.org/10.1038/s41467-023-37180-x
(2023).“),[3](https://www.nature.com/articles/s41467-025-65387-7#ref-CR3 “Bartolozzi, C., Indiveri, G. & Donati, E. Embodied neuromorphic intelligence. Nat. Commun. 13 https://doi.org/10.1038/s41467-022-28487-2
(2022).“).
This Perspective focuses on AI perception systems that process input from sensors of various modalities used for extracting information in natural scenes. These systems typically exploit neural networks consisting of convolutional and recurrent layers, and recently, more complex architectures like transformers. To deploy perception systems on energy-constrained hardware platforms, huge efforts have been made to reduce unnecessary computations within the networks, that is, to increase the compute sparsity, which will improve the energy efficiency of the corresponding hardware platforms.
Traditional approaches focus on what we term static sparsity—sparsifying network connections by applying pruning techniques[4](https://www.nature.com/articles/s41467-025-65387-7#ref-CR4 “LeCun, Y., Denker, J. & Solla, S. Optimal brain damage. In Advances in Neural Information Processing Systems, Vol. 2 https://proceedings.neurips.cc/paper_files/paper/1989/file/6c9882bbac1c7093bd25041881277658-Paper.pdf
(Morgan-Kaufmann, 1989).“). To further minimize the size and complexity of the model, pruning is often combined with other optimization techniques such as parameter quantization5 and neural architecture search[6](https://www.nature.com/articles/s41467-025-65387-7#ref-CR6 “Tan, M. & Le, Q. EfficientNet: rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, Vol. 97, 6105–6114 https://proceedings.mlr.press/v97/tan19a.html
(PMLR, 2019).“). Although these static sparsity methods have yielded substantial model-size reduction and inference acceleration (e.g., 2 × smaller and 1.8 × faster convolutional models for image recognition7), these approaches are inherently static and do not account for the characteristics of the actual data input during runtime. Recently, several data-driven dynamic sparsity approaches are on the rise, to further decrease the number of computations at runtime. Yet, this emerging field is still highly scattered, and opportunities for perception systems remain largely underexplored.
This Perspective therefore explores the various forms of dynamic sparsity, with a focus on context-aware sparsity, which seek to reduce computation based on the dynamic structure of the incoming data and the evolving context of a task, particularly for systems that operate in natural environments. This data-driven approach is inspired by the redundancy in the sensor and network output due to intrinsic spatiotemporal correlations of natural stimuli as we will discuss further in the next section. Rather than processing every component of the model for every input sample, a system employing dynamic context-aware sparsity would be selectively activated by the input, and would then execute the network computations and memory accesses only when needed. This concept is inspired by biological brains, which operate under strict energy budgets with tight latency constraints, and have evolved to process information in an adaptive, context-dependent manner.
While spiking neural networks (SNNs) operating on data from event-based sensors serve as the prototypical example, we will demonstrate that the concept of dynamic sparsity is much more general and broadly applicable across neural network architectures beyond SNNs. For example, transformers8,9, the current workhorses of large-scale foundation models, exploit a form of data-driven dynamic attention. Here, the self-attention mechanism takes into account some contextual information from the token sequence. Typical transformers, however, execute this attention mechanism in a dense fashion, primarily towards increased accuracy rather than reduced computation counts. They can also benefit from reduced computation by using the sparse dynamic outputs of event-based vision sensors10. However, they leave a lot of margin for further exploiting dynamic sparsity in a data-driven and context-aware fashion, as nature does.
This Perspective outlines the broad potential of dynamic sparsity as a key enabler of the next wave of energy-efficient intelligent perception. We draw on biological insights to demonstrate how the brain exploits dynamic sparsity in various ways, present a taxonomy of the sparsity types, then explore how dynamic sparsity can be introduced at multiple algorithmic and hardware levels through both sparsity-enhancing and sparsity-exploiting techniques. Additionally, we examine open challenges in architectures, algorithms, and technologies, as well as potential applications for future exploration and innovation in dynamic-sparsity-aware, neuro-inspired AI systems. In particular, we focus on the opportunities arising from dynamic sparsity for artificial neural networks (ANNs), where we identify greater potential benefits than for SNNs, which already have closer connections to biology.
Neural inspiration
Animals can only sustain themselves with the amount of energy they can forage[11](https://www.nature.com/articles/s41467-025-65387-7#ref-CR11 “Allman, J. Evolving Brains https://www.goodreads.com/book/show/1633356.Evolving_Brains
(Scientific American Library, New York, 2000).“), making energy efficiency crucial for survival. Consequently, the brain’s computation must be highly energy-efficient. This demand for efficiency suggests that neurons in the brain must fire sparsely, since spike generation accounts for more than 50% of brain energy consumption12. Various estimates indicate that the average firing rate of cortical neurons is approximately 1 Hz (Box 1). The sparse firing of neurons can be directly observed in an example calcium imaging recording of brain slices, as shown in Fig. 1A.
Fig. 1: Examples of dynamic sparsity.
A Sparse spiking activity (arrows) observed through calcium imaging of a brain slice from mouse frontal cortex (courtesy R. Loidl). B Muybridge’s 1878 Horse in Motion sequence repeats nearly exactly the same information across frames, albeit with severe aliasing. C Driving sequence is dense and highly repetitive; the critical pixels with the child (circled) are a tiny fraction of total. D Example used by J. Hopfield in his Caltech teaching of forming attentional expectation bias in language that sparsifies subsequent inference.
The sparsity of neural activity suggests that the brain uses sparse firing patterns to encode information, a concept known as sparse coding13. Theoretical and experimental evidence supports this principle across various sensory modalities, including vision14, audition15, and olfaction16. Sparse coding is consistent with the redundancy-reduction hypothesis17, which postulates that sensory systems aim to preserve essential information while discarding redundant input. Natural scenes, such as a horse in motion illustrated in Fig. 1B, exhibit high spatiotemporal redundancy: most pixels change little over time, and nearby pixel values are highly correlated. Therefore, encoding only the spatiotemporal changes drastically reduces the number of spikes required to represent the stimuli17.
Another important property of nervous systems is their statefulness. Neurons maintain localized states through a variety of mechanisms such as synaptic connections, neuron membrane potentials, calcium concentrations, and many other localized, time-varying state variables18,19. These states—distributed at different synapses, neurons, and brain areas—allow biological neural networks to integrate sensory information across a range of temporal and spatial scales, forming context-aware models of the environment. This stateful computation approach enables efficient processing: rather than recalculating everything from scratch, the brain updates only what is necessary based on its current state using sparse local communication.
While modern AI models do employ states—such as hidden states in recurrent neural networks (RNNs)20, KV cache in Transformers21, and long-term memory banks in memory-augmented models22—they typically process all inputs and all model components densely at each inference step. This dense processing undercuts the potential gains from statefulness by incurring high energy and latency costs. In contrast, the brain performs selective and sparse updates, often triggered by surprise or salient stimuli.
Two key mechanisms have been proposed to explain how the brain maintains sparse activity and energy-efficient inference: predictive coding and attention-based gating. Predictive coding23 posits that the brain actively generates top-down predictions of the incoming stimuli and compares them with the actual inputs. The predictions are then updated by the bottom-up error signals. This feedback process allows the brain to focus its processing resources on unexpected inputs (surprise). For example, in a driving scene (Fig. 1C), the background motion is highly predictable, whereas a child suddenly running across the street generates a significant prediction error, rapidly engaging sensory processing and motor response. Fig. 1D illustrates the consequence of such a predictive model, where the prior established by the first sentence biases the interpretation of “flies” in the second sentence, necessitating a reset. Nevertheless, this bias dynamically lowers the inference cost and latency by constraining the search space.
In parallel, attention mechanisms24 serve as top-down processes that prioritize relevant inputs and modulate the activation of various computational pathways. This form of selective processing constitutes a coarse but powerful implementation of dynamic sparsity. By focusing only on salient information, attention enables the brain to allocate resources more effectively and reduce overall processing cost.
Figure 2 shows an example of embedding neuro-inspired dynamic sparsity in a vision sensor. Retinal circuits respond primarily to changes in the visual field25, and event camera pixels26 mimic this behavior by producing output events only when brightness changes above a certain threshold occur. These neuromorphic sensors generate sparse, low-latency event streams that better capture dynamic visual information without the redundancy of frame-based input, offering substantial advantage in terms of latency, temporal resolution, energy efficiency, and dynamic range27.
Fig. 2: Dynamic sparsity in neuromorphic vision sensors.
A The three layers of the biological retina25. Left to right: photoreceptors, bipolar cells, and ganglion cells. B Silicon implementation of the retina cells in a neuromorphic event camera pixel26. C Comparison of dense frames (top) and sparse brightness-change events (bottom) from a spinning dot stimulus, recorded by a hybrid vision sensor152.
The neural foundation of dynamic sparsity as well as its demonstrated effectiveness in neuromorphic vision sensors, motivate the exploration of its broader applications in energy-efficient AI. To connect insights from neuroscience with the recent progress in various fields—such as neuromorphic engineering, deep learning, and domain-specific accelerators—and to systematically frame the key design considerations for implementing this principle, the next section elaborates a necessary taxonomy of dynamic sparsity.
Types of dynamic sparsity
Sparsity plays a crucial role in both biological and artificial perception systems. By eliminating non-informative redundancy, sparsity reduces unnecessary computation and communication, thereby shortening processing latency and lowering energy consumption. Depending on whether the eliminated redundancy is data-dependent, sparsity in perception systems can be broadly classified into two categories: static sparsity (Fig. 3A) and dynamic sparsity (Fig. 3B).
Fig. 3: Taxonomy of sparsity. Sparsity is classified based on whether it is data-dependent.
A Static sparsity is fixed and leads to a static processing flow. Weight sparsity, commonly employed in neural network compression, falls into this category. B Dynamic sparsity is data-dependent and leads to a dynamic processing flow. Rooted in data redundancy, it can be further categorized based on its dimension, structuredness, and statefulness. C Dynamic sparsity can be spatial, temporal, or spatiotemporal, depending on the dimension along which the information redundancy is exploited. D Dynamic sparsity can be either structured or unstructured, depending on whether such sparsity should satisfy any spatial or temporal structural constraints. E Dynamic sparsity can be either stateless or stateful, depending on whether extra memory or states are employed to induce sparse representations from dense representations.
Static sparsity exploits predetermined and fixed redundancy, resulting in a fixed processing flow during perception. Methods for obtaining static sparsity include fixed duty cycling of sensors28, using a preset camera region of interest, as well as pruning the weights of a neural network29. Although static sparsity effectively reduces computation and data movement demand for a given task, it enforces an identical processing flow regardless of input data variations. This fixed connectivity map can potentially miss out on further data-dependent optimization as discussed next.
Dynamic sparsity, in contrast, leverages data-dependent redundancy. Box 2 provides a formal definition of dynamic sparsity. Our definition of dynamic sparsity is distinct from a class of network pruning methods known as dynamic pruning30 or dynamic sparse training31,32,33,34. Although these methods dynamically adjust the sparse neuron connectivity during training, the sparsity is fixed once the training is completed (i.e., during inference). In contrast, we focus on algorithms and hardware designs targeting sparse computational flow that can dynamically change in a data-driven fashion during inference.
Prior works that have discussed and incorporated various forms of dynamic sparsity are often applied to solve specific, isolated problems, resulting in a fragmented landscape. For example, some works focus exclusively on activation sparsity in convolutional neural networks (CNNs) (e.g., skipping zero-valued ReLU outputs35,36, dynamic channel and activation pruning during inference[37](https://www.nature.com/articles/s41467-025-65387-7#ref-CR37 “Gao, Y., Zhang, B., Qi, X. & So, H. K.-H. DPACS: hardware accelerated dynamic neural network pruning through algorithm-architecture co-design. In Proceedings of the ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 237–251 https://doi.org/10.1145/3575693.3575728
(Association for Computing Machinery, New York, NY, USA, 2023).“)) or subnetwork gating for large language models (LLM) (e.g., Mixture of Experts (MoE)[38](#ref-CR38 “Shazeer, N. et al. Outrageously large neural networks: the sparsely-gated mixture-of-experts layer. In International Conference on Learning Representations https://openreview.net/forum?id=B1ckMDqlg
(2017).“),39,40, and speculative decoding[41](#ref-CR41 “Xia, H. et al. Unlocking efficiency in large language model inference: a comprehensive survey of speculative decoding. In ACL (Findings), 7655–7671 https://aclanthology.org/2024.findings-acl.456
(2024).“),[42](#ref-CR42 “Spector, B. F. & Re, C. Accelerating LLM inference with staged speculative decoding. In Workshop on Efficient Systems for Foundation Models at ICML 2023 https://openreview.net/forum?id=RKHF3VYjLK
(2023).“),[43](https://www.nature.com/articles/s41467-025-65387-7#ref-CR43 “Liu, X. et al. Online speculative decoding. In Proceedings of the International Conference on Machine Learning, Vol. 235, 31131–31146 https://proceedings.mlr.press/v235/liu24y.html
(2024).“)), while others explore stateful temporal sparsity in RNNs (e.g., delta networks[44](https://www.nature.com/articles/s41467-025-65387-7#ref-CR44 “Neil, D., Lee, J. H., Delbruck, T. & Liu, S.-C. Delta networks for optimized recurrent network computation. In Proceedings of the International Conference on Machine Learning, Vol. 70, 2584–2593 https://proceedings.mlr.press/v70/neil17a.html
(PMLR, 2017).“),[45](https://www.nature.com/articles/s41467-025-65387-7#ref-CR45 “Gao, C., Neil, D., Ceolini, E., Liu, S.-C. & Delbruck, T. DeltaRNN: a power-efficient recurrent neural network accelerator. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 21–30 (Association for Computing Machinery, 2018). https://dl.acm.org/doi/abs/10.1145/3174243.3174261
.“)). These various forms of dynamic sparsity have rarely been analyzed within a unified framework. While existing surveys on dynamic neural networks46 or ephemeral sparsification29 summarized the algorithmic aspects of dynamic sparsity within neural networks, a systematic treatment of dynamic sparsity for intelligent perception systems, encompassing both algorithm design and hardware optimization throughout the entire processing chain, is still missing.
As a first step towards a more unified view and to encourage a more holistic approach to system design, we categorize dynamic sparsity along three independent yet interrelated aspects: sparsity dimension, structuredness, and statefulness. This taxonomy of dynamic sparsity is applicable throughout the perception pipeline, from the sensory periphery and early feature extraction to multi-modal integration towards higher-level decision-making. The taxonomy presented here is based on pixel- or neuron-level dynamic sparsity, and we extend it later to a coarser granularity when discussing system-level dynamic sparsity.
Dimension of dynamic sparsity
Spatial sparsity (Fig. 3C, left) refers to the sparse activity of a collection of neurons or pixels within a time window. It originates from information redundancy along the spatial dimensions. Examples of spatial redundancy are zero-values in the feature maps in CNNs35,36, the sparsely firing channels/pixels/taxels of event-driven neuromorphic sensors26,47,48, and the similarity between spatially neighboring pixels49 or neurons50 at a given time point.
Temporal sparsity (Fig. 3C, right) refers to the sparse activity of a single neuron or pixel over time. It takes advantage of information redundancy in the temporal dimension. Examples of temporal redundancy are the predominance of environmental noise for speech processing tasks51, the spectral similarity between neighboring audio frames52, the slow variation of neuron activation over time[44](https://www.nature.com/articles/s41467-025-65387-7#ref-CR44 “Neil, D., Lee, J. H., Delbruck, T. & Liu, S.-C. Delta networks for optimized recurrent network computation. In Proceedings of the International Conference on Machine Learning, Vol. 70, 2584–2593 https://proceedings.mlr.press/v70/neil17a.html
(PMLR, 2017).“),[45](https://www.nature.com/articles/s41467-025-65387-7#ref-CR45 “Gao, C., Neil, D., Ceolini, E., Liu, S.-C. & Delbruck, T. DeltaRNN: a power-efficient recurrent neural network accelerator. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 21–30 (Association for Computing Machinery, 2018). https://dl.acm.org/doi/abs/10.1145/3174243.3174261
.“), and the dynamically gated neuron updates in RNNs53,[54](https://www.nature.com/articles/s41467-025-65387-7#ref-CR54 “Cheng, L., Pandey, A., Xu, B., Delbruck, T. & Liu, S.-C. Dynamic gated recurrent neural network for compute-efficient speech enhancement. In Interspeech, 677–681 https://doi.org/10.21437/Interspeech.2024-958
(ISCA, 2024).“).
Spatial and temporal dynamic sparsity are not mutually exclusive. In fact, many stimuli exhibit redundancy in both space and time, leading to spatiotemporal sparsity. For example, in the driving scene shown in Fig. 1C, the relevant objects, such as vehicles and pedestrians, are normally located in the bottom half of the camera view, while the top half can be mostly regarded as background and ignored, exhibiting spatial sparsity. Meanwhile, the movement of the vehicles or the traffic lanes are highly predictable, exhibiting temporal sparsity. Spatiotemporal sparsity can be directly visualized in Fig. 2C, where the sparse brightness-change events create a helix in spacetime.
Structuredness of dynamic sparsity
Unstructured sparsity (Fig. 3D, left) allows for arbitrary patterns of inactive neurons. There is no restriction as to which neurons can be active or inactive at any moment. Many neuromorphic sensors26,47,48 as well as spiking55,56 and non-spiking35,57 neural network accelerators, utilize unstructured sparsity. Without structural constraints, it provides the finest sparsity granularity and maximum flexibility in skipping useless computations.
Structured sparsity (Fig. 3D, right), on the other hand, requires the sparse elements to have regular patterns. In general, this entails grouping the neurons so that those within the same group are all active or inactive simultaneously. The neuron grouping defines the granularity of structured sparsity. Example groupings are locally neighboring elements58, entire rows or columns59, CNN feature maps60, and all neurons in the same layer61. Such regularity allows for more efficient hardware implementations compared to unstructured sparsity.
Statefulness of dynamic sparsity
Stateless sparsity (Fig. 3E, left) does not require any internal states to induce sparse representations from dense representations. It relies solely on the instantaneous input to identify redundant operations and determine the sparse computational pattern. Skipping zero activation values in a neural network35,36 provides a canonical example of stateless sparsity.
Stateful sparsity (Fig. 3E, right) derives the sparse representation by taking into account not only the current input but also an internal state variable that encodes the past inputs. Examples that incorporate stateful sparsity are spiking neuron models implemented in neuromorphic spiking sensors26,47,48 and SNN processors55,56. Notably, the highly sparse computation in the brain is inherently stateful due to its complex dynamics, suggesting the potential advantage of stateful sparsity over stateless sparsity.
Dynamic sparsity enhancing and exploitation techniques
The brain’s ability to induce and exploit dynamic sparsity has long inspired designers of intelligent perception systems, be it robots, wearables, or smart spaces. In this section, we review these state-of-the-art techniques in light of the proposed taxonomy and identify the key design considerations for leveraging dynamic sparsity. As shown in Fig. 4, dynamic sparsity can be incorporated within the three major components of an intelligent perception system, namely, the sensor, memory, and neural-compute subsystems. In addition, it can also be applied at the system level, which involves dynamically activating the entire modules or subsystems.
Fig. 4: Enhancing and exploiting dynamic sparsity in perception systems.
A The sensor subsystem uses methods such as sparse coding and event-based representation to suppress data redundancy at the very first stage of perception. B The memory subsystem provides storage for both activations and weights. It exploits dynamic sparsity by reducing the data traffic to and from the memory, using methods such as activation compression, sparse in-memory computing (IMC), and weight access skipping. C The neural-compute subsystem enhances dynamic sparsity through both stateless (e.g., ReLU) and stateful approaches (e.g., spiking network and delta network). Techniques such as zero-gating and zero-skipping exploit the induced sparsity. D At the system level, dynamic sparsity brings further energy savings by de-activating and gating entire system modules.
Sensor subsystem
Exploiting dynamic sparsity at the sensor subsystem—the very first stage of the processing pipeline—offers a significant advantage in terms of system-level energy and latency, as much less sensory data needs to be transmitted or processed by the subsequent stages62. Both stateful and stateless techniques can be applied to substantially improve energy efficiency and reduce the burden on later processing stages.
The most widely used stateless methods for initial sparsification include sparse coding and vector symbolic architecture (VSA) (also known as hyperdimensional computing). Sparse coding aims to represent input signals using an overcomplete set of basis vectors, ensuring that only a few coefficients are nonzero, so the data representation is highly sparse13. Similarly, VSA employs high-dimensional sparse vectors to encode information, naturally promoting sparsity in the activation space by emphasizing zero-valued components in representations63. While these stateless techniques effectively exploit the instantaneous sparsity of the original signal, they are inherently limited in exploiting temporal correlation as they lack internal states to memorize previously seen input patterns.
Stateful methods can be employed to achieve higher sparsity levels. These methods leverage past information or spatial correlations to encode the data more efficiently. Compared to stateless methods, stateful methods are particularly effective in natural environments where input signals have strong spatiotemporal correlations, because they dynamically adapt to the characteristics of the signals. Using these correlations, algorithms can drastically reduce power consumption and bandwidth requirements, making them ideal for resource-constrained perception tasks.
A prominent example, shown in Fig. 2, is the neuromorphic dynamic vision sensor (DVS)26, also known as the event camera27. In DVS, pixels use delta modulation64 to remove temporal redundancy by asynchronously quantizing temporal changes in scene brightness (the logarithm of intensity) as ternary ON/OFF events that encode the location, time, and brightness-change polarity. After each event, the current brightness is stored in the pixel on a capacitor to detect the next change. The sparse event-based camera output enables the subsequent neural network to selectively process only reflectance changes on an event-by-event[65](https://www.nature.com/articles/s41467-025-65387-7#ref-CR65 “Messikommer, N., Gehrig, D., Loquercio, A. & Scaramuzza, D. Event-based asynchronous sparse convolutional networks. In Proceedings of the European Conference on Computer Vision, 415–431 https://doi.org/10.1007/978-3-030-58598-3_25
(Springer-Verlag, 2020).“) or patch-by-patch basis10. A simple scheme, such as processing accumulated event frames only when event counts reach a few thousand, can effectively save idle computation without compromising latency[66](https://www.nature.com/articles/s41467-025-65387-7#ref-CR66 “Moeys, D. P. et al. Steering a predator robot using a mixed frame/event-driven convolutional neural network. In International Conference on Event-based Control, Communication, and Signal Processing, 1–8 https://ieeexplore.ieee.org/abstract/document/7605233/
(ieeexplore.ieee.org, 2016).“). To further increase the output sparsity, spatial filtering before the temporal delta modulation removes spatial redundancy49. Although maintaining the states requires extra circuit area and energy, the resulting enhancement in output sparsity can reduce the response latency to sub-millisecond under most illumination conditions26, sensor output bandwidth by more than 100×67 and the computational burden on subsequent stages by 20×[65](https://www.nature.com/articles/s41467-025-65387-7#ref-CR65 “Messikommer, N., Gehrig, D., Loquercio, A. & Scaramuzza, D. Event-based asynchronous sparse convolutional networks. In Proceedings of the European Conference on Computer Vision, 415–431 https://doi.org/10.1007/978-3-030-58598-3_25
(Springer-Verlag, 2020).“) compared to frame-based cameras. Advances in image sensor wafer stacking have reduced the complex pixel size to only a few times that of standard frame-based imagers68.
Another example of a stateful spiking sensor is the neuromorphic silicon cochlea[47](https://www.nature.com/articles/s41467-025-65387-7#ref-CR47 “Liu, S.-C., van Schaik, A., Minch, B. A. & Delbruck, T. Asynchronous binaural spatial audition sensor with 2 × 64 × 4 channel output. IEEE Trans. Biom