Introduction
Biological brains have evolved to be highly energy efficient1. Mimicking this energy efficiency in machine intelligence systems remains a major challenge. One of the brain’s strategies for saving energy is to represent and communicate information with discrete, binary voltage pulses called “spikes”2,[3](https://www.nature.com/articles/s41…
Introduction
Biological brains have evolved to be highly energy efficient1. Mimicking this energy efficiency in machine intelligence systems remains a major challenge. One of the brain’s strategies for saving energy is to represent and communicate information with discrete, binary voltage pulses called “spikes”2,3. Neuromorphic engineering approaches based on spiking neural networks have successfully harnessed this idea to achieve large gains in energy efficiency4,5. A second strategy used by biology is to represent, e.g., sensory information using particularly efficient codes that avoid encoding redundant information and require the transmission of only few spikes per input6,7. This saves precious energy since neuronal spiking is a major source of energy consumption8,9. Predictive coding approaches take this idea to the extreme. In classic predictive coding models, all sensory information that can already be predicted by higher stages in a processing hierarchy is discarded and only prediction errors are transmitted to the higher stages10,11. While this is an elegant theoretical idea that has triggered much neurobiological and theoretical research12,13,14,15,16,17,18,19, it remains unclear whether it is actually realized in biological brains and how it might be implemented at the level of spiking neural networks20,21,22,23.
Here we propose Predictive Coding Light (PCL), an alternative to conventional predictive coding, and offer a concrete implementation in an artificial spiking neural network. Instead of comparing bottom-up inputs to top-down predictions and transmitting only prediction errors to higher levels, PCL learns to suppress the most predictable spikes via recurrent and top-down inhibitory connections and passes a compressed representation of the sensory input to higher processing stages. In this sense it is a “light” version of predictive coding that may be easier to reconcile with neurophysiological evidence and offer advantages for the engineering of neuromorphic information processing systems.
To test these ideas, we propose a concrete instantiation of PCL in the form of a hierarchical spiking neural network. We train this PCL network on input from an event-based vision sensor that represents visual information as an asynchronous stream of binary events. When trained on natural images, units in the PCL network develop tuning properties similar to those of simple and complex cells in primary visual cortex of mammals. Furthermore, the PCL network exhibits surround suppression, orientation-tuned suppression and cross-orientation suppression, three widely studied non-classical receptive field effects that are considered signatures of predictive coding in the brain12,24. PCL explains these effects at conceptual and mechanistic levels in terms of saving energy by inhibiting the most predictable spikes in a recurrent hierarchical network architecture. Finally, we demonstrate that the PCL network permits reduced spiking activity with only little loss in performance on challenging downstream classification tasks. Overall, our work proposes an alternative approach to predictive coding and how it might be implemented in the brain and shows how it can be harnessed in neuromorphic spiking neural networks solving challenging machine vision problems.
Results
The architecture of the PCL network is shown in Fig. 1. The network takes input from an event camera that generates two types of events: ‘ON’ events signal brightness increases and ‘OFF’ events signal brightness decreases. These events feed into a “simple cell layer,” which in turn projects to a “complex cell layer”. These layers are inspired by the presence of simple and complex cells in the primary visual cortex of the mammalian visual system. All feedforward connections in the PCL network are excitatory. This feedforward connectivity resembles that of a standard convolutional neural network. In contrast, all recurrent or feedback connections are inhibitory. These inhibitory connections come in three types: short-ranging and long-ranging lateral connections and top-down connections. The short-ranging lateral inhibitory connections act similar to a Winner-Take-All mechanism25. Specifically, units in a layer can inhibit other units that receive inputs from the same retinal location, in order to promote the learning of diverse features. In contrast, the long-ranging lateral inhibitory connections affect distant units that receive inputs from different retinal locations. Finally, the top-down inhibitory connections consist of feedback projections of units in a higher layer to the units of their input zone in the lower layer. The long-ranging lateral and top-down inhibitory connections promote the learning of a spike removal scheme that tends to suppress the most predictable spikes. All connections are trained with a generic unsupervised spike timing-dependent plasticity rule (see Methods).
Fig. 1: Predictive Coding Light (PCL) network. a Top: Illustration of a two-layer PCL network comprising a “simple cell” (black) and a “complex cell” (blue) layer driven by visual input from an event-based vision sensor. Feedforward connections (green) are excitatory while feedback and recurrent connections (orange) are inhibitory. All connections are learned via unsupervised spike timing-based learning rules. bottom: Detailed network architecture illustrating sizes of receptive fields and numbers of feature channels. b Suppression of predictable spikes through learnt inhibitory connections. The top panel illustrates a hypothetical visual input comprising a vertical bar moving from left to right. The leading and trailing edge of the bar create “on” (green) and “off” (red) events on the sensor array, respectively. At different time points, these events drive activity in three simple cells with different receptive field locations (colored boxes). The bottom panel illustrates the expected behavior of the neurons’ membrane potentials for this input. The appearance of the bar is assumed unexpected and reliably drives firing in the green neuron. In contrast, strong activation of the middle neuron (purple) is highly predictable and has caused the growth of a strong inhibitory connection onto this neuron. This inhibition manages to cancel the predictable spike (thereby rendering it less predictable). Finally, the black neuron, not being under the influence of strong inhibition, generates a spike.
Inhibitory spike timing-dependent plasticity removes the most predictable spikes
The central idea behind Predictive Coding Light is that inhibitory STDP (iSTDP) permits the network to learn to suppress the most predictable spikes. Although the iSTDP rule (see Methods) does not put an explicit constraint on the predictability of any spikes, it naturally gives rise to the suppression of the most predictable spikes via inhibitory synapses.
Specifically, if a presynaptic neuron tends to spike shortly before a postsynaptic one, the causal iSTDP rule will grow the inhibitory synapse between the two neurons. Once this synapse becomes strong enough it will become effective at suppressing the predictable spikes of the postsynaptic neuron. At the network level, iSTDP will come to capture the statistics of neural firing and allow the removal of the most predictable spikes.
In order to illustrate this idea, we realize a simple experiment with just 4 neurons (see Fig. 2a). In this experiment, the presynaptic neuron (black) is driven by excitatory inputs from a Poisson spike train and has inhibitory synapses onto three other neurons (red, green, blue). These neurons are partly driven by the same Poisson input, albeit with a small delay of 1 ms, such that 90%, 50%, and 10% of their spikes, respectively, are predictable from the black neuron’s spike train (see Methods). The inhibitory synapses from the black neuron to the other 3 neurons are initialized to the same value and evolve according to the causal STDP rule.
Fig. 2: An inhibitory STDP rule leads to the removal of the most predictable spikes in a spiking neural network. a Experimental protocol. 4 neurons named “neuron 0”, “neuron 1”, “neuron 2” and “neuron 3” receive Poisson spike trains as input. Neuron 0 is considered the “main neuron” that receives fully unpredictable Poisson spike train inputs, while other neurons exhibit redundancy with this initial spike train (90% for neuron 1, 50% for neuron 2 and 10% for neuron 3). Neuron 0 grows inhibitory connections to neurons 1 to 3 that are learnt via an STDP rule on 140 samples for 10 epochs, while the excitatory weights are set to a same constant value of 15 mV with a neural threshold at 10 mV. The inhibitory connections undergo competition through L1-normalization (see Methods). b Evolution of the inhibitory synaptic weights during training averaged across 3 networks. The shaded area denotes the standard error. c Raster plot of the network’s responses without inhibition (top) and with inhibition (bottom) on an example test sample. Responses of neurons 1 and 2 are significantly suppressed when inhibition is present, contrary to the case without inhibition. d Suppression of neurons 1-3 averaged over 3 networks. Error bars denote the standard error.
The network is trained on 140 samples of spike trains of one second length and tested on 35 additional samples (see Methods). The iSTDP rule causes the inhibitory synapses to grow stronger the higher the predictability of the postsynaptic spikes (Fig. 2b). The resulting inhibition onto neurons that exhibit highly predictable spikes can lead to the suppression of these spikes when the inhibition is strong enough to overcome the excitatory driving input (Fig. 2c). Overall, this leads to some highly predictable spikes being removed from the network (Fig. 2d).
Simple and complex cell-like response properties arise from unsupervised learning on natural images
Given its biological inspiration, we wondered if cells in the PCL network might reproduce findings about simple and complex cells in visual cortex. To this end we train the network on a database of natural images (Fig. 3a, see Methods for details). After training, we observe the formation of simple and complex cell-like receptive fields in the network. Most simple receptive fields are well fit by Gabor functions (see examples in Fig. 3b and analysis in Supplementary Fig. 1a, b). Figure 3b (top right-hand side) visualizes learnt complex cell receptive fields. For each complex cell we plot the set of simple cell receptive fields that have the strongest connection to this complex cell from a particular retinal location. The brightness of the simple cell receptive field indicates the strength of the connection. Complex cells tend to pool responses from simple cells with similar orientation and frequency preferences (Supplementary Fig. 1c, d). In contrast, the phase and polarity of the pooled simple cell receptive fields is quite variable, as would be expected for biological complex cells.
To quantify the tuning properties of the PCL network’s simple and complex cells, we stimulate the network with sinusoidal counterphase gratings (Fig. 3a, for details see Methods) and record the neural responses. Fig. 3c and d compare the tuning properties of simple and complex cells in the PCL network to responses of the classic Gabor filter26,27 and energy models28, respectively, which are considered the standard model of biological simple and complex cells29.
Overall, both simple and complex cells in the PCL network show qualitatively similar tuning properties as their respective standard models. In particular, both simple and complex cells show clear tuning for orientation and spatial frequency. Importantly, however, only simple cells are tuned for a particular spatial phase, while the responses of the complex cells are approximately phase invariant. This phase invariance is a defining feature of biological complex cells and may reflect a general mechanism of invariance formation that ultimately leads to position-invariant object recognition in higher visual cortical areas. Similarly, complex cells exhibit the so-called “frequency-doubling” effect relative to simple cells when stimulated with sinusoidal counterphase gratings (see the response over time in Fig. 3c, d). This is another well-known reflection of the phase invariance of biological complex cells. The complex cells in the PCL network acquire these properties through learning via the STDP rule when exposed to natural images.
PCL network reproduces surround suppression, orientation-tuned suppression and cross-orientation suppression effects
Biological simple cells are known to exhibit different classical and nonclassical receptive field effects. For example, when a neuron’s classical receptive field and its surround are excited simultaneously, simple cells exhibit so-called surround suppression30 and orientation-tuned suppression31. Rao and Ballard’s classic predictive coding model has offered an explanation for such phenomena10, but it remains open if spiking predictive coding approaches including PCL exhibit similar effects. Therefore, we tested if the PCL network exhibits similar effects. Testing the PCL network with test stimuli for surround suppression, orientation-tuned suppression, and cross-orientation suppression (Fig. 4a, see Methods for details), we find that it qualitatively reproduces experimental findings from primary visual cortex (Fig. 4b–d). First, simple cells in the PCL network exhibit surround suppression when stimulated by gratings of increasing size (Fig. 4b). Second, simple cells also exhibit cross-orientation suppression when confronted with a superposition of gratings of different orientations (Fig. 4c). Third, simple cells also exhibit orientation-tuned suppression when stimulated with gratings of different orientation in their classical receptive field vs. surround (Fig. 4d).
Fig. 4: Predictive Coding Light gives rise to surround suppression effects as observed in primary visual cortex. a Test stimuli and evaluation protocol. A PCL network is excited with gratings of increasing size to reveal surround suppression, gratings of distinct orientations between center and surround for orientation-tuned suppression and a superposition of gratings of distinct orientations for cross-orientation suppression. b Surround suppression in a PCL network compared to data from macaque V1 (reproduced from30). Error bars denote one standard error across trials. Responses are averaged over all 64 simple cells and 2 trials. c Orientation-tuned suppression in PCL network compared to data from cat V1 (reproduced from31). d Cross-orientation suppression in PCL network compared to data from cat V1 (reproduced from58). Rightmost panels in (b–d) show simple cell responses for different types of inhibitory connections being either present or absent.
We wondered which of the inhibitory connection types in the PCL network contributes how much to these effects. To investigate this, we selectively disabled one or the other connection type (Fig. 4b-d (right)). The results reveal that all inhibitory connection types contribute to the reduction in network activity. Abolishing the distant lateral inhibition has only a mild effect on the results in all stimulus conditions (compare blue vs. red curves). In contrast, results can change drastically if the top-down inhibition is removed (orange curves in b,c. The local lateral inhibition also plays a major role (compare green and yellow curves in c).
Next, we wondered to what extent suppression effects may occur in the complex cells of the PCL network. Interestingly, we found that complex cells also exhibit orientation-tuned and cross-orientation suppression but not surround suppression (Supplementary Fig. 2). Overall, responses of complex cells are reduced due to the distant lateral and top-down inhibition onto simple cells that drive them. However, a complex cell’s response does not diminish due to excitation of its nonclassical receptive field, as would be expected in surround suppression. This is likely due to the absence of distant lateral and top-down inhibitory inputs onto complex cells, which would be present in deeper instantiations of the PCL network, however.
Predictive Coding Light establishes a favorable energy information trade-off
By removing the most predictable and therefore most redundant spikes, PCL aims to save energy with only moderate loss of information. To characterize the network’s ability to trade off reduction in spiking activity (which we use as a proxy for energy savings) against information loss, we test the PCL network on two visual classification problems, the DVS128 gesture benchmark32 and the N-MNIST problem33 (Fig. 5a, b). We compare the PCL network to a version where inhibitory connections have been disabled and instead the same number of spikes are removed randomly (Fig. 5b) to reveal if PCL retains more information for the same energy expenditure (number of spikes).
Fig. 5: PCL achieves substantial reduction of spiking with only small to moderate loss of downstream task performance. a Example of event-based inputs from the DVS128 Gesture dataset and the N-MNIST dataset. b Experimental protocol. The inhibitory connectivity in the PCL network is fine-tuned on inputs from the respective datasets. Afterwards an SVM is trained to classify the visual input sequences from the spike patterns of the simple cells. c Removal of most predictable spikes in the PCL network vs. random removal of the same number of spikes in the control condition. d Effect of different amounts of inhibition on classification accuracy (left y-axis) for gesture (top) and N-MNIST datasets (bottom). 100% of inhibition corresponds to our default parameters. Grey bars show the median number of spikes (right y-axis). Results show averages over 3 networks with inhibition trained from different random seeds. Shaded regions denote one standard error across networks.
We begin with the PCL network trained on natural images. We fine-tune its distant lateral and top-down inhibition on the two classification datasets. Then we train a classifier (linear support vector machine, see Methods) on the (binned) spike trains of the PCL network’s simple (and/or complex) cells. By training on the natural images, the PCL network learns visual representations that improve classification accuracy for both simple and complex cells (Supplementary Fig. 3).
We systematically vary the amount of inhibition by scaling the fine-tuned inhibitory weights to reveal the PCL network’s trade-off between energy savings and information loss. Figure 5d confirms that more inhibition reduces spiking activity in the PCL network, down to a lower bound. More importantly, the classification results reveal that the PCL network consistently outperforms the random spike removal scheme. For Gestures, performance remains high over the entire range of inhibition levels tested, while for N-MNIST, performance drops gradually with increasing levels of inhibition (Fig. 5d). We also consider classifiers trained on the activity of only the complex cells or the combination of simple and complex cells (Supplementary Fig. 4). These experiments confirm that for the N-MNIST dataset, higher levels of inhibition tend to gradually impair classification performance, whereas for gestures, substantial reductions in spiking activity are possible with only marginal loss of performance. They also reveal that classification based on the highly compressed complex cell representations is worse than that for simple cells, while classification based on the combined simple and complex cell activities can outperform classification based on simple cell activities alone.
We also test the effect of different bin widths when binning the spike trains (see Methods) on classification performance. For the N-MNIST dataset, we observe that performance does not critically depend on the chosen bin width. In contrast, for the gesture dataset performance is best for a coarse binning of the entire 4 s spike train into a single bin (Supplementary Table 1).
To elucidate how the PCL network manages to retain high amounts of information about the input despite massively reducing the number of spikes, we analyze the firing statistics of units in the PCL network (Fig. 6). We observe that the PCL network exhibits increased population sparseness. We measure population sparseness as the number of active neurons, i.e., neurons that fire at least one spike in response to a particular stimulus. A smaller number of active neurons corresponds to a higher population sparseness. We compare the number of active neurons for the PCL network and networks without recurrent inhibition and control networks with random spike removal (Fig. 6 and Supplementary Tab. 2). Figure 6 (left-hand side) shows example spike trains of the different networks for three example gestures. It is apparent that the PCL network uses a smaller number of active neurons in each case. This observation is confirmed quantitatively in Fig. 6 (right-hand side), where for each condition we plot the number of active neurons in each condition. Supplementary Table 2 confirms the statistical significance of this finding for all gestures with just one exception: For the “other gesture” class, which is composed of many different and unrelated gestures, the sparsity is not significantly increased by the PCL network. We suspect that this is due to a lack of systematic and learnable structure in this class. In conclusion, the PCL network provides an efficient code by reducing the number of neurons actively involved in the representation of any particular stimulus.
Fig. 6: PCL’s simple cells show increased population sparseness. Raster plots of a network without distant lateral and top-down inhibition, a PCL network, and a control network in which spikes are removed randomly, for three different gestures. The right side shows the average number of active neurons (firing at least one spike) for 77 samples of each gesture. Error bars denote one standard error.
We compare the classification performance of the PCL network (without suppression) to that of other spiking neural networks that have been tested on the same or similar tasks. Supplementary Table 3 compares different spiking neural network approaches that have been applied to digit recognition (using the N-MNIST, MNIST and MNIST-DVS datasets). Comparisons of methods tested on different datasets are generally difficult, but we have still included MNIST and MNIST-DVS in order to cover more approaches. Supervised methods generally tend to show the strongest performance (top part of table). Among the unsupervised approaches (bottom part of table), those that rely on biologically implausible non-local learning rules show particularly strong performance. Among the models that rely only on local learning rules and that have been tested on N-MNIST, PCL performs best, even though its feature representation has only been trained on natural images.
Results of different approaches on the DVS128 Gesture dataset are compared in Supplementary Tab. 4. Again, supervised approaches (top part of table) tend to show the strongest performance. Among unsupervised approaches based on local learning rules, only the model by Nadafian et al.34 shows slightly better results than PCL (90.52% accuracy vs. 89.12%). Again, it is important to keep in mind that the PCL network was only trained on natural images and its architecture and parameters have not been optimized for any classification task.
Overall, our results indicate that the PCL network, by suppressing the most predictable spikes, establishes a favorable trade-off between energy consumption and expressiveness of the learned representation and permits strong downstream classification performance on two challenging benchmark problems.
Discussion
We have presented Predictive Coding Light (PCL), a hierarchical spiking neural network for unsupervised representation learning. In sharp contrast to conventional predictive coding models, where only prediction errors are transmitted to higher levels within a processing hierarchy, PCL suppresses only the most predictable spikes and transmits a compressed representation of the input to higher processing stages. In this sense, it is also related to sparse coding approaches35,36,37. This approach also obviates the need for a dedicated population of neurons for encoding prediction errors24,38.
We have demonstrated that a PCL network trained on natural images reproduces many experimental observations about information processing in the primary visual cortex of mammals including simple and complex cell receptive fields, surround suppression, orientation-tuned suppression, and cross-orientation suppression. Furthermore, we have demonstrated that the PCL network, by suppressing the most predictable spikes, achieves substantial energy savings at the cost of only little to moderate reductions in downstream classification performance on two challenging benchmark problems.
Other spiking predictive coding models can be divided into different classes depending on how prediction errors are encoded23. One class encodes errors with explicit prediction error neurons21,39,40. Another class of approaches represents prediction errors in the membrane potentials of the spiking neurons22,41,42. In contrast, the PCL network proposes to encode prediction errors implicitly within representational neurons that generate more or fewer spikes depending on how predictable these spikes would be. A few other models have also studied how plastic lateral and top-down connections can give rise to prediction error-like responses43,44. However, contrary to all of the aforementioned models, the PCL network has been shown to mimic key aspects of information processing in mammalian V1 including different suppression effects.
Our approach also avoids a frequently overlooked problem of conventional predictive coding. When a lower-level brain area projects to multiple higher-level areas and receives predictive feedback from all of them, then these predictions will generally differ. After all, these areas will receive inputs from different sets of lower-level areas and therefore have different “views” of the world. This means that the lower-level area has to calculate separate prediction errors for all of these higher-level areas and target them specifically to each one of them. While this may not be completely impossible, it would pose a formidable challenge given the high degree of connectedness between cortical areas45.
Interestingly, some of the observed biological properties of the PCL network such as the emergence of simple cell-like receptive fields have been previously observed in supervised deep learning approaches such as convolutional neural networks (CNNs)46 and, more relevant to this study, in other unsupervised approaches including sparse coding35, predictive coding10, and independent component analysis47 models. While most work in this area has considered non-spiking neural networks, a few spiking network models have also replicated simple cell-like receptive fields6,7,48,49,50. However, the PCL network also learns complex cell-like receptive fields in an unsupervised fashion based on a generic, local, spike timing-based learning rule, in addition to exhibiting the aforementioned suppression effects.
Although the PCL network reproduces many findings about primary visual cortex (at least at a qualitative level), it is not meant to be a faithful model of the underlying neurobiology. Some shortcomings in this respect are the violation of Dale’s law by units in the PCL network, i.e., they excite some of their targets and inhibit others. Other shortcomings are the absence of recurrent excitation and feedforward inhibition. Nevertheless, the PCL network offers an arguably minimal description of how a wide range of visual cortex phenomena can arise through generic spike timing-based learning rules. It remains to be seen if other features of information processing in primary visual cortex can also be demonstrated in PCL networks.
Interestingly, the PCL network’s mechanism of suppressing highly predictable spikes could also serve as a basis for explaining sensory attenuation effects observed across different sensory modalities51. Sensory attenuation refers to the phenomenon that self-generated sensory inputs tend to evoke weaker neural responses than unpredictable, externally generated sensory inputs. Specifically, corollary discharges of motor commands generated in motor areas might well predict specific sensory consequences. Therefore, these corollary discharges could be used to predict and suppress sensory evoked spikes in sensory areas, analogous to our illustration in Fig. 2.
The reliance on generic local learning rules makes PCL an interesting candidate architecture for implementation on neuromorphic hardware for energy efficient information processing and continual learning. The violation of Dale’s law by the PCL network may even be seen as a strength in this context, as it obviates the duplication of precious resources (separate excitatory and inhibitory units with identical tuning that might be required otherwise). An important open challenge is to extend the approach to deep architectures with many layers. This could pave the way for many interesting real-world applications. Furthermore, the incorporation of recurrent excitation could instill additional desirable computational features such as a fading memory, pattern completion, evidence accumulation and others. However, maintaining healthy network dynamics when incorporating recurrent excitatory connections that learn with local learning rules remains a formidable challenge.
Methods
Network architecture
In the PCL network all feedforward connections are excitatory, while all recurrent or feedback connections are inhibitory (Fig. 1a). A unit in the PCL network can therefore excite some of its targets and inhibit others, i.e., the PCL network does not respect Dale’s law. Tables 1 and 2 respectively summarize the network’s connectivity and parameters of the model neurons. All synaptic connections in the PCL network are learned via a generic, biologically plausible spike timing-dependent learning rule (described below).
Simple cell layer
Units in the simple cell layer receive events from the event camera and excite units in the complex cell layer. Simple cells also receive three kinds of inhibition. The first is a local lateral inhibition from other simple cells with identical receptive field location. This mutual inhibition facilitates simple cells developing diverse tuning properties. The second is a distant lateral inhibition. This allows the network to learn to suppress spikes that are highly predictable based on the surrounding context. The third is a top down inhibition from the complex cell layer. Similar to the distant lateral inhibition it enables the network to learn to suppress highly predictable simple cell spikes based on the compressed and more abstract representation in the complex cell layer.
Complex cell layer
Units in the complex cell layer are driven by excitatory inputs from the simple cell layer. Similar to the simple cells, complex cells also use a local lateral inhibition scheme to facilitate the learning of a broad range of tuning properties. Furthermore, complex cells send inhibitory feedback connections to simple cells to cancel highly predictable spikes.
Weight sharing
The PCL network uses weight sharing for the excitatory and local lateral inhibitory connections formed by both simple and complex cells. The weight sharing is implemented such that whenever the excitatory or local inhibitory weights of one neuron are changed via the strictly local STDP rule (described below), all other neurons encoding the same feature (but at different retinal locations) also get their weights modified to the same value. Specifically, after each individual neuronal weight update, all excitatory and local lateral inhibitory synapses are further updated as follows:
$$\forall z,\forall j,\ne, k,{w}_{i}{z,j}\leftarrow {w}_{i}{z,k},$$
(1)
where z gives the feature channel, j gives the retinal location of the neuron, k is the retinal location of the neuron that received the last synaptic weight update and w**i denotes an excitatory or local lateral inhibitory synaptic weight indexed by i. While such weight sharing is not biologically plausible, it reduces the number of trainable parameters. Other biologically inspired models52,53 also report the use of such a