Fluctuation-learning relationship in recurrent neural networks

Introduction

Learning is a fundamental ability of the brain to adapt to dynamically changing circumstances. In general, learning performance depends on a large variety of factors in the neural systems1,2 and the content to be learned3. Learning is achieved by the alteration of neural dynamics that is driven by the change in synaptic connectivity4. In turn, the change in connectivity is regulated by the neural dynamics through the activity-dependent plasticity rule (namely, the learning rule). Thus, the learning results from the interactive changes in both neural dynamics and synaptic connectivity. To better understand the mechanisms of learning, it is crucial to investigate how the interaction between the changes in neural activities and connectivity leads to learning.

A broad range of recent experimental studies has suggested a positive correlation between the speed of learning and the temporal variability of neural activities3,5,6 and the behaviors7,8 before learning. For instance, brain-computer interface (BCI) studies3,5,6 have demonstrated that the greater the spontaneous neural activity (namely, the neural activity without an external input or task engagement) before learning in monkeys, the higher the learning speed. Other studies have shown that the behavioral variabilities before learning (the variabilities in the arm movement trajectories7,9, and birdsong patterns10) are correlated with the speeds of learning new behaviors (new arm-reaching and new song patterns). Now it is important to answer a question: how do the fluctuations of neural activities impact the learning process, and in particular, is there a quantitative relationship between the learning speed and the fluctuation of the neural activities before learning, regardless of the specific tasks?

In statistical physics, it is formulated that the degree of the change in a system’s state in response to an external force is proportional to the variance of the state fluctuations in the absence of such a force, as pioneered by Einstein11 and established as fluctuation-response relationship12,13,14,15. In neural systems, neural activities fluctuate, i.e., they vary in time around a mean stationary level, even in the absence of an external stimulus. Such fluctuations without input are called spontaneous fluctuations. Then, in the spirit of the fluctuation-response relation, one may expect that the change in the neural state due to learning, i.e., the learning speed, would be correlated with the spontaneous fluctuations of neural activities before learning.

Of course, the neural dynamics are far from thermal equilibrium, and direct application of statistical physics is not available. Still, however, in neuroscience and machine learning, the relationship between fluctuation and response has been proposed following the concept in statistical physics. In neuroscience, theoretical16,17,18 and experimental19 studies investigated the relation between the spontaneous fluctuation in the neural system and the system’s response to a weak external stimulus. These studies showed that susceptibility in response to the stimulus in spiking neuron models16,18 and rate neuron models16,17 was evaluated together with the correlations of spontaneous neural activities possibly relating to the fluctuation-response relation in statistical physics. Other studies20,21,22 in machine learning field discussed the fluctuation-dissipation relation in the stochastic gradient descent (SGD) learning method. They investigated the learning process as the Langevin dynamics of the connection weights of the network and demonstrated that the dissipation term is evaluated by the correlation of the weights in the learning process. In these studies, the fluctuations in the neural activities, which are the focus here, were not analyzed. Although all these studies explored the fluctuations of neural activities and weight parameters, the quantitative relationship between the learning speed and the spontaneous fluctuations of neural activities remains to be elucidated.

In the present study, we first theoretically derive a general relationship between the learning speed and the fluctuations of the neural activity before learning. This relationship is broadly applicable to associative memory in Hebb-type learning, I/O mapping in perceptron learning, and other sophisticated machine learning algorithms. Especially, in the case of Hebb-type learning, the learning speed is determined by the spontaneous fluctuations in the direction of the target and by the squared neural response to the input. We then verify this relationship numerically using Hebb-type learning rule for input/output target (I/O) maps and associative memories. Interestingly, we numerically confirm that this relationship holds much beyond the assumptions adopted in the derivation of the formulae; asymmetric, not full-rank connectivity and non-linear regime in the learning process. Hence the formula we derive is valid over a large range of learning in neural networks in general. In addition to the associative memories, we validate the relationship for more complex sequence-generation tasks. As a straightforward implication of the formula is that learning a specific input/output mapping is quicker when the direction of the target or the input aligns with the direction where the variance of the spontaneous fluctuations is larger, which is consistent with the experimental findings3. The relationship provides a general basis for understanding how the geometrical relationship between the spontaneous dynamics before learning and the task-relevant directions determines learning speed, which can be applied to a wide range of human and animal learning tasks.

Results

The fluctuation before learning determines the learning speed in neural systems: fluctuation-learning relationship

We investigate the general relationship between spontaneous neural dynamics and speed in learning I/O maps, which includes associative memory as a special case. To this end, we use a rate coding neural network model with neural activities, termed x, and an initial synaptic connectivity between neurons represented by a matrix J. The neural network is trained to generate a static target output pattern ξ in the presence of the associated static input pattern η under the internal noise, as shown in Fig. 1A. Here, the output of the network is given by the neural activities x themselves. In the presence of the input pattern η, it is postulated that the stationary pattern of the neural activities x gives the target pattern ξ after learning. In theoretical derivation, we focus on the initial stage of the learning: the initial connectivity J is changed to J + ΔJ within Δt in learning process, resulting in the change in neural activities called Δx*, as shown in Fig. 1B–E. Here, the learning speed is defined by Δx*/Δt or its absolute value. In this section, we derive the relation between this learning speed and the spontaneous fluctuations for small ΔJ and Δx*.

Fig. 1: Schematic image for the relation between the spontaneous fluctuations and the learning speed.

A (Left) The neural network used in this study. (Right) The formulae we derive. B–E The schematic images of the process we consider in this study. The yellow landscape represents the schematic landscape of the neural dynamics that are changed through the learning. The color lines on the gray plane are contour lines of the schematic landscape. Generally, such a landscape view is not correct for asymmetric connectivity. We numerically verify the formulae even in that case.

Our network model is composed of N neurons whose activities x evolve as widely used neural dynamics with an activation function ϕ(x);

$$\tau \dot{{\boldsymbol{x}}}=\phi ((J{\boldsymbol{x}}+\gamma {\boldsymbol{\eta }}))-{\boldsymbol{x}}+{\boldsymbol{\zeta }},$$

(1)

where ϕ(x) is a sigmoid function. τ is a time scale of neural dynamics and is set at unity for simplicity. x is referred to as neural activities or neural state and ζ is internal white Gaussian noise. Both are N-dimensional vectors. ζ satisfies < ζ(t1)ζT(t2) > = 2D**δ(t1 − t2)I, where < ⋯ > indicates the temporal average, ζT is the transpose vector of ζ, D represents noise strength, and the function δ(x) is Dirac delta function. γ represents the strength of external input. Here, the spontaneous neural activity is defined as the neural activity without the input η (γ = 0) and the spontaneous fluctuation represents the fluctuation of the spontaneous neural activity due to the internal noise, whose magnitude is measured by its (co-)variance. The learning process changes J to make the network memorize an I/O (η /ξ) map, where both η and ξ are also N-dimensional vectors.

We assume that the time scale of neural dynamics τ is much smaller than that of the learning, denoted by τ**J, whereas the time scale of noise is much smaller than τ, and consider the regime in which the neural state converges into a fixed point. Thus, the neural state converges to a fixed point on the time scale τ for a given connectivity J and then changes adiabatically through the change of J on the time scale τ**J(≫τ), so called adiabatic assumption. Here (and in the following analysis), convergence to a fixed point means that the neural activity averaged over a short time (or the activity averaged out the fluctuations due to noise) converges to it. We set the initial J that is a random matrix (consequently, almost surely full rank), the variance of whose elements is 1/N.

By applying the input η to the network, the neural state departs from a fixed point in the spontaneous dynamics and converges to another fixed point, denoted by xr, under a given initial J, as shown in Fig. 1B, C. xr satisfies

$${{\boldsymbol{x}}}_{{\boldsymbol{r}}}=\phi (J{{\boldsymbol{x}}}_{{\boldsymbol{r}}}+\gamma {\boldsymbol{\eta }}).$$

(2)

Here, the noise term is neglected because xr is a stationary state wherein the noise effect is temporally averaged out. In the following analysis of the much slower processes than the noise, such as the learning process, we focused on the dynamics in the absence of the noise effect.

After convergence to xr, the connectivity is changed to J + ΔJ within Δt and the neural state converges to the new fixed point of the dynamics for the connectivity J + ΔJ, termed x* as shown in Fig. 1D. Here, the specific form of ΔJ is determined by the learning rule. We find that the deviation between x* and xr, represented by Δx*, is derived as the fixed point of the dynamics,

$$\frac{d({\boldsymbol{x}}-{{\boldsymbol{x}}}_{{\boldsymbol{r}}})}{dt}=-(1-BJ)({\boldsymbol{x}}-{{\boldsymbol{x}}}_{{\boldsymbol{r}}})+B\Delta J{{\boldsymbol{x}}}_{{\boldsymbol{r}}}+{\boldsymbol{\zeta }},$$

(3)

which is obtained from replacing J by J + ΔJ in Eq. (1) and then linearizing ϕ((J + ΔJ)x + γη) at around Jxr + γη (see “Derivation of FLR1” in Methods for details). Here, we define B as a diagonal matrix represented by dia**g(β1, β2, ⋯ , β**N) with ({\beta }_{i}\triangleq {\phi }^{{\prime} }{(J{{\boldsymbol{x}}}_{{\boldsymbol{r}}}+\gamma {\boldsymbol{\eta }})}_{i}). In other words, its diagonal element represents the coefficient of the derivative of each neuron’s activation function around Jxr + γη. This equation represents Langevin dynamics with the external force BΔJxr. The change in the neural state caused by the change in the connectivity is represented by the fixed point of Eq. (3) and formally written as

$$\Delta {{\boldsymbol{x}}}^{*}={(1-BJ)}{-1}B\Delta J{{\boldsymbol{x}}}_{{\boldsymbol{r}}},$$

(4)

as illustrated in Fig. 1B. Note that, following the adiabatic assumption described above, the neural state rapidly converges to xr in response to the input. Subsequently, as J changes to J + ΔJ over the learning time Δt, the converged state adiabatically shifts to xr + Δx*.

To evaluate (1−B**J)−1 from neural activity before learning, we consider the Langevin dynamics given by

$$\frac{d({\boldsymbol{x}}-{{\boldsymbol{x}}}_{{\boldsymbol{r}}})}{dt}=-(1-BJ)({\boldsymbol{x}}-{{\boldsymbol{x}}}_{{\boldsymbol{r}}})+{\boldsymbol{\zeta }},$$

(5)

which corresponds to the linearized dynamics of Eq. (1) around xr, in the same manner as Eq. (3). Eq. (5) describes the linearized dynamics around xr before learning, whereas Eq. (3) describes the linearized dynamics after the change in connectivity, J → J + ΔJ.

(1−B**J)−1 is estimated by the covariance matrix of the neural dynamics in the presence of the input according to Eq. (5), termed Cov(x)inp, as (1−B**J)−1 = D−1Cov(x)inp (see “Derivation of FLR1” for detail). By substituting this evaluation into Eq. (4) and dividing Δt, we obtain a general relationship that relates the learning speed to he fluctuation, namely covariance, of the activity in the presence of the input before learning as

$$\begin{array}{r}\frac{\Delta {{\boldsymbol{x}}}^{{*}}{\Delta t}={D}}{-1}Cov{({\boldsymbol{x}})}_{{\rm{inp}}}B\frac{\Delta J}{\Delta t}{{\boldsymbol{x}}}_{{\boldsymbol{r}}},\end{array}$$

(6)

named FLR1. This relationship shows that the change in the neural state Δx* driven by the synaptic change ΔJ is proportional to the covariance of the neural activity before learning. It is not dependent on the learning rule, because the effect of the learning rule is included in ΔJ. As shown in the “derivation of FLR1 in Methods”, this relationship is obtained for a symmetric B**J matrix (or symmetric J if B is scalar, which holds in the linear regime of the neural dynamics as in Eq. (1)), whereas, even for an asymmetric random matrix, this relationship holds if ∣B∣ is small (see “Derivation of FLR1” for detail).

The spontaneous fluctuations along the target and input directions determine the learning speed

Generally, ΔJ is determined by the neural state through a learning rule. To understand the relationship between the neural dynamics and Δx* further, we adopt a Hebb-type learning rule

$$\Delta J={({\tau }_{J}N)}^{{-1}{\boldsymbol{f}}({\boldsymbol{x}}){{\boldsymbol{g}}}}{T}({\boldsymbol{x}})\Delta t$$

(7)

as this analysis. τ**J is a time scale of the change in the connectivity. f(x) and g(x) are arbitrary N-dimensional functions determining the post- and pre-synaptic contributions in the Hebb-type learning, respectively. For simplicity, we consider the case of g(x) = x. Still, even for an arbitrary function g(x), the following analysis holds by replacing ∣x∣2 by gT(x)x.

This form widely covers various types of learning rules4, e.g., those for associative memory23 and I/O (η/ξ) mapping. For auto-associative memory, such as Hopfield model, the input pattern η is set to be identical to the target pattern ξ with f(x) = x, that is, the target itself is applied to the network as input. The associative memory with the Hopfield network is not the main concern of the paper, but, the theoretical relation between the spontaneous fluctuations and the learning speed is still valid, as shown in Supplementary Information.

For I/O mapping or hetero-associative memory, we use the perceptron-like learning with f(x) = ξ − x, where the learning rule drives x to the target ξ by ΔJx = (ξ − x)∣x∣2. In fact, it was demonstrated that this learning rule can memorize sufficiently many I/O maps24,25,26. In the following, we focus on I/O mappings and train a network to generate a target output ξ in the presence of an input η. Thus, we mainly analyze the perceptron-like learning

$$dJ/dt=(1/{\tau }_{J}N)({\boldsymbol{\xi }}-{\boldsymbol{x}}){{\boldsymbol{x}}}^{T}.$$

(8)

In this section, we consider the initial stage of the learning process and, consequently, the neural state x is around xr that is much smaller than ξ, so that

$$\Delta J \sim {({\tau }_{J}N)}^{{-1}{\boldsymbol{\xi }}{{\boldsymbol{x}}}_{{\boldsymbol{r}}}}{T}\Delta t.$$

(9)

Now we transform FLR1 further by using Eq. (9). In this transformation, we assume that the initial connectivity J is symmetric (but later it is extended to the asymmetric case), as well as that the input is small so that the response xr is in a linear regime of ϕ(x). By substituting Eq. (9) to FLR1, the initial change Δx*, and also the initial learning speed Δx*/Δt, are determined by the response to the external input xr and the covariance of the neural activities around xr as

$$\frac{\Delta {{\boldsymbol{x}}}^{{*}}{\Delta t}=\frac{| {{\boldsymbol{x}}}_{{\boldsymbol{r}}}{| }}{2}}{DN{\tau }_{J}}Cov{({\boldsymbol{x}})}_{{\rm{inp}}}B{\boldsymbol{\xi }}.$$

(10)

Based on the small input assumption, Cov(x)inp is approximated by the covariance matrix of the spontaneous activities Cov(x) and B is replaced by (\beta={\phi }^{{\prime} }(0)). For instance, if tanh function is adopted as (\phi (x)=\tanh (\beta x)), β is a gain parameter. Then we obtain a new relationship between the neural activities and learning speed, named FLR2, (see “Derivation of FLR2” in Methods) as

$$\frac{\Delta {{\boldsymbol{x}}}^{{*}}{\Delta t}=\frac{\beta | {{\boldsymbol{x}}}_{{\boldsymbol{r}}}{| }}{2}}{DN{\tau }_{J}}Va{r}_{{\boldsymbol{\xi }}}({\boldsymbol{x}}){\boldsymbol{\xi }},$$

(11)

where a scalar value (Va{r}_{{\boldsymbol{\xi }}}({\boldsymbol{x}}),\triangleq ,( < {({{\boldsymbol{\xi }}}^{{T}{\boldsymbol{x}})}}{2} > - < ({{\boldsymbol{\xi }}}^{{T}{\boldsymbol{x}}){\ > \ }}{2})/| {\boldsymbol{\xi }}{| }^{2}) indicates the variance of the spontaneous activities along the ξ direction. This formula means that the direction of the change in the neural state due to learning is aligned with the target vector and the rate of the change (namely, the learning speed) is proportional to the response to the input and the variance of the spontaneous neural activities along the target direction before learning. This formula holds exactly for ξ parallel to an eigenvector of J and approximately for a random ξ for the sufficiently large N. Further, this formula holds even for the asymmetric network when β is small as shown in Derivation of FLR2 in the asymmetric matrix in Methods.

By applying the same assumption adopted to get FLR2 (in Eq. (11)), we also represent xr (see “Derivation of FLR2” in Methods) by

$$\begin{array}{r}{{\boldsymbol{x}}}_{{\boldsymbol{r}}}=\frac{\beta \gamma }{D}Va{r}_{{\boldsymbol{\eta }}}({\boldsymbol{x}}){\boldsymbol{\eta }}.\end{array}$$

(12)

Then, the formula FLR2 has another form FLR2’ as

$$\begin{array}{r}\frac{\Delta {{\boldsymbol{x}}}^{{*}}{\Delta t}=\frac{\beta }{DN{\tau }_{J}}{(\frac{\beta \gamma }{D})}}{2}{(Va{r}_{{\boldsymbol{\eta }}}({\boldsymbol{x}})| {\boldsymbol{\eta }}| )}^{2}Va{r}_{{\boldsymbol{\xi }}}({\boldsymbol{x}}){\boldsymbol{\xi }}.\end{array}$$

(13)

In contrast to FLR2 (Eq. (11)), this formula is now represented only by the fluctuation of the spontaneous neural activity without the response explicitly. It shows that the learning speed is proportional to the spontaneous fluctuations before learning.

Here, we summarize the conceptual difference across the fluctuation-learning relations, (FLR1,2). First, we derived the general relationship between the learning speed and the spontaneous fluctuations in the formula FLR1 that applies to any learning rule. When we specified the Hebb-type learning rule Eq. (7) with the assumption of the small input, (namely, within the linear regime of ϕ), we obtained the formula FLR2 (Eq. (11)), which indicates that the learning speed is determined by the variance of the spontaneous fluctuations along the target and the response in the former relation. Finally, the formula FLR2’ (Eq. (13)) shows the learning speed is determined only by the variance of spontaneous fluctuations along the target and input directions without using the response. All these relations show how the variance of spontaneous neural fluctuations determines the learning speed. Although these formulae rigorously hold for symmetric J, they still approximately hold even for asymmetric case if β is small. In the following section, we examine the validity of the (initial) learning speed ∣Δx*∣/Δt; the numerically measured learning speed referred to as s and the theoretically evaluated learning speed referred to as sth and ({s}_{,\text{th},}^{{\prime} }) based on FLR2 and FLR2’, respectively.

Fluctuation-learning relationship is validated in a random network model

To validate the above formulae FLR2 (Eq. (11)) and LR2’ (Eq. (13)) numerically, we explored three neural network models with Eqs. (1) and (8) in which only the initial network J is different: a random symmetric Gaussian matrix (random symmetric network) model, a random asymmetric Gaussian matrix (random asymmetric network) and a pre-embedded association model (pre-embedded network)27. The random symmetric network model satisfies the symmetry in the linear regime of the neural dynamics and full-rank properties that are assumed to derive FLR1 and FLR2 rigorously, while the random asymmetric network satisfies only the full-rank property. The pre-embedded network does not satisfy both properties (see “Neural network models” in Methods for the details).

Here, we analyze the random symmetric network model. We examine if the measured learning speed s matches the theoretical learning speed sth and ({s}_{,\text{th},}^{{\prime} }) determined by FLR2 and FLR2’, respectively (see Fig. 2A and Methods for the details of the measurement of the speed). We have the following two cases of I/O maps: (i) Eigenvector map: the input and target are eigenvectors of the connectivity matrix. (ii) Random map: they are random patterns. In the latter, the learning speed could not exactly match the theoretical ones due to the approximation of the covariance matrix by the diagonal variance matrix, and, consequently, the validity of the formulae in FLR2 and FLR2’ is not completely assured.

Fig. 2: The learning speed for the eigenvector and random maps in the random symmetric matrix model.

A The learning process for 5 different maps. The overlaps of the neural state with the targets, xTξ/N, are plotted. The inset figure is the expansion of (A). Up to t = 200, only the neural dynamics run without the learning process (i.e., only x evolves without the change of J) in order to measure the response to input before learning. B, D The measured learning speed s for different values of β. s is plotted against two types of the theoretical learning speed sth of FLR2 (Eq. (11)) and ({s}_{,\text{th},}^{{{\prime} }) of FLR2’ (Eq. (13)) by cross and circle markers, respectively. The leaning speed sth and ({s}_{,\text{th},}}{{\prime} }) for 10 maps are plotted for each value of β. C, E ({T}_{L}^{{-1}) is plotted against sth and ({s}_{,\text{th},}}{{\prime} }) by cross and circle, respectively. The leaning speed sth and ({s}_{,\text{th},}^{{\prime} }) for 10 maps are plotted for each value of β. B, C for the eigenvector map, and D, E for the random map. F The learning speed s as a function of β is plotted for the eigenvector map by the circle outlines and for the random map by filled circles.

First, for the eigenvector maps, we measured the learning speed for different values of β. The measured learning speed s is plotted in Fig. 2B against the theoretical ones sth and ({s}_{,\text{th},}^{{{\prime} }). For all values of β, s agrees with both the theoretical value sth and ({s}_{,\text{th},}}{{\prime} }). The learning rate is greater and more widely distributed as β increases, which is reflected in the increase in the mean amplitude and the distribution of the fluctuations with β (see Supplementary Note S1). Further, we explored numerically the validity of our formulae FLR2 and FLR2’ for larger input strength, which is beyond our assumption of the weak input in the derivation of FLR2. We compared s with the theoretical value sth and ({s}_{,\text{th},}^{{{\prime} }) for γ = 0.1 and 1 and found s agrees with both the theoretical value sth and ({s}_{,\text{th},}}{{\prime} }). See Supplementary Note S2 for detailed analysis.

So far, we have focused on the initial learning speed satisfying the linear approximation of ΔJ s assumed to derive the formulae. Now we ask whether the learning speed due to the large change in J is also evaluated by the formulae FLR2 (Eq. (11)) and FLR2’ (Eq. (13)). To answer this question, we numerically calculated the time taken for learning to be completed, T**L. This time to complete is defined as the time at which the neural activity is sufficiently close to the target; to be specific, the overlap of the neural activities with the target reaches 0.75, as shown in Fig. 2A (see “Measurement of the learning speed” in Methods for the details). Then, we compared T**L to sth and ({s}_{,\text{th},}^{{{\prime} }), as shown in Fig. 2C. We found sth and ({s}_{,\text{th},}}{{\prime} }) are proportional to ({T}_{L}^{{-1}), meaning that ({T}_{L}}{-1}) is successfully estimated by FLR2 and FLR2’. Note that sth and ({s}_{,\text{th},}^{{{\prime} }) are the learning speed in the initial stage of learning, while the speed is not constant throughout the entire learning process. Thus, the proportional relation between sth and ({s}_{,\text{th},}}{{\prime} }) and ({T}_{L}^{-1}) is not obvious. These results show that FLR2 and FLR2’ hold beyond the linear approximation of the change in J.

For the random maps, we also examined the validity of FLR2 and FLR2’, as shown in Fig. 2. In the random maps, the derivation of the learning speed sth is not exact compared with the case for the eigenvector maps, at the point of the transformation of Cov(x)ξ to Varξ(x)ξ (see “Derivations of FLR2” in Methods) and, consequently, the validity of these formulae is not completely assured. Still, similar to the eigenvector map case, we found that the measured speed s agreed with the theoretical ones sth and ({s}_{,\text{th},}^{{{\prime} }), as well as ({T}_{L}}{-1}), even though the covariance matrix of the spontaneous fluctuations is approximately evaluated as the variance in the derivation of FLR2 and FLR2’ for the random maps. These results show that our formulae for the relationship between spontaneous fluctuations and learning speed are broadly valid independent of the input and the target directions.

Finally, we note the difference between the random and eigenvector maps, as shown in Fig. 2F. Although, in both cases, the measured speed s increases with β according to FLR2 and FLR2’, the speed for the eigenvector maps has a wider distribution than that for the random pattern case. For the eigenvector map, the variance of the spontaneous activities along the target and the input directions shows a wide distribution (see Supplementary Note S1 for details). Contrary, in the random map, the variance along the target and the input is limited due to averaging over contributions of many eigenvectors.

In addition to the random symmetric network model, we analyze FLR2 and FLR2’ in a random asymmetric network model. We examine whether the measured learning speed s agrees with theoretical ones sth and ({s}_{,\text{th},}^{{{\prime} }), as well as ({T}_{L}}{-1}) beyond the symmetric case (see Supplementary Note S3-2 for details). In this model, we investigate two cases of the learned map, the eigenvector map and the random map. As shown in Fig. S5A, D, FLR2 and FLR2’ hold well similarly to in the random symmetric matrix model. Further, ({T}_{L}^{{-1}) is also proportional to the theoretical sth and ({s}_{,\text{th},}}{{\prime} }).

Fluctuation-learning relationship is validated for pre-embedded network, which goes beyond the assumption for theoretical derivation

The dynamics in the random network model have no apriori structure before learning a new map, whereas the neural system usually has structured dynamics that were shaped through learning many patterns before learning a new pattern. The neural system learns a new pattern, depending on the relation between the new and already learned patterns28,29. To examine if the learning speed formula is valid in this case, we analyze the neural network model27 in which I/O maps are pre-embedded before a new map is learned. In this model, to pre-embed input η μ / target ξ μ maps (μ = 1, 2, , , α**N), J is constructed as follows27:

$$J=(1/N)\mathop{\Sigma }\limits_{\mu=1}^{{\alpha N}({{\boldsymbol{\xi }}}}{{\boldsymbol{\mu }}}-{{\boldsymbol{\eta }}}^{{{\boldsymbol{\mu }}}){({{\boldsymbol{\xi }}}}{{\boldsymbol{\mu }}}+{{\boldsymbol{\eta }}}^{{{\boldsymbol{\mu }}})}}{T},$$

(14)

where η μ and ξμ are N-dimensional random vectors(see “Methods” for the details). Here, α represents the memory load factor. The connectivity matrix J in Eq. (14) is not full-rank and consequently the formulae obtained above do not apply to this model directly.

Our previous studies have demonstrated that this J emerges through the learning of ημ/ξμ maps according to Eq. (8)24,26. Thus, the connectivity Eq. (14) corresponds to the state after learning such α**N maps, and is suitable to study how these pre-embedded maps affect the learning of a new map of η and ξ. For clarity, we refer to the pre-embedded inputs and targets as ημ and ξμ, while the new pair to be learned is denoted simply as η and ξ without superscript.

Before examining the validity of FLR2 (Eq. (11)) and FLR2’ (Eq. (13)), we analyze the spontaneous fluctuations and found that the fluctuation is anisotropic: the variance of the spontaneous fluctuation along the pre-embedded targets ξμ, (Va{r}_{{{\boldsymbol{\xi }}}^{{\mu }}({\boldsymbol{x}})), is larger than those along the pre-embedded input ημ, (Va{r}_{{{\boldsymbol{\eta }}}}{\mu }}({\boldsymbol{x}})), and the random one Varζ(x) (see Supplementary Note S4 for the detail). These variances increase with α and β. Due to such anisotropic behavior, we analyze two cases of I/O maps: (i) remap case: a new map to be learned is set up by combining randomly the pre-embedded ημ and ξμ patterns, and (ii) random map case: a new map is chosen from random patterns orthogonal to the pre-embedded patterns. We now investigate the learning speed for the remap case in the main text.

First, we examine the validity of the formulae by computing the learning speed s. Figure 3A shows s and sth for different values of β with α = 0.1. The values of the learning speed for each β are distributed along the diagonal line. Thus, the formula FLR2 (Eq. (11)) is valid for a broad range of β (see Supplementary Note S4-2 for additional analysis), although their distribution for β = 1.4 deviates from the diagonal slightly more than those for smaller β. β = 1.4 is close to the boundary under which the fixed point of the origin is stable and the assumption for the learning speed formula is almost broken. Figure 3B shows that, against varying α, s also agrees with sth. Compared to the distribution of s across various values of β, that of s across α is narrower. This means that the change in α leads to change in the learning speed, but its effect is smaller than that by the change in β.

Fig. 3: The learning speed in the pre-embedded model.

The learning speed in the remap case (A–C) and the random map case (D–F) are plotted. A, D The learning speed s is plotted against the theoretical ones sth and ({s}_{,\text{th},}^{{{\prime} }) in cross and circle respectively, for different values of β for α = 0.1. The color code is shown in the right bar. B, E s against sth is plotted for different values of α for β = 0.6. The color code is shown in the right bar. C, F The inverse of the learning time to complete ({T}_{L}}{-1}) is plotted against sth. The color code is the same as that in (A, D).

We, next, measured the learning time to complete, T**L, in the similar manner as for the random network model and plotted it against sth in Fig. 3C. ({T}_{L}^{-1}) is clearly proportional to sth, although its proportionality is somewhat weaker than that for the random network model.

In addition to the remap case, the random map case is examined in Fig. 3D, E, F and the formula FLR2 and ({T}_{L}^{{-1}) are also confirmed for the random map case, including dependence on α and β. In the random maps in the pre-embedded network model, we verify the formulae FLR2 and FLR2’ by measuring the learning speed s. Figure 3D plots s against the theoretical ones sth and ({s}_{,\text{th},}}{{\prime} }) for α = 0.1, demonstrating that s agrees with sth and ({s}_{,\text{th},}^{{\prime} }) for almost all of β except that for β = 1.4, which is close to the critical value of β exhibited in Fig. S6A.

Besides the dependence on β, the dependence of the learning speed on α is investigated. Figure 3E exhibits that the learning speed s increases with α in agreement with the theoretical value sth. This result shows that the formula FLR2 is valid for different α.

We plot ({T}_{L}^{{-1}) against sth in Fig. 3F. As shown, ({T}_{L}}{-1}) is proportional to sth. Thus, the learning time to complete is also estimated well by the spontaneous fluctuation in the direction of the target and the response according to FLR2.

Altogether, these results show that in the random map, the learning rate is predicted (somewhat weakly compared to the remap case, but still clearly) by the spontaneous fluctuations for the various values of β and α.

We found that the measured learning speed s for the remap case is larger than that for the random case. For both cases, our results show that, even in the pre-embedded model that is beyond the assumption of J, the predictions of the formulae FLR2 (Eq. (11)) and FLR2’ (Eq. (13))are validated: With the increase in α and β, the spontaneous fluctuations increase, leading to faster learning. Furthermore, the larger fluctuations along the target direction result in the increase in the learning speed for the remap case, as compared to the random map case with the smaller fluctuations. These observations agree well with the formulae in the pre-embedded model.

Finally, besides the pre-embedded model, our formulae hold in the standard Hopfield network model, as demonstrated in Supplementary Note S5.

The fluctuation-learning relationship is validated for complex tasks to generate time-dependent sequences

So far, we have validated the fluctuation–learning relationship for static associative memories in symmetric and asymmetric random networks, as well as in pre-embedded networks corresponding to the sequential learning of multiple memories. Now we ask whether the relationship holds beyond static tasks. To clarify this point

Introduction

Introduction

Results

The fluctuation before learning determines the learning speed in neural systems: fluctuation-learning relationship

The spontaneous fluctuations along the target and input directions determine the learning speed

Fluctuation-learning relationship is validated in a random network model

Fluctuation-learning relationship is validated for pre-embedded network, which goes beyond the assumption for theoretical derivation

The fluctuation-learning relationship is validated for complex tasks to generate time-dependent sequences

Similar Posts