1 Introduction
The functional central limit theorem (FCLT), also known as Donsker’s invariance principle [16], asserts that for a sequence ((X_n)_{n \ge 0}) of i.i.d. random variables with zero mean and unit variance, the random process
$$ W_N(t) = N^{-1/2} \sum _{n = 0}^{ \lfloor N t \rfloor - 1 } X_n, \quad t \in [0,1], $$
converges in distribution to a standard Brownian motion ((Z(t))_{t \in [0,1]}) with respect to the Skorokhod topology. Barbour [[5](https://link.springer.com/article/10.1007/s10955-025-03547-1#ref-CR5 “Barbour, A.D.: Stein’s m…
1 Introduction
The functional central limit theorem (FCLT), also known as Donsker’s invariance principle [16], asserts that for a sequence ((X_n)_{n \ge 0}) of i.i.d. random variables with zero mean and unit variance, the random process
$$ W_N(t) = N^{-1/2} \sum _{n = 0}^{ \lfloor N t \rfloor - 1 } X_n, \quad t \in [0,1], $$
converges in distribution to a standard Brownian motion ((Z(t))_{t \in [0,1]}) with respect to the Skorokhod topology. Barbour [5] observed that Stein’s method [49], initially developed for error estimation in the central limit theorem, can be adapted to obtain rates of convergence in the FCLT with respect to integral probability metrics of sufficiently smooth test functions.
Consider an integral probability metric of the form
$$\begin{aligned} d_{{\mathfrak G}}(\mu _N, \nu ) := \sup _{g \in {\mathfrak G}} | \mu _N( g ) - \nu (g) |, \end{aligned}$$
(1)
where ({\mathfrak G}) is some class of real-valued test functions, ((\mu _N)_{N \ge 0}) is a sequence of probability distributions, and (\nu ) is a known target distribution used to approximate (\mu _N). Here, (\nu (g)) denotes the expectation of g with respect to (\nu ). The core idea behind Stein’s method can be summarized as follows: to estimate (1), a linear operator (\mathcal {A}), called a Stein operator, together with a class of functions ({\mathfrak F}(\mathcal {A})) is constructed such that (a) the identity (\nu ( \mathcal {A}f ) = 0) holds for each (f \in {\mathfrak F}(\mathcal {A})); and (b) for each (g \in {\mathfrak G}), there exists a solution (f \in {\mathfrak F}(\mathcal {A})) to the Stein equation
$$\begin{aligned} \mathcal {A}f (w) = g(w) - \nu ( g ). \end{aligned}$$
(2)
Taking expectations reformulates the original problem of estimating (1) into one of estimating (\sup _{f \in {\mathfrak F}(\mathcal {A})} | \mu _N ( \mathcal {A}f ) |), for which various techniques have been developed in the extensive literature on Stein’s method. For continuous target distributions, these techniques commonly involve Taylor expansions, coupling methods, and Malliavin calculus. A notable feature of Stein’s method, particularly from the viewpoint of applications, is its flexibility in handling various dependence structures including local dependence [9, Chapter 9] and weak dependence [47, Chapter III]. For further background on Stein’s method and its applications in different probabilistic frameworks, we refer the reader to the monographs [9, 43] and the surveys [35, 48].
In the special case where the target distribution (\nu ) is the Wiener measure, Barbour considered a class ({\mathscr {M}}) (defined below in Section 2.1) of test functions (g : D \rightarrow {\mathbb R}) on the space (D := D([0,1], {\mathbb R})) of càdlàg functions (w : [0,1] \rightarrow {\mathbb R}) equipped with the sup-norm, where g is twice Fréchet differentiable with a Lipschitz continuous second derivative and satisfies appropriate growth constraints. To construct a Stein operator (\mathcal {A}), Barbour used Markov generator theory, defining (\mathcal {A}) as the infinitesimal generator of a Markov process that solves an SDE with stationary distribution (\nu ). Using the transition semigroup associated with (\mathcal {A}), he identified the solution to (2) and analyzed its properties. Employing Taylor expansions in combination with the leave-one-out approach, he derived an error bound of the form
$$\begin{aligned} | \textbf{E}[ g( W_N ) ] - \textbf{E}[ g(Z) ] | \le C N^{-1/2} \Vert g \Vert ( \sqrt{ \log (N) } + \textbf{E}[ |X_1|^3 ] ), \end{aligned}$$
(3)
where (C > 0) is an absolute constant and (\Vert g \Vert ) is a suitable norm of g. Recently, Kasprzak [25] extended Barbour’s results [5] to the approximation of vector-valued processes by multivariate correlated Brownian motion, where the covariance matrices can be non-identity and the approximated process is allowed to have certain dependence structure. Barbour, Ross, and Zheng [7] derived Gaussian smoothing inequalities that, in the case of error bounds such as (3) obtained via Barbour’s method, allow the class of test functions to be extended at the cost of reduced precision in the estimates.
Our aim in this study is to adapt Barbour’s technique to extend error bounds such as (3) to cases where the process ((X_n)) is generated by a deterministic dynamical system with good mixing properties. We restrict our considerations to self-normed càdlàg paths (W_N(t)) (see (17)) and approximations by the univariate standard Brownian motion, with generalizations (such as those considered in [25]) left for future explorationFootnote 1. Our study is, in spirit, a continuation of previous studies [24, 34] that dealt with Gaussian approximation of dynamical systems using Stein’s method. An observation from Barbour’s work [5] is that, a rate of convergence in Donsker’s theorem follows with little added effort to the Gaussian case, once Stein’s method for Brownian approximations has been properly set up. Based on results from the present study and those in [24, 34], we find that this observation continues to hold for a broad class of chaotic dynamical systems.
The problem of estimating rates of convergence in FCLTs for dynamical systems has been addressed in [2, 15, 19, 20, 22, 36, 44]. Using an explicit construction of an approximating Gaussian sequence together with a Fuk–Nagaev type deviation inequality, Dedecker, Merlevède, and Rio [15] established rates of order (O(N^{-1/4}(\log N){1/4})) for the Wasserstein-2 metric in the case of Young towers with return times to the base having a finite fourth moment. A series of works has been devoted to the adaptation of martingale approximation techniques, starting with Antoniou and Melbourne [2], who derived convergence rates with respect to the Prokhorov metric. For systems modeled by Young towers with exponential tails of return times to the base, such as planar periodic dispersing billiards and unimodal maps satisfying the Collet–Eckmann condition, their method yields rates of order (O(N{-1/4+\delta })) for arbitrarily small (\delta > 0). For Young towers with polynomial tails, including interval maps with neutral fixed points as in the Pomeau–Manneville scenario [46], the rates depend on the “degree of nonuniformity.” In a setting similar to [2], Liu and Wang [36] obtained rates for Wasserstein-p metrics, and Paviato [44] established rates in multidimensional FCLTs with respect to Prokhorov and Wasserstein-1 metrics. In [37], Liu and Wang derived convergence rates for Wasserstein-p metrics in the FCLT for uniformly expanding nonautonomous dynamical systems described by concatenations (T_n \circ \cdots \circ T_1) of varying maps (T_i : X \rightarrow X), such as those in the setting of Conze and Raugi [13]. Their rates are (O(\sigma _N^{-1/2+\delta })) for arbitrarily small (\delta > 0), where (\sigma _N^2) denotes the variance of the partial sum process. Under linear growth of (\sigma _N^2) this corresponds to (O(N^{-1/4+\delta })).
The primary contribution of our work is the method, which, under certain weak dependence criteria (see Definition 2.8) called functional correlation bounds, yields an error bound in a self-normed FCLT with respect to a metric as in (1). The result applies to test functions (g : D \rightarrow {\mathbb R}) that belong to Barbour’s class ({\mathscr {M}}) and satisfy an additional smoothness condition (see (5)). The error decays at the rate (O(N^{-1/2})), provided that the decay rate of functional correlations associated to separately Lipschitz functions has a finite first moment and that the variance of the partial sum (\sum _{n=0}{N-1} X_n) grows linearly as (N \rightarrow \infty ). We consider two types of applications: (1) stationary processes of the form (X_n = f \circ T^n) with (T^n = T{n-1} \circ T), where (T : M \rightarrow M) is a measure-preserving transformation of a probability space ((M, \mathcal {F}, \mu )) and (f : M \rightarrow {\mathbb R}) is a suitably regular observable; and (2) nonstationary processes of the form (X_n = f_n \circ T_n \cdots \circ T_1), where ((T_n)) is a deterministic/random sequence of transformations and ((f_n)) is a sequence of regular observables.
Prior to our work, Fleming [19] addressed the problem of estimating the rate of convergence in FCLTs for ergodic measure-preserving transformations satisfying weak dependence criteria also called functional correlation bounds, which are closely related yet different from those considered here (compare [19, Definition 3.2.2] with (FCB)). Using an approach based on Bernstein’s blocking technique, he derived convergence rates in the multivariate FCLT with respect to the Wasserstein-1 metric of Lipschitz continuous test functions, at best of order (O(N^{-1/4 + \delta })), where the rate of convergence depends on the decay rate of the functional correlation bounds. In the recent work [20], similar techniques are used to derive Wasserstein-p rates in multivariate FCLTs. In the case of Young towers with superpolynomial tails, rates of order (O(n^{-\kappa })) for all (\kappa < 1/4) and (p < \infty ) are established.
Previously, functional correlation bounds were derived in [31, 33, 34] for piecewise uniformly expanding interval maps, Pomeau–Manneville-type interval maps, and a class of dispersing Sinai billiards. By combining these bounds with our abstract theorem (Theorem 2.11), we establish order (O(N^{-1/2})) error bounds in FCLTs for these systems. To our knowledge, all of these results are new. Additionally, we analyze an interval map (T : [-1,1] \rightarrow [-1,1] ) introduced by Pikovsky [45], which features two neutral fixed points and an unbounded derivative. The neutrality of the fixed points and the degree of the singularity are governed by a single parameter (\gamma \in (1, \infty )). By [12, 14], this map admits a Young tower with a first return map whose tails decay at the rate (O(n^{- \gamma / ( \gamma - 1)})). An immediate consequence (see e.g. [2, 40]) is that, for (\gamma < 2), the process (X_n = f \circ T^n) where f is Lipschitz continuous satisfies the FCLT. We prove that ((X_n)) satisfies a functional correlation bound with a polynomial rate that matches the decay rate of usual auto-correlations from [14]. This leads to new error bounds in the FCLT for parameters (\gamma < 5/3), which decay at the order (O(N^{-1/2})) when (\gamma < 3/2).
Remark 1.1
After circulating a preliminary version of this manuscript, we learned from Yeor Hafouta about the related interesting work in [23, Chapter 1], which applies Barbour’s method [5] in the context of weakly dependent processes. In particular, [23, Theorem 1.6.2] provides error bounds in the FCLT for nonconventional sums of the form (\sum _{n=1}{ \lfloor Nt\rfloor } F( \xi _n, \xi _{2n}, \ldots , \xi _{\ell n})) where ((\xi _n)) satisfies certain stationarity and mixing conditions and F is a sufficiently regular function. Dynamical applications include topologically mixing subshifts of finite type and systems with corresponding symbolic representations, such as smooth Axiom A diffeomorphisms. While the metric used in [23] does not impose the smoothness condition (5), the established error estimates decay at a rate slower than (O(N{-1/2})), depending on the rate of mixing.
1.1 Structure of the Paper
In Section 2, we present our main result (Theorem 2.11) concerning the rate of convergence in the FCLT with respect to an integral probability metric of smooth test functions for sequences of uniformly bounded real-valued random variables. The proof, given in Section 5 and Appendix A, relies on Barbour’s method [5] which is reviewed in Section 4. Our hypothesis, similar to [24, 34], is a functional correlation bound with a sufficiently fast polynomial rate of decay. Examples of dynamical systems that satisfy such bounds are discussed in Section 3. For the intermittent map of Pikovsky [45], we prove a polynomial functional correlation bound in Section 6.
2 Abstract theorem
2.1 Spaces of test functions (\mathscr {L}), (\mathscr {M}), and (\mathscr {M}_0)
Denote by D the space of all càdlàg functions (w : [0,1] \rightarrow {\mathbb R}) equipped with the sup-norm (\Vert w \Vert _\infty = \sup _{ t \in [0,1] } | w(t)|). Given a function (f : D \rightarrow {\mathbb R}), by (f^{(k)}) we mean the kth Fréchet derivative of f, which is a map (f^{(k)} : D \rightarrow \mathcal {L}( D^k , {\mathbb R})) from D to the space (\mathcal {L}( D^k, {\mathbb R})) of all continuous multilinear maps from (D^k) to ({\mathbb R}). The k-linear norm of (A \in \mathcal {L}( D^k, {\mathbb R})) is defined by
$$ \Vert A \Vert = \sup _{ \Vert w_i \Vert _\infty \le 1 , \forall i = 1,\ldots , k } | A[w_1,\ldots , w_k] |, $$
where (A[w_1,\ldots , w_k]) denotes A applied to the arguments (w_1,\ldots , w_k \in D).
Following [5], let ({\mathscr {L}}) be the Banach space of all continuous functions (g : D \rightarrow {\mathbb R}) for which the norm
$$ \Vert g \Vert _{{\mathscr {L}}} := \sup _{w \in D} \frac{ | g(w) | }{ 1 + \Vert w \Vert _\infty ^3 } $$
is finite. Let ({\mathscr {M}}\subset {\mathscr {L}}) be the subcollection of all twice Fréchet differentiable functions (g \in {\mathscr {L}}) that satisfy
$$\begin{aligned} \sup _{w,h \in D, , h \ne 0} \frac{\Vert g’‘(w + h) - g’’(w) \Vert }{ \Vert h \Vert _\infty } < \infty . \end{aligned}$$
(4)
A norm on ({\mathscr {M}}) can be defined as follows:
Proposition 2.1
(See equation (2.7) in [5]). For every (g \in {\mathscr {M}}), define
$$\begin{aligned} \Vert g \Vert _{\mathscr {M}}&= \sup _{w \in D} \frac{ | g(w) | }{ 1 + \Vert w \Vert _\infty ^3 } + \sup _{w \in D} \frac{ \Vert g’ (w) \Vert }{ 1 + \Vert w \Vert _\infty ^2 } + \sup _{w \in D} \frac{ \Vert g’’ (w) \Vert ^2 }{ 1 + \Vert w \Vert _\infty } \&+ \sup _{w,h \in D} \frac{ \Vert g’‘(w + h) - g’’(w) \Vert }{ \Vert h \Vert _\infty } \end{aligned}$$
Then, for all (g \in {\mathscr {M}}), we have (\Vert g \Vert _{\mathscr {M}}< \infty ).
Remark 2.2
For a test function (g \in {\mathscr {M}}), the solution (\phi (g)) to the Stein equation (SE), described in Section 4, satisfies (\phi (g) \in {\mathscr {M}}); see Lemma 4.3.
To facilitate the adaptation of Stein’s method to Brownian approximations of dynamical systems, an additional regularity assumption is imposed on (g \in {\mathscr {M}}). Let ({\mathscr {M}}_0 = {\mathscr {M}}_{0}(C_0) \subset {\mathscr {M}}) consist of all (g \in {\mathscr {M}}) that satisfy the following smoothness condition introduced in [5]:
$$\begin{aligned} \sup _{w \in D} | g’’(w)[J_r, J_s - J_t] | \le C_0 \Vert g \Vert _{{\mathscr {M}}} |t-s|^{1/2} \quad \forall r,s,t \in [0,1], \end{aligned}$$
(5)
where
$$\begin{aligned} J_{ \alpha }(t) = {\left{ \begin{array}{ll} 1, & {\text {if }}{t \ge \alpha }, \ 0, & {\text {if }}t < \alpha . \end{array}\right. } \end{aligned}$$
(6)
Remark 2.3
Condition (5) is used in two places: Lemma 4.2 and Proposition 5.2. For Lemma 4.2, which concerns the definition of the Stein operator (\mathcal {A}), the following weaker condition suffices:
$$\begin{aligned} \lim _{n \rightarrow \infty } \int _0^1 g’‘(w)[ J_{ \lfloor nt\rfloor /n }{(2)} ] , dt = \int _0^1 g’’(w) [ J_t{(2)} ] , dt \quad \forall w \in D. \end{aligned}$$
The formulation of (5) is specifically tailored for Proposition 5.2, which is instrumental for obtaining the error bound in Theorem 2.11. Various relaxations of (5), such as weakening the modulus of continuity (|t-s|^{1/2}) or allowing (| g’’(w)[J_r, J_s - J_t] | ) to grow with (\Vert w \Vert _\infty ) could be considered. However, these would affect the error bounds in Theorem 2.11 and the rates of convergence in the FCLTs discussed in Section 3.
Example 2.4
The collection ({\mathscr {M}}) contains functions of the form
$$ g(w) = \int _0^1 \kappa (t, w(t)) , d m(t) $$
where (\kappa : [0,1] \times {\mathbb R}\rightarrow {\mathbb R}) is a measurable function such that (\kappa (t, \cdot ) \in C^2) for all (t \in [0,1]),
$$\begin{aligned}&\sup _{t \in [0,1]} | \kappa (t,0)| + \sup _{t \in [0,1]} | \partial _2 \kappa (t,0)| + \sup _{t \in [0,1]} | \partial _2^2 \kappa (t,0)| \le C_{\kappa }, \&\sup _{t \in [0,1]} | \partial _2^2 \kappa (t, x ) - \partial _2^2 \kappa (t,y) | \le C_{\kappa } |x - y| \quad \forall x,y \in {\mathbb R}^2, \end{aligned}$$
and m is a probability measure on [0, 1]. If we further require that
$$ m([t,s]) \le C_m |t - s|^{1/2} \quad {\text {and}} \quad \sup _{t \in [0,1], x \in {\mathbb R}} | \partial _2^2 \kappa (t,x)| \le C_{\kappa }’, $$
then ({\mathscr {M}}_0(C_0)) contains g for (C_0 = 2 C_m C_\kappa ’). Indeed, the second Fréchet derivative of g is given by
$$ g’’(w)[h_1, h_2] = \int _0^1 \partial _2^2 \kappa (t, w(t)) h_1(t) h_2(t) , d m(t). $$
Therefore,
$$\begin{aligned} | g’‘(w)[J_r, J_s - J_t] |&\le C_{\kappa }’ m([s,t]) \le C_{\kappa }’ C_m |t - s |{1/2} \le \Vert g \Vert _{{\mathscr {M}}} 2 C_{\kappa }’ C_m |t - s |{1/2}. \end{aligned}$$
Example 2.5
For (t,s \in [0,1]), the following function, which can be used to identify the covariance structure of the limiting process in the FCLT, belongs to ({\mathscr {M}}):
$$ g(w) = w(t)w(s). $$
However, g does not belong to ({\mathscr {M}}_0(C_0)) for any (C_0), as it does not satisfy (5). On the other hand, ({\mathscr {M}}_0(C_0)) for suitably large (C_0 = C_0(\varepsilon )) does contain the following smooth approximation of g:
$$ g_\varepsilon (w) = \int _0^1 \int _0^1 K_\varepsilon ( t - u ) K_\varepsilon ( s - v ) w (u) w(v) , du dv, $$
where (K(x) = ( 2 \pi ){-1 /2 } e{-x^2/2}) is a Gaussian kernel and (K_\varepsilon (x) = \varepsilon ^{-1} K(x/\varepsilon )). The second Fréchet derivative of (g_\varepsilon ) is
$$ g_\varepsilon ‘’(w)[h_1, h_2] = \int _0^1 \int _0^1 K_\varepsilon ( t - u ) K_\varepsilon ( s - v ) h_1 (u) h_2(v) , du dv. $$
An elementary computation yields
$$\begin{aligned} | g_\varepsilon ‘’(w)[J_r, J_s - J_t] | \le C_\varepsilon \Vert g_\varepsilon \Vert _{{\mathscr {M}}} |s -t| \end{aligned}$$
for some constant (C_\varepsilon > 0) depending on (\varepsilon ).
Example 2.6
Let (\Theta \sim N(0,1)) and (U \sim \textrm{Uniform}(0,1)) be random variables independent of Z, where Z is a standard Brownian motion. For (\varepsilon , \delta > 0) and (h : D \rightarrow {\mathbb R}), define
$$ h_{\varepsilon , \delta }(w) = \textbf{E}[ h ( w_\varepsilon + \delta Z + \delta \Theta ) ], $$
where (w_\varepsilon ) is given by
$$\begin{aligned} w_\varepsilon (s) = \textbf{E}[ w ( s + \varepsilon U ) ], \end{aligned}$$
(7)
with the convention that (w(t) = w(0)) if (t < 0) and (w(t) = w(1)) if (t > 1). It was proved in [7, Lemma 1.11] that for any Skorokhod-measurable function h satisfying (\sup _{w \in D} |h(w)| \le 1),
$$ h_{\varepsilon , \delta } \in \mathscr {M}_0(C_0) \quad {\text {with}} \quad C_0 = O \bigl ( \varepsilon ^{-2} \delta ^{-2} \bigr ). $$
The functions (h_{\varepsilon , \delta }) were used in [7] to derive Gaussian smoothing inequalities, which can be applied to extend FCLT error bounds from (g \in \mathscr {M}_0) to broader classes of test functions, including bounded Lipschitz functions [7, equation (1.15)] and indicators of Skorokhod-measurable sets [7, equation (1.14)], albeit at the cost of reduced precision in the estimates.
2.2 Functional correlation bound
Let (X_0, \ldots , X_{N-1}), (N \ge 1), be real-valued random variables defined on a probability space ((M, \mathcal {F}, \mu )) such that
$$\begin{aligned} \Vert X_n \Vert _\infty \le L \quad {\text {and}} \quad \mu (X_n) = 0, \quad {\text {for all }}{0 \le n < N,} \end{aligned}$$
(8)
for some constant (L > 0), where (\mu (X_n)) denotes the expectation of (X_n) with respect to (\mu )Footnote 2. For a finite non-empty subset
$$I = {i_1, \ldots , i_n} \subset {\mathbb Z}_+ \cap [0, N-1]$$
of indices (i_1< \ldots < i_n), we denote by (\nu _I) the joint distribution of the subsequence ((X_i)_{i \in I}). That is, (\nu _I) is a probability measure on ([-L,L]^{n}) characterized by the identity
$$\begin{aligned} \int _{[-L,L]^n} h , d \nu _I = \int _{M} h( X_{i_1} , \ldots , X_{i_n} ) , d \mu \end{aligned}$$
for bounded measurable (h : [-L,L]^n \rightarrow {\mathbb R}). In what follows, we consider unions ( I = \cup _{1 \le k \le K} I_k) of index sets (I_k = {i_{p_{k-1} + 1}, \ldots , i_{p_k} }) with (i_{p_{k-1} + 1}< \ldots < i_{p_k}). We will always assume that the sets are disjoint and ordered, in the sense that the gap between (I_k) and (I_{k+1}) satisfies
$$ g_{k} = i_{p_k + 1} - i_{p_k} > 0 \quad \forall k = 1, \ldots , K-1. $$
For brevity, we shall henceforth write (I_1< \cdots < I_K) to express these conventions.
Definition 2.7
Let (\vartheta \in (0,1]). Given a function (F : [-L,L]^k \rightarrow {\mathbb R}), where (k \ge 1), we define
$$ [F]_{\vartheta } = \max _{1 \le i \le k} \sup _{x \in [-L,L]k} \sup _{a \ne a’} \frac{|F(x(a/i)) - F(x(a’/i))|}{|a - a’|{\vartheta }}. $$
Here, (x(a/i) \in [-L, L]^k) denotes the vector obtained from (x \in [-L,L]^k) by replacing the ith component (x_i) with (a \in [-L, L]). We say that F is separately Hölder continuous with exponent (\vartheta ) if ([F]_{\vartheta } < \infty ), and we define (\Vert F\Vert _{\vartheta } = \Vert F\Vert _\infty + [F]_{\vartheta }). Moreover, if ([F]_{\vartheta } < \infty ) holds with (\vartheta = 1), we say that F is separately Lipschitz continuous and write (\Vert F\Vert _{{\textrm{Lip}}} = \Vert F\Vert _1).
Definition 2.8
We say that ((X_n)_{0 \le n < N}) satisfies a functional correlation bound with rate function (R : {1,2,\ldots } \rightarrow {\mathbb R}_+) and constant (C_*), if the following holds. Whenever (I \subset {\mathbb Z}_+ \cap [0, N -1]) and (I_1 < I_2) are such that (I = I_2 \cup I_2), and (F : [-L, L]^{|I|} \rightarrow {\mathbb R}) is a separately Lipschitz continuous function,
$$\begin{aligned} \biggl | \int F , d \nu _{I} - \int F d ( \nu _{I_1} \otimes \nu _{I_2} ) \biggr | \le C_* \Vert F \Vert _{ \text {Lip} } R(g_1). \end{aligned}$$
(FCB)
By induction, (FCB) readily extends to the case of K index sets ((K-1) gaps) as follows.
Proposition 2.9
If ((X_n)_{0\le n< N}) satisfies the functional correlation bound with rate function (R : {1,2,\ldots } \rightarrow {\mathbb R}_+) and constant (C_*), then the following holds for all (K \ge 2): Whenever (I \subset {\mathbb Z}_+ \cap [0, N -1]) and (I_1< \cdots < I_K) are such that (I = \cup _{k=1}K I_k ), and (F : [-L, L]{|I|} \rightarrow {\mathbb R}) is a separately Lipschitz continuous function,
$$\begin{aligned} \biggl | \int F , d \nu _{I} - \int F d ( \nu _{I_1} \otimes \cdots \otimes \nu _{I_{ K } } ) \biggr | \le C_* \Vert F \Vert _{ \text {Lip} } \sum _{k=1}^{K-1} R(g_k). \end{aligned}$$
(FCB’)
Proof
This is done by induction with respect to (K\ge 2). The base case (K = 2) is given by the assumption that (FCB) holds. We now suppose that (FCB’) holds for (K-1) where (K > 2). Since (FCB) holds for (K=2,) we have
$$\begin{aligned} \biggl |\int F \ d\nu _I-\int F \ d(\nu _{\cup _{i=1}^{K-1}I_i}\otimes \nu _{I_K}) \biggr | \le C_{*}\Vert F \Vert _{ \text {Lip} }R(g_{K-1}). \end{aligned}$$
(9)
Note that
$$\begin{aligned} \int F \ d(\nu _{\cup _{i=1}{K-1}I_i}\otimes \nu _{I_K}) = \int \tilde{F} d \nu _{\cup _{i=1}{K-1}I_i}, \end{aligned}$$
where (\tilde{F} : [-L, L]^{ |I| - |I|_K } \rightarrow {\mathbb R}) is defined by
$$\begin{aligned} \tilde{F}(y_1, \ldots , y_{ p_{K-1} } ) = \int _M F( y_1, \ldots , y_{ p_{K - 1 } } , X_{i_{p_{K-1}+1}}(x_K), \ldots ,X_{i_{p_{K}}}(x_K)) , d \mu (x_K). \end{aligned}$$
Since (\tilde{F}) is separately Lipschitz continuous with
$$ \Vert \tilde{F} \Vert _{{\textrm{Lip}}} \le \Vert F \Vert _{\textrm{Lip}}, $$
it follows from the induction hypothesis that
$$\begin{aligned} \biggl | \int \tilde{F} , d\nu _{\cup _{i=1}{K-1}I_i} - \int \tilde{F} , d(\nu _{I_1}\otimes \cdots \otimes \nu _{I_{K-1}}) \biggr | \le C_{*}\Vert F\Vert _{ {\textrm{Lip}}} \sum _{i=1}{K-2}R(g_{i}). \end{aligned}$$
(10)
Combining (9) and (10) yields (FCB’) for K, which completes the proof. (\square )
Remark 2.10
Correlation decay conditions such as (FCB) arise quite naturally in applications of Stein’s method to distributional approximations of dynamical systems. Let us illustrate this in the context of normal approximations. In the case of the standard normal distribution N(0, 1), the Stein equation is given by
$$\begin{aligned} f’(w) - w f(w) = h(w) - \Phi (h), \end{aligned}$$
(11)
where (\Phi (h)) denotes the expectation of h with respect to N(0, 1). Let ((T, M, \mathcal {F}, \mu )) be a measure-preserving transformation, and let (\varphi : M \rightarrow {\mathbb R}) be a bounded measurable observable with (\mu (\varphi ) = 0). Consider the partial sum ( V = b^{-1} \sum _{n=0}{N-1} X_n, ) where (X_n = \varphi \circ T^n), (b = \sqrt{\text {Var}_\mu ( \sum _{n=0}{N-1} X_n )}), and it is assumed that (b > 0). Here, (T^n = T \circ T^{n-1}) with (T^0) being the identity map. Solving (11) for a given test function h and taking expectations, we have that
$$ | \mu [h(V)] - \Phi (h) | \le | \mu [f’(V) - V f(V)] |. $$
It is well known (see [9]) that for a Lipschitz continuous h, the solution f to (11) has bounded derivatives up to second order. Introducing the punctured sums (V_{n,K} := \sum _{0 \le i < N, |i - n| > K} X_i), we can decompose ( \mu [f’(V) - V f(V)] ) as follows:
$$\begin{aligned}&\mu [V f(V) - f’(V)] \nonumber \&= b^{-1} \mu \biggl ( \sum _{n=0}^{N-1} X_n (f(V) - f(V_{n,K}) - f’(V)(V - V_{n,K})) \biggr ) \end{aligned}$$
(12)
$$\begin{aligned}&\quad + \mu \biggl [ \biggl ( b^{-1} \sum _{n=0}^{N-1} \sum _{ 0 \le m < N, , |n-m| \le K } X_n X_m - 1 \biggr ) f’(V) \biggr ] \end{aligned}$$
(13)
$$\begin{aligned}&\quad + b^{-1} \mu \biggl ( \sum _{n=0}^{N-1} X_n f(V_{n,K}) \biggr ). \end{aligned}$$
(14)
Controlling (12) and (13) involves Taylor expansions, along with mixing properties of the dynamics. For simplicity, let us focus on (14). Observe that, if ((X_n)) in (14) is replaced with a sequence of centered and independent random variables, then (14) vanishes as soon as (K \ge 1). However, for a dependent sequence, (14) can be large. Using (\mu (X_n) = \mu (\varphi ) = 0), we can write
$$ \mu ( X_n f(V_{n,K}) ) = \int F , d \nu _{I} - \int F , d (\nu _{I_1} \otimes \nu _{I_2} \otimes \nu _{I_3}), $$
where (I = \cup _{i=1}^3 I_i) with
$$ I_1 = {0 \le i< N : i< n - K}, \quad I_2 = {n}, \quad I_3 = {0 \le i < N : i > n + K}, $$
and (F : [- \Vert \varphi \Vert _\infty , \Vert \varphi \Vert _\infty ]^{|I|} \rightarrow {\mathbb R}) is a separately Lipschitz continuous function with (\Vert F \Vert _{{\textrm{Lip}}} \le C \Vert \varphi \Vert _\infty \Vert f \Vert _{{\textrm{Lip}}}) for some absolute constant (C > 0). Assuming (FCB’), we find that
$$ |(14)| \le N b^{-1} C_* \Vert F \Vert _{ {\textrm{Lip}}} 2 R(K + 1) \le N b^{-1} C C_* \Vert \varphi \Vert _\infty \Vert f \Vert _{{\textrm{Lip}}} 2 R(K + 1), $$
where it is natural to expect that R(n) decays rapidly as (n \rightarrow \infty ) if T is a suitably chaotic dynamical system and (\varphi ) is sufficiently regular.
2.3 Main result: Brownian approximation under (FCB)
Let (X_0, \ldots , X_{N-1}) be real-valued random variables as in (8). Define the following quantities:
$$\begin{aligned}&\sigma _{i,j} = \mu (X_iX_j), \quad \beta _i = \sum _{j = 0}{N-1} \sigma _{i,j}, \quad B_k = \sum _{i=0}{k-1} \beta _i = \sum _{i = 0}{k-1} \sum _{j=0}{N-1} \sigma _{i,j}, \end{aligned}$$
where we adopt the conventions (\beta _0 = B_0 = 0). Note that
$$\begin{aligned} B := B\