to analyze your time series as a data scientist? Have you ever wondered whether signal processing could make your life easier?
If yes — stay with me. This article is made for you. 🙂
Working with real-world time series can be… painful. Financial curves, ECG traces, neural signals: they often look like chaotic spikes with no structure at all.
Working with real-world time series can be… painful. Financial curves, ECG traces, neural signals: they often look like chaotic spikes with no structure at all.
Working with real-world time series can be… painful. Financial curves, ECG traces, neural signals: they often look like chaotic spikes with no structure at all.
In data science, we tend to rely on classical statistical preprocessing: seasonal decomposition, detrending, smoothing, mo…
to analyze your time series as a data scientist? Have you ever wondered whether signal processing could make your life easier?
If yes — stay with me. This article is made for you. 🙂
Working with real-world time series can be… painful. Financial curves, ECG traces, neural signals: they often look like chaotic spikes with no structure at all.
Working with real-world time series can be… painful. Financial curves, ECG traces, neural signals: they often look like chaotic spikes with no structure at all.
Working with real-world time series can be… painful. Financial curves, ECG traces, neural signals: they often look like chaotic spikes with no structure at all.
In data science, we tend to rely on classical statistical preprocessing: seasonal decomposition, detrending, smoothing, moving averages… These techniques are useful, but they come with strong assumptions that are rarely valid in practice. And when those assumptions fail, your machine learning model might underperform or not generalize.
Today, we’ll explore a family of methods that are rarely taught in data-science training, yet they can completely transform how you work with time data.
On Today’s Menu 🍔
🍰 Why traditional methods struggle with real-world time series 🍛 How signal-processing tools can help 🍔 How Empirical Mode Decomposition (EMD) works and where it fails
The “classic” preprocessing techniques I mentioned above are good starting points, but as i said they rely on fixed, defined assumptions about how a signal should behave.
Most of them assume that the signal is stationary, meaning its statistical properties (mean, variance, spectral content) stay constant over time.
But in reality, most real signals are:
- non-stationary (their frequency content evolves)
- non-linear (they cannot be explained by simple additive components)
- noisy
- mixed with multiple oscillations at once
So… what exactly is a “signal”?
A signal is simply any quantity that varies over time (what we usually call a time series in data science).
Some examples:
- ❤️ ECG or EEG — biomedical/brain signals
- 🌋 Seismic activity — geophysics
- 🖥️ CPU usage — system monitoring
- 💹 Stock prices, volatility, order flow — finance
- 🌦️ Temperature or humidity — climate science
- 🎧 Audio waveforms — speech & sound analysis
Figure 1: Example of Magnetoencephalography (MEG) signal data. (Image by author)
Signals are everywhere. And almost all of them violate the assumptions of classical time-series models.
They are rarely “clean.” What i mean is that a single signal is usually a mixture of several processes happening at the same time.
Inside one signal, you can often find:
- slow trends
- periodic oscillations
- short bursts
- random noise
- hidden rhythms you can’t see directly
👉 Now imagine you could separate all of these components — directly from the data — without assuming stationarity, without specifying frequency bands, and without forcing the signal into a predefined basis.
That’s the promise of data-driven signal decomposition.
This article is Part 1 of a 3-article series on adaptive decomposition:
- EMD — Empirical Mode Decomposition (today)
- VMD — Variational Mode Decomposition (next)
- MVMD — Multivariate VMD (next)
Each method is more powerful and more stable than the previous one — and by the end of the series, you’ll understand how signal-processing methods can extract clean, interpretable components.
Empirical Mode Decomposition
Empirical Mode Decomposition was introduced by Huang et al. (1998) as part of the Hilbert–Huang Transform. Its goal is simple but powerful: take a signal and split it into a set of clean oscillatory components, called Intrinsic Mode Functions (IMFs).
Each IMF corresponds to an oscillation present in your signal, from the fastest to the slowest trends.
Take a look at Figure 2 below: At the top, you see the original signal. Below it, you see several IMFs — each one capturing a different “layer” of oscillation hidden inside the data.
IMF₁ contains the fastest variations IMF₂ captures a slightly slower rhythm … The last IMF + residual represent the slow trend or baseline
Some IMFs will be useful for your machine learning task; others may correspond to noise, artifacts, or irrelevant oscillations.
Figure 2: Original signal (top) and 5 IMFs (bottom), ordered from high-frequency to low-frequency components. (Image by author)
What is the Math behind EMD?
Any signal x(t) is decomposed by EMD as:

Where:
- Ci(t) are the Intrinsic Mode Functions (IMFs)
- IMF₁ captures the fastest oscillations
- IMF₂ captures a slower oscillation, and so on…
- r(t) is the residual — the slow trend or baseline
- Adding all IMFs + the residual reconstructs the original signal exactly.
An IMF is a clean oscillation obtained directly from the data. It must satisfy two simple properties:
- The number of zero crossings ≈ the number of extrema → The oscillation is well-behaved.
- The mean of the upper and lower envelopes is approximately zero → The oscillation is locally symmetric, with no long-term information.
These two rules make IMFs fundamentally data-driven and adaptive unlike Fourier or wavelets, which force the signal into predetermined shapes.
The intuition behind the EMD Algorithm
The EMD algorithm is surprisingly intuitive. Here’s the extraction loop:
- Start with your signal
- Find all local maxima and minima
- Interpolate them to form an upper and a lower envelope (see Figure 3)
- Compute the mean of both envelopes
- Subtract this mean from the signal
→ This gives you a “candidate IMF.”
6. Then check the two IMF conditions:
- Does it have the same number of zero crossings and extrema?
- Is the mean of its envelopes approximately zero?
If yes → You have extracted IMF₁. If no → You repeat the process (called sifting) until it meets the criteria.
7. Once you obtain IMF₁ (the fastest oscillation):
- You subtract it from the original signal,
- The remainder becomes the new signal,
- And you repeat the process to extract IMF₂, IMF₃, …
This continues until there is no meaningful oscillation left. What remains is the residual trend r(t).
Figure 3: One iteration of the EMD. Top: Original signal (blue). Middle: Upper and lower envelopes (red). Bottom: Local mean (black). (Image by author)
EMD in Practice
To really understand how EMD works, let’s create our own synthetic signal.
We’ll mix three components:
- A low-frequency oscillation (around 5 Hz)
- A high-frequency oscillation (around 30 Hz)
- A bit of random white noise
Once everything is summed into one single messy signal, we’ll apply the EMD method.
import numpy as np
import matplotlib.pyplot as plt
# --- Parameters ---
Fs = 500 # Sampling frequency (Hz)
t_end = 2 # Duration in seconds
N = Fs * t_end # Total number of samples
t = np.linspace(0, t_end, N, endpoint=False)
# --- Components ---
# 1. Low-frequency component (Alpha-band equivalent)
f1 = 5
s1 = 2 * np.sin(2 * np.pi * f1 * t)
# 2. High-frequency component (Gamma-band equivalent)
f2 = 30
s2 = 1.5 * np.sin(2 * np.pi * f2 * t)
# 3. White noise
noise = 0.5 * np.random.randn(N)
# --- Composite Signal ---
signal = s1 + s2 + noise
# Plot the synthetic signal
plt.figure(figsize=(12, 4))
plt.plot(t, signal)
plt.title(f'Synthetic Signal (Components at {f1} Hz and {f2} Hz)')
plt.xlabel('Time (s)')
plt.ylabel('Amplitude')
plt.grid(True)
plt.tight_layout()
plt.show()
Figure 4: A Synthetic Signal Containing Multiple Frequencies. (Image by author)
An important detail:
EMD automatically chooses the number of IMFs. It keeps decomposing the signal until a stopping criterion is reached — typically when:
- no more oscillatory structure can be extracted
- or the residual becomes a monotonic trend
- or the sifting process stabilizes
(You can also set a maximum number of IMFs if needed, but the algorithm naturally stops on its own.)
from PyEMD import EMD
# Initialize EMD
emd = EMD()
IMFs = emd.emd(signal, max_imf=10)
# Plot Original Signal and IMFs
fig, axes = plt.subplots(IMFs.shape[0] + 1, 1, figsize=(10, 2 * IMFs.shape[0]))
fig.suptitle('EMD Decomposition Results', fontsize=14)
axes[0].plot(t, signal)
axes[0].set_title('Original Signal')
axes[0].set_xlim(t[0], t[-1])
axes[0].grid(True)
for n, imf in enumerate(IMFs):
axes[n + 1].plot(t, imf, 'g')
axes[n + 1].set_title(f"IMF {n+1}")
axes[n + 1].set_xlim(t[0], t[-1])
axes[n + 1].grid(True)
plt.tight_layout(rect=[0, 0.03, 1, 0.95])
plt.show()
Figure 5: EMD Decomposition of the Synthetic Signal. (Image by author)
EMD Limitations
EMD is powerful, but it has several weaknesses:
- Mode mixing: different frequencies can end up in the same IMF.
- Oversplitting: EMD decides the number of IMFs on its own and can extract too many.
- Noise sensitivity: small noise changes can completely alter the IMFs.
- No solid mathematical foundation: results are not guaranteed to be stable or unique.
Because of these limitations, several improved versions exist (EEMD, CEEMDAN), but they remain empirical.
This is exactly why methods like VMD were created — and this is what we’ll explore in the next article of this series.