A Linear-Time Alternative To t-SNE for Dimensionality Reduction and Fast Visualisation

Moving data visualisation from a Python notebook to a web browser usually demands a painful compromise: you either pay for a heavy GPU backend or you force the user to wait while JavaScript struggles through iterative algorithms.

This article explores a third option: Sine Landmark Reduction (SLR).

SLR is a deterministic, linear-time alternative to t-SNE designed specifically for the browser. It bypasses the heavy optimisation loops of traditional methods by using trilateration against a fixed topological skeleton. The result? A method fast enough to power Thingbook’s DriftMind stack, capable of mapping 9,000 datapoints (at 50 dimensions) into 3D space in under two seconds.

We will cover:

Why t-SNE/UMAP are a poor fit for the browser...

This article explores a third option: Sine Landmark Reduction (SLR).

We will cover:

Why t-SNE/UMAP are a poor fit for the browser
The idea of landmarks instead of all-pairs distances
How to build a synthetic “sine skeleton” in high-D
How linearised trilateration turns distances into coordinates
Two important refinements: alpha scaling and distance warping
A compact Python implementation of SLR you can experiment with today

Why do most dimensionality-reduction techniques fail in resource-limited environments? (such as your browser…)

Methods like t-SNE and UMAP are excellent for static, offline exploration, but they are fundamentally unsuited for rapid, iterative visual inspection. During exploratory analysis, many data-driven decisions depend on immediate feedback, insights that guide the next analytical step or help you assess the operational status of highly complex datasets. When the underlying method cannot deliver results interactively due to memory or compute constraints, the entire exploratory workflow breaks down. Several reasons explain this limitation:

They rely on iterative optimisation (gradient descent style loops).
They typically need to compare many or all points to each other, leading to O(N²) complexity.
For 10,000 points, that’s on the order of 100 million pairwise interactions.
In a browser, that means dropped frames, frozen UIs, or shipping the entire dataset to a backend.

For an interactive tool where users drag-and-drop a CSV and expect something to appear almost instantly, we need something closer to O(N): one pass over the points, not N passes.

This is the design target for SLR: linear time, deterministic, analytic, and simple enough to run in plain JavaScript or WebAssembly.

Landmarks Instead of All-Pairs

The core idea is simple: Instead of comparing every point to every other point, compare every point to a small, fixed set of landmarks. Imagine placing 100 “radio towers” in high-dimensional space (k=50) to cover all the space. To embed a new point x, you don’t ask

“how far am I from every other point?”

You only ask:

“How far am I from each of these 100 towers?”

If we do this for N points, the work is O(N × k). With k fixed and relatively small, this is effectively O(N).

The key questions then become:

How do we choose good landmarks to place my towers?
Given only distances to these landmarks, how do we reconstruct a low-dimensional coordinate?

SLR answers these with:

A sine-based synthetic skeleton or data-derived skeleton
A fast, analytic linearised trilateration step inspired by GPS localisation

Building a Synthetic Sine Skeleton

The first way to obtain landmarks is to invent them synthetically, without using any real data points. SLR defines a smooth path through the n-dimensional space using independent sine waves on each dimension:

Each coordinate uses its own amplitude, frequency, and phase, drawn from simple uniform distributions. In code:

class SineLandmarkReduction: def __init__(self, n_components=2, n_landmarks=50, random_state=42, synthetic_landmarks=False): self.n_components = n_components self.k = n_landmarks self.synthetic_landmarks = synthetic_landmarks self.rng = np.random.RandomState(random_state) def _gamma(self, t, a, omega, phi): """Synthetic sine path function γ(t).""" # result shape: (n_dims, n_landmarks) return a[:, None] * np.sin(omega[:, None] * t + phi[:, None])

To build the landmarks:

a = self.rng.uniform(0.5, 2.0, n_features) omega = self.rng.uniform(0.5, 1.5, n_features) phi = self.rng.uniform(0, 2 * np.pi, n_features) t = np.linspace(0, 2 * np.pi, self.k) self.L_high = self._gamma(t, a, omega, phi).T # shape: (k, n_features)

We then sample k points along this path to define the high-dimensional landmarks:

Provides a visualisation of the synthetic ’reference skeleton’ that anchors the SLR algorithm. This winding sine curve acts as a probe, exploring the bounds of the high-dimensional space (simplified here to 3D). By planting fixed landmarks at regular intervals along this path, we create a set of synthetic ’satellites.’ These serve as constant anchors, allowing the algorithm to locate real data points based on their distance to this stable structure, much like GPS.

Because the curve winds smoothly through the space, the landmarks form a well-spread, continuous loop when projected to 3D subspaces:

3D Projection of Sine Landmarks using PCA. The landmarks form a smooth, closed loop in the reduced space, ensuring consistent coverage

2D pairwise projections of the same landmarks, confirming that the non-linear structure is preserved across dimensions

Why sine waves?

Extremely cheap to compute
Deterministic given a random seed
Naturally explore the space without clustering in arbitrary regions

This mode is ideal when you want a stable, model-driven skeleton that is independent of the dataset.

Data-Derived Landmarks and Hybrid Normalisation

SLR allows you to skip the automatic skeleton generation and extract the Landmarks from your dataset structure without losing the O(N) complexity. Essentially, this mode lets the landmarks adapt to the data. This is useful when the dataset has a strong cluster structure or an interesting topology that a synthetic curve might miss.

The trick is a hybrid normalisation strategy:

Raw selection

Select landmark indices from the unnormalised data.
High-variance features dominate cluster structure; staying in raw scale helps us capture that.

2. Normalised computation

After selecting the landmarks, normalise both X and the selected landmarks (e.g. StandardScaler).
All features then contribute fairly to Euclidean distances during trilateration.

In the implementation:

if self.synthetic_landmarks: # synthetic branch (see previous section) ... else: # 1. Select k landmark indices from raw data idx = self.rng.choice(n_samples, size=self.k, replace=False) self.L_high = X[idx].copy() # raw landmarks # 2. Fit PCA skeleton on raw landmarks L_low_raw = pca.fit_transform(self.L_high) # 3. Normalise X and landmarks for distance computations X_scaled = scaler.fit_transform(X) self.L_high = scaler.transform(self.L_high)

We now have high-D landmarks L ∈ R^(k×n). To embed everything in e.g. 2D or 3D, we first map the landmarks themselves to a low dimension using PCA:

1. Centre the Data: Centre the landmark matrix L to obtain L_bar.

2. Compute Covariance: Compute the covariance matrix Σ: Σ = (1 / k-1) · L_barᵀ · L_bar

3. Form the Projection Matrix: Take the top m eigenvectors to form the projection matrix Wm.

4. Project to Low Dimensions: Obtain low-D landmark coordinates L’_raw: L’_raw = L_bar · Wm

Because the distance computations later are done in the normalised feature space, we match scales using an RMS ratio:

rms_high = np.sqrt(np.mean([ np.linalg.norm(self.L_high[i] - self.L_high[j])**2 for i in range(self.k) for j in range(i+1, self.k) ])) rms_low = np.sqrt(np.mean([ np.linalg.norm(L_low_raw[i] - L_low_raw[j])**2 for i in range(self.k) for j in range(i+1, self.k) ])) self.L_low = L_low_raw * (rms_high / rms_low)

This ensures:

The shape of the low-D skeleton matches the raw data geometry (for data-derived landmarks).
The scale of the skeleton is compatible with the normalised distance metric.

Linearised Trilateration: The GPS Analogy

With landmarks defined both in high-D and low-D, the core embedding step becomes:

Given a point x, find a low-D point y whose distances to the low-D landmarks match the high-D distances from x to the high-D landmarks.

Let:

L_j: The high-D landmarks (after scaling).
L''_j: The low-D landmarks (self.L_low).
δ²_j: The squared distance from x to landmark j.

We ideally want:

This is a classic trilateration problem (think GPS). Naively, it is non-linear in y. SLR’s key trick is to linearise it.

Take the equation for landmark 0 and subtract it from each equation j:

Expanding and cancelling the yᵀy terms give a linear system:

Stacking these for j = 1…k-1 yields:

Where:

A depends only on low-D landmarks.
b depends on the measured distances δ²_j for a given x.

In code, we precompute A and its pseudoinverse once:

# Precompute solver self.L0_low = self.L_low[0] self.A = 2 * (self.L_low[1:] - self.L0_low) # shape: (k-1, m) self.A_pinv = np.linalg.pinv(self.A) # Moore–Penrose pseudoinverse self.L_low_sq_norms = np.sum(self.L_low**2, axis=1)

Then, for a batch of points X:

# Squared distances in high-D diff = X_scaled[:, np.newaxis, :] - self.L_high[np.newaxis, :, :] delta_sq = np.sum(diff**2, axis=2) # shape: (n_samples, k) # Right-hand side b for all points term1 = self.L_low_sq_norms[1:] - self.L_low_sq_norms[0] # shape: (k-1,) term2 = delta_sq[:, 1:] - delta_sq[:, 0:1] # broadcast b = term1 - term2 # (n_samples, k-1) # Initial low-D coordinates (first pass) Y_raw = b @ self.A_pinv.T # (n_samples, m)

Trilateration geometry. Three (or more) landmarks define circles (or spheres) with radii given by the distances. Their intersection identifies the point y. SLR solves this algebraically after linearization.

Because A is fixed, embedding a new point is just:

Compute k squared distances.
Build b.
One matrix multiplication.

This is exactly the type of workload that scales linearly with N and runs comfortably in the browser.

Alpha Refinement: Fixing Global Scale

In practice, we do a two-pass mapping. The first pass gives Y_raw, but some global scale mismatch can remain between high-D and low-D distances.

SLR introduces a global scalar α that best aligns the two:

In code:

# Squared distances in high-D diff = X_scaled[:, np.newaxis, :] - self.L_high[np.newaxis, :, :] delta_sq = np.sum(diff**2, axis=2) # shape: (n_samples, k) # Right-hand side b for all points term1 = self.L_low_sq_norms[1:] - self.L_low_sq_norms[0] # shape: (k-1,) term2 = delta_sq[:, 1:] - delta_sq[:, 0:1] # broadcast b = term1 - term2 # (n_samples, k-1) # Initial low-D coordinates (first pass) Y_raw = b @ self.A_pinv.T # (n_samples, m)

We then rescale the high-D distances and solve again:

delta_sq_corrected = (alpha**2) * delta_sq term2_corr = delta_sq_corrected[:, 1:] - delta_sq_corrected[:, 0:1] b_corr = term1 - term2_corr Y_final = b_corr @ self.A_pinv.T

This simple, non-iterative correction significantly improves embedding quality while keeping the whole procedure analytic and fast.

Distance Warping: Tuning Locality vs Global Geometry

t-SNE is popular because it exaggerates local structure: clusters become very tight and well separated. Pure trilateration, on the other hand, preserves global geometry more faithfully.

SLR adds a knob to interpolate between these behaviours via distance warping:

p = 1.0 → pure global geometry
p ≈ 0.5 → stronger local neighbourhoods
p ≈ 0.33 → visually similar separation to t-SNE

In the reference implementation, the warp is applied right after computing distances:

# delta_sq shape: (k,) delta_sq = np.sum((x - self.L_high)**2, axis=1) # Nonlinear locality warp delta = np.sqrt(delta_sq) # Euclidean distances p = 0.5 # try 0.5; smaller p → stronger locality delta = delta ** p delta_sq = delta ** 2

This gives you t-SNE-like cluster separation while preserving the deterministic, analytic nature of SLR.

Putting It Together: The fit_transform Pipeline

The full fit_transform method ties everything together: scaling, landmark construction, PCA, trilateration, alpha refinement, and final embedding.

Here is a condensed view (non-essential boilerplate omitted):

def fit_transform(self, X): scaler = StandardScaler() n_samples, n_features = X.shape pca = PCA(n_components=self.n_components) if self.synthetic_landmarks: # Synthetic sine landmarks a = self.rng.uniform(0.5, 2.0, n_features) omega = self.rng.uniform(0.5, 1.5, n_features) phi = self.rng.uniform(0, 2 * np.pi, n_features) t = np.linspace(0, 2 * np.pi, self.k) self.L_high = self._gamma(t, a, omega, phi).T L_low_raw = pca.fit_transform(self.L_high) X_scaled = scaler.fit_transform(X) else: # Data-derived landmarks idx = self.rng.choice(n_samples, size=self.k, replace=False) self.L_high = X[idx].copy() L_low_raw = pca.fit_transform(self.L_high) X_scaled = scaler.fit_transform(X) self.L_high = scaler.transform(self.L_high) # RMS scaling of low-D skeleton ... # (rms_high / rms_low scaling as shown earlier) # Precompute trilateration solver self.L0_low = self.L_low[0] self.A = 2 * (self.L_low[1:] - self.L0_low) self.A_pinv = np.linalg.pinv(self.A) self.L_low_sq_norms = np.sum(self.L_low**2, axis=1) # Vectorised first pass (Y_raw) diff = X_scaled[:, np.newaxis, :] - self.L_high[np.newaxis, :, :] delta_sq = np.sum(diff**2, axis=2) term1 = self.L_low_sq_norms[1:] - self.L_low_sq_norms[0] term2 = delta_sq[:, 1:] - delta_sq[:, 0:1] b = term1 - term2 Y_raw = b @ self.A_pinv.T # Alpha refinement and second pass diff_low = Y_raw[:, np.newaxis, :] - self.L_low[np.newaxis, :, :] dist_low = np.linalg.norm(diff_low, axis=2) dist_high = np.sqrt(delta_sq) alpha = np.sum(dist_low * dist_high) / np.sum(delta_sq) delta_sq_corrected = (alpha**2) * delta_sq term2_corr = delta_sq_corrected[:, 1:] - delta_sq_corrected[:, 0:1] b_corr = term1 - term2_corr Y_final = b_corr @ self.A_pinv.T return Y_final

From the browser’s perspective, once the landmarks and pseudoinverse are precomputed, embedding new points is just distance computations + one matrix multiply.

Example: SLR vs t-SNE on a 5-Cluster Dataset

To evaluate SLR against a familiar baseline, consider 5 Gaussian clusters in a 20-dimensional space (5,000 points). Using SLR with appropriate p:

LR embedding of the 5-cluster dataset. Clusters are cleanly separated while preserving meaningful global geometry.

Using t-SNE on the same data:

You get a flavour of the trade-off:

t-SNE

Strong local separation
Poor global interpretability
Stochastic, run-to-run variability
Iterative, quadratic, and slower

SLR

Deterministic layouts
Preserved global structure
Tunable locality via p
Linear time, analytic, out-of-sample mapping

A concise comparison:

Minimal Usage Example

Here is a minimal example showing how to use SineLandmarkReduction on a synthetic dataset and plot the result:

import numpy as np import matplotlib.pyplot as plt from sklearn.datasets import make_blobs X, y = make_blobs( n_samples=5000, n_features=20, centers=5, cluster_std=0.80, random_state=42 ) slr = SineLandmarkReduction( n_components=2, n_landmarks=50, random_state=42, synthetic_landmarks=False # or True, depending on your use case ) Y = slr.fit_transform(X) plt.figure(figsize=(7, 6)) scatter = plt.scatter(Y[:, 0], Y[:, 1], c=y, s=8, alpha=0.8) plt.xlabel("SLR Dimension 1") plt.ylabel("SLR Dimension 2") plt.title("Sine Landmark Reduction (SLR) Visualization") plt.colorbar(scatter, label="Cluster Label") plt.show()

Replace the synthetic dataset with your own feature matrix and you have a ready-to-use, deterministic embedding.

In a browser deployment, the same ideas can be ported to JavaScript or WebAssembly with minimal changes: the algorithm itself only needs basic linear algebra and a pseudoinverse.

Closing Thoughts

Sine Landmark Reduction (SLR) is designed from first principles for environments where:

Latency matters (interactive drag-and-drop exploration).
Resources are constrained (no GPU, no heavy backend).
Reproducibility is a feature (deterministic embeddings across runs).

By:

Constructing a synthetic or data-driven landmark skeleton,
Projecting it via PCA with RMS scaling,
Using linearised trilateration for an analytic solution,
Adding alpha refinement for scale consistency, and
Exposing distance warping as a control for locality,

SLR offers a fast, deterministic, and interpretable alternative to t-SNE/UMAP in browser-native contexts.

It is already powering Thingbook’s interactive Data Explorer, but the underlying idea is more general: if you can formalise your embedding problem in terms of distances to a small set of stable landmarks, you can achieve real-time dimensionality reduction with simple linear algebra.

If you want to experiment further, the Python implementation shown here can be dropped into your own notebooks or adapted to a JavaScript/WebAssembly stack to enable SLR directly in your web applications.

A Linear-Time Alternative To t-SNE for Dimensionality Reduction and Fast Visualisation was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.