Market StructureActive

Correlated Market Clustering (Spectral + K-Means)

A clustering pipeline that groups markets by how their daily belief changes move together, using correlation, spectral embedding, and k-means. This model is designed to surface resilient hedges with measurable statistical confidence, and it is already showing promise in live research.

Created 2/8/2025

Correlated Market Clustering (Spectral + K-Means)

Specification narrative

Detailed walk-through with math-ready notation.

Overview

Objective: Cluster markets based on correlated daily belief changes, not raw price levels. This makes clusters reflect how markets move together in response to news.

1. Start with daily probability closes

Each market has a daily close price $p_t$ between 0 and 1. We convert probabilities to log-odds to stabilize behavior near 0 and 1:

$\text{logit}(p_t) = \ln\left(\frac{p_t}{1 - p_t}\right)$

To avoid infinities we clamp:

$p_t = \min(\max(p_t, \varepsilon), 1 - \varepsilon)$

2. Turn prices into daily returns

We cluster on changes, not levels:

$r_t = \text{logit}(p_t) - \text{logit}(p_{t-1})$

Interpretation: We measure how belief moved today, not whether belief is high or low.

3. Compare markets by correlation of returns

For every pair of markets we compute Pearson correlation:

$C[i, j] = \text{corr}(r_i, r_j)$

$C[i, j] = 1$ means they move together.
$C[i, j] = 0$ means no relationship.
$C[i, j] < 0$ means they move opposite (hedge-like).

We only keep positive correlation for clustering.

4. Build a similarity matrix

We convert correlation into a similarity matrix:

$S[i, j] = \max(C[i, j], 0)$

Negative correlations become 0 so they do not pull markets together.

5. Spectral embedding

We treat $S$ as a network and embed it into low-dimensional coordinates:

Build a graph Laplacian from $S$
Compute eigenvectors
Use those eigenvectors as market coordinates

Markets that move together land close to one another.

6. K-means clustering

Once markets are points, we cluster them with k-means to minimize within-cluster distance.

7. Choosing the number of clusters (elbow method)

We run k-means for $k = 2, 3, \ldots, K$ , compute inertia, and select the elbow point where additional clusters stop improving fit. The elbow trace is stored for auditing and visualization.

What these clusters mean

A cluster is a group of markets that move together in belief changes. This often reflects shared information flow or macro drivers, not necessarily similar wording.

Summary

We cluster markets based on correlated daily belief changes using spectral embedding + k-means, so you can see groups of markets that react similarly to news and events.

Key formulas

These equations are referenced throughout the specification.

Log-odds transform

\text{logit}(p_t) = \ln\left(\frac{p_t}{1 - p_t}\right)

Clamped probability

p_t = \min(\max(p_t, \varepsilon), 1 - \varepsilon)

Return series

r_t = \text{logit}(p_t) - \text{logit}(p_{t-1})

Correlation matrix

C[i, j] = \text{corr}(r_i, r_j)

Similarity matrix

S[i, j] = \max(C[i, j], 0)