Erdos-Szekeres Theorem

This post is dedicated to Erdos-Szekeres Theorem, which says:

ES Thm. (Monotone sequence)

For any positive integer $n$, any sequence of $n^2 + 1$ distinct real numbers contains a monotone subsequence of length at least $n+1$.

This means that within any sufficiently long sequence, there is either an increasing subsequence or a decreasing subsequence of a certain minimum length.

There’s another version of the theorem statement coming from the 1935 paper by Paul Erdos and George Szekeres¹, which concerns combinatorial geometry. It says:

ES Thm. (combinatorial geometry)

For every positive integer $n$ among every $N=\binom{2 n-4}{n-2}+1 \sim 4^{n} / \sqrt{n}$ points $p_1, \ldots, p_N \in \mathbb{R}^2$, where $p_i = (x_i , y_i)$ and $x_1 < x_2 < \ldots < x_N$, there is a convex configuration or a concave configuation of at least $n$ points.

This means that there are indices $i_1 < \ldots < i_n$ such that the slopes of the segments $p_{i_j} p_{i_{j+1}}, j = 1, \ldots, n-1$ are either all nondecreasing or all nonincreasing.

These two statements can be traced back to Ramsey’s theory.

Ramsey Thm.

For every $k,r,n \in \mathbb{N}$ there exists a number $N \in \mathbb{N}$ such that for every coloring of the $k$-tuples of the $[N]$ by $r$ colors there is a subset $T \subseteq [N]$ of size $n$ such that all $k$-tuples of elements of $T$ have the same color. The Ramsey number, $R^r_n (k)$ is the smallest such $N$.

I might have lost you here already. The connections between these three theorems are pretty vague to me at first. But let me make an analogy to clarify that ES theorem actually makes a special case of Ramsey theorem.

First, Ramsey’s theorem can be translated using graph theory language. Let’s focus on the case where we are coloring pairs of points, so $k=2$. The theorem then says that if you take a complete graph $K_N$ with a large enough number of vertices $N$, and you color every edge with one of $r$ colors, you are guaranteed to find a complete subgraph $K_n$ (a clique of size $n$) where all edges have the same color.

Let’s see how this applies to the Monotone Sequence Theorem.

Imagine there is a graph that connects the elements of our sequence of $N = n^2+1$ distinct numbers, ${x_1, x_2, \ldots, x_N}$.

Vertices: Let the vertices of a complete graph $K_N$ be the numbers $x_1, \ldots, x_N$.
Edges: The edges are all the pairs $(x_i, x_j)$ where $i < j$.
Colors: We will use two colors ($r=2$), let’s call them “blue” and “red”.
- Color the edge $(x_i, x_j)$ blue if $x_i < x_j$ (the sequence is “increasing” at this step).
- Color the edge $(x_i, x_j)$ red if $x_i > x_j$ (the sequence is “decreasing” at this step).

Since every pair of numbers is either increasing or decreasing, every edge in our complete graph $K_N$ is colored. Now, what does Ramsey’s Theorem promise us? It guarantees that there exists a monochromatic clique of a certain size. What is that in our context?

A blue clique of size $n+1$ is a subset of vertices ${x_{i_1}, x_{i_2}, \ldots, x_{i_{n+1}}}$ with $i_1 < i_2 < \ldots < i_{n+1}$ such that every edge between them is blue. This means for any pair $j < k$, the edge $(x_{i_j}, x_{i_k})$ is blue, which implies $x_{i_j} < x_{i_k}$. This is precisely an increasing subsequence of length $n+1$.
A red clique of size $n+1$ is a subset of vertices ${x_{i_1}, x_{i_2}, \ldots, x_{i_{n+1}}}$ with $i_1 < i_2 < \ldots < i_{n+1}$ such that every edge between them is red. This means for any pair $j < k$, the edge $(x_{i_j}, x_{i_k})$ is red, which implies $x_{i_j} > x_{i_k}$. This is precisely a decreasing subsequence of length $n+1$.

Ramsey’s Theorem for two colors, $R(n+1, n+1)$, gives us an upper bound on the sequence length $N$ needed to guarantee one of these structures. The Erdős-Szekeres Theorem provides a much tighter bound, $N = n^2 + 1$, but the underlying principle is the same: in a sufficiently large system, order must emerge.

A Simple Proof

While the connection to Ramsey Theory is fundamental, there is a wonderfully simple proof of the Monotone Sequence Theorem that feels almost like magic. It’s one of my all-time favorites.

Let’s take our sequence $X = (x_1, x_2, \ldots, x_{n^2+1})$. For each number $x_k$ in the sequence, let’s assign it a special label—a pair of integers $(a_k, b_k)$. We define them like this:

$a_k$ = the length of the longest increasing subsequence that ends with $x_k$.
$b_k$ = the length of the longest decreasing subsequence that ends with $x_k$.

Now, here’s the crucial insight: every single one of these labels is unique.

Let’s think about why this has to be true. Pick any two numbers from our sequence, say $x_i$ and $x_j$, where $x_i$ comes before $x_j$ (so $i < j$). Since all the numbers are distinct, there are only two possibilities:

If $x_i$ is smaller than $x_j$: We can take the longest increasing subsequence ending at $x_i$ (which we know has length $a_i$) and just tack $x_j$ onto the end. This gives us a new, longer increasing subsequence that ends at $x_j$. Its length is $a_i + 1$. This means the longest possible increasing subsequence ending at $x_j$ must be at least this long, so $a_j \ge a_i + 1$. Right away, we know $a_i$ and $a_j$ can’t be the same.
If $x_i$ is greater than $x_j$: We can do the same trick with the decreasing subsequences. Take the longest decreasing subsequence ending at $x_i$ (length $b_i$) and add $x_j$ to it. This creates a decreasing subsequence ending at $x_j$ of length $b_i + 1$. So, we must have $b_j \ge b_i + 1$, which means $b_i$ and $b_j$ can’t be the same.

In either scenario, the label $(a_i, b_i)$ is different from the label $(a_j, b_j)$. Since this applies to any pair of numbers in our sequence, it means all $n^2+1$ labels are unique.

This is where the argument comes together. Let’s assume, just for a moment, that the theorem is wrong and there is no monotone subsequence of length $n+1$. What would that mean for our labels?

It would mean that the length of any increasing subsequence is at most $n$, and the length of any decreasing subsequence is also at most $n$. This puts a very tight limit on the possible values in our labels: $$1 \le a_k \le n$$ $$1 \le b_k \le n$$

So, if our assumption is true, how many different labels could we possibly create? The first number, $a_k$, can be anything from 1 to $n$. The second number, $b_k$, can also be anything from 1 to $n$. This gives us a total of $n \times n = n^2$ possible unique labels.

But wait. We have $n^2+1$ numbers in our sequence, and we just proved that every single one of them generates a completely unique label. This leads to a logical contradiction: we have $n^2+1$ unique items, but we only have $n^2$ available slots for them to fit into. That’s simply not possible.

By contradiction, there must be at least one monotone subsequence of length $n+1$ or greater, hence the proof.

Erdös, P. and Szekeres, G., 1935. A combinatorial problem in geometry. Compositio mathematica, 2, pp.463-470. ↩︎