MIT 6.7220/15.084 — Nonlinear Optimization (Spring ‘25) Thu, Feb 20th 2025
Lecture 5
The driver of duality: Separation
Instructor: Prof. Gabriele Farina ( gfarina@mit.edu)★
In Lecture 3 we guessed the expression for the normal cone to the intersection of halfspaces.
We then saw that our guess, natural in its graphical intuition, quite surprisingly implied
immediately strong linear programming duality. In light of this, whatever proof techniques
confirms our guess for the expression of the normal cone correct, rightfully deserves our
attention, as it must encode the grain of duality.
As it turns our, the key idea behind the proof is the concept of separation.
L5.1 Separating a point from a closed convex set
An important property of any convex set Ω is that whenever a point 𝑦 is not in Ω, then
we can separate 𝑦 from Ω using a hyperplane. In other words, flat separating surfaces are
enough for certifying that a point 𝑦 ∉ Ω.
Theorem L5.1. Let Ω ⊆ ℝ𝑛 be a nonempty, closed, and convex set, and let 𝑦 ∈ ℝ𝑛 be a
point. If 𝑦 ∉ Ω, then there exist 𝑢 ∈ ℝ𝑛, 𝑣 ∈ ℝ such that
⟨𝑢, 𝑦⟩ < 𝑣, and ⟨𝑢, 𝑥⟩ ≥ 𝑣 ∀𝑥 ∈ Ω.
Proof. The proof of the result rests on a very simple
idea: the direction of the halfspace will be made orthog:
onal to the line that connects 𝑦 to its projection 𝑥∗ onto
Ω, and the halfspace boundary will be set so that it
passes through 𝑥∗. We now make the argument formal.
First, since Ω is nonempty and closed, a Euclidean pro:
jection 𝑥∗ of 𝑦 onto Ω exists,¹ as we discussed in Lecture
1. In other words, the nonlinear optimization problem
𝑢
𝑦
𝑥∗
Ω
min
𝑥
s.t.
1
2 ‖𝑥 − 𝑦‖2
2
𝑥 ∈ Ω
must have at least a solution 𝑥∗ ∈ Ω. Furthermore, since the objective function is differ:
entiable and Ω is convex, from the first:order optimality conditions (see Lecture 2) we
know that
⟨𝑥∗ − 𝑦, 𝑥 − 𝑥∗⟩ ≥ 0 ∀𝑥 ∈ Ω. (1)
Let now
𝑢 ≔ 𝑥∗ − 𝑦, [▷ this is the direction that connects 𝑦 to 𝑥∗]
and 𝑣 ≔ ⟨𝑢, 𝑥∗⟩. [▷ so that the halfspace boundary passes through 𝑥∗]
Note that 𝑢 ≠ 0, since 𝑥∗ ∈ Ω but 𝑦 ∉ Ω. So, ‖𝑢‖ > 0 and therefore
⟨𝑢, 𝑦⟩ = ⟨𝑢, 𝑥∗ − 𝑢⟩ = 𝑣 − ‖𝑢‖2
2 < 𝑣.
Thus, to complete the proof, we now need to show that ⟨𝑢, 𝑥⟩ ≥ 𝑣 for all 𝑥 ∈ Ω. But this
is exactly what (1) guarantees, since 𝑢 = 𝑥∗ − 𝑦 and 𝑣 = ⟨𝑢, 𝑥∗⟩ by definition. □
The result above might not seem like much. After all, the proof is pretty straightforward,
and the geometric intuition strong enough that one might be tempted to just take it for
granted. Instead, the consequences of the result are deep, far:reaching, and intimately tied
to some of the most significant breakthroughs in mathematical optimization theory.
Remark L5.1. The result of Theorem L5.1 holds even if we insist on only having strict
inequalities, that is ⟨𝑢, 𝑦⟩ < 𝑣 and ⟨𝑢, 𝑥⟩ > 𝑣 for all 𝑥 ∈ Ω. We can see this in two ways:
• Graphically, in the proof we could have chosen 𝑣 so that the halfspace would pass
through the midpoint of the line connecting 𝑦 and 𝑥∗.
• Algebraically, let 𝑢, 𝑣 be as in Theorem L5.1. We will show that we can always
perturb 𝑣 to make both inequalities hold strictly. The key is the observation that
⟨𝑢, 𝑦⟩ = 1
2⟨𝑢, 𝑦⟩ + 1
2⟨𝑢, 𝑦⟩ < 1
2(𝑣 + ⟨𝑢, 𝑦⟩)
⟨𝑢, 𝑥⟩ ≥ 𝑣 = 1
2(𝑣 + 𝑣) > 1
2 (𝑣 + ⟨𝑢, 𝑦⟩) ∀𝑥 ∈ Ω.
Hence, in both cases we have that the scalar 𝑣′ ≔ 1
2 (𝑣 + ⟨𝑢, 𝑦⟩) satisfies ⟨𝑢, 𝑦⟩ <
𝑣′, ⟨𝑢, 𝑥⟩ > 𝑣′ for all 𝑥 ∈ Ω.
L5.1.1 Separating a point from a convex cone
Before we prove the expression for the normal cone at the intersection of halfspaces, we will
find it helpful to use the following corollary of separation for convex cones. A cone is a set
with the property that the ray {𝜆 ⋅ 𝑥 : 𝜆 ≥ 0} generated by any point 𝑥 in the set is fully
contained in the set.
Definition L5.1 (Cone). A set 𝑆 is a cone if, for any 𝑥 ∈ 𝑆 and 𝜆 ∈ ℝ≥0, the point 𝜆 ⋅
𝑥 ∈ 𝑆.
Convex cones are among the simplest convex sets, and they appear all the time in
optimization theory.² In particular, in the next theorem we show that separation of a point
from a nonempty closed convex cone can always be achieved using a hyperplane passing
through the origin.
Theorem L5.2. Let 𝑆 ⊆ ℝ𝑛 be a nonempty closed convex cone, and 𝑦 ∉ 𝑆 be a point in
ℝ𝑛. Then, there exists a hyperplane passing through the origin that separates 𝑦 from
𝑆; formally, there exists 𝑢 ∈ ℝ𝑛 such that
⟨𝑢, 𝑦⟩ < 0 and ⟨𝑢, 𝑥⟩ ≥ 0 ∀𝑥 ∈ 𝑆.
Proof. We already know from Theorem L5.1 that there exist 𝑢 ∈ ℝ𝑛, 𝑣 ∈ ℝ such that
⟨𝑢, 𝑦⟩ < 𝑣 and ⟨𝑢, 𝑥⟩ ≥ 𝑣 ∀𝑥 ∈ 𝑆. (2)
Consider any point 𝑎 ∈ 𝑆. By definition of cone, 𝜆 ⋅ 𝑎 ∈ 𝑆 for all 𝜆 ≥ 0. Thus, the sepa:
ration condition on the right in (2) implies that 𝑣 ≤ 𝜆 ⋅ ⟨𝑢, 𝑎⟩ for all 𝜆 ≥ 0. In particular,
by plugging 𝜆 = 0, we find that 𝑣 ≤ 0, yielding ⟨𝑢, 𝑦⟩ < 0. Furthermore, dividing by 𝜆 we
find that
⟨𝑢, 𝑎⟩ ≥ 𝑣
𝜆 ∀𝜆 ≥ 0 ⟹ ⟨𝑢, 𝑎⟩ ≥ sup
𝜆→∞
𝑣
𝜆 = 0.
Since 𝑎 ∈ 𝑆 was arbitrary, the statement follows. □
L5.2 A second look at the normal cone of linear constraints
In Lecture 3, we considered normal cones at the intersection
of halfspaces. On that occasion, we drew a picture and were
convinced that the normal cone at a point at the intersection of
halfspaces was given by the conic hull of the directions orthog:
onal to those halfspaces (see the figure on the right).
This led to the following guess, which was left unproven.
𝑥
𝑎1
𝑎2
𝒩Ω(𝑥)
Ω
Theorem L5.3. Let Ω ⊆ ℝ𝑛 be defined as the intersection of 𝑚 linear inequalities
Ω ≔ {𝑥 ∈ ℝ𝑛 : 𝐴𝑥 ≤ 𝑏}, where 𝐴 =
(
(((— 𝑎⊤
1 —
⋮
— 𝑎⊤
𝑚 —)
))) ∈ ℝ𝑚×𝑛, 𝑏 ∈ ℝ𝑚.
Given a point 𝑥 ∈ Ω, define the index set of the “active” constraints
𝐼(𝑥) ≔ {𝑗 ∈ {1, ..., 𝑚} : 𝑎⊤
𝑗 𝑥 = 𝑏𝑗}.
Then, the normal cone at any 𝑥 ∈ Ω is given by
𝒩Ω(𝑥) =
{{{
{{
∑
𝑗∈𝐼(𝑥)
𝜆𝑗𝑎𝑗 : 𝜆𝑗 ≥ 0
}}}
}}
= {𝐴⊤𝜆 : 𝜆⊤(𝑏 − 𝐴𝑥) = 0, 𝜆 ∈ ℝ𝑚
≥0},
where the second equality rewrites the condition 𝑗 ∈ 𝐼(𝑥) via the complementary slack
ness (see Lecture 3).
We now give the proof of the above result. We will do that by invoking the machinery of
separation to argue that a direction outside of the normal cone must form an acute angle
with at least one direction that remains in the feasible set Ω.
Proof of Theorem L5.3. Fix any 𝑥 ∈ Ω and let
𝒞(𝑥) ≔
{{{
{{
∑
𝑗∈𝐼(𝑥)
𝜆𝑗𝑎𝑗 : 𝜆𝑗 ≥ 0
}}}
}}
.
We will show that 𝒩Ω(𝑥) = 𝒞(𝑥) by proving the two directions of inclusion separately.
• We start by showing that any 𝑑 ∈ 𝒞(𝑥) belongs to 𝒩Ω(𝑥), that is,
⟨𝑑, 𝑦 − 𝑥⟩ ≤ 0 for all 𝑦 ∈ Ω.
Let 𝑑 be expressed as ∑𝑗∈𝐼(𝑥) 𝜆𝑗𝑎𝑗 with 𝜆𝑗 ≥ 0. Then, for any 𝑦 ∈ Ω,
⟨ ∑
𝑗∈𝐼(𝑥)
𝜆𝑗𝑎𝑗, 𝑦 − 𝑥⟩ = ∑
𝑗∈𝐼(𝑥)
𝜆𝑗⟨𝑎𝑗, 𝑦 − 𝑥⟩
= ∑
𝑗∈𝐼(𝑥)
𝜆𝑗(⟨𝑎𝑗, 𝑦⟩ − 𝑏𝑗) (by definition of 𝐼(𝑥), ⟨𝑎𝑗, 𝑥⟩ = 𝑏𝑗)
≤ ∑
𝑗∈𝐼(𝑥)
𝜆𝑗(𝑏𝑗 − 𝑏𝑗) = 0. (since 𝑦 ∈ Ω and 𝜆𝑗 ≥ 0)
This shows that 𝑑 ∈ 𝒩Ω(𝑥) and concludes the proof of this direction of the inclusion.
• We now look at the other direction. Take any 𝑑 ∉ 𝒞(𝑥). Since 𝒞 is a nonempty
closed convex cone [▷ you should verify this claim], by the conic separation result
of Theorem L5.2, there must exist 𝑢 ∈ ℝ𝑛 such that
⟨𝑢, 𝑑⟩ < 0, and ⟨𝑢, 𝑎⟩ ≥ 0 ∀𝑎 ∈ 𝒞(𝑥). (3)
We argue that for 𝛿 > 0 small enough, the point 𝑦 ≔ 𝑥 − 𝛿 ⋅ 𝑢 belongs to Ω. We do
so by showing that it satisfies all the inequalities 𝑎⊤
𝑗 𝑥 ≤ 𝑏𝑗 that define Ω:
‣ if 𝑗 ∈ 𝐼(𝑥), then ⟨𝑎𝑗, 𝑥 − 𝛿 ⋅ 𝑢⟩ = 𝑏𝑗 − 𝛿 ⋅ ⟨𝑎𝑗, 𝑢⟩ ≤ 𝑏𝑗 since ⟨𝑎𝑗, 𝑢⟩ ≥ 0 by (3).
‣ if 𝑗 ∉ 𝐼(𝑥), then 𝑏𝑗 − ⟨𝑎𝑗, 𝑥⟩ > 0. By continuity, small enough perturbations of
𝑥, in any direction, will not affect the strict inequality.
Thus, the direction 𝛿 ⋅ 𝑢 remains inside of Ω starting from 𝑥. We now argue that it
forms a strictly positive inner product with 𝑑. Indeed, note that from (3)
⟨𝑑, 𝑦 − 𝑥⟩ = ⟨𝑑, −𝛿 ⋅ 𝑢⟩ = −𝛿 ⋅ ⟨𝑑, 𝑢⟩ > 0.
This shows that 𝑑 ∉ 𝒞(𝑥) ⟹ 𝑑 ∉ 𝒩Ω(𝑥), completing the proof. □
L5.3 Separation oracles
The result established in Theorem L5.1 justifies the following definition.
Definition L5.2 ((Strong) separation oracle). Let Ω ⊆ ℝ𝑛 be convex and closed. A strong
separation oracle for Ω is an algorithm that, given any point 𝑦 ∈ ℝ𝑛, correctly outputs
one of the following:
• “𝑦 ∈ Ω”, or
• “(𝑦 ∉ Ω, 𝑢)”, where the vector 𝑢 ∈ ℝ𝑛 is such that
⟨𝑢, 𝑦⟩ < ⟨𝑢, 𝑥⟩ ∀𝑥 ∈ Ω.
L5.3.1 Finding separating hyperplanes in practice
Theorem L5.1 guarantees the existence of a separating hyperplane. In many problems of
interest, constructing a separation oracle is simple.
Example L5.1 (Separation oracle for a convex polytope). Let Ω be a convex polytope,
that is, the convex set defined by the intersection of a finite number of halfspaces (linear
inequalities)
Ω ≔ {𝑥 ∈ ℝ𝑛 : 𝐴𝑥 ≤ 𝑏}, where 𝐴 =
(
(((— 𝑎⊤
1 —
⋮
— 𝑎⊤
𝑚 —)
))) ∈ ℝ𝑚×𝑛, 𝑏 ∈ ℝ𝑚.
Then, given a point 𝑦 ∈ ℝ𝑚, we can implement a separation oracle as follows:
• if 𝐴𝑦 ≤ 𝑏, return “𝑦 ∈ Ω”;
• else, at least one of the inequalities 𝑎⊤
𝑗 𝑦 ≤ 𝑏𝑗, 𝑗 ∈ {1, ..., 𝑚} is violated. In other
words, there exists 𝑗 such that 𝑎⊤
𝑗 𝑦 > 𝑏𝑗, while by definition of Ω, 𝑎⊤
𝑗 𝑥 ≤ 𝑏𝑗 for all
𝑥. This shows that the response “(𝑦 ∉ Ω, −𝑎𝑗)” is a valid response.
Remark L5.2. Example L5.1 shows that whenever we have a finite number 𝑚 of
inequalities, a separation oracle for the polytope defined by those inequalities can be
implemented in time that depends linearly on 𝑚 and the dimension of the embedding
space. This result establishes a blanket guarantee, but in some cases, one can do
better: depending on the structure of the inequalities, sometimes one can get away with
sublinear complexity in 𝑚. In some cases, one might be able to construct an efficient
separation oracle even for polytopes that have an infinite number of inequalities!
We proceed with another classic example of a feasible set that admits a simple separation
oracle.
Example L5.2 (Separation oracle for the semidefinite cone). Let Ω = {𝑀 ∈ ℝ𝑛×𝑛 : 𝑀 ⪰
0} be the set of semidefinite matrices, that is, all symmetric matrices such that 𝑣⊤𝑀 𝑣 ≥
0 for all 𝑣 ∈ ℝ𝑛—or, equivalently, such that all of 𝑀 ’s eigenvalues are nonnegative.
Then, given a point 𝑌 ∈ ℝ𝑛×𝑛, we can implement a separation oracle as follows:
• if 𝑌 is not symmetric, then there exist 𝑖, 𝑗 ∈ {1, ..., 𝑛} such that 𝑌𝑖𝑗 < 𝑌𝑗𝑖; return
“(𝑌 ∉ Ω, 𝐸𝑖𝑗 − 𝐸𝑗𝑖)”, where 𝐸𝑖𝑗 is the matrix of all zeros, except in position 𝑖, 𝑗
where it has a 1.
• else, if 𝑌 is symmetric, we can compute all of its eigenvalues and eigenvectors.
If one eigenvalue is negative, then the corresponding eigenvector 𝑤 must be such
that 𝑤⊤𝑌 𝑤 = ⟨𝑌 , 𝑤𝑤⊤⟩ < 0. Hence, return “(𝑌 ∉ Ω, 𝑤𝑤⊤)”.
• otherwise, return “𝑌 ∈ Ω”.
As we show next, a fundamental result in optimization theory reveals that under mild
hypotheses, if the feasible set admits an efficient separation oracle and the objective function
is convex, then the solution can be computed efficiently.
L5.4 Optimization via separation
In a major breakthrough in mathematical optimization, Khachiyan, L. G. [Kha80] proposed
a polynomial:time algorithm for using separation oracles to find the minimum of a linear
function. The algorithm, which goes under the name of ellipsoid method is actually more
general, and applies to general convex objectives on sets for which separation oracles are
available. The result builds on top of previous work by Šor, N. Z. [Šor77] and Yudin, D. B.,
& Nemirovskii, A. S. [YN76].
In particular, Khachiyan’s result was the first to show that linear programming problems
can be solved in polynomial time. This was an unexpected result at the time, and in fact,
the complexity of linear programming solvers was conjectured to be not polynomial (more
on this in the next section). The result of Khachiyan stirred so much enthusiasm in the
research community that the New York Times even advertised it on its first page.
Despite the enthusiasm, the ellipsoid method turned out to be very impractical. Still, it is
a great theoretical idea, and its consequences are pervasive.
Figure 1: https://timesmachine.nytimes.com/timesmachine/1979/11/07/issue.html
L5.4.1 The intuition behind the ellipsoid method
Formalizing the details of the ellipsoid method is rather complex. A major source of difficulty
is the fact that the algorithm needs to approximate square roots using fractions to be
implementable on a finite:precision machine, and that causes all sorts of tricky analyses
that the approximation error can indeed be kept under control. These details are certainly
important, but are notoriously tedious, and fundamentally they are just that, details. If you
are curious to read a formal account, I recommend the authoritative book by Grötschel,
M., Lovász, L., & Schrijver, A. [GLS93]. For this lecture, we just focus on the idea behind
the ellipsoid method.
The idea behind the ellipsoid method is rather elegant. At its core, it is a generalization
of binary search from one dimension to multiple dimensions. At every iteration of the
algorithm, the space is “cut” by using a separating hyperplane.
■ Feasibility. To build intuition, ignore for now the objective function, and consider the
following problem: given a separation oracle for Ω (closed and convex), either find 𝑥 ∈ Ω,
or determine that Ω is empty. You are given two radiuses:
• the radius 𝑅 > 0 guarantees that if Ω is not empty, then Ω ∩ 𝔹𝑅(0) ≠ ∅;
• the radius 𝑟 > 0 guarantees that if Ω is not empty, then it contains a ball of radius 𝑟
in its interior.
If this problem were one:dimensional, then Ω would be either empty or an interval, and
a separation oracle would be an algorithm that, given any 𝑦 ∈ ℝ, would return whether
𝑦 ∈ Ω, or one of the statements “𝑦 is too small” / “𝑦 is too large”. Solving the problem
now appears easy: start from the interval [−𝑅, 𝑅], and perform a binary search using the
separation oracle to guide the search. Once the size of the search interval drops below 𝑟,
we know that Ω is empty.
The ellipsoid method generalizes this idea to multiple dimensions. At every iteration, it
keeps track of a “search space” (the generalization of the search interval above). Then, it
queries the separation oracle for the center 𝑐𝑡 of this search space. If the point does not
belong to Ω, and the separation oracle returns the separating direction 𝑢 ∈ ℝ𝑛, then the
search space is cut by considering now only the subset of the search space that intersects
{𝑥 ∈ ℝ𝑛 : ⟨𝑢, 𝑥 − 𝑐𝑡⟩ ≥ 0}. The process continues until the volume of the search space
becomes smaller than the radius 𝑟. The reason why this method is called the “ellipsoid
method” is that the search space in the multi:dimensional case is not kept in the form of
an interval, but rather as an ellipsoid. This is mostly for computational reasons, since we
need to have an internal way of representing the search domain that is convenient to use.
■ Incorporating the objective. The above idea can be extended to incorporate an objective
function 𝑓(𝑥). To do that, we will need to start cutting not only the search ellipsoid, but
also the feasible set to make sure we end up at the optimum. In other words, you can think
of this extended ellipsoid method as having “two modes”: while it has not found a feasible
point in Ω, it cuts the search ellipsoid; then, once feasible points are found, it cuts the
feasible set to exclude all values above the current value.
• Initialize at time 𝑡 = 1 with the starting point 𝑦1 ≔ 0 ∈ ℝ𝑛, starting ellipsoid ℰ1 ≔
𝔹𝑅(0), and starting feasible set Ω1 ≔ Ω.
• At each time 𝑡, we ask a separation oracle for Ω𝑡 whether the center 𝑐𝑡 ∈ ℝ𝑛 of the
search ellipsoid ℰ𝑡 belongs to Ω𝑡 or not.³ There are only two cases:
‣ If the center 𝑐𝑡 is not feasible, then set Ω𝑡+1 ≔ Ω𝑡, and cut the search space by
setting ℰ𝑡+1 to an ellipsoid that contains the intersection between ℰ𝑡 and the
halfspace containing Ω𝑡 returned by the separation oracle.
‣ If the center 𝑐𝑡 is feasible, then we know for sure that all points 𝑥 ∈ Ω𝑡 such that
⟨∇𝑓(𝑐𝑡), 𝑥 − 𝑐𝑡⟩ ≥ 0 are such that 𝑓(𝑥) ≥ 𝑓(𝑐𝑡). This follows trivially from the
linear lower bound property of convex functions (Theorem L4.1 of Lecture 4):
⟨∇𝑓(𝑐𝑡), 𝑥 − 𝑐𝑡⟩ ≥ 0 ⟹ 𝑓(𝑥) ≥ 𝑓(𝑐𝑡) + ⟨∇𝑓(𝑐𝑡), 𝑥 − 𝑐𝑡⟩ ≥ 𝑓(𝑐𝑡).
Hence, we can cut both the search ellipsoid ℰ𝑡 and the feasible set Ω𝑡 by consid:
ering their intersection with the halfspace 𝐻𝑡 ≔ {𝑥 ∈ ℝ𝑛 : ⟨∇𝑓(𝑐𝑡), 𝑥 − 𝑐𝑡⟩ ≤ 0}.
In particular, we set Ω𝑡+1 ≔ Ω𝑡 ∩ 𝐻𝑡, and set ℰ𝑡+1 to a smaller ellipsoid that
contains ℰ𝑡 ∩ 𝐻𝑡.
‣ Finally, after the volume of the search ellipsoid has gotten sufficiently small (this
happens after 𝑇 = 𝑂(𝑛2) log(𝑅/𝑟) iterations), we output the following:
– If we never encountered a center 𝑐𝑡 that was feasible, then we report that Ω
was infeasible.
– Else, we output the 𝑐𝑡 that minimizes 𝑓, out of those that were feasible.
Assuming that we can ignore all sorts of tedious rounding issues, the following guarantee
can be shown [Gup20].
Theorem L5.4. Let 𝑅 and 𝑟 be as above, and let the range of the function 𝑓 on
Ω be bounded by [−𝐵, 𝐵]. Then, the ellipsoid method described above run for 𝑇 ≥
2𝑛2 log(𝑅/𝑟) steps either correctly reports that Ω = ∅, or produces a point 𝑥∗ such that
𝑓(𝑥∗) ≤ 𝑓(𝑥) + 2𝐵𝑅
𝑟 exp(− 𝑇
2𝑛(𝑛 + 1) ) ∀𝑥 ∈ Ω.
L5.4.2 Takeaway message: Separation implies optimization
If you squint your eyes, what the ellipsoid method proves constructively is the following:
if we know how to construct a separation oracle for a set Ω, then we can optimize over Ω.
Of course, this is a bit of a simplification (and there are all sorts of little conditions here
and there as we have seen above), but nonetheless it is a good first approximation of the
general message.
In a later lecture, we will discuss how the opposite direction is also known to be true, even
when by “optimization” we simply mean optimization of linear objective functions.
Further readings and bibliography
If you want to read more about the ellipsoid method, the book by Grötschel, M., Lovász, L.,
& Schrijver, A. [GLS93] is a standard and accessible reference on the topic. The bound on the
approximation error incurred by the ellipsoid method was taken from Gupta, A. [Gup20].
[HL01] Hiriart:Urruty, J.:B., & Lemaréchal, C. (2001). Fundamentals of Convex Analysis.
Springer. https://link.springer.com/book/10.1007/978-3-642-56468-0
[Kha80] Khachiyan, L. G. (1980). Polynomial algorithms in linear programming. USSR
Computational Mathematics and Mathematical Physics, 20(1), 53–72.
[Šor77] Šor, N. Z. (1977). Cut:off method with space extension in convex programming
problems. Cybernetics, 13(1), 94–96.
[YN76] Yudin, D. B., & Nemirovskii, A. S. (1976). Informational complexity and efficient
methods for the solution of convex extremal problems. Matekon, 13(2), 22–45.
[GLS93] Grötschel, M., Lovász, L., & Schrijver, A. (1993). Geometric Algorithms and
Combinatorial Optimization. Springer. https://link.springer.com/book/10.1007/
978-3-642-78240-4
[Gup20] Gupta, A. (2020). The Centroid and Ellipsoid Algorithms. https://www.cs.cmu.
edu/~15850/notes/lec21.pdf
Changelog
• Feb 20, 2025: Added Remark L5.1 and edited Definition L5.2.
• Feb 24, 2025: Changed compact → closed in Definition L5.2. (Thanks https://
piazza.com/class/m6lg9aspoutda/post/m7jzmkp7jyv6xd)
★These notes are class material that has not undergone formal peer review. The TAs and I are grateful
for any reports of typos.
¹In fact, it is easy to prove that the projection is unique (see strict convexity, Lecture 4). However, we
do not need uniqueness for the argument that follows.
²Hiriart:Urruty, J.:B., & Lemaréchal, C. [HL01], referring to convex cones, write: “they are important in
convex analysis (the “unilateral” realm of inequalities), just as subspaces are important in linear analysis
(the “bilateral” realm of equalities)”.
³There is a caveat here: technically, we are assuming as given a separation oracle for Ω, not Ω𝑡. Yet,
because Ω𝑡 is obtained from Ω by intersecting with halfspaces, it is easy to see that one separation oracle for
Ω𝑡 can be constructed efficiently starting from that for Ω and the description of the intersected hyperplanes.
Try working out the details!
Lecture 5
The driver of duality: Separation
Instructor: Prof. Gabriele Farina ( gfarina@mit.edu)★
In Lecture 3 we guessed the expression for the normal cone to the intersection of halfspaces.
We then saw that our guess, natural in its graphical intuition, quite surprisingly implied
immediately strong linear programming duality. In light of this, whatever proof techniques
confirms our guess for the expression of the normal cone correct, rightfully deserves our
attention, as it must encode the grain of duality.
As it turns our, the key idea behind the proof is the concept of separation.
L5.1 Separating a point from a closed convex set
An important property of any convex set Ω is that whenever a point 𝑦 is not in Ω, then
we can separate 𝑦 from Ω using a hyperplane. In other words, flat separating surfaces are
enough for certifying that a point 𝑦 ∉ Ω.
Theorem L5.1. Let Ω ⊆ ℝ𝑛 be a nonempty, closed, and convex set, and let 𝑦 ∈ ℝ𝑛 be a
point. If 𝑦 ∉ Ω, then there exist 𝑢 ∈ ℝ𝑛, 𝑣 ∈ ℝ such that
⟨𝑢, 𝑦⟩ < 𝑣, and ⟨𝑢, 𝑥⟩ ≥ 𝑣 ∀𝑥 ∈ Ω.
Proof. The proof of the result rests on a very simple
idea: the direction of the halfspace will be made orthog:
onal to the line that connects 𝑦 to its projection 𝑥∗ onto
Ω, and the halfspace boundary will be set so that it
passes through 𝑥∗. We now make the argument formal.
First, since Ω is nonempty and closed, a Euclidean pro:
jection 𝑥∗ of 𝑦 onto Ω exists,¹ as we discussed in Lecture
1. In other words, the nonlinear optimization problem
𝑢
𝑦
𝑥∗
Ω
min
𝑥
s.t.
1
2 ‖𝑥 − 𝑦‖2
2
𝑥 ∈ Ω
must have at least a solution 𝑥∗ ∈ Ω. Furthermore, since the objective function is differ:
entiable and Ω is convex, from the first:order optimality conditions (see Lecture 2) we
know that
⟨𝑥∗ − 𝑦, 𝑥 − 𝑥∗⟩ ≥ 0 ∀𝑥 ∈ Ω. (1)
Let now
𝑢 ≔ 𝑥∗ − 𝑦, [▷ this is the direction that connects 𝑦 to 𝑥∗]
and 𝑣 ≔ ⟨𝑢, 𝑥∗⟩. [▷ so that the halfspace boundary passes through 𝑥∗]
Note that 𝑢 ≠ 0, since 𝑥∗ ∈ Ω but 𝑦 ∉ Ω. So, ‖𝑢‖ > 0 and therefore
⟨𝑢, 𝑦⟩ = ⟨𝑢, 𝑥∗ − 𝑢⟩ = 𝑣 − ‖𝑢‖2
2 < 𝑣.
Thus, to complete the proof, we now need to show that ⟨𝑢, 𝑥⟩ ≥ 𝑣 for all 𝑥 ∈ Ω. But this
is exactly what (1) guarantees, since 𝑢 = 𝑥∗ − 𝑦 and 𝑣 = ⟨𝑢, 𝑥∗⟩ by definition. □
The result above might not seem like much. After all, the proof is pretty straightforward,
and the geometric intuition strong enough that one might be tempted to just take it for
granted. Instead, the consequences of the result are deep, far:reaching, and intimately tied
to some of the most significant breakthroughs in mathematical optimization theory.
Remark L5.1. The result of Theorem L5.1 holds even if we insist on only having strict
inequalities, that is ⟨𝑢, 𝑦⟩ < 𝑣 and ⟨𝑢, 𝑥⟩ > 𝑣 for all 𝑥 ∈ Ω. We can see this in two ways:
• Graphically, in the proof we could have chosen 𝑣 so that the halfspace would pass
through the midpoint of the line connecting 𝑦 and 𝑥∗.
• Algebraically, let 𝑢, 𝑣 be as in Theorem L5.1. We will show that we can always
perturb 𝑣 to make both inequalities hold strictly. The key is the observation that
⟨𝑢, 𝑦⟩ = 1
2⟨𝑢, 𝑦⟩ + 1
2⟨𝑢, 𝑦⟩ < 1
2(𝑣 + ⟨𝑢, 𝑦⟩)
⟨𝑢, 𝑥⟩ ≥ 𝑣 = 1
2(𝑣 + 𝑣) > 1
2 (𝑣 + ⟨𝑢, 𝑦⟩) ∀𝑥 ∈ Ω.
Hence, in both cases we have that the scalar 𝑣′ ≔ 1
2 (𝑣 + ⟨𝑢, 𝑦⟩) satisfies ⟨𝑢, 𝑦⟩ <
𝑣′, ⟨𝑢, 𝑥⟩ > 𝑣′ for all 𝑥 ∈ Ω.
L5.1.1 Separating a point from a convex cone
Before we prove the expression for the normal cone at the intersection of halfspaces, we will
find it helpful to use the following corollary of separation for convex cones. A cone is a set
with the property that the ray {𝜆 ⋅ 𝑥 : 𝜆 ≥ 0} generated by any point 𝑥 in the set is fully
contained in the set.
Definition L5.1 (Cone). A set 𝑆 is a cone if, for any 𝑥 ∈ 𝑆 and 𝜆 ∈ ℝ≥0, the point 𝜆 ⋅
𝑥 ∈ 𝑆.
Convex cones are among the simplest convex sets, and they appear all the time in
optimization theory.² In particular, in the next theorem we show that separation of a point
from a nonempty closed convex cone can always be achieved using a hyperplane passing
through the origin.
Theorem L5.2. Let 𝑆 ⊆ ℝ𝑛 be a nonempty closed convex cone, and 𝑦 ∉ 𝑆 be a point in
ℝ𝑛. Then, there exists a hyperplane passing through the origin that separates 𝑦 from
𝑆; formally, there exists 𝑢 ∈ ℝ𝑛 such that
⟨𝑢, 𝑦⟩ < 0 and ⟨𝑢, 𝑥⟩ ≥ 0 ∀𝑥 ∈ 𝑆.
Proof. We already know from Theorem L5.1 that there exist 𝑢 ∈ ℝ𝑛, 𝑣 ∈ ℝ such that
⟨𝑢, 𝑦⟩ < 𝑣 and ⟨𝑢, 𝑥⟩ ≥ 𝑣 ∀𝑥 ∈ 𝑆. (2)
Consider any point 𝑎 ∈ 𝑆. By definition of cone, 𝜆 ⋅ 𝑎 ∈ 𝑆 for all 𝜆 ≥ 0. Thus, the sepa:
ration condition on the right in (2) implies that 𝑣 ≤ 𝜆 ⋅ ⟨𝑢, 𝑎⟩ for all 𝜆 ≥ 0. In particular,
by plugging 𝜆 = 0, we find that 𝑣 ≤ 0, yielding ⟨𝑢, 𝑦⟩ < 0. Furthermore, dividing by 𝜆 we
find that
⟨𝑢, 𝑎⟩ ≥ 𝑣
𝜆 ∀𝜆 ≥ 0 ⟹ ⟨𝑢, 𝑎⟩ ≥ sup
𝜆→∞
𝑣
𝜆 = 0.
Since 𝑎 ∈ 𝑆 was arbitrary, the statement follows. □
L5.2 A second look at the normal cone of linear constraints
In Lecture 3, we considered normal cones at the intersection
of halfspaces. On that occasion, we drew a picture and were
convinced that the normal cone at a point at the intersection of
halfspaces was given by the conic hull of the directions orthog:
onal to those halfspaces (see the figure on the right).
This led to the following guess, which was left unproven.
𝑥
𝑎1
𝑎2
𝒩Ω(𝑥)
Ω
Theorem L5.3. Let Ω ⊆ ℝ𝑛 be defined as the intersection of 𝑚 linear inequalities
Ω ≔ {𝑥 ∈ ℝ𝑛 : 𝐴𝑥 ≤ 𝑏}, where 𝐴 =
(
(((— 𝑎⊤
1 —
⋮
— 𝑎⊤
𝑚 —)
))) ∈ ℝ𝑚×𝑛, 𝑏 ∈ ℝ𝑚.
Given a point 𝑥 ∈ Ω, define the index set of the “active” constraints
𝐼(𝑥) ≔ {𝑗 ∈ {1, ..., 𝑚} : 𝑎⊤
𝑗 𝑥 = 𝑏𝑗}.
Then, the normal cone at any 𝑥 ∈ Ω is given by
𝒩Ω(𝑥) =
{{{
{{
∑
𝑗∈𝐼(𝑥)
𝜆𝑗𝑎𝑗 : 𝜆𝑗 ≥ 0
}}}
}}
= {𝐴⊤𝜆 : 𝜆⊤(𝑏 − 𝐴𝑥) = 0, 𝜆 ∈ ℝ𝑚
≥0},
where the second equality rewrites the condition 𝑗 ∈ 𝐼(𝑥) via the complementary slack
ness (see Lecture 3).
We now give the proof of the above result. We will do that by invoking the machinery of
separation to argue that a direction outside of the normal cone must form an acute angle
with at least one direction that remains in the feasible set Ω.
Proof of Theorem L5.3. Fix any 𝑥 ∈ Ω and let
𝒞(𝑥) ≔
{{{
{{
∑
𝑗∈𝐼(𝑥)
𝜆𝑗𝑎𝑗 : 𝜆𝑗 ≥ 0
}}}
}}
.
We will show that 𝒩Ω(𝑥) = 𝒞(𝑥) by proving the two directions of inclusion separately.
• We start by showing that any 𝑑 ∈ 𝒞(𝑥) belongs to 𝒩Ω(𝑥), that is,
⟨𝑑, 𝑦 − 𝑥⟩ ≤ 0 for all 𝑦 ∈ Ω.
Let 𝑑 be expressed as ∑𝑗∈𝐼(𝑥) 𝜆𝑗𝑎𝑗 with 𝜆𝑗 ≥ 0. Then, for any 𝑦 ∈ Ω,
⟨ ∑
𝑗∈𝐼(𝑥)
𝜆𝑗𝑎𝑗, 𝑦 − 𝑥⟩ = ∑
𝑗∈𝐼(𝑥)
𝜆𝑗⟨𝑎𝑗, 𝑦 − 𝑥⟩
= ∑
𝑗∈𝐼(𝑥)
𝜆𝑗(⟨𝑎𝑗, 𝑦⟩ − 𝑏𝑗) (by definition of 𝐼(𝑥), ⟨𝑎𝑗, 𝑥⟩ = 𝑏𝑗)
≤ ∑
𝑗∈𝐼(𝑥)
𝜆𝑗(𝑏𝑗 − 𝑏𝑗) = 0. (since 𝑦 ∈ Ω and 𝜆𝑗 ≥ 0)
This shows that 𝑑 ∈ 𝒩Ω(𝑥) and concludes the proof of this direction of the inclusion.
• We now look at the other direction. Take any 𝑑 ∉ 𝒞(𝑥). Since 𝒞 is a nonempty
closed convex cone [▷ you should verify this claim], by the conic separation result
of Theorem L5.2, there must exist 𝑢 ∈ ℝ𝑛 such that
⟨𝑢, 𝑑⟩ < 0, and ⟨𝑢, 𝑎⟩ ≥ 0 ∀𝑎 ∈ 𝒞(𝑥). (3)
We argue that for 𝛿 > 0 small enough, the point 𝑦 ≔ 𝑥 − 𝛿 ⋅ 𝑢 belongs to Ω. We do
so by showing that it satisfies all the inequalities 𝑎⊤
𝑗 𝑥 ≤ 𝑏𝑗 that define Ω:
‣ if 𝑗 ∈ 𝐼(𝑥), then ⟨𝑎𝑗, 𝑥 − 𝛿 ⋅ 𝑢⟩ = 𝑏𝑗 − 𝛿 ⋅ ⟨𝑎𝑗, 𝑢⟩ ≤ 𝑏𝑗 since ⟨𝑎𝑗, 𝑢⟩ ≥ 0 by (3).
‣ if 𝑗 ∉ 𝐼(𝑥), then 𝑏𝑗 − ⟨𝑎𝑗, 𝑥⟩ > 0. By continuity, small enough perturbations of
𝑥, in any direction, will not affect the strict inequality.
Thus, the direction 𝛿 ⋅ 𝑢 remains inside of Ω starting from 𝑥. We now argue that it
forms a strictly positive inner product with 𝑑. Indeed, note that from (3)
⟨𝑑, 𝑦 − 𝑥⟩ = ⟨𝑑, −𝛿 ⋅ 𝑢⟩ = −𝛿 ⋅ ⟨𝑑, 𝑢⟩ > 0.
This shows that 𝑑 ∉ 𝒞(𝑥) ⟹ 𝑑 ∉ 𝒩Ω(𝑥), completing the proof. □
L5.3 Separation oracles
The result established in Theorem L5.1 justifies the following definition.
Definition L5.2 ((Strong) separation oracle). Let Ω ⊆ ℝ𝑛 be convex and closed. A strong
separation oracle for Ω is an algorithm that, given any point 𝑦 ∈ ℝ𝑛, correctly outputs
one of the following:
• “𝑦 ∈ Ω”, or
• “(𝑦 ∉ Ω, 𝑢)”, where the vector 𝑢 ∈ ℝ𝑛 is such that
⟨𝑢, 𝑦⟩ < ⟨𝑢, 𝑥⟩ ∀𝑥 ∈ Ω.
L5.3.1 Finding separating hyperplanes in practice
Theorem L5.1 guarantees the existence of a separating hyperplane. In many problems of
interest, constructing a separation oracle is simple.
Example L5.1 (Separation oracle for a convex polytope). Let Ω be a convex polytope,
that is, the convex set defined by the intersection of a finite number of halfspaces (linear
inequalities)
Ω ≔ {𝑥 ∈ ℝ𝑛 : 𝐴𝑥 ≤ 𝑏}, where 𝐴 =
(
(((— 𝑎⊤
1 —
⋮
— 𝑎⊤
𝑚 —)
))) ∈ ℝ𝑚×𝑛, 𝑏 ∈ ℝ𝑚.
Then, given a point 𝑦 ∈ ℝ𝑚, we can implement a separation oracle as follows:
• if 𝐴𝑦 ≤ 𝑏, return “𝑦 ∈ Ω”;
• else, at least one of the inequalities 𝑎⊤
𝑗 𝑦 ≤ 𝑏𝑗, 𝑗 ∈ {1, ..., 𝑚} is violated. In other
words, there exists 𝑗 such that 𝑎⊤
𝑗 𝑦 > 𝑏𝑗, while by definition of Ω, 𝑎⊤
𝑗 𝑥 ≤ 𝑏𝑗 for all
𝑥. This shows that the response “(𝑦 ∉ Ω, −𝑎𝑗)” is a valid response.
Remark L5.2. Example L5.1 shows that whenever we have a finite number 𝑚 of
inequalities, a separation oracle for the polytope defined by those inequalities can be
implemented in time that depends linearly on 𝑚 and the dimension of the embedding
space. This result establishes a blanket guarantee, but in some cases, one can do
better: depending on the structure of the inequalities, sometimes one can get away with
sublinear complexity in 𝑚. In some cases, one might be able to construct an efficient
separation oracle even for polytopes that have an infinite number of inequalities!
We proceed with another classic example of a feasible set that admits a simple separation
oracle.
Example L5.2 (Separation oracle for the semidefinite cone). Let Ω = {𝑀 ∈ ℝ𝑛×𝑛 : 𝑀 ⪰
0} be the set of semidefinite matrices, that is, all symmetric matrices such that 𝑣⊤𝑀 𝑣 ≥
0 for all 𝑣 ∈ ℝ𝑛—or, equivalently, such that all of 𝑀 ’s eigenvalues are nonnegative.
Then, given a point 𝑌 ∈ ℝ𝑛×𝑛, we can implement a separation oracle as follows:
• if 𝑌 is not symmetric, then there exist 𝑖, 𝑗 ∈ {1, ..., 𝑛} such that 𝑌𝑖𝑗 < 𝑌𝑗𝑖; return
“(𝑌 ∉ Ω, 𝐸𝑖𝑗 − 𝐸𝑗𝑖)”, where 𝐸𝑖𝑗 is the matrix of all zeros, except in position 𝑖, 𝑗
where it has a 1.
• else, if 𝑌 is symmetric, we can compute all of its eigenvalues and eigenvectors.
If one eigenvalue is negative, then the corresponding eigenvector 𝑤 must be such
that 𝑤⊤𝑌 𝑤 = ⟨𝑌 , 𝑤𝑤⊤⟩ < 0. Hence, return “(𝑌 ∉ Ω, 𝑤𝑤⊤)”.
• otherwise, return “𝑌 ∈ Ω”.
As we show next, a fundamental result in optimization theory reveals that under mild
hypotheses, if the feasible set admits an efficient separation oracle and the objective function
is convex, then the solution can be computed efficiently.
L5.4 Optimization via separation
In a major breakthrough in mathematical optimization, Khachiyan, L. G. [Kha80] proposed
a polynomial:time algorithm for using separation oracles to find the minimum of a linear
function. The algorithm, which goes under the name of ellipsoid method is actually more
general, and applies to general convex objectives on sets for which separation oracles are
available. The result builds on top of previous work by Šor, N. Z. [Šor77] and Yudin, D. B.,
& Nemirovskii, A. S. [YN76].
In particular, Khachiyan’s result was the first to show that linear programming problems
can be solved in polynomial time. This was an unexpected result at the time, and in fact,
the complexity of linear programming solvers was conjectured to be not polynomial (more
on this in the next section). The result of Khachiyan stirred so much enthusiasm in the
research community that the New York Times even advertised it on its first page.
Despite the enthusiasm, the ellipsoid method turned out to be very impractical. Still, it is
a great theoretical idea, and its consequences are pervasive.
Figure 1: https://timesmachine.nytimes.com/timesmachine/1979/11/07/issue.html
L5.4.1 The intuition behind the ellipsoid method
Formalizing the details of the ellipsoid method is rather complex. A major source of difficulty
is the fact that the algorithm needs to approximate square roots using fractions to be
implementable on a finite:precision machine, and that causes all sorts of tricky analyses
that the approximation error can indeed be kept under control. These details are certainly
important, but are notoriously tedious, and fundamentally they are just that, details. If you
are curious to read a formal account, I recommend the authoritative book by Grötschel,
M., Lovász, L., & Schrijver, A. [GLS93]. For this lecture, we just focus on the idea behind
the ellipsoid method.
The idea behind the ellipsoid method is rather elegant. At its core, it is a generalization
of binary search from one dimension to multiple dimensions. At every iteration of the
algorithm, the space is “cut” by using a separating hyperplane.
■ Feasibility. To build intuition, ignore for now the objective function, and consider the
following problem: given a separation oracle for Ω (closed and convex), either find 𝑥 ∈ Ω,
or determine that Ω is empty. You are given two radiuses:
• the radius 𝑅 > 0 guarantees that if Ω is not empty, then Ω ∩ 𝔹𝑅(0) ≠ ∅;
• the radius 𝑟 > 0 guarantees that if Ω is not empty, then it contains a ball of radius 𝑟
in its interior.
If this problem were one:dimensional, then Ω would be either empty or an interval, and
a separation oracle would be an algorithm that, given any 𝑦 ∈ ℝ, would return whether
𝑦 ∈ Ω, or one of the statements “𝑦 is too small” / “𝑦 is too large”. Solving the problem
now appears easy: start from the interval [−𝑅, 𝑅], and perform a binary search using the
separation oracle to guide the search. Once the size of the search interval drops below 𝑟,
we know that Ω is empty.
The ellipsoid method generalizes this idea to multiple dimensions. At every iteration, it
keeps track of a “search space” (the generalization of the search interval above). Then, it
queries the separation oracle for the center 𝑐𝑡 of this search space. If the point does not
belong to Ω, and the separation oracle returns the separating direction 𝑢 ∈ ℝ𝑛, then the
search space is cut by considering now only the subset of the search space that intersects
{𝑥 ∈ ℝ𝑛 : ⟨𝑢, 𝑥 − 𝑐𝑡⟩ ≥ 0}. The process continues until the volume of the search space
becomes smaller than the radius 𝑟. The reason why this method is called the “ellipsoid
method” is that the search space in the multi:dimensional case is not kept in the form of
an interval, but rather as an ellipsoid. This is mostly for computational reasons, since we
need to have an internal way of representing the search domain that is convenient to use.
■ Incorporating the objective. The above idea can be extended to incorporate an objective
function 𝑓(𝑥). To do that, we will need to start cutting not only the search ellipsoid, but
also the feasible set to make sure we end up at the optimum. In other words, you can think
of this extended ellipsoid method as having “two modes”: while it has not found a feasible
point in Ω, it cuts the search ellipsoid; then, once feasible points are found, it cuts the
feasible set to exclude all values above the current value.
• Initialize at time 𝑡 = 1 with the starting point 𝑦1 ≔ 0 ∈ ℝ𝑛, starting ellipsoid ℰ1 ≔
𝔹𝑅(0), and starting feasible set Ω1 ≔ Ω.
• At each time 𝑡, we ask a separation oracle for Ω𝑡 whether the center 𝑐𝑡 ∈ ℝ𝑛 of the
search ellipsoid ℰ𝑡 belongs to Ω𝑡 or not.³ There are only two cases:
‣ If the center 𝑐𝑡 is not feasible, then set Ω𝑡+1 ≔ Ω𝑡, and cut the search space by
setting ℰ𝑡+1 to an ellipsoid that contains the intersection between ℰ𝑡 and the
halfspace containing Ω𝑡 returned by the separation oracle.
‣ If the center 𝑐𝑡 is feasible, then we know for sure that all points 𝑥 ∈ Ω𝑡 such that
⟨∇𝑓(𝑐𝑡), 𝑥 − 𝑐𝑡⟩ ≥ 0 are such that 𝑓(𝑥) ≥ 𝑓(𝑐𝑡). This follows trivially from the
linear lower bound property of convex functions (Theorem L4.1 of Lecture 4):
⟨∇𝑓(𝑐𝑡), 𝑥 − 𝑐𝑡⟩ ≥ 0 ⟹ 𝑓(𝑥) ≥ 𝑓(𝑐𝑡) + ⟨∇𝑓(𝑐𝑡), 𝑥 − 𝑐𝑡⟩ ≥ 𝑓(𝑐𝑡).
Hence, we can cut both the search ellipsoid ℰ𝑡 and the feasible set Ω𝑡 by consid:
ering their intersection with the halfspace 𝐻𝑡 ≔ {𝑥 ∈ ℝ𝑛 : ⟨∇𝑓(𝑐𝑡), 𝑥 − 𝑐𝑡⟩ ≤ 0}.
In particular, we set Ω𝑡+1 ≔ Ω𝑡 ∩ 𝐻𝑡, and set ℰ𝑡+1 to a smaller ellipsoid that
contains ℰ𝑡 ∩ 𝐻𝑡.
‣ Finally, after the volume of the search ellipsoid has gotten sufficiently small (this
happens after 𝑇 = 𝑂(𝑛2) log(𝑅/𝑟) iterations), we output the following:
– If we never encountered a center 𝑐𝑡 that was feasible, then we report that Ω
was infeasible.
– Else, we output the 𝑐𝑡 that minimizes 𝑓, out of those that were feasible.
Assuming that we can ignore all sorts of tedious rounding issues, the following guarantee
can be shown [Gup20].
Theorem L5.4. Let 𝑅 and 𝑟 be as above, and let the range of the function 𝑓 on
Ω be bounded by [−𝐵, 𝐵]. Then, the ellipsoid method described above run for 𝑇 ≥
2𝑛2 log(𝑅/𝑟) steps either correctly reports that Ω = ∅, or produces a point 𝑥∗ such that
𝑓(𝑥∗) ≤ 𝑓(𝑥) + 2𝐵𝑅
𝑟 exp(− 𝑇
2𝑛(𝑛 + 1) ) ∀𝑥 ∈ Ω.
L5.4.2 Takeaway message: Separation implies optimization
If you squint your eyes, what the ellipsoid method proves constructively is the following:
if we know how to construct a separation oracle for a set Ω, then we can optimize over Ω.
Of course, this is a bit of a simplification (and there are all sorts of little conditions here
and there as we have seen above), but nonetheless it is a good first approximation of the
general message.
In a later lecture, we will discuss how the opposite direction is also known to be true, even
when by “optimization” we simply mean optimization of linear objective functions.
Further readings and bibliography
If you want to read more about the ellipsoid method, the book by Grötschel, M., Lovász, L.,
& Schrijver, A. [GLS93] is a standard and accessible reference on the topic. The bound on the
approximation error incurred by the ellipsoid method was taken from Gupta, A. [Gup20].
[HL01] Hiriart:Urruty, J.:B., & Lemaréchal, C. (2001). Fundamentals of Convex Analysis.
Springer. https://link.springer.com/book/10.1007/978-3-642-56468-0
[Kha80] Khachiyan, L. G. (1980). Polynomial algorithms in linear programming. USSR
Computational Mathematics and Mathematical Physics, 20(1), 53–72.
[Šor77] Šor, N. Z. (1977). Cut:off method with space extension in convex programming
problems. Cybernetics, 13(1), 94–96.
[YN76] Yudin, D. B., & Nemirovskii, A. S. (1976). Informational complexity and efficient
methods for the solution of convex extremal problems. Matekon, 13(2), 22–45.
[GLS93] Grötschel, M., Lovász, L., & Schrijver, A. (1993). Geometric Algorithms and
Combinatorial Optimization. Springer. https://link.springer.com/book/10.1007/
978-3-642-78240-4
[Gup20] Gupta, A. (2020). The Centroid and Ellipsoid Algorithms. https://www.cs.cmu.
edu/~15850/notes/lec21.pdf
Changelog
• Feb 20, 2025: Added Remark L5.1 and edited Definition L5.2.
• Feb 24, 2025: Changed compact → closed in Definition L5.2. (Thanks https://
piazza.com/class/m6lg9aspoutda/post/m7jzmkp7jyv6xd)
★These notes are class material that has not undergone formal peer review. The TAs and I are grateful
for any reports of typos.
¹In fact, it is easy to prove that the projection is unique (see strict convexity, Lecture 4). However, we
do not need uniqueness for the argument that follows.
²Hiriart:Urruty, J.:B., & Lemaréchal, C. [HL01], referring to convex cones, write: “they are important in
convex analysis (the “unilateral” realm of inequalities), just as subspaces are important in linear analysis
(the “bilateral” realm of equalities)”.
³There is a caveat here: technically, we are assuming as given a separation oracle for Ω, not Ω𝑡. Yet,
because Ω𝑡 is obtained from Ω by intersecting with halfspaces, it is easy to see that one separation oracle for
Ω𝑡 can be constructed efficiently starting from that for Ω and the description of the intersected hyperplanes.
Try working out the details!