MIT 6.7220/15.084 — Nonlinear Optimization (Spring ‘25) Tue, Mar 4th 2025
Lecture 8
Lagrangian duality
Instructor: Prof. Gabriele Farina ( gfarina@mit.edu)★
We know since Lecture 3, when we introduced convexity, that the first-order optimality
condition
−∇𝑓(𝑥) ∈ 𝒩Ω(𝑥)
is both necessary and sufficient for optimality in convex optimization problems—that is,
problems in which the objective function 𝑓 is convex, and the feasible set Ω is convex. With
the machinery of Lecture 7, we now know that if Ω is expressed via functional constraints,
and Slater’s condition holds (or all the constraints are linear), the KKT conditions are both
necessary and sufficient (and in fact, the normal cone to the linearization at each point
matches the normal cone to the original feasible set).
In other words, consider a convex optimization problem of the form
(𝑃 ) ≔
{
{
{
{
{min
𝑥
s.t.
𝑓(𝑥) 𝑓 : ℝ𝑛 → ℝ convex
𝑔𝑗(𝑥) ≤ 0 𝑗 = 1, ..., 𝑟 𝑔𝑗 : ℝ𝑛 → ℝ convex
ℎ𝑖(𝑥) = 0 𝑖 = 1, ..., 𝑠 ℎ𝑖 : ℝ𝑛 → ℝ affine,
under the assumption that either all 𝑔𝑗 are affine, or that Slater’s condition holds at all
points, that is, there exists a point 𝑥0 such that 𝑔𝑗(𝑥0) < 0 for all 𝑗 = 1, ..., 𝑟. Then, we
know from Lecture 7 that the following two statements are equivalent:
A The point 𝑥∗ is optimal for (P) ⟺
B The point 𝑥∗ ∈ Ω admits 𝜆∗ ∈ ℝ𝑟
≥0, 𝜇∗ ∈ ℝ𝑠
such that
−∇𝑓(𝑥∗) = ∑
𝑟
𝑗=1
𝜆∗
𝑗 ∇𝑔𝑗(𝑥∗) + ∑
𝑠
𝑖=1
𝜇∗
𝑖 ∇ℎ𝑖(𝑥∗),
𝜆∗
𝑗 𝑔𝑗(𝑥∗) = 0 ∀𝑗 ∈ {1, ..., 𝑟}.
(As a reminder, the implication A ⟹ B is necessity of KKT conditions and comes from
constraint qualification. The reverse direction B ⟹ A is sufficiency of the KKT conditions
and requires convexity of 𝑓 and 𝑔𝑗 and the affinity of ℎ𝑖.)
In this lecture, we ponder the depth of this statement by interpreting it from three different
points of view. As usual, we will use the letter Ω to denote the feasible set of (P), that is,
Ω ≔ {𝑥 ∈ ℝ𝑛 : 𝑔𝑗(𝑥) ≤ 0, ℎ𝑖(𝑥) = 0 for all 𝑖 = 1, ..., 𝑠, 𝑗 = 1, ..., 𝑟}.
L8.1 Point of view #1: Certification of optimality
In a sense, the multipliers 𝜆∗, 𝜇∗ are nothing but a reflection of the expression for the normal
cone to the linearization of the feasible set: it just so happens that the normal cone to linear
constraints can be written as a linear combination of vectors, and 𝜆∗, 𝜇∗ are simply these
combination coefficients. However, an aspect worth paying attention to is the very different
conceptual flavor of the two statements A and B :
• The statement A is asserting that 𝑥∗ is better than all other points in the feasible
set. It is a “for all” kind of statement.
• The statement B is asserting that 𝑥∗ admits one set of multipliers 𝜆∗, 𝜇∗ with certain
properties that are easy to check. It is an “exists” kind of statement.
This equivalence for example implies that verifying whether a point is optimal is easy, at
least assuming that all gradients of all functions involved only use rational numbers and a
reasonable number of bits in their encoding. Indeed, if someone claims that 𝑥∗ is optimal, we
could ask them to show us the multipliers 𝜆∗, 𝜇∗. This ability to efficiently certify optimality
is not at all trivial.
In fact, assuming that all gradients of all functions involved only use rational numbers and
a reasonable number of bits in their encoding, we can go even further. Given a putative
optimal solution 𝑥∗, we could compute ∇𝑓(𝑥∗) and the gradients ∇𝑔𝑗(𝑥∗), ∇ℎ𝑖(𝑥∗) of the
active constraints, and determine whether suitable 𝜆∗ ∈ ℝ𝑟
≥0, 𝜇∗ ∈ ℝ𝑠 exist by using a linear
program. This shows that we would not even need to ask for multipliers, but we could
instead come up with them ourselves.
L8.2 Point of view #2: From constraints to penalizations
By rearranging the terms, the gradient equality in B can be written as
∇𝑓(𝑥∗) + ∑
𝑟
𝑗=1
𝜆∗
𝑗 ∇𝑔𝑗(𝑥∗) + ∑
𝑠
𝑖=1
𝜇∗
𝑖 ∇ℎ𝑖(𝑥∗) = 0.
It is evident that the left-hand side has the flavor of the gradient of a function. In particular,
it is the gradient with respect to 𝑥 and evaluated at the point (𝑥∗; 𝜆∗, 𝜇∗) of the function
ℒ : ℝ𝑛 × (ℝ𝑟
≥0 × ℝ𝑠) → ℝ, ℒ(𝑥; 𝜆, 𝜇) ≔ 𝑓(𝑥) + ∑
𝑟
𝑗=1
𝜆𝑗𝑔𝑗(𝑥) + ∑
𝑠
𝑖=1
𝜇𝑖ℎ𝑖(𝑥).
The function ℒ is called the Lagrangian function associated with problem (P).
Given any 𝜇 ∈ ℝ𝑠 and 𝜆 ∈ ℝ𝑟
≥0, the function ℒ(𝑥; 𝜆, 𝜇) is a convex function of 𝑥 ∈ ℝ𝑛. Since
ℝ𝑛 is open and the Lagrangian is convex in 𝑥, we know from Lecture 3 that the condition
∇𝑥ℒ(𝑥; 𝜆, 𝜇) = 0 is both necessary and sufficient for (i.e., it is equivalent to) optimality of
𝑥 in the minimization of ℒ(𝑥; 𝜆, 𝜇) over ℝ𝑛.
In light of the above observation, we can add a third equivalent characterization C to our
block diagram with A and B above, as follows.
A The point 𝑥∗ is optimal for (P) ⟺
B The point 𝑥∗ ∈ Ω admits 𝜆∗ ∈ ℝ𝑟
≥0, 𝜇∗ ∈ ℝ𝑠
such that
−∇𝑓(𝑥∗) = ∑
𝑟
𝑗=1
𝜆∗
𝑗 ∇𝑔𝑗(𝑥∗) + ∑
𝑠
𝑖=1
𝜇∗
𝑖 ∇ℎ𝑖(𝑥∗),
𝜆∗
𝑗 𝑔𝑗(𝑥∗) = 0 ∀𝑗 ∈ {1, ..., 𝑟}.
⟺
C There exist penalization coefficients 𝜇∗ ∈
ℝ𝑠, 𝜆∗ ∈ ℝ𝑟
≥0 such that
𝑥∗ is a minimizer of ℒ(𝑥; 𝜆∗, 𝜇∗) over ℝ𝑛,
𝑥∗ ∈ Ω, and 𝜆∗
𝑗 𝑔𝑗(𝑥∗) = 0 ∀𝑗 ∈ {1, ..., 𝑟}.
To appreciate why C is itself worthy of attention, we need to discuss the interpretation of
the Lagrangian function.
L8.2.1 Interpretation of the Lagrangian function
The Lagrangian function is defined not on the feasible set Ω, but on the entire space ℝ𝑛. It
can be thought of as a relaxation of the original problem, in which, instead of constraining
𝑥 to satisfy all constraints 𝑔𝑗(𝑥) ≤ 0 and ℎ𝑖(𝑥) = 0, it simply penalizes those 𝑥 that do not
satisfy the constraints by adding in the objective a penalty term 𝜆𝑗𝑔𝑗(𝑥) for each constraint
𝑔𝑗(𝑥) ≤ 0 and a penalty term 𝜇𝑗ℎ𝑖(𝑥) for the equality constraints. The coefficients 𝜆 ∈
ℝ𝑟
≥0, 𝜇 ∈ ℝ𝑠 control the magnitude of the penalization.
To build intuition, let’s consider two extremes: zero penalization and infinite penalization.
When the amount of penalization is zero, the Lagrangian function is simply the objective
function 𝑓 considered as a function on ℝ𝑛, that is, ignoring all constraints. In this case, it
is clear that the infimum of ℒ(𝑥; 0, 0) is a lower bound on the value of 𝑓(𝑥), and in general
we expect that if a minimum exists, it might very well lie outside of the feasible set Ω
of (P). As we increase the penalization, the Lagrangian function becomes more and more
sensitive to the constraints, and the minimum of ℒ will move towards the feasible set Ω. A
key observation is that, since the Slater’s condition hold, when 𝜆 → +∞, the minimum of
ℒ will be attained in the interior of the inequality constraints.
In other words, as the penalization coefficients 𝜆, 𝜇 are varied, the Lagrangian ℒ(𝑥; 𝜆, 𝜇)
interpolates between two different priorities: minimizing the original objective function 𝑓,
at the cost of potentially violating the constrained set Ω; and finding any point in the
(relative) interior of the feasible set Ω, at the cost of ignoring the objective function 𝑓(𝑥).
Example L8.1. As a concrete example, consider the optimization problem
min
𝑥
s.t.
𝑓(𝑥) ≔ (𝑥1 − 2)2 + (𝑥2 + 1)2
𝑔(𝑥) ≔ 𝑥4
1 + (𝑥1 − 𝑥2)2 − (𝑥1 + 1) ≤ 0
𝑥 ∈ ℝ2,
This problem is convex and satisfies Slater’s
conditions [▷ You should check this!].
The figure on the right shows the trajectory
of the minimizers of the Lagrangian func-
tions
arg min
𝑥∈ℝ𝑛
{ℒ(𝑥, 𝜆) ≔ 𝑓(𝑥) + 𝜆𝑔(𝑥)}
as the penalization coefficient 𝜆 ≥ 0 varies
from 0 to +∞.
−1 1 2 𝑥1
−1
1
2
𝑥2
0
Ω 𝜆 = 0
𝜆 → +∞
L8.2.2 Existence of finite penalization coefficients
Given the tension between the two priorities encoded by the Lagrangian function and
discussed above, it is natural to ask:
Is there an intermediate choice of penalization coefficients 𝜇, 𝜆 that balances between the
two priorities in a way that makes minimizing the Lagrangian equivalent to solving the
original problem (P)?
This is the content of the following theorem and the meaning of C .
Theorem L8.1. If the constrained problem (P) has a minimizer and attains optimal value
value(𝑃 ), there exist concrete penalization coefficients (𝜆∗, 𝜇∗) ∈ ℝ𝑟
≥0 × ℝ𝑠 such that:
1. any optimal solution 𝑥∗ of the original problem is a minimizer of ℒ(𝑥; 𝜆∗, 𝜇∗); and
2. the value of min𝑥∈ℝ𝑛 ℒ(𝑥; 𝜆∗, 𝜇∗) is exactly equal to value(𝑃 ).
Furthermore,
3. if 𝑥∗ ∈ ℝ𝑛 is a minimizer of ℒ(𝑥; 𝜆∗, 𝜇∗), and it satisfies 𝑥∗ ∈ Ω, 𝜆∗
𝑗 𝑔𝑗(𝑥∗) = 0 for
all 𝑗 = 1, ..., 𝑟, then it is also an optimal solution of (P).
Proof.
1. Again, this is immediate from the implication A ⟹ C .
2. Let 𝑥∗ be any minimizer of (P). From the implication A ⟹ C , we know that
there exist coefficients 𝜆∗ ∈ ℝ𝑟
≥0, 𝜇∗ ∈ ℝ𝑠 such that 𝑥∗ is a minimizer of the penal-
ized objective function ℝ𝑛 ∋ 𝑥 ↦ ℒ(𝑥; 𝜆∗, 𝜇∗), and 𝜆∗
𝑗 𝑔𝑗(𝑥∗) = 0 for all 𝑗 = 1, ..., 𝑟.
Hence,
min
𝑥∈ℝ𝑛 ℒ(𝑥; 𝜆∗, 𝜇∗) = ℒ(𝑥∗; 𝜆∗, 𝜇∗) = 𝑓(𝑥∗) + ∑
𝑟
𝑗=1
𝜆∗
𝑗 𝑔𝑗(𝑥∗) + ∑
𝑠
𝑖=1
𝜇∗
𝑖 ℎ𝑖(𝑥∗) = 𝑓(𝑥∗),
where in the last equality we used the fact that ℎ𝑖(𝑥∗) = 0 (since 𝑥∗ ∈ Ω), and the
complementary slackness conditions 𝜆∗
𝑗 𝑔𝑗(𝑥∗) = 0.
3. This is nothing but the implication C ⟹ A . □
Remark L8.1. Theorem L8.1 asserts that if the Lagrangian is set up with suitable penal-
ization coefficients 𝜆∗, 𝜇∗, the optimal value of the constrained problem (P) matches
the optimal value of the Lagrangian ℝ𝑛 ∋ 𝑥 ↦ ℒ(𝑥; 𝜆∗, 𝜇∗). Furthermore, all optimal
points of the original problem (P) are minimizers for the Lagrangian.
However, when multiple optimal solutions exist for the Lagrangian, it is possible that
not all of them are optimal for the original problem. As a simple example, consider the
linear program
min
𝑥
s.t.
2𝑥2
−𝑥1 − 𝑥2 ≤ 0
𝑥1 − 𝑥2 ≤ 0 }
}
}
}
}
⟶ ℒ(𝑥; 𝜆) = 2𝑥2 + 𝜆1(−𝑥1 − 𝑥2) + 𝜆2(𝑥1 − 𝑥2).
The point 𝑥∗ = (0, 0) is optimal for the problem, and achieves the optimal value 0 of
the objective. It admits the unique choice of Lagrange multipliers 𝜆∗ = (1, 1). For that
choice, however, the Lagrangian becomes
ℒ(𝑥; 𝜆∗) = 0.
Indeed, the value of the linear program is 0, but the Lagrangian has infinite minimizers
and so we cannot extract the optimal solution 𝑥∗ just by looking at the minimizers of
ℒ.
L8.3 Point of view #3: A natural dual problem
Theorem L8.1 guarantees that if (P) admits an optimal solution, then there exist penal-
ization coefficients 𝜆∗, 𝜇∗ such that min𝑥∈ℝ𝑛 ℒ(𝑥; 𝜆∗, 𝜇∗) is exactly equal to the value of the
original problem. This shows that
sup
𝜆∈ℝ𝑟
≥0,𝜇∈ℝ𝑠
inf
𝑥∈ℝ𝑛 ℒ(𝑥; 𝜆, 𝜇) ≥ value(𝑃 ).
As it turns out, the reverse inequality is also true, and in fact it holds much more generally.
Theorem L8.2 (Weak duality). For any choice of penalization coefficients 𝜆 ∈ ℝ𝑟
≥0, 𝜇 ∈
ℝ𝑠,
inf
𝑥 ℒ(𝑥; 𝜆, 𝜇) ≤ 𝑓(̃𝑥) ∀̃𝑥 ∈ Ω.
In fact, the inequality holds for any minimization problem with functional constraints
—that is, even ignoring the requirement that 𝑔𝑗 is convex, that ℎ𝑖 is affine, and any
constraint qualification.
As a direct consequence, if (P) admits an optimal solution, then
inf
𝑥 ℒ(𝑥; 𝜆, 𝜇) ≤ value(𝑃 ).
Proof. Let 𝜆 ∈ ℝ𝑟
≥0, 𝜇 ∈ ℝ𝑠 be arbitrary, and ̃ 𝑥 be any point feasible for (P). Clearly,
inf
𝑥 ℒ(𝑥; 𝜆, 𝜇) ≤ ℒ(̃𝑥; 𝜆, 𝜇)
= 𝑓(̃𝑥) + ∑
𝑟
𝑗=1
𝜆𝑗𝑔𝑗(̃𝑥) + ∑
𝑠
𝑖=1
𝜇𝑖ℎ𝑖(̃𝑥)
≤ 𝑓(̃𝑥),
where we used the fact that ℎ𝑖(̃𝑥) = 0 and 𝑔𝑗(̃𝑥) ≤ 0 for all 𝑖 ∈ {1, ..., 𝑠} and 𝑗 ∈ {1, ..., 𝑟}
since ̃ 𝑥 is feasible for (P). □
The weak duality theorem Theorem L8.2 shows that minimizing the Lagrangian ℒ with
respect to 𝑥 with any choice of penalization coefficients yields a lower bound on the value
of (P); this fact is sometimes useful to come up with bounds for the optimal value of a
problem.
When taken together, Theorem L8.1 and Theorem L8.2 imply the following corollary.
Corollary L8.1 (Strong duality). If (P) admits a minimizer, the optimization problem
(𝐷) ≔
{
{
{
{
{max
𝜇,𝜆
s.t.
inf𝑥∈ℝ𝑛 ℒ(𝑥; 𝜆, 𝜇)
𝜆 ∈ ℝ𝑟
≥0
𝜇 ∈ ℝ𝑠
admits an optimal solution 𝜆∗, 𝜇∗, and matches the value of the original problem (P).
The problem (D) is called the (Lagrangian) dual problem of (P).
Example L8.2. The dual problem corresponding to the constrained optimization problem
in Example L8.1 is by definition
max
𝜆
s.t.
inf𝑥∈ℝ𝑛 ℒ(𝑥; 𝜆)
𝜆 ≥ 0.
The plot below shows the dual objective inf𝑥∈ℝ𝑛 ℒ(𝑥; 𝜆).
0.4 0.8 1.2 1.6 2 2.4 2.8 3.2 3.6 4 𝜆
1
inf𝑥 ℒ(𝑥, 𝜆)
0
As a final remark, we point out that whenever (P) has a solution, the value at optimality
satisfies
value(𝑃 ) = min
𝑥∈ℝ𝑛 sup
𝜆∈ℝ𝑟
≥0,𝜇∈ℝ𝑠
ℒ(𝑥; 𝜆, 𝜇),
since it is immediate to check that given 𝑥,
sup
𝜆∈ℝ𝑟
≥0,𝜇∈ℝ𝑠
ℒ(𝑥; 𝜆, 𝜇) = {𝑓(𝑥) if ℎ𝑖(𝑥) = 0, 𝑔𝑗(𝑥) ≤ 0 for all 𝑖, 𝑗
+∞ otherwise.
Hence, the strong duality theorem (Corollary L8.1) can be quite succinctly stated as follows:
if (P) has an optimal solution, then
max
𝜆∈ℝ𝑟
≥0,𝜇∈ℝ𝑠 inf
𝑥∈ℝ𝑛 ℒ(𝑥; 𝜆, 𝜇)
⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
value(𝐷)
= min
𝑥∈ℝ𝑛 sup
𝜆∈ℝ𝑟
≥0,𝜇∈ℝ𝑠
ℒ(𝑥; 𝜆, 𝜇)
⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
value(𝑃 )
.
L8.4 Examples of dual problems
In general, the dual objective function inf𝑥∈ℝ𝑛 ℒ(𝑥; 𝜆, 𝜇) might not be a nice, closed-form
expression. However, in some cases we know that we can massage it into a nice, tractable
form. We show two important examples below.
L8.4.1 Example #1: Dual of a linear program
As a sanity check, we can verify that the Lagrangian dual problem to a linear program
recovers the usual notion of linear programming duality. As a reminder, a standalone
proof of linear programming duality was given in Lecture 3 as a direct consequence of the
characterization of the normal cone to the intersection of halfspaces.
Consider the primal linear program
min
𝑥
s.t.
𝑐⊤𝑥
𝐴𝑥 ≤ 𝑏,
where 𝑐 ∈ ℝ𝑛, 𝐴 ∈ ℝ𝑚×𝑛, and 𝑏 ∈ ℝ𝑚. The Lagrangian function is given by
ℒ : ℝ𝑛 × ℝ𝑚
≥0, ℒ(𝑥; 𝜆) = 𝑐⊤𝑥 + 𝜆⊤(𝐴𝑥 − 𝑏).
The dual problem in this case is therefore
max
𝜆∈ℝ𝑚
≥0
inf
𝑥∈ℝ𝑛 ℒ(𝑥; 𝜆) = max
𝜆∈ℝ𝑚
≥0
inf
𝑥∈ℝ𝑛 𝑐⊤𝑥 + 𝜆⊤(𝐴𝑥 − 𝑏)
= max
𝜆∈ℝ𝑚
≥0
{−𝜆⊤𝑏 + inf
𝑥∈ℝ𝑛 (𝐴⊤𝜆 + 𝑐)⊤𝑥}.
If a value of 𝜆 such that 𝐴⊤𝜆 + 𝑐 ≠ 0 is picked, the inner infimum is −∞. Hence, the
only interesting choices of 𝜆 ≥ 0 are those such that 𝐴⊤𝜆 + 𝑐 = 0, leading to the equivalent
formulation of the dual problem as
max
𝜆
s.t.
−𝜆⊤𝑏
𝐴⊤𝜆 = −𝑐
𝜆 ≥ 0,
as expected.
L8.4.2 Example #2: Dual of a convex quadratic program
As a second example, we can investigate the dual problem of a convex quadratic program
min
𝑥
s.t.
1
2 𝑥⊤𝑄𝑥 + 𝑐⊤𝑥
𝐴𝑥 ≤ 𝑏
where 𝑄 ∈ ℝ𝑛×𝑛 is symmetric positive definite, 𝑐 ∈ ℝ𝑛, 𝐴 ∈ ℝ𝑚×𝑛, and 𝑏 ∈ ℝ𝑚.
The Lagrangian function is given by
ℒ : ℝ𝑛 × ℝ𝑚
≥0, ℒ(𝑥; 𝜆) = 1
2𝑥⊤𝑄𝑥 + 𝑐⊤𝑥 + 𝜆⊤(𝐴𝑥 − 𝑏).
The dual problem in this case is therefore
max
𝜆∈ℝ𝑚
≥0
inf
𝑥∈ℝ𝑛 ℒ(𝑥; 𝜆) = max
𝜆∈ℝ𝑚
≥0
inf
𝑥∈ℝ𝑛
1
2 𝑥⊤𝑄𝑥 + 𝑐⊤𝑥 + 𝜆⊤(𝐴𝑥 − 𝑏)
= max
𝜆∈ℝ𝑚
≥0
{−𝜆⊤𝑏 + inf
𝑥∈ℝ𝑛
1
2 𝑥⊤𝑄𝑥 + (𝑐 + 𝐴⊤𝜆)⊤𝑥}.
The inner infimum is bounded from below if and only if it has a minimizer, which is the case
if and only if there exists a point 𝑥 ∈ ℝ𝑛 that satisfies the first-order optimality conditions,
i.e., if there exists a point 𝑥 ∈ ℝ𝑛 such that
0 = ∇𝑥(1
2𝑥⊤𝑄𝑥 + 𝑐⊤𝑥 + 𝜆⊤(𝑏 − 𝐴𝑥)) = 𝑄𝑥 + 𝑐 + 𝐴⊤𝜆 ⟺ 𝑥 = −𝑄−1(𝑐 + 𝐴⊤𝜆),
where we used the fact that a positive definite matrix 𝑄 is invertible. Substituting this back
into the expression for the dual problem, we obtain
max
𝜆∈ℝ𝑚
≥0
inf
𝑥∈ℝ𝑛 ℒ(𝑥; 𝜆) = max
𝜆∈ℝ𝑚
≥0
−𝜆⊤𝑏 − 1
2(𝑐 + 𝐴⊤𝜆)⊤𝑄−1(𝑐 + 𝐴⊤𝜆)
= max
𝜆∈ℝ𝑚
≥0
− 1
2𝜆⊤(𝐴𝑄−1𝐴⊤)𝜆 − (𝑏 + 𝐴𝑄−1𝑐)⊤𝜆 − 1
2𝑐⊤𝑄−1𝑐.
In other words, the dual problem is
max
𝜆
s.t.
− 1
2 𝜆⊤(𝐴𝑄−1𝐴⊤)𝜆 − (𝑏 + 𝐴𝑄−1𝑐)⊤𝜆 − 1
2 𝑐⊤𝑄−1𝑐
𝜆 ≥ 0,
which is another quadratic problem.
L8.5 Failure of duality
Lagrangian duality is nothing but a reflection of the characterization of the normal cone to
the intersection of the convex constraints, which is given by the KKT idea of linearizing the
feasible set. As we discussed in Lecture 7, such a linearization usually gives an expression for
the normal cone to the original feasible set, but sometimes it fails to do so. Unsurprisingly,
when constraint qualifications fail, the Lagrangian dual problem and the original problem
might not yield the same value.
When the KKT conditions are not necessary, the only result discussed today that keeps
holding is weak duality (Theorem L8.2), which guarantees that
sup
𝜆∈ℝ𝑟
≥0,𝜇∈ℝ𝑠
inf
𝑥∈ℝ𝑛 ℒ(𝑥; 𝜆, 𝜇) ≤ value(𝑃 ).
In particular, it is totally possible that sup𝜆∈ℝ𝑟
≥0,𝜇∈ℝ𝑠 inf𝑥 ℒ(𝑥; 𝜆, 𝜇) is strictly lower than
𝑓(𝑥). When this happens, the Lagrangian dual problem is said to have a duality gap.
Changelog
• Mar 4, 2025: Fixed typo in domain of lambda, mu in Theorem L8.1.
• Mar 6, 2025: Fixed typo in dual of quadratic problem.
★These notes are class material that has not undergone formal peer review. The TAs and I are grateful
for any reports of typos.
Lecture 8
Lagrangian duality
Instructor: Prof. Gabriele Farina ( gfarina@mit.edu)★
We know since Lecture 3, when we introduced convexity, that the first-order optimality
condition
−∇𝑓(𝑥) ∈ 𝒩Ω(𝑥)
is both necessary and sufficient for optimality in convex optimization problems—that is,
problems in which the objective function 𝑓 is convex, and the feasible set Ω is convex. With
the machinery of Lecture 7, we now know that if Ω is expressed via functional constraints,
and Slater’s condition holds (or all the constraints are linear), the KKT conditions are both
necessary and sufficient (and in fact, the normal cone to the linearization at each point
matches the normal cone to the original feasible set).
In other words, consider a convex optimization problem of the form
(𝑃 ) ≔
{
{
{
{
{min
𝑥
s.t.
𝑓(𝑥) 𝑓 : ℝ𝑛 → ℝ convex
𝑔𝑗(𝑥) ≤ 0 𝑗 = 1, ..., 𝑟 𝑔𝑗 : ℝ𝑛 → ℝ convex
ℎ𝑖(𝑥) = 0 𝑖 = 1, ..., 𝑠 ℎ𝑖 : ℝ𝑛 → ℝ affine,
under the assumption that either all 𝑔𝑗 are affine, or that Slater’s condition holds at all
points, that is, there exists a point 𝑥0 such that 𝑔𝑗(𝑥0) < 0 for all 𝑗 = 1, ..., 𝑟. Then, we
know from Lecture 7 that the following two statements are equivalent:
A The point 𝑥∗ is optimal for (P) ⟺
B The point 𝑥∗ ∈ Ω admits 𝜆∗ ∈ ℝ𝑟
≥0, 𝜇∗ ∈ ℝ𝑠
such that
−∇𝑓(𝑥∗) = ∑
𝑟
𝑗=1
𝜆∗
𝑗 ∇𝑔𝑗(𝑥∗) + ∑
𝑠
𝑖=1
𝜇∗
𝑖 ∇ℎ𝑖(𝑥∗),
𝜆∗
𝑗 𝑔𝑗(𝑥∗) = 0 ∀𝑗 ∈ {1, ..., 𝑟}.
(As a reminder, the implication A ⟹ B is necessity of KKT conditions and comes from
constraint qualification. The reverse direction B ⟹ A is sufficiency of the KKT conditions
and requires convexity of 𝑓 and 𝑔𝑗 and the affinity of ℎ𝑖.)
In this lecture, we ponder the depth of this statement by interpreting it from three different
points of view. As usual, we will use the letter Ω to denote the feasible set of (P), that is,
Ω ≔ {𝑥 ∈ ℝ𝑛 : 𝑔𝑗(𝑥) ≤ 0, ℎ𝑖(𝑥) = 0 for all 𝑖 = 1, ..., 𝑠, 𝑗 = 1, ..., 𝑟}.
L8.1 Point of view #1: Certification of optimality
In a sense, the multipliers 𝜆∗, 𝜇∗ are nothing but a reflection of the expression for the normal
cone to the linearization of the feasible set: it just so happens that the normal cone to linear
constraints can be written as a linear combination of vectors, and 𝜆∗, 𝜇∗ are simply these
combination coefficients. However, an aspect worth paying attention to is the very different
conceptual flavor of the two statements A and B :
• The statement A is asserting that 𝑥∗ is better than all other points in the feasible
set. It is a “for all” kind of statement.
• The statement B is asserting that 𝑥∗ admits one set of multipliers 𝜆∗, 𝜇∗ with certain
properties that are easy to check. It is an “exists” kind of statement.
This equivalence for example implies that verifying whether a point is optimal is easy, at
least assuming that all gradients of all functions involved only use rational numbers and a
reasonable number of bits in their encoding. Indeed, if someone claims that 𝑥∗ is optimal, we
could ask them to show us the multipliers 𝜆∗, 𝜇∗. This ability to efficiently certify optimality
is not at all trivial.
In fact, assuming that all gradients of all functions involved only use rational numbers and
a reasonable number of bits in their encoding, we can go even further. Given a putative
optimal solution 𝑥∗, we could compute ∇𝑓(𝑥∗) and the gradients ∇𝑔𝑗(𝑥∗), ∇ℎ𝑖(𝑥∗) of the
active constraints, and determine whether suitable 𝜆∗ ∈ ℝ𝑟
≥0, 𝜇∗ ∈ ℝ𝑠 exist by using a linear
program. This shows that we would not even need to ask for multipliers, but we could
instead come up with them ourselves.
L8.2 Point of view #2: From constraints to penalizations
By rearranging the terms, the gradient equality in B can be written as
∇𝑓(𝑥∗) + ∑
𝑟
𝑗=1
𝜆∗
𝑗 ∇𝑔𝑗(𝑥∗) + ∑
𝑠
𝑖=1
𝜇∗
𝑖 ∇ℎ𝑖(𝑥∗) = 0.
It is evident that the left-hand side has the flavor of the gradient of a function. In particular,
it is the gradient with respect to 𝑥 and evaluated at the point (𝑥∗; 𝜆∗, 𝜇∗) of the function
ℒ : ℝ𝑛 × (ℝ𝑟
≥0 × ℝ𝑠) → ℝ, ℒ(𝑥; 𝜆, 𝜇) ≔ 𝑓(𝑥) + ∑
𝑟
𝑗=1
𝜆𝑗𝑔𝑗(𝑥) + ∑
𝑠
𝑖=1
𝜇𝑖ℎ𝑖(𝑥).
The function ℒ is called the Lagrangian function associated with problem (P).
Given any 𝜇 ∈ ℝ𝑠 and 𝜆 ∈ ℝ𝑟
≥0, the function ℒ(𝑥; 𝜆, 𝜇) is a convex function of 𝑥 ∈ ℝ𝑛. Since
ℝ𝑛 is open and the Lagrangian is convex in 𝑥, we know from Lecture 3 that the condition
∇𝑥ℒ(𝑥; 𝜆, 𝜇) = 0 is both necessary and sufficient for (i.e., it is equivalent to) optimality of
𝑥 in the minimization of ℒ(𝑥; 𝜆, 𝜇) over ℝ𝑛.
In light of the above observation, we can add a third equivalent characterization C to our
block diagram with A and B above, as follows.
A The point 𝑥∗ is optimal for (P) ⟺
B The point 𝑥∗ ∈ Ω admits 𝜆∗ ∈ ℝ𝑟
≥0, 𝜇∗ ∈ ℝ𝑠
such that
−∇𝑓(𝑥∗) = ∑
𝑟
𝑗=1
𝜆∗
𝑗 ∇𝑔𝑗(𝑥∗) + ∑
𝑠
𝑖=1
𝜇∗
𝑖 ∇ℎ𝑖(𝑥∗),
𝜆∗
𝑗 𝑔𝑗(𝑥∗) = 0 ∀𝑗 ∈ {1, ..., 𝑟}.
⟺
C There exist penalization coefficients 𝜇∗ ∈
ℝ𝑠, 𝜆∗ ∈ ℝ𝑟
≥0 such that
𝑥∗ is a minimizer of ℒ(𝑥; 𝜆∗, 𝜇∗) over ℝ𝑛,
𝑥∗ ∈ Ω, and 𝜆∗
𝑗 𝑔𝑗(𝑥∗) = 0 ∀𝑗 ∈ {1, ..., 𝑟}.
To appreciate why C is itself worthy of attention, we need to discuss the interpretation of
the Lagrangian function.
L8.2.1 Interpretation of the Lagrangian function
The Lagrangian function is defined not on the feasible set Ω, but on the entire space ℝ𝑛. It
can be thought of as a relaxation of the original problem, in which, instead of constraining
𝑥 to satisfy all constraints 𝑔𝑗(𝑥) ≤ 0 and ℎ𝑖(𝑥) = 0, it simply penalizes those 𝑥 that do not
satisfy the constraints by adding in the objective a penalty term 𝜆𝑗𝑔𝑗(𝑥) for each constraint
𝑔𝑗(𝑥) ≤ 0 and a penalty term 𝜇𝑗ℎ𝑖(𝑥) for the equality constraints. The coefficients 𝜆 ∈
ℝ𝑟
≥0, 𝜇 ∈ ℝ𝑠 control the magnitude of the penalization.
To build intuition, let’s consider two extremes: zero penalization and infinite penalization.
When the amount of penalization is zero, the Lagrangian function is simply the objective
function 𝑓 considered as a function on ℝ𝑛, that is, ignoring all constraints. In this case, it
is clear that the infimum of ℒ(𝑥; 0, 0) is a lower bound on the value of 𝑓(𝑥), and in general
we expect that if a minimum exists, it might very well lie outside of the feasible set Ω
of (P). As we increase the penalization, the Lagrangian function becomes more and more
sensitive to the constraints, and the minimum of ℒ will move towards the feasible set Ω. A
key observation is that, since the Slater’s condition hold, when 𝜆 → +∞, the minimum of
ℒ will be attained in the interior of the inequality constraints.
In other words, as the penalization coefficients 𝜆, 𝜇 are varied, the Lagrangian ℒ(𝑥; 𝜆, 𝜇)
interpolates between two different priorities: minimizing the original objective function 𝑓,
at the cost of potentially violating the constrained set Ω; and finding any point in the
(relative) interior of the feasible set Ω, at the cost of ignoring the objective function 𝑓(𝑥).
Example L8.1. As a concrete example, consider the optimization problem
min
𝑥
s.t.
𝑓(𝑥) ≔ (𝑥1 − 2)2 + (𝑥2 + 1)2
𝑔(𝑥) ≔ 𝑥4
1 + (𝑥1 − 𝑥2)2 − (𝑥1 + 1) ≤ 0
𝑥 ∈ ℝ2,
This problem is convex and satisfies Slater’s
conditions [▷ You should check this!].
The figure on the right shows the trajectory
of the minimizers of the Lagrangian func-
tions
arg min
𝑥∈ℝ𝑛
{ℒ(𝑥, 𝜆) ≔ 𝑓(𝑥) + 𝜆𝑔(𝑥)}
as the penalization coefficient 𝜆 ≥ 0 varies
from 0 to +∞.
−1 1 2 𝑥1
−1
1
2
𝑥2
0
Ω 𝜆 = 0
𝜆 → +∞
L8.2.2 Existence of finite penalization coefficients
Given the tension between the two priorities encoded by the Lagrangian function and
discussed above, it is natural to ask:
Is there an intermediate choice of penalization coefficients 𝜇, 𝜆 that balances between the
two priorities in a way that makes minimizing the Lagrangian equivalent to solving the
original problem (P)?
This is the content of the following theorem and the meaning of C .
Theorem L8.1. If the constrained problem (P) has a minimizer and attains optimal value
value(𝑃 ), there exist concrete penalization coefficients (𝜆∗, 𝜇∗) ∈ ℝ𝑟
≥0 × ℝ𝑠 such that:
1. any optimal solution 𝑥∗ of the original problem is a minimizer of ℒ(𝑥; 𝜆∗, 𝜇∗); and
2. the value of min𝑥∈ℝ𝑛 ℒ(𝑥; 𝜆∗, 𝜇∗) is exactly equal to value(𝑃 ).
Furthermore,
3. if 𝑥∗ ∈ ℝ𝑛 is a minimizer of ℒ(𝑥; 𝜆∗, 𝜇∗), and it satisfies 𝑥∗ ∈ Ω, 𝜆∗
𝑗 𝑔𝑗(𝑥∗) = 0 for
all 𝑗 = 1, ..., 𝑟, then it is also an optimal solution of (P).
Proof.
1. Again, this is immediate from the implication A ⟹ C .
2. Let 𝑥∗ be any minimizer of (P). From the implication A ⟹ C , we know that
there exist coefficients 𝜆∗ ∈ ℝ𝑟
≥0, 𝜇∗ ∈ ℝ𝑠 such that 𝑥∗ is a minimizer of the penal-
ized objective function ℝ𝑛 ∋ 𝑥 ↦ ℒ(𝑥; 𝜆∗, 𝜇∗), and 𝜆∗
𝑗 𝑔𝑗(𝑥∗) = 0 for all 𝑗 = 1, ..., 𝑟.
Hence,
min
𝑥∈ℝ𝑛 ℒ(𝑥; 𝜆∗, 𝜇∗) = ℒ(𝑥∗; 𝜆∗, 𝜇∗) = 𝑓(𝑥∗) + ∑
𝑟
𝑗=1
𝜆∗
𝑗 𝑔𝑗(𝑥∗) + ∑
𝑠
𝑖=1
𝜇∗
𝑖 ℎ𝑖(𝑥∗) = 𝑓(𝑥∗),
where in the last equality we used the fact that ℎ𝑖(𝑥∗) = 0 (since 𝑥∗ ∈ Ω), and the
complementary slackness conditions 𝜆∗
𝑗 𝑔𝑗(𝑥∗) = 0.
3. This is nothing but the implication C ⟹ A . □
Remark L8.1. Theorem L8.1 asserts that if the Lagrangian is set up with suitable penal-
ization coefficients 𝜆∗, 𝜇∗, the optimal value of the constrained problem (P) matches
the optimal value of the Lagrangian ℝ𝑛 ∋ 𝑥 ↦ ℒ(𝑥; 𝜆∗, 𝜇∗). Furthermore, all optimal
points of the original problem (P) are minimizers for the Lagrangian.
However, when multiple optimal solutions exist for the Lagrangian, it is possible that
not all of them are optimal for the original problem. As a simple example, consider the
linear program
min
𝑥
s.t.
2𝑥2
−𝑥1 − 𝑥2 ≤ 0
𝑥1 − 𝑥2 ≤ 0 }
}
}
}
}
⟶ ℒ(𝑥; 𝜆) = 2𝑥2 + 𝜆1(−𝑥1 − 𝑥2) + 𝜆2(𝑥1 − 𝑥2).
The point 𝑥∗ = (0, 0) is optimal for the problem, and achieves the optimal value 0 of
the objective. It admits the unique choice of Lagrange multipliers 𝜆∗ = (1, 1). For that
choice, however, the Lagrangian becomes
ℒ(𝑥; 𝜆∗) = 0.
Indeed, the value of the linear program is 0, but the Lagrangian has infinite minimizers
and so we cannot extract the optimal solution 𝑥∗ just by looking at the minimizers of
ℒ.
L8.3 Point of view #3: A natural dual problem
Theorem L8.1 guarantees that if (P) admits an optimal solution, then there exist penal-
ization coefficients 𝜆∗, 𝜇∗ such that min𝑥∈ℝ𝑛 ℒ(𝑥; 𝜆∗, 𝜇∗) is exactly equal to the value of the
original problem. This shows that
sup
𝜆∈ℝ𝑟
≥0,𝜇∈ℝ𝑠
inf
𝑥∈ℝ𝑛 ℒ(𝑥; 𝜆, 𝜇) ≥ value(𝑃 ).
As it turns out, the reverse inequality is also true, and in fact it holds much more generally.
Theorem L8.2 (Weak duality). For any choice of penalization coefficients 𝜆 ∈ ℝ𝑟
≥0, 𝜇 ∈
ℝ𝑠,
inf
𝑥 ℒ(𝑥; 𝜆, 𝜇) ≤ 𝑓(̃𝑥) ∀̃𝑥 ∈ Ω.
In fact, the inequality holds for any minimization problem with functional constraints
—that is, even ignoring the requirement that 𝑔𝑗 is convex, that ℎ𝑖 is affine, and any
constraint qualification.
As a direct consequence, if (P) admits an optimal solution, then
inf
𝑥 ℒ(𝑥; 𝜆, 𝜇) ≤ value(𝑃 ).
Proof. Let 𝜆 ∈ ℝ𝑟
≥0, 𝜇 ∈ ℝ𝑠 be arbitrary, and ̃ 𝑥 be any point feasible for (P). Clearly,
inf
𝑥 ℒ(𝑥; 𝜆, 𝜇) ≤ ℒ(̃𝑥; 𝜆, 𝜇)
= 𝑓(̃𝑥) + ∑
𝑟
𝑗=1
𝜆𝑗𝑔𝑗(̃𝑥) + ∑
𝑠
𝑖=1
𝜇𝑖ℎ𝑖(̃𝑥)
≤ 𝑓(̃𝑥),
where we used the fact that ℎ𝑖(̃𝑥) = 0 and 𝑔𝑗(̃𝑥) ≤ 0 for all 𝑖 ∈ {1, ..., 𝑠} and 𝑗 ∈ {1, ..., 𝑟}
since ̃ 𝑥 is feasible for (P). □
The weak duality theorem Theorem L8.2 shows that minimizing the Lagrangian ℒ with
respect to 𝑥 with any choice of penalization coefficients yields a lower bound on the value
of (P); this fact is sometimes useful to come up with bounds for the optimal value of a
problem.
When taken together, Theorem L8.1 and Theorem L8.2 imply the following corollary.
Corollary L8.1 (Strong duality). If (P) admits a minimizer, the optimization problem
(𝐷) ≔
{
{
{
{
{max
𝜇,𝜆
s.t.
inf𝑥∈ℝ𝑛 ℒ(𝑥; 𝜆, 𝜇)
𝜆 ∈ ℝ𝑟
≥0
𝜇 ∈ ℝ𝑠
admits an optimal solution 𝜆∗, 𝜇∗, and matches the value of the original problem (P).
The problem (D) is called the (Lagrangian) dual problem of (P).
Example L8.2. The dual problem corresponding to the constrained optimization problem
in Example L8.1 is by definition
max
𝜆
s.t.
inf𝑥∈ℝ𝑛 ℒ(𝑥; 𝜆)
𝜆 ≥ 0.
The plot below shows the dual objective inf𝑥∈ℝ𝑛 ℒ(𝑥; 𝜆).
0.4 0.8 1.2 1.6 2 2.4 2.8 3.2 3.6 4 𝜆
1
inf𝑥 ℒ(𝑥, 𝜆)
0
As a final remark, we point out that whenever (P) has a solution, the value at optimality
satisfies
value(𝑃 ) = min
𝑥∈ℝ𝑛 sup
𝜆∈ℝ𝑟
≥0,𝜇∈ℝ𝑠
ℒ(𝑥; 𝜆, 𝜇),
since it is immediate to check that given 𝑥,
sup
𝜆∈ℝ𝑟
≥0,𝜇∈ℝ𝑠
ℒ(𝑥; 𝜆, 𝜇) = {𝑓(𝑥) if ℎ𝑖(𝑥) = 0, 𝑔𝑗(𝑥) ≤ 0 for all 𝑖, 𝑗
+∞ otherwise.
Hence, the strong duality theorem (Corollary L8.1) can be quite succinctly stated as follows:
if (P) has an optimal solution, then
max
𝜆∈ℝ𝑟
≥0,𝜇∈ℝ𝑠 inf
𝑥∈ℝ𝑛 ℒ(𝑥; 𝜆, 𝜇)
⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
value(𝐷)
= min
𝑥∈ℝ𝑛 sup
𝜆∈ℝ𝑟
≥0,𝜇∈ℝ𝑠
ℒ(𝑥; 𝜆, 𝜇)
⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
value(𝑃 )
.
L8.4 Examples of dual problems
In general, the dual objective function inf𝑥∈ℝ𝑛 ℒ(𝑥; 𝜆, 𝜇) might not be a nice, closed-form
expression. However, in some cases we know that we can massage it into a nice, tractable
form. We show two important examples below.
L8.4.1 Example #1: Dual of a linear program
As a sanity check, we can verify that the Lagrangian dual problem to a linear program
recovers the usual notion of linear programming duality. As a reminder, a standalone
proof of linear programming duality was given in Lecture 3 as a direct consequence of the
characterization of the normal cone to the intersection of halfspaces.
Consider the primal linear program
min
𝑥
s.t.
𝑐⊤𝑥
𝐴𝑥 ≤ 𝑏,
where 𝑐 ∈ ℝ𝑛, 𝐴 ∈ ℝ𝑚×𝑛, and 𝑏 ∈ ℝ𝑚. The Lagrangian function is given by
ℒ : ℝ𝑛 × ℝ𝑚
≥0, ℒ(𝑥; 𝜆) = 𝑐⊤𝑥 + 𝜆⊤(𝐴𝑥 − 𝑏).
The dual problem in this case is therefore
max
𝜆∈ℝ𝑚
≥0
inf
𝑥∈ℝ𝑛 ℒ(𝑥; 𝜆) = max
𝜆∈ℝ𝑚
≥0
inf
𝑥∈ℝ𝑛 𝑐⊤𝑥 + 𝜆⊤(𝐴𝑥 − 𝑏)
= max
𝜆∈ℝ𝑚
≥0
{−𝜆⊤𝑏 + inf
𝑥∈ℝ𝑛 (𝐴⊤𝜆 + 𝑐)⊤𝑥}.
If a value of 𝜆 such that 𝐴⊤𝜆 + 𝑐 ≠ 0 is picked, the inner infimum is −∞. Hence, the
only interesting choices of 𝜆 ≥ 0 are those such that 𝐴⊤𝜆 + 𝑐 = 0, leading to the equivalent
formulation of the dual problem as
max
𝜆
s.t.
−𝜆⊤𝑏
𝐴⊤𝜆 = −𝑐
𝜆 ≥ 0,
as expected.
L8.4.2 Example #2: Dual of a convex quadratic program
As a second example, we can investigate the dual problem of a convex quadratic program
min
𝑥
s.t.
1
2 𝑥⊤𝑄𝑥 + 𝑐⊤𝑥
𝐴𝑥 ≤ 𝑏
where 𝑄 ∈ ℝ𝑛×𝑛 is symmetric positive definite, 𝑐 ∈ ℝ𝑛, 𝐴 ∈ ℝ𝑚×𝑛, and 𝑏 ∈ ℝ𝑚.
The Lagrangian function is given by
ℒ : ℝ𝑛 × ℝ𝑚
≥0, ℒ(𝑥; 𝜆) = 1
2𝑥⊤𝑄𝑥 + 𝑐⊤𝑥 + 𝜆⊤(𝐴𝑥 − 𝑏).
The dual problem in this case is therefore
max
𝜆∈ℝ𝑚
≥0
inf
𝑥∈ℝ𝑛 ℒ(𝑥; 𝜆) = max
𝜆∈ℝ𝑚
≥0
inf
𝑥∈ℝ𝑛
1
2 𝑥⊤𝑄𝑥 + 𝑐⊤𝑥 + 𝜆⊤(𝐴𝑥 − 𝑏)
= max
𝜆∈ℝ𝑚
≥0
{−𝜆⊤𝑏 + inf
𝑥∈ℝ𝑛
1
2 𝑥⊤𝑄𝑥 + (𝑐 + 𝐴⊤𝜆)⊤𝑥}.
The inner infimum is bounded from below if and only if it has a minimizer, which is the case
if and only if there exists a point 𝑥 ∈ ℝ𝑛 that satisfies the first-order optimality conditions,
i.e., if there exists a point 𝑥 ∈ ℝ𝑛 such that
0 = ∇𝑥(1
2𝑥⊤𝑄𝑥 + 𝑐⊤𝑥 + 𝜆⊤(𝑏 − 𝐴𝑥)) = 𝑄𝑥 + 𝑐 + 𝐴⊤𝜆 ⟺ 𝑥 = −𝑄−1(𝑐 + 𝐴⊤𝜆),
where we used the fact that a positive definite matrix 𝑄 is invertible. Substituting this back
into the expression for the dual problem, we obtain
max
𝜆∈ℝ𝑚
≥0
inf
𝑥∈ℝ𝑛 ℒ(𝑥; 𝜆) = max
𝜆∈ℝ𝑚
≥0
−𝜆⊤𝑏 − 1
2(𝑐 + 𝐴⊤𝜆)⊤𝑄−1(𝑐 + 𝐴⊤𝜆)
= max
𝜆∈ℝ𝑚
≥0
− 1
2𝜆⊤(𝐴𝑄−1𝐴⊤)𝜆 − (𝑏 + 𝐴𝑄−1𝑐)⊤𝜆 − 1
2𝑐⊤𝑄−1𝑐.
In other words, the dual problem is
max
𝜆
s.t.
− 1
2 𝜆⊤(𝐴𝑄−1𝐴⊤)𝜆 − (𝑏 + 𝐴𝑄−1𝑐)⊤𝜆 − 1
2 𝑐⊤𝑄−1𝑐
𝜆 ≥ 0,
which is another quadratic problem.
L8.5 Failure of duality
Lagrangian duality is nothing but a reflection of the characterization of the normal cone to
the intersection of the convex constraints, which is given by the KKT idea of linearizing the
feasible set. As we discussed in Lecture 7, such a linearization usually gives an expression for
the normal cone to the original feasible set, but sometimes it fails to do so. Unsurprisingly,
when constraint qualifications fail, the Lagrangian dual problem and the original problem
might not yield the same value.
When the KKT conditions are not necessary, the only result discussed today that keeps
holding is weak duality (Theorem L8.2), which guarantees that
sup
𝜆∈ℝ𝑟
≥0,𝜇∈ℝ𝑠
inf
𝑥∈ℝ𝑛 ℒ(𝑥; 𝜆, 𝜇) ≤ value(𝑃 ).
In particular, it is totally possible that sup𝜆∈ℝ𝑟
≥0,𝜇∈ℝ𝑠 inf𝑥 ℒ(𝑥; 𝜆, 𝜇) is strictly lower than
𝑓(𝑥). When this happens, the Lagrangian dual problem is said to have a duality gap.
Changelog
• Mar 4, 2025: Fixed typo in domain of lambda, mu in Theorem L8.1.
• Mar 6, 2025: Fixed typo in dual of quadratic problem.
★These notes are class material that has not undergone formal peer review. The TAs and I are grateful
for any reports of typos.