By: Sadik
1
Inverse and Implicit Mapping Theorems: Motivation and Heuristics
Let’s start with a motivating example. Say some nefarious agent gives you the
equation and asks you to solve
y 5 + yx2 + 2y + 3 = 0
for y as a function of x. Even though it’s hard to write down an explicit
formula for y as a function of x, i.e. solve the equation, if you were to sit down
and numerically find pairs (x, y) satisfying the equation and plot them in the
plane (or use Mathematica...), you’d see that it looks like y can be written down
as a function of x, i.e. it passes the so-called ”vertical line test” . Contrast this
to the case of x2 + y 2 = 1, a circle, where you can’t write y as a function of
x. The question then becomes, what conditions do you need on the defining
equation to ensure that such an ”implicit function” exists? And if it does exist,
given that the defining equation is sufficiently regular (i.e. you can differentiate
it a bunch of times), can we prove that the implicit function also shares its
regularity?
An important thing to note is that all of the results that we’ll end up obtaining will be local results, in the sense that close to a point (x0 , y0 ) on the
defining curve, under certain conditions, we’ll be able to show that you can
solve for y as a function of x near this point, but our results won’t be able to
say anything about solving for y as a function of any x. To be concrete, in the
above case of the circle x2 + y 2 = 1, we’ll be able to show that as long as y0 = 0,
close to (x0 , y0 ) on the circle, you can solve for y as a function of x, but this
clearly doesn’t extend to the whole circle.
Moving on from the motivation stage (but still in the pre-proof stage) let’s
try to figure out what conditions we might need to guarantee the existence
of a differentiable implicit function. Generaizing somewhat, say you have a
differentiable function G : R2 → R which defines an equation G(x, y) = 0. Let’s
also say that you can solve for y, so there exists a differentiable function φ(x) s.t.
G(x, y) = 0 iff y = φ(x). If this were the case, we would have G(x, φ(x)) = 0.
Now, let’s try differentiating this equation with respect to x. Using the Chain
Rule, we’d get
∂G ∂x
∂x ∂x
and if
∂G
∂y
+
∂G
∂y φ
(x) = 0
= 0, we could say
∂G
∂x
φ (x) = − ∂G
∂y
1
so it seems like ∂G
∂y = 0 is related to the implicit function existing and being
differentiable. BUT NOTE this is not a neccessary condition for the
existence and differentiability of an implicit function φ(x) (as Edwards seems
to suggest) as the example G(x, y) = x3 − y 3 shows (Think about it at the origin...). However it will be a crucial condition for the theorem we’ll state later on.
Still in heuristic mode, let’s look at a consequence of being able to solve equations like G(x, y) = 0 (given other conditions of course). If we take G(x, y) =
x − f (y) for some function f , then being able to find a φ(x) like above s.t.
G(x, y) = 0 iff y = φ(x), we would have G(x, φ(x)) = 0 = x − f (φ(x)) →
f (φ(x)) = x, i.e. it becomes apparent that φ is f ’s inverse. Hence our ability
to solve for implicitly defined equations seems to allow us to find inverses to
functions as well. With this connection in mind, let’s see what conditions we
might need to guarantee the existence/regularity of local inverses to functions.
First, the simple example of x2 . This function doesn’t have an inverse in
any neighborhood around the origin (Why?) and its derivative at the origin
is 0. Hmm, coincidence, I think not... So maybe requiring the derivative of
the function to be non-zero at a point is important to guaranteeing an inverse
function (Again, this condition isn’t neccessary for an inverse to exist, as x3
demonstrates near the origin, but if you require the inverse to also be differentiable at the point, the condition becomes neccessary).
But is this enough? That is, if we have a differentiable function with nonzero derivative at a point, is it locally bijective (i.e. does it have a local inverse)?
Well, I probably wouldn’t have asked these questions if the answer was yes...so
in fact the claim is false. An illuminating counterexample (although most surely
not the only one) is given by
x + sgn(x) · x1.5 sin( x1 )
and is 0 at x = 0. If you plot this in Mathematica/Wolfram Alpha (try it!),
it doesn’t pass the horizontal line test in a neighborhood near the origin (or
doesn’t seem to... I chose this example since it’s formula is fairly simple, but
to construct a rock-solid counterexample it’s easier to do something along the
lines of what was done in class) but we can compute its derivative at x = 0,
like we did in an earlier pset for similar functions, and it equals 1, which is
surely non-zero. So what’s the issue that’s causing the lack of injectivity near
the origin? Well, if you compute the derivative of this function (and/or plot it),
it isn’t continuous at the origin, which suggests that it’s the lack of continuity
of the derivative that’s ruining our chances of getting a local inverse.
2
2
Actual Statement of the Theorems...
Ok, now that we’ve gone through the exploratory/heuristic stages and we have
an idea of what is needed to guarantee existence/regularity of implicit functions/inverses, I’m going to skip the months/years where we struggle to figure
out the precise statement of the theorems and their proofs...instead here are the
basic cases of the Inverse and Implicit Mapping Theorems.
1-D Inverse Function Theorem: Given a function f : R → R that is
C 1 (i.e. differentiable and the derivative is continuous) in a neighborhood of
a point a ∈ R AND f (a) = 0, then there exists a C 1 function g defined in a
neighborhood of f (a) and neighborhoods U and V of a and f (a) respectively
s.t. g(f (x)) = x for x ∈ U and f (g(y)) = y for y ∈ U .
1-equation Implicit Mapping Theorem: Given a C 1 function G : Rn+1 →
R in a neighborhood of a point (x0 , y0 ) (x0 ∈ Rn and y0 ∈ R) AND ∂G
∂y (x0 , y0 ) =
0, then ∃ a C 1 function φ : Rn → R and a local neighborhood U of (x0 , y0 ) s.t.
G(x, y) = 0 iff y = φ(x) for (x, y) ∈ U .
and for good measure, here are the general, multivariable results.
Inverse Mapping Theorem: Given a function f : Rn → Rn that is C 1
(i.e. differentiable and the derivative is continuous) in a neighborhood of a point
a ∈ R AND f (a) is an invertible linear map, then there exists a C 1 mapping
g defined in a neighborhood of f (a) and neighborhoods U and V of a and f (a)
respectively s.t. g(f (x)) = x for x ∈ U and f (g(y)) = y for y ∈ U .
Note here that derivatives are now linear maps/matrices w.r.t a given basis,
so when we say the derivative is continuous, the simplest way to think about
it is that the entries of the derivative matrix each vary continuously with the
point the derivative is being evaluated at (Even though this conceptualization
is basis dependent, it coincides with the more abstract definition).
Implicit Mapping Theorem: Given a C 1 function G : Rn+m → Rm in
a neighborhood of a point (x0 , y0 ) (x0 ∈ Rn and y0 ∈ Rm ) AND ∂G
∂y (x0 , y0 ) is
invertible, then ∃ a C 1 function φ : Rn → Rm and a local neighborhood U of
(x0 , y0 ) s.t. G(x, y) = 0 iff y = φ(x) for (x, y) ∈ U .
A simple way to think about what
of G , an m x (n + m) matrix, then
∂G
∂y
∂G
∂y
means is that if you look at the matrix
corresponds to the matrix formed by
the last m columns of G , which makes it an m x m matrix (See Edwards for
the actual basis-independent definition and how to work with these generalized
partials). And the heuristic I use to remember the dimensions of the spaces
(like which one is m + n, is the target space m or n, what is φ mapping between,
etc.) is to think of G as defining m equations of your n + m variables, so setting
3
all these equations to 0 takes out m degrees of freedom from the equations, so
there are n degrees of freedom left. Hence, the last m variables should be completely dependent on the first n variables, i.e. y ∈ Rm should depend on x ∈ Rn .
So the proof of the Inverse Mapping Theorem will be given in the next lecture, but it turns out that one can derive the Implicit Mapping Theorem from
the Inverse Mapping Theorem AND vice-versa (i.e. we only have to prove one
of them later).
To see why this is the case, assume that we know that the Implicit Mapping Theorem is true. If we let G(x, y) = x − f (y) for x, y ∈ Rn , and if the
conditions of the Inverse Mapping Theorem are satisfied, then G becomes C 1
(since f is) and ∂G
∂y = −f (y), which is invertible at (x0 , y0 ) (again, by assumption). Hence we can apply the Implicit Mapping Theorem to G(x, y), which
gives us locally a C 1 mapping g : Rn → Rn s.t. G(x, y) = 0 iff y = g(x), i.e.
G(x, g(x)) = x − f (g(x)) = 0 → f (g(x)) = 0, giving us g as f ’s local inverse.
For the other direction, assume that we know the Inverse Mappin’ Theorem
is true. Assuming the conditions of the Implicit Mapping Theorem are satisfied,
define the function f (x, y) = (x, G(x, y)), i.e. f : Rn+m → Rn+m . Since G is C 1
it follows that f is a composition of C 1 mappings and hence is C 1 . Computing
the derivative of f and writing its matrix in block-form, we get
∂x
∂x
∂G
∂x
∂x
∂y
∂G
∂y
=
In
∂G
∂x
0n,m
∂G
∂y
where In is the n x n identity matrix (again, if you’re worried about why I
can still do this even though x and y are vectors, check out Edwards, or think
about everything componentwise). The determinant of this matrix is equal to
det( ∂G
∂y ) (work this out entry-wise using the permutation formula for the determinant if this calculation isn’t apparent), so it is non-zero (by assumption
again), hence f is invertible.
Applying the Inverse Mapping Theorem to f , we get locally a C 1 inverse
mapping h(x, y) = (h1 (x, y), h2 (x, y)) to f (here h1 and h2 are just the ”coordinate” functions of h) so that h(f (x, y)) = (x, y). It follows by definition that
h(x, G(x, y)) = (x, y).
Now, if G(x, y) = 0, then h(x, 0) = (h1 (x, 0), h2 (x, 0)) = (x, y) and entrywise, we see that h1 (x, 0) = x and y = h2 (x, 0). So if we take φ(x) = h2 (x, 0)
to be our C 1 implicitly defined function, we’ve shown that G(x, y) = 0 implies
that y = φ(x). For the other direction, since f (h(x, y)) = (x, y) as well, we have
(h1 (x, y), G(h1 (x, y), h2 (x, y))) = (x, y)
so letting y = 0 (possible since f (x0 , y0 ) = (x0 , 0), so points with y = 0
are in a neighborhood of (x0 , 0)) and using our previous results, we get that
4
(x, G(x, φ(x)) = (x, 0) → G(x, φ(x)) = 0. I.e. if y = φ(x), then G(x, y) =
G(x, φ(x)) = 0, and we’re done.
In the above discussion I ignored a lot of issues about the neighborhoods
where things were defined, but if you’re interested, Edwards’ proofs cover that
much better. And the main takeaway is that the Inverse and Implicit Mapping
Theorems are equivalent (once you know the general forms that work in any
dimension).
3
The Banach Contraction Principle
As I said, the Inverse Mapping Theorem will be proven in the next lecture, but
we’ll prove here a very important, general principle that can be used all over
the place to establish local existence of things (inverses, zeroes, solutions to
differential equations, etc.) and that we’ll use in the proof of the IMT, namely
the Banach Contraction Principle. It usually even gives you an algorithm
to compute the thing you’re trying to show exists, as we’ll see in a minute.
I’ll give the theorem and proof in its most general setting (in the context of
metric spaces) but we’ll only end up using it in Rn , so don’t worry too much if
it seems a bit weird at first.
The setup is as follows: You have a complete metric space X, and a contraction mapping φ : X → X that has the property that ∃k < 1 s.t.
d(φ(x), φ(y)) ≤ k · d(x, y) ∀x, y ∈ X (i.e. it contracts the distance between
any two points, hence the name). The theorem then states that ∃ a unique
fixed point x ∈ X, i.e. a point such that φ(x) = x, and it can be obtained by
taking any point x0 ∈ X, defining a sequence of points xn+1 = φ(xn ), and then
finding the limit of this sequence (it’ll turn out to converge), i.e. you take any
point in the space, and then repeatedly apply the contraction map to it.
Before we get to the proof, let me give a few tips on usage. Where we’ll be
using it, Euclidean spaces, we’ll normally have our candidate contraction map φ
already figured out based on its fixed point equation, φ(x) = x, leading to something we desire (an inverse, a root, etc.). The next thing to do is to find a suitable
closed set (remeber subsets of Rn whose induced metric space is complete are
closed sets) that the map preserves. This is a crucial, subtle thing to do, as the
theorem only applies if φ maps X to itself, so be wary of this (I forget this step all
the time...). To show that it’s a contraction, for our purposes, we’ll usually have
to resort to the Multivariable Mean-Value Inequality. If you don’t know what
this is, that’s fine (but check out Edwards’ section on it, because it’s fairly important to know on at least some level), the idea is similar to the 1-dimensional
case. In that case, if you have a differentiable function on R and the derivative
is bounded by a constant k < 1 everywhere in the region of interest, note that
5
the MVT says that |f (b) − f (a)| = |f (c)(b − a)| = |f (c)||b − a| ≤ k|b − a|, i.e.
its a contraction. So in general, in our applications, something akin to this will
have to be done to show that a map is a contraction.
And lastly, φ satsifies the weaker condition that ∃C s.t. d(φ(x), φ(y)) ≤
C · d(x, y) (we’re not requiring C to be less than 1 here), which is known as
Lipschitz continuity. The word continuous is in the name, because as you
probably already guessed, such mappings are continuous (Why?). However the
converse is not true, i.e. continuity does not imply Lipschitz continuity (Can
you think of a example of such a function? Don’t overthink this one...).
Ok, here goes the proof. Let’s first show that a fixed point of φ exists, and
later we’ll show it’s unique. To find it, we execute the algorithm I mentioned
eariler: take some point x0 ∈ X and define a sequence of points via xn+1 =
φ(xn ). Now we need to worry about if this sequence actually converges. But
since X is complete, this is the same as asking if it’s Cauchy (which is a bit
easier b/c we don’t need to know what the limit is explicitly). To see that it’s
Cauchy, we need to get a bound on terms like d(xn+m , xn ). How do we do this?
Well, a little trickiness and the Triangle Inequality goes a long way...so first
note that we have the following inequality via the fact that φ is a contraction
mapping
d(xn+1 , xn ) = d(φ(xn ), φ(xn−1 )) ≤ k · d(xn , xn−1 )
and doing this trick repeatedly, we see that
d(xn+1 , xn ) ≤ k n d(x1 , x0 )
Now, repeated use of the Triangle Inequality tells us that
d(xn+m , xn ) ≤ d(xn+m , xn+m−1 ) + d(xn+m−1 , xn ) ≤ ... ≤
d(xn+m , xn+m−1 ) + ... + d(xn+i , xn+i−1 ) + ... + d(xn+1 , xn )
and so using the previous inequalities we got, we get
d(xn+m , xn ) ≤ k n+m−1 d(x1 , x0 ) + ... + k n d(x1 , x0 ) =
k n d(x1 , x0 )(1 + k + ... + k m−1 )
but the geometric series in the above equation is definitely less than the
1
infinite series 1 + k + k 2 + ... = 1−k
, so we get the inequality
d(xn+m , xn ) ≤
kn
1−k d(x1 , x0 )
and this is precisely what we need to show that the sequence is Cauchy,
kN
d(x1 , x0 ) < , in
because given any > 0, we can choose N so large so that 1−k
which case ∀n, m > N , d(xn , xm ) ≤
k n is a decreasing function of n.
kmin(n,m)
d(x1 , x0 )
1−k
6
≤
kN
1−k d(x1 , x0 )
<
since
Ok, so we know that this sequence xn is Cauchy, and thus converges to a
limit, say x∗ . So what? Well, that limit is probably the fixed point...but how
do we check this? We note that
φ(x∗ ) = φ( lim xn ) = lim φ(xn ) = lim xn+1 = x∗
n→∞
n→∞
n→∞
where we used the fact that φ is continuous, and thus respects convergent
sequences (i.e. it’s sequentially continuous).
Nice, we”ve almost got everything! The final thing to show is that this fixed
point is unique. Doing the usual thing, let’s say it wasn’t unique, and that there
existed 2 distinct fixed points x∗ and x∗∗ . Hmm, I wonder what would happen
if we applied φ to them... I mean, it’s supposed to bring them closer, but they’re
fixed points...well let’s see
d(x∗ , x∗∗ ) = d(φ(x∗ ), φ(x∗∗ )) ≤ k · d(x∗ , x∗∗ )
but wait, k < 1, and this is telling us that d(x∗ , x∗∗ ) ≤ k · d(x∗ , x∗∗ ), which
could only be possible if d(x∗ , x∗∗ ) = 0, i.e. if x∗ = x∗∗ , contrary to our assumption. Hence the fixed point is unique.
That’s all folks. Tune in next time for the proof of the Inverse Mapping
Theorem.
P.S. If you have any comments/feedback/suggestions on my lecture notes,
I’d love to hear about them so I can improve to help you guys learn better.
E-mail would be preferable (smoke signals would not be...). Thanks!
7