*E = mc*^{2} on a pub
serviette

or, *E = mc*^{2} *on a bar napkin*, for all you American yahoos

NOTE: In some version of windows and some versions of Internet Explorer, there is a problem displaying the equation fonts correctly. We are tying to fix this ASAP, but in the mean time, this page is best viewed with Firefox.

Ever been haunting a pub on a cloudy night and some jerk
starts an argument about relativity? Or maybe you want to impress the brainy
banshee or nerdy nymph over sips of Jägermeister? Described here is a very
concise derivation of Albert Einstein’s famous equation, *E =
mc*^{2}. It’s concise enough that it can be written right at
the pub on a paper napkin (or a couple napkins if they’re small).

This page is inspired by the fact that it’s surprisingly
difficult to find a good derivation of this equation. Even Einstein’s original
1905 papers don’t even technically have the equation *E = mc*^{2 }within them – they get close, but never do they
actually come out and say “*E = mc*^{2}”.
The corresponding equations in Einstein’s papers are actually quite ugly,
leaving you to do some significant simplifying on your own if you wanted to end
with the elegant *E = mc*^{2}.
Moreover, even fine relativity text books, of which I have no complaints about
in any other regard, often stumble and fall short on this mass-energy
relationship derivation. I find numerous university level text books that use
circular arguments that don’t really hold weight, when it comes to the
derivation of this famous relationship.

What we have done a the Shady Crypt, is used a number of sources, combined the respective material, and made extensive modifications ourselves, to form a derivation that is concise and simple enough to be written on two sides of a paper serviette (napkin). You will have to write small, but it is possible.

Any text below typed in red
is an essential part, and should be written on the napkin. Of course, it’s
important that you *understand* what it
is you are writing. Your derivations won’t be worth much at the pub if they
can’t stand up to questioning. So a generous amount of explanations are given,
not typed in red, along with the essentials. Any text below that is not typed
in red needn’t be written on the napkin.

Part I of our derivation starts with the postulate: The speed of light is always measured to be constant, i.e. the same value, regardless of the relative velocity of the measuring device. Not too long before Einstein wrote his original paper on what we now call “special relativity” [1], Albert Michelson and Edward Morley performed an experiment in an attempt to measure Earth’s velocity through the ether, and found that the speed of light is always measured to be the same, no matter what speed or direction the Earth is moving. One might think that this Michelson-Morley experiment is what primarily inspired Einstein to create special relativity. But it turns out not to be true.

In Einstein’s education, he studied electricity and magnetism (among other things). He was troubled by a basic concept in electromagnetic theory. We at the Shady Crypt have also studied classical electrodynamics, and can attest that this particular concept doesn’t sit well with us either. The concept goes something like this: When calculating the magnetic force between two moving electrons (suppose they are moving together, side by side), the process is as follows:

(1) The first moving electron creates a magnetic field, which can be described mathematically.

(2) The second moving electron feels a force as it moves through the magnetic field created by the first electron.

(Switch which electron you start with to find the force felt by the other electron.)

So there are actual 3 velocities that we need to pay attention to: the respective velocities of the two electrons; and the velocity of the field, which in classical electrodynamics is always zero.

But if you think about it, this doesn’t really make any
sense. If the two electrons are standing still, no magnetic force is felt by
either electron. But if both electrons are next to each other, moving together
at the same velocity and same direction, there *is* a force felt, because each electron is moving with respect to a
zero-velocity magnetic field. But what constitutes zero velocity? The ground?
Well, the ground is on the Earth, and the Earth is rotating and revolving
around the Sun, and the whole solar system is moving with respect to other
celestial objects, so what the heck *isn’t*
moving? And who says that the electrons are the ones that are really moving in
the first place? Since everything else is moving, maybe they are the ones
standing still. Who’s to say? And it’s an important distinction, because if you
choose a different “reference” of what is standing still, you get different
answers when you calculate the magnetic forces. (The electric force tends to
push the pair of electrons apart. When the pair is moving, the magnetic force
tends to pull them together, but to a smaller extent. The strength of the
magnetic force increases to the same level as the electric force, only when the
electrons’ speed increases to the speed of light.)

It doesn’t quite end there. As a student struggles through the
mathematics in electromagnetic field theory, one eventually comes to Maxwell’s
equations. Maxwell’s equations are the “holy grail” of electromagnetic field
theory. They consist of 4 separate, simply elegant equations, and together they
can describe all aspects of classical electrometric field theory. But they
still don’t answer the question of “what is it that *isn’t* moving?” Using
Maxwell’s equations, one can calculate the speed of light. But the speed of
light relative to what? The Earth? The Sun? Something else? Who’s to say? Enter
Albert Einstein.

Einstein surmised that it didn’t matter what velocity the instrumentation used to measure the speed of light. Every laboratory will come up with the same value, regardless of each laboratory’s relative velocity. But how could this be? If a laboratory in a train is moving at 99% the speed of light, almost keeping up with a ray of light just ahead of it, how could the train measure the speed of the ray to be moving away from the train at 100% the speed of light? The only answer is this: time is moving slower in the train, relative to the ground where the velocity of the train is being measured. (By the same respect, instruments in the train will actually measure time moving slower on the ground, which is also true. Both are equally valid measurements. While this may seem like a horrible paradox, it is easily resolved by analyzing a spacetime diagram or similar tool, but that’s outside the scope of what we want to write on the napkin.)

So let’s go back to our derivation. We
create a mathematical system such that as something moves faster through space,
it moves slower through time, relative to the laboratory that is
measuring the velocities. We will define
two different measures (lengths) of time. The variable *t*
is used to represent a length of time according to the clocks in the laboratory
that is measuring the velocities of things. The variable _{} (the Greek letter tau) is used to indicate the
amount of time that passes, according to the clocks in the moving object. We call *t regular
time*, and we call _{} *proper time*. We call the moving object the *moving frame of reference*, or *moving
frame* for short. We call the laboratory that is measuring the velocities of
things the *inertial frame*. It is
assumed that the inertial frame is not accelerating.

In summary,

Inertial frame: The [non-accelerating] laboratory keeping track of the various velocities of objects.

Moving frame: Anything moving relative to the Inertial frame (may or may not be accelerating).

* _{}*: regular
time. A time interval measured by the clocks in the inertial frame.

_{}:
proper time. A time interval measured by the
clocks in the moving frame.

The mathematical system can be created rather simply. In
doing so we define Minkowski
spacetime. During his education, Einstein was
a student of Herman Minkowski. In Einstein’s original
paper on special relativity [1], Einstein treated each direction, including the
time direction, separately. They were interrelated, but still separate. He went
so far as to combine the different equations into large matrices, but they were
still cumbersome to say the least. A few years later, Minkowski,
when giving lectures on relativity, began to introduce the concept of
4-dimensional spacetime. You, I and everything else
rockets through spacetime with a magnitude equal to
the speed of light – no faster, no slower. Because everything’s speed through spacetime is a constant, if something moves faster through
space, it necessarily moves slower through time. The relationship of how
velocities though space and time dimensions relate is something akin to the
Pythagorean theorem, _{}. Minkowski
spacetime gives the same results as Einstein’s
methods, but is much more compact. We will use Minkowski
spacetime on our serviette.

Simply treat time as any other dimension of space, and
define some minor differences of the time dimension as described a little
later. So to start, we have 4-dimensional spacetime.
The different directions (as seen in the inertial frame) are *ct*, *x*, *y*, and *z*, where *c* is the
speed of light. We multiply time by *c*, so
that *ct* has units of length, just
like the other directions.

Vector’s in 3-dimensions we denote with an arrow above the
variable’s symbol, such as the 3-velocity_{}. Four dimensional spacetime
vectors, called 4-vectors, are denoted using a squiggly underneath, such as the
position 4-vector_{}. 4-vectors take the form

_{}

_{} (1)

or in shorthand form,

_{}
(2)

Now we define the inner product, or dot product of two 4-vectors. For this definition, suppose we have two arbitrary 4-vectors,

_{}, _{}, (3)

then,

_{}
(4)

where,

_{}
(5)

Now you might be saying, “whoa,
this is getting too complicated.” But it’s not as bad as it looks. All as we’re
saying here is that the dot product of a 4-vector is the same idea as the dot
product of a 3-vector, except the time component gets a negative sign. In other
words, when you take the dot product (inner product) of a 4-vector, you
multiply the respective time components together, and give that a negative
sign; multiply the *x*
components together; then the *y*
components; then the *z*
components; and finally add all the subsequent results together.

As an example, suppose we have

_{}, _{},

then the dot product of the two vectors is

_{}.

Since each component is perpendicular to other components, we can also express 4-vectors in terms of differentials.

_{} (6)

and

_{}
(7)

And now we define _{}. Remember _{}? _{} is the *proper time*, i.e. a time interval as measured by clocks in the
moving frame. And _{} is essentially the
differential length of a spacetime 4-vector, in units
of time.

_{}

_{}.
(8)

So far we have represented our spacetime
4-vectors in units of length. To accomplish this, regular time *t* was converted to units of length by
multiplying it by *c*. But we could have
just as easily kept everything in units of time, and *divided **x*, *y*, and z by *c*.
Had we done so, _{} is simply the
imaginary length of that differential position 4-vector.

Of course what we’ve really been interested in all along, is
the relationship between _{} and _{}. Particularly, we are interested in determining
_{}. This represents the amount of time it takes in the inertial
frame, for the clock in the moving frame to advance one unit of time. Combining
equations (7) and (8) gives us,

_{}
(9)

Dividing by _{}* *produces

_{}
.

_{} (10)

where _{} is the moving
frame’s 3-velocity, relative to the inertial frame (it is understood that _{}is the magnitude squared; a scalar quantity). Dividing
by _{} gives us

_{}
(11)

taking the square root gives us

_{}
(12)

and after inverting we have

_{}.
(13)

This relationship is used so often, we give it a special
designation, _{} (the Greek letter gamma).

It’s worthwhile to point out that so far, we really haven’t *derived* anything. All that we have done
is *created* a mathematical system such
that all observers measure the speed of light to be the same value, regardless
of each observers relative velocity. We started out
with the postulate that all observers get the same value when measuring the
speed of light, and we have *created* a
mathematical system that does just that. So far, that is all we have done. Now we can move forward and get to the
nitty-gritty of special relativity.

Next we define the *velocity 4-vector*, _{}. The velocity 4-vector is the derivative of the position
4-vector, with respect to proper time, t.

_{}. .

It is a good idea here to point out a trick that we will use
often. Note that, _{}. Using this trick, the above equation reduces to

_{}
.

or more succinctly,

_{}
(14)

_{}, the velocity 4-vector, represents the speed and direction
that something is traveling through 4-dimensional spacetime.
I’ll leave it as a exercise to you to calculate the
speed (magnitude of the velocity), but I’ll tell you the answer. The speed that
anything travels through spacetime is always the
speed of light, regardless of the object’s 3-velocity (the answer comes out to *c*_{}, but since we are only concerned with the magnitude [and not
the phase], the *speed* reduces to *c*). So the important information that _{}gives us is the object’s *direction*
through spacetime, not forgetting that regular time
is one possible direction.

Next we define the *momentum 4-vector, _{}.* Just like the momentum 3-vector, the
momentum 4-vector is simply the velocity vector multiplied times the mass,

_{}
(15)

If you continue to use and pursue special relativity beyond
the napkin, you will find that _{}is commonly called the energy-momentum 4-vector. This is
because the time component is proportional to the total energy of the system.
Of course, we haven’t derived that yet, so we’ll just keep on calling it the *momentum* 4-vector for now. But on a side
note, I have actually witnessed _{} “derivations” that
essentially stop about here and claim something to the effect of, “…since _{} is *called* the energy-momentum 4-vector, *E* is therefore equal to _{}.” Ya gotta
be kidding me. For shame.

Next we define the force
4-vector, _{}. The force 4-vector is the derivative of the momentum
4-vector with respect to proper time, _{}.

_{} .

Using our trick for equation (14), this simplifies to

_{}
.

where dotted variables represent
derivatives with respect to normal time _{}, (not proper time, _{}). More compactly, this equation can be written as

_{} (16)

where _{}is the force 3-vector, the time derivative of _{} (with respect to
normal time, _{}),

_{}, (17)

where _{} is the acceleration
3-vector. Notice here that we’ve had to modify the force 3-vector from _{}. This is because as an object approaches the speed of light,
it becomes harder and harder to push it such that it goes even faster. We must
abandon Isaac Newton’s _{} in favor of the
relativistic version, _{}. Einstein, in his original special relativity paper [1],
used a mathematically equivalent, yet conceptually different terminology.
Einstein kept _{} relationship, and
accounted for the relativistic effects by modifying the definition of mass. As
a matter of fact, according to Einstein’s original paper, every moving object
has 2 masses associated with it, longitudinal mass and transverse mass (terms
also used by Max Abraham and Hendrik Lorentz, a few years prior). Yes: one
object; two different masses. Many (most?) contemporary physicists have
abandoned the concept of relativistic mass, and have instead accepted the
relativistic modification of the force 3-vector. We shall do the same. (The momentum 3-vector faces a similar
modification, _{}, but we don’t explicitly use the momentum 3-vector on our
napkin, thus it’s not discussed here.)

It can be shown that

_{}.
(18)

i.e., the dot product of the force and velocity 4-vectors is zero. Proof:

NOTE: This might be a fine time to
pick up a second pub serviette (bar napkin) for the proof, and then come back
to the original napkin later. If you can fit it on the original serviette, then
fine. But it does take up precious space. But if you attempt to explain it away
verbally rather than show the explicit proof, here is some advice that might
help. It makes sense that _{}. Stop and think about if for a moment. As we already
discussed, you, I and everything else in the universe are rocketing through
4-dimentional spacetime with a speed (magnitude)
being precisely the value of the speed of light; no faster, no slower. It’s not
possible to change the magnitude of an object’s speed through 4-dimensional spacetime – only the direction. So if any 4-force is
applied to an object, it is guaranteed that the component of the force parallel
to the object’s velocity 4-vector is zero. That is exactly how to interpret _{}. All that being said, you should still be prepared to work
out the following proof if requested.

Combining equations (16), (17), and taking the dot product with (14) yields

_{}

_{} .

_{} .

_{} .

_{}

_{}

_{}

_{}
.

So now, we are completely confident that _{}. Next we substitute equations (14) and (16) into equation
(18) and we get a surprising result.

_{} (19)

dividing by _{} and rearranging, we
have

_{}
(20)

but we know from the classical
physics, by nearly the very definition of kinetic energy, *K.E.*,
that

_{}. (21)

Combining equations (20) and (21), and integrating yields

_{} .

_{} (22)

where *K _{constant}*
is an arbitrary constant until we apply our initial conditions. We know that when the 3-velocity is 0,

_{}
(23)

So we have the relativistic equation for kinetic energy,

_{} (24)

At this point we know we are on the right track. If we take
the _{},
we find that it reduces to the classical _{} for *v* near 0. However, it is unfortunate and
disheartening that vast majority of sources attempting to derive Einstein’s
famous _{}equation stop here. Their argument is that the ** total**
energy of the system is

Thus begins part II of our derivation. In the collective
opinion of the Shady Crypt, the most concise argument discussing mass-energy
transitions comes from Albert Einstein himself in his second 1905 paper discussing
relativity [2]. Although the rest of the proof shown below is based on
Einstein’s work, the terminology and details have been significantly modified
to better fit with the _{} on a pub
serviette.

Imagine an apparatus, in an internal frame, containing an isotropically (same in all directions) radiating light source. For every ray of light, there is also a ray of light in the opposite direction.

A given pair of rays (opposite directions) correspond to a
unit of energy *E*, when measured in the
apparatus’s frame of reference. Each ray of a given pair contains energy of ˝ *E*. When measured in the apparatus’s frame
of reference, the total energy of a given pair of rays is

_{}
(25)

Where *E _{s}*
is energy of the pair, as measured in the same frame of reference as the
apparatus, which we call the stationary frame.

_{}

Now imagine a spaceship moving
toward the apparatus on the *x*-axis, at speed *v*. The spaceship measures
the energy of the same pair of light rays to be

_{} (26)

where *E _{m}* is
the energy of the pair, measured in the
moving frame, and

It was known back in 1905 that the energy of a “light ray”
is proportional to its frequency. This relationship was established by Albert
Einstein himself in another 1905 paper [3], involving the photoelectric effect.
(Max Planck technically derived the relationship, but it was Einstein who
nailed its significance.) The principles described in this paper are not only
of particular interest here, but (ironically) would become part and parcel to
the foundation of quantum mechanics. But it’s not important to really know
anything about quantum mechanics for the *E = **mc*^{2} on a pub serviette
derivation. Notice that I have not used the term *photon* at all in this derivation. The term *light ray* is sufficient for this exercise. For this derivation, all
that is necessary to take away from Einstein’s photoelectric effect, 1905
paper, is that all else being equal, the energy of a light ray is proportional
to its frequency.

So the stationary observer measures *E*
energy for a given pair of light rays, and the moving observer measures _{} energy for the exact same
rays. But energy is energy. One can’t measure different values of energy in
different frames of reference or conservation of energy would be violated;
unless there is something else going on related to energy in the different
frames. And there is. In the stationary frame, the apparatus has no kinetic
energy relative to the stationary observer. But the apparatus does have kinetic
energy according to the observer in the moving frame. So the difference between
*E _{m}*
and

_{}
.

which after factoring becomes

_{}
(27)

Keep in mind that we are not talking about the kinetic
energy of the entire apparatus – only the kinetic energy of the mass associated
with the pair of light rays that were measured*. Since the moving observer
measures higher radiated energy, the associated component of that apparatus’s
Kinetic energy must have been reduced by the same amount (conservation of
energy). The only conclusion that follows from equation (32) is that if the apparatus gives off energy *E*,
it’s mass *must* decrease by a
corresponding amount as a result. Specifically, we obtain that amount by
combining equations (27) and (24),

_{}
.

or simply,

_{}.
(28)

*(It is a subtle yet important consideration that the angle be measured in the apparatus’ frame. If an object is radiating isotopically in its own frame, it is not radiating isotopically in other frames due a process called “relativistic beaming.” In our derivation, we are talking about the same two light rays regardless of what is measuring them. We could take relativistic beaming into account and integrate the apparatus’s energy in all directions. That would give the same result we obtain but be more complicated than what we want to show on our napkin. The approach we used here mirrors Einstein’s original work.)

References:

[1] *On the Electrodynamics of Moving Bodies*, A. Einstein, Annelen
der Physik,

[2] *Does the Inertia of a Body Depend Upon It’s
Energy Content?,*
A. Einstein,

[3] *Concerning an Heuristic Point of View Toward the Emission and Transformation of Light*, A.
Einstein, Annelen der Physik,

[4] *Gravity, An
Introduction to Einstein’s General Relativity*, James B. Hartle,
Pearson Education, Inc., 2003.