E = mc2 on a pub serviette
or, E = mc2 on a bar napkin, for all you American yahoos
NOTE: In some version of windows and some versions of Internet Explorer, there is a problem displaying the equation fonts correctly. We are tying to fix this ASAP, but in the mean time, this page is best viewed with Firefox.
Ever been haunting a pub on a cloudy night and some jerk starts an argument about relativity? Or maybe you want to impress the brainy banshee or nerdy nymph over sips of Jägermeister? Described here is a very concise derivation of Albert Einstein’s famous equation, E = mc2. It’s concise enough that it can be written right at the pub on a paper napkin (or a couple napkins if they’re small).
This page is inspired by the fact that it’s surprisingly difficult to find a good derivation of this equation. Even Einstein’s original 1905 papers don’t even technically have the equation E = mc2 within them – they get close, but never do they actually come out and say “E = mc2”. The corresponding equations in Einstein’s papers are actually quite ugly, leaving you to do some significant simplifying on your own if you wanted to end with the elegant E = mc2. Moreover, even fine relativity text books, of which I have no complaints about in any other regard, often stumble and fall short on this mass-energy relationship derivation. I find numerous university level text books that use circular arguments that don’t really hold weight, when it comes to the derivation of this famous relationship.
What we have done a the Shady Crypt, is used a number of sources, combined the respective material, and made extensive modifications ourselves, to form a derivation that is concise and simple enough to be written on two sides of a paper serviette (napkin). You will have to write small, but it is possible.
Any text below typed in red is an essential part, and should be written on the napkin. Of course, it’s important that you understand what it is you are writing. Your derivations won’t be worth much at the pub if they can’t stand up to questioning. So a generous amount of explanations are given, not typed in red, along with the essentials. Any text below that is not typed in red needn’t be written on the napkin.
Part I of our derivation starts with the postulate: The speed of light is always measured to be constant, i.e. the same value, regardless of the relative velocity of the measuring device. Not too long before Einstein wrote his original paper on what we now call “special relativity” [1], Albert Michelson and Edward Morley performed an experiment in an attempt to measure Earth’s velocity through the ether, and found that the speed of light is always measured to be the same, no matter what speed or direction the Earth is moving. One might think that this Michelson-Morley experiment is what primarily inspired Einstein to create special relativity. But it turns out not to be true.
In Einstein’s education, he studied electricity and magnetism (among other things). He was troubled by a basic concept in electromagnetic theory. We at the Shady Crypt have also studied classical electrodynamics, and can attest that this particular concept doesn’t sit well with us either. The concept goes something like this: When calculating the magnetic force between two moving electrons (suppose they are moving together, side by side), the process is as follows:
(1) The first moving electron creates a magnetic field, which can be described mathematically.
(2) The second moving electron feels a force as it moves through the magnetic field created by the first electron.
(Switch which electron you start with to find the force felt by the other electron.)
So there are actual 3 velocities that we need to pay attention to: the respective velocities of the two electrons; and the velocity of the field, which in classical electrodynamics is always zero.
But if you think about it, this doesn’t really make any sense. If the two electrons are standing still, no magnetic force is felt by either electron. But if both electrons are next to each other, moving together at the same velocity and same direction, there is a force felt, because each electron is moving with respect to a zero-velocity magnetic field. But what constitutes zero velocity? The ground? Well, the ground is on the Earth, and the Earth is rotating and revolving around the Sun, and the whole solar system is moving with respect to other celestial objects, so what the heck isn’t moving? And who says that the electrons are the ones that are really moving in the first place? Since everything else is moving, maybe they are the ones standing still. Who’s to say? And it’s an important distinction, because if you choose a different “reference” of what is standing still, you get different answers when you calculate the magnetic forces. (The electric force tends to push the pair of electrons apart. When the pair is moving, the magnetic force tends to pull them together, but to a smaller extent. The strength of the magnetic force increases to the same level as the electric force, only when the electrons’ speed increases to the speed of light.)
It doesn’t quite end there. As a student struggles through the mathematics in electromagnetic field theory, one eventually comes to Maxwell’s equations. Maxwell’s equations are the “holy grail” of electromagnetic field theory. They consist of 4 separate, simply elegant equations, and together they can describe all aspects of classical electrometric field theory. But they still don’t answer the question of “what is it that isn’t moving?” Using Maxwell’s equations, one can calculate the speed of light. But the speed of light relative to what? The Earth? The Sun? Something else? Who’s to say? Enter Albert Einstein.
Einstein surmised that it didn’t matter what velocity the instrumentation used to measure the speed of light. Every laboratory will come up with the same value, regardless of each laboratory’s relative velocity. But how could this be? If a laboratory in a train is moving at 99% the speed of light, almost keeping up with a ray of light just ahead of it, how could the train measure the speed of the ray to be moving away from the train at 100% the speed of light? The only answer is this: time is moving slower in the train, relative to the ground where the velocity of the train is being measured. (By the same respect, instruments in the train will actually measure time moving slower on the ground, which is also true. Both are equally valid measurements. While this may seem like a horrible paradox, it is easily resolved by analyzing a spacetime diagram or similar tool, but that’s outside the scope of what we want to write on the napkin.)
So let’s go back to our derivation. We
create a mathematical system such that as something moves faster through space,
it moves slower through time, relative to the laboratory that is
measuring the velocities. We will define
two different measures (lengths) of time. The variable t
is used to represent a length of time according to the clocks in the laboratory
that is measuring the velocities of things. The variable (the Greek letter tau) is used to indicate the
amount of time that passes, according to the clocks in the moving object. We call t regular
time, and we call
proper time. We call the moving object the moving frame of reference, or moving
frame for short. We call the laboratory that is measuring the velocities of
things the inertial frame. It is
assumed that the inertial frame is not accelerating.
In summary,
Inertial frame: The [non-accelerating] laboratory keeping track of the various velocities of objects.
Moving frame: Anything moving relative to the Inertial frame (may or may not be accelerating).
: regular
time. A time interval measured by the clocks in the inertial frame.
:
proper time. A time interval measured by the
clocks in the moving frame.
The mathematical system can be created rather simply. In
doing so we define Minkowski
spacetime. During his education, Einstein was
a student of Herman Minkowski. In Einstein’s original
paper on special relativity [1], Einstein treated each direction, including the
time direction, separately. They were interrelated, but still separate. He went
so far as to combine the different equations into large matrices, but they were
still cumbersome to say the least. A few years later, Minkowski,
when giving lectures on relativity, began to introduce the concept of
4-dimensional spacetime. You, I and everything else
rockets through spacetime with a magnitude equal to
the speed of light – no faster, no slower. Because everything’s speed through spacetime is a constant, if something moves faster through
space, it necessarily moves slower through time. The relationship of how
velocities though space and time dimensions relate is something akin to the
Pythagorean theorem, . Minkowski
spacetime gives the same results as Einstein’s
methods, but is much more compact. We will use Minkowski
spacetime on our serviette.
Simply treat time as any other dimension of space, and define some minor differences of the time dimension as described a little later. So to start, we have 4-dimensional spacetime. The different directions (as seen in the inertial frame) are ct, x, y, and z, where c is the speed of light. We multiply time by c, so that ct has units of length, just like the other directions.
Vector’s in 3-dimensions we denote with an arrow above the
variable’s symbol, such as the 3-velocity. Four dimensional spacetime
vectors, called 4-vectors, are denoted using a squiggly underneath, such as the
position 4-vector
. 4-vectors take the form
(1)
or in shorthand form,
(2)
Now we define the inner product, or dot product of two 4-vectors. For this definition, suppose we have two arbitrary 4-vectors,
,
, (3)
then,
(4)
where,
(5)
Now you might be saying, “whoa, this is getting too complicated.” But it’s not as bad as it looks. All as we’re saying here is that the dot product of a 4-vector is the same idea as the dot product of a 3-vector, except the time component gets a negative sign. In other words, when you take the dot product (inner product) of a 4-vector, you multiply the respective time components together, and give that a negative sign; multiply the x components together; then the y components; then the z components; and finally add all the subsequent results together.
As an example, suppose we have
,
,
then the dot product of the two vectors is
.
Since each component is perpendicular to other components, we can also express 4-vectors in terms of differentials.
(6)
and
(7)
And now we define . Remember
?
is the proper time, i.e. a time interval as measured by clocks in the
moving frame. And
is essentially the
differential length of a spacetime 4-vector, in units
of time.
.
(8)
So far we have represented our spacetime
4-vectors in units of length. To accomplish this, regular time t was converted to units of length by
multiplying it by c. But we could have
just as easily kept everything in units of time, and divided x, y, and z by c.
Had we done so, is simply the
imaginary length of that differential position 4-vector.
Of course what we’ve really been interested in all along, is
the relationship between and
. Particularly, we are interested in determining
. This represents the amount of time it takes in the inertial
frame, for the clock in the moving frame to advance one unit of time. Combining
equations (7) and (8) gives us,
(9)
Dividing by produces
.
(10)
where is the moving
frame’s 3-velocity, relative to the inertial frame (it is understood that
is the magnitude squared; a scalar quantity). Dividing
by
gives us
(11)
taking the square root gives us
(12)
and after inverting we have
.
(13)
This relationship is used so often, we give it a special
designation, (the Greek letter gamma).
It’s worthwhile to point out that so far, we really haven’t derived anything. All that we have done is created a mathematical system such that all observers measure the speed of light to be the same value, regardless of each observers relative velocity. We started out with the postulate that all observers get the same value when measuring the speed of light, and we have created a mathematical system that does just that. So far, that is all we have done. Now we can move forward and get to the nitty-gritty of special relativity.
Next we define the velocity 4-vector, . The velocity 4-vector is the derivative of the position
4-vector, with respect to proper time, t.
. .
It is a good idea here to point out a trick that we will use
often. Note that, . Using this trick, the above equation reduces to
.
or more succinctly,
(14)
, the velocity 4-vector, represents the speed and direction
that something is traveling through 4-dimensional spacetime.
I’ll leave it as a exercise to you to calculate the
speed (magnitude of the velocity), but I’ll tell you the answer. The speed that
anything travels through spacetime is always the
speed of light, regardless of the object’s 3-velocity (the answer comes out to c
, but since we are only concerned with the magnitude [and not
the phase], the speed reduces to c). So the important information that
gives us is the object’s direction
through spacetime, not forgetting that regular time
is one possible direction.
Next we define the momentum 4-vector,. Just like the momentum 3-vector, the
momentum 4-vector is simply the velocity vector multiplied times the mass, m.
(15)
If you continue to use and pursue special relativity beyond
the napkin, you will find that is commonly called the energy-momentum 4-vector. This is
because the time component is proportional to the total energy of the system.
Of course, we haven’t derived that yet, so we’ll just keep on calling it the momentum 4-vector for now. But on a side
note, I have actually witnessed
“derivations” that
essentially stop about here and claim something to the effect of, “…since
is called the energy-momentum 4-vector, E is therefore equal to
.” Ya gotta
be kidding me. For shame.
Next we define the force
4-vector, . The force 4-vector is the derivative of the momentum
4-vector with respect to proper time,
.
.
Using our trick for equation (14), this simplifies to
.
where dotted variables represent
derivatives with respect to normal time , (not proper time,
). More compactly, this equation can be written as
(16)
where is the force 3-vector, the time derivative of
(with respect to
normal time,
),
, (17)
where is the acceleration
3-vector. Notice here that we’ve had to modify the force 3-vector from
. This is because as an object approaches the speed of light,
it becomes harder and harder to push it such that it goes even faster. We must
abandon Isaac Newton’s
in favor of the
relativistic version,
. Einstein, in his original special relativity paper [1],
used a mathematically equivalent, yet conceptually different terminology.
Einstein kept
relationship, and
accounted for the relativistic effects by modifying the definition of mass. As
a matter of fact, according to Einstein’s original paper, every moving object
has 2 masses associated with it, longitudinal mass and transverse mass (terms
also used by Max Abraham and Hendrik Lorentz, a few years prior). Yes: one
object; two different masses. Many (most?) contemporary physicists have
abandoned the concept of relativistic mass, and have instead accepted the
relativistic modification of the force 3-vector. We shall do the same. (The momentum 3-vector faces a similar
modification,
, but we don’t explicitly use the momentum 3-vector on our
napkin, thus it’s not discussed here.)
It can be shown that
.
(18)
i.e., the dot product of the force and velocity 4-vectors is zero. Proof:
NOTE: This might be a fine time to
pick up a second pub serviette (bar napkin) for the proof, and then come back
to the original napkin later. If you can fit it on the original serviette, then
fine. But it does take up precious space. But if you attempt to explain it away
verbally rather than show the explicit proof, here is some advice that might
help. It makes sense that . Stop and think about if for a moment. As we already
discussed, you, I and everything else in the universe are rocketing through
4-dimentional spacetime with a speed (magnitude)
being precisely the value of the speed of light; no faster, no slower. It’s not
possible to change the magnitude of an object’s speed through 4-dimensional spacetime – only the direction. So if any 4-force is
applied to an object, it is guaranteed that the component of the force parallel
to the object’s velocity 4-vector is zero. That is exactly how to interpret
. All that being said, you should still be prepared to work
out the following proof if requested.
Combining equations (16), (17), and taking the dot product with (14) yields
.
.
.
.
So now, we are completely confident that . Next we substitute equations (14) and (16) into equation
(18) and we get a surprising result.
(19)
dividing by and rearranging, we
have
(20)
but we know from the classical physics, by nearly the very definition of kinetic energy, K.E., that
. (21)
Combining equations (20) and (21), and integrating yields
.
(22)
where Kconstant
is an arbitrary constant until we apply our initial conditions. We know that when the 3-velocity is 0, K.E. must be zero. When the velcocity
3-vector is zero, . Therefore,
(23)
So we have the relativistic equation for kinetic energy,
(24)
At this point we know we are on the right track. If we take
the ,
we find that it reduces to the classical
for v near 0. However, it is unfortunate and
disheartening that vast majority of sources attempting to derive Einstein’s
famous
equation stop here. Their argument is that the total
energy of the system is
, and since
reduces ˝ mv2 for v
near 0, the rest of the energy must be in the mass such that
. But no, no no. Hold on. We
have not proven (so far) that the total
energy equals
yet. The “claim” turns
out to be correct, as we shall soon show. But in the mean time, the only thing
we’ve derived so far deals with the kinetic energy. We haven’t derived anything regarding even the concept of total energy. We are not
finished yet. We haven’t even begun discussing matter-energy transitions yet.
So that is where we must head now.
Thus begins part II of our derivation. In the collective
opinion of the Shady Crypt, the most concise argument discussing mass-energy
transitions comes from Albert Einstein himself in his second 1905 paper discussing
relativity [2]. Although the rest of the proof shown below is based on
Einstein’s work, the terminology and details have been significantly modified
to better fit with the on a pub
serviette.
Imagine an apparatus, in an internal frame, containing an isotropically (same in all directions) radiating light source. For every ray of light, there is also a ray of light in the opposite direction.
A given pair of rays (opposite directions) correspond to a unit of energy E, when measured in the apparatus’s frame of reference. Each ray of a given pair contains energy of ˝ E. When measured in the apparatus’s frame of reference, the total energy of a given pair of rays is
(25)
Where Es is energy of the pair, as measured in the same frame of reference as the apparatus, which we call the stationary frame.
Now imagine a spaceship moving toward the apparatus on the x-axis, at speed v. The spaceship measures the energy of the same pair of light rays to be
(26)
where Em is
the energy of the pair, measured in the
moving frame, and is the relative angle
of the pair, as measured in the stationary frame, away from the axis of
the spaceship’s movement. In equation (31), the
terms come from the
Doppler effect. As the ship moves toward the light ray
heading in its direction, the wavefronts get
scrunched up, increasing the frequency of the ray. The other ray, moving in the
other direction away from the spaceship, the frequency is decreased. In both
terms, there is a
. The g is due to the fact that spaceship is moving, and
there is time dilation involved. The clocks in the spaceship are moving slower.
So the spaceship will measure more wavefronts per
unit time by its own clocks, than one would measure using the stationary
frame’s clocks. The difference in frequency due to time dilation remains when
adding up the energy of the two light rays.
It was known back in 1905 that the energy of a “light ray” is proportional to its frequency. This relationship was established by Albert Einstein himself in another 1905 paper [3], involving the photoelectric effect. (Max Planck technically derived the relationship, but it was Einstein who nailed its significance.) The principles described in this paper are not only of particular interest here, but (ironically) would become part and parcel to the foundation of quantum mechanics. But it’s not important to really know anything about quantum mechanics for the E = mc2 on a pub serviette derivation. Notice that I have not used the term photon at all in this derivation. The term light ray is sufficient for this exercise. For this derivation, all that is necessary to take away from Einstein’s photoelectric effect, 1905 paper, is that all else being equal, the energy of a light ray is proportional to its frequency.
So the stationary observer measures E
energy for a given pair of light rays, and the moving observer measures energy for the exact same
rays. But energy is energy. One can’t measure different values of energy in
different frames of reference or conservation of energy would be violated;
unless there is something else going on related to energy in the different
frames. And there is. In the stationary frame, the apparatus has no kinetic
energy relative to the stationary observer. But the apparatus does have kinetic
energy according to the observer in the moving frame. So the difference between
Em
and Es
must be related to the kinetic energy, K.E.,
of the apparatus.
.
which after factoring becomes
(27)
Keep in mind that we are not talking about the kinetic energy of the entire apparatus – only the kinetic energy of the mass associated with the pair of light rays that were measured*. Since the moving observer measures higher radiated energy, the associated component of that apparatus’s Kinetic energy must have been reduced by the same amount (conservation of energy). The only conclusion that follows from equation (32) is that if the apparatus gives off energy E, it’s mass must decrease by a corresponding amount as a result. Specifically, we obtain that amount by combining equations (27) and (24),
.
or simply,
.
(28)
*(It is a subtle yet important consideration that the angle be measured in the apparatus’ frame. If an object is radiating isotopically in its own frame, it is not radiating isotopically in other frames due a process called “relativistic beaming.” In our derivation, we are talking about the same two light rays regardless of what is measuring them. We could take relativistic beaming into account and integrate the apparatus’s energy in all directions. That would give the same result we obtain but be more complicated than what we want to show on our napkin. The approach we used here mirrors Einstein’s original work.)
References:
[1] On the Electrodynamics of Moving Bodies, A. Einstein, Annelen
der Physik,
[2] Does the Inertia of a Body Depend Upon It’s
Energy Content?,
A. Einstein,
[3] Concerning an Heuristic Point of View Toward the Emission and Transformation of Light, A.
Einstein, Annelen der Physik,
[4] Gravity, An
Introduction to Einstein’s General Relativity, James B. Hartle,
Pearson Education, Inc., 2003.