E = mc2   on a pub serviette

or,  E = mc2 on a bar napkin,  for all you American yahoos

NOTE: In some version of windows and some versions of Internet Explorer, there is a problem displaying the equation fonts correctly. We are tying to fix this ASAP, but in the mean time, this page is best viewed with Firefox.

Ever been haunting a pub on a cloudy night and some jerk starts an argument about relativity? Or maybe you want to impress the brainy banshee or nerdy nymph over sips of Jägermeister? Described here is a very concise derivation of Albert Einstein’s famous equation, E = mc2. It’s concise enough that it can be written right at the pub on a paper napkin (or a couple napkins if they’re small).

This page is inspired by the fact that it’s surprisingly difficult to find a good derivation of this equation. Even Einstein’s original 1905 papers don’t even technically have the equation E = mc2 within them – they get close, but never do they actually come out and say “E = mc2”. The corresponding equations in Einstein’s papers are actually quite ugly, leaving you to do some significant simplifying on your own if you wanted to end with the elegant E = mc2. Moreover, even fine relativity text books, of which I have no complaints about in any other regard, often stumble and fall short on this mass-energy relationship derivation. I find numerous university level text books that use circular arguments that don’t really hold weight, when it comes to the derivation of this famous relationship.

What we have done a the Shady Crypt, is used a number of sources, combined the respective material, and made extensive modifications ourselves, to form a derivation that is concise and simple enough to be written on two sides of a paper serviette (napkin). You will have to write small, but it is possible.

Any text below typed in red is an essential part, and should be written on the napkin. Of course, it’s important that you understand what it is you are writing. Your derivations won’t be worth much at the pub if they can’t stand up to questioning. So a generous amount of explanations are given, not typed in red, along with the essentials. Any text below that is not typed in red needn’t be written on the napkin.

# Derivation

Part I of our derivation starts with the postulate: The speed of light is always measured to be constant, i.e. the same value, regardless of the relative velocity of the measuring device. Not too long before Einstein wrote his original paper on what we now call “special relativity” [1], Albert Michelson and Edward Morley performed an experiment in an attempt to measure Earth’s velocity through the ether, and found that the speed of light is always measured to be the same, no matter what speed or direction the Earth is moving. One might think that this Michelson-Morley experiment is what primarily inspired Einstein to create special relativity. But it turns out not to be true.

In Einstein’s education, he studied electricity and magnetism (among other things).  He was troubled by a basic concept in electromagnetic theory. We at the Shady Crypt have also studied classical electrodynamics, and can attest that this particular concept doesn’t sit well with us either. The concept goes something like this: When calculating the magnetic force between two moving electrons (suppose they are moving together, side by side), the process is as follows:

(1)  The first moving electron creates a magnetic field, which can be described mathematically.

(2)  The second moving electron feels a force as it moves through the magnetic field created by the first electron.

(Switch which electron you start with to find the force felt by the other electron.)

So there are actual 3 velocities that we need to pay attention to: the respective velocities of the two electrons; and the velocity of the field, which in classical electrodynamics is always zero.

But if you think about it, this doesn’t really make any sense. If the two electrons are standing still, no magnetic force is felt by either electron. But if both electrons are next to each other, moving together at the same velocity and same direction, there is a force felt, because each electron is moving with respect to a zero-velocity magnetic field. But what constitutes zero velocity? The ground? Well, the ground is on the Earth, and the Earth is rotating and revolving around the Sun, and the whole solar system is moving with respect to other celestial objects, so what the heck isn’t moving? And who says that the electrons are the ones that are really moving in the first place? Since everything else is moving, maybe they are the ones standing still. Who’s to say? And it’s an important distinction, because if you choose a different “reference” of what is standing still, you get different answers when you calculate the magnetic forces. (The electric force tends to push the pair of electrons apart. When the pair is moving, the magnetic force tends to pull them together, but to a smaller extent. The strength of the magnetic force increases to the same level as the electric force, only when the electrons’ speed increases to the speed of light.)

It doesn’t quite end there. As a student struggles through the mathematics in electromagnetic field theory, one eventually comes to Maxwell’s equations. Maxwell’s equations are the “holy grail” of electromagnetic field theory. They consist of 4 separate, simply elegant equations, and together they can describe all aspects of classical electrometric field theory. But they still don’t answer the question of “what is it that isn’t moving?”  Using Maxwell’s equations, one can calculate the speed of light. But the speed of light relative to what? The Earth? The Sun? Something else? Who’s to say? Enter Albert Einstein.

Einstein surmised that it didn’t matter what velocity the instrumentation used to measure the speed of light. Every laboratory will come up with the same value, regardless of each laboratory’s relative velocity. But how could this be? If a laboratory in a train is moving at 99% the speed of light, almost keeping up with a ray of light just ahead of it, how could the train measure the speed of the ray to be moving away from the train at 100% the speed of light? The only answer is this: time is moving slower in the train, relative to the ground where the velocity of the train is being measured. (By the same respect, instruments in the train will actually measure time moving slower on the ground, which is also true. Both are equally valid measurements. While this may seem like a horrible paradox, it is easily resolved by analyzing a spacetime diagram or similar tool, but that’s outside the scope of what we want to write on the napkin.)

So let’s go back to our derivation. We create a mathematical system such that as something moves faster through space, it moves slower through time, relative to the laboratory that is measuring the velocities.  We will define two different measures (lengths) of time. The variable t is used to represent a length of time according to the clocks in the laboratory that is measuring the velocities of things. The variable  (the Greek letter tau) is used to indicate the amount of time that passes, according to the clocks in the moving object.  We call t regular time, and we call  proper time. We call the moving object the moving frame of reference, or moving frame for short. We call the laboratory that is measuring the velocities of things the inertial frame. It is assumed that the inertial frame is not accelerating.

In summary,

Inertial frame:      The [non-accelerating] laboratory keeping track of the various velocities of objects.

Moving frame:     Anything moving relative to the Inertial frame (may or may not be accelerating).

:                            regular time. A time interval measured by the clocks in the inertial frame.

:                           proper time. A time interval measured by the clocks in the moving frame.

The mathematical system can be created rather simply. In doing so we define Minkowski spacetime. During his education, Einstein was a student of Herman Minkowski. In Einstein’s original paper on special relativity [1], Einstein treated each direction, including the time direction, separately. They were interrelated, but still separate. He went so far as to combine the different equations into large matrices, but they were still cumbersome to say the least. A few years later, Minkowski, when giving lectures on relativity, began to introduce the concept of 4-dimensional spacetime. You, I and everything else rockets through spacetime with a magnitude equal to the speed of light – no faster, no slower. Because everything’s speed through spacetime is a constant, if something moves faster through space, it necessarily moves slower through time. The relationship of how velocities though space and time dimensions relate is something akin to the Pythagorean theorem, .  Minkowski spacetime gives the same results as Einstein’s methods, but is much more compact. We will use Minkowski spacetime on our serviette.

Simply treat time as any other dimension of space, and define some minor differences of the time dimension as described a little later. So to start, we have 4-dimensional spacetime. The different directions (as seen in the inertial frame) are ct, x, y, and z, where c is the speed of light. We multiply time by c, so that ct has units of length, just like the other directions.

Vector’s in 3-dimensions we denote with an arrow above the variable’s symbol, such as the 3-velocity. Four dimensional spacetime vectors, called 4-vectors, are denoted using a squiggly underneath, such as the position 4-vector. 4-vectors take the form

(1)

or in shorthand form,

(2)

Now we define the inner product, or dot product of two 4-vectors. For this definition, suppose we have two arbitrary 4-vectors,

,      ,               (3)

then,

(4)

where,

(5)

Now you might be saying, “whoa, this is getting too complicated.” But it’s not as bad as it looks. All as we’re saying here is that the dot product of a 4-vector is the same idea as the dot product of a 3-vector, except the time component gets a negative sign. In other words, when you take the dot product (inner product) of a 4-vector, you multiply the respective time components together, and give that a negative sign; multiply the x components together; then the y components; then the z components; and finally add all the subsequent results together.

As an example, suppose we have

,         ,

then the dot product of the two vectors is

.

Since each component is perpendicular to other components, we can also express 4-vectors in terms of differentials.

(6)

and

(7)

And now we define . Remember ?  is  the proper time, i.e. a time interval as measured by clocks in the moving frame. And  is essentially the differential length of a spacetime 4-vector, in units of time.

.                                                                  (8)

So far we have represented our spacetime 4-vectors in units of length. To accomplish this, regular time t was converted to units of length by multiplying it by c. But we could have just as easily kept everything in units of time, and divided x, y, and z by c. Had we done so,  is simply the imaginary length of that differential position 4-vector.

Of course what we’ve really been interested in all along, is the relationship between  and . Particularly, we are interested in determining . This represents the amount of time it takes in the inertial frame, for the clock in the moving frame to advance one unit of time. Combining equations (7) and (8) gives us,

(9)

Dividing by  produces

.

(10)

where  is the moving frame’s 3-velocity, relative to the inertial frame (it is understood that is the magnitude squared; a scalar quantity). Dividing by  gives us

(11)

taking the square root gives us

(12)

and after inverting we have

.                                                 (13)

This relationship is used so often, we give it a special designation,  (the Greek letter gamma).

It’s worthwhile to point out that so far, we really haven’t derived anything. All that we have done is created a mathematical system such that all observers measure the speed of light to be the same value, regardless of each observers relative velocity. We started out with the postulate that all observers get the same value when measuring the speed of light, and we have created a mathematical system that does just that. So far, that is all we have done.  Now we can move forward and get to the nitty-gritty of special relativity.

Next we define the velocity 4-vector, . The velocity 4-vector is the derivative of the position 4-vector, with respect to proper time, t.

.                                       .

It is a good idea here to point out a trick that we will use often. Note that, . Using this trick, the above equation reduces to

.

or more succinctly,

(14)

, the velocity 4-vector, represents the speed and direction that something is traveling through 4-dimensional spacetime. I’ll leave it as a exercise to you to calculate the speed (magnitude of the velocity), but I’ll tell you the answer. The speed that anything travels through spacetime is always the speed of light, regardless of the object’s 3-velocity (the answer comes out to c, but since we are only concerned with the magnitude [and not the phase], the speed reduces to c). So the important information that gives us is the object’s direction through spacetime, not forgetting that regular time is one possible direction.

Next we define the momentum 4-vector,. Just like the momentum 3-vector, the momentum 4-vector is simply the velocity vector multiplied times the mass, m.

(15)

If you continue to use and pursue special relativity beyond the napkin, you will find that is commonly called the energy-momentum 4-vector. This is because the time component is proportional to the total energy of the system. Of course, we haven’t derived that yet, so we’ll just keep on calling it the momentum 4-vector for now. But on a side note, I have actually witnessed  “derivations” that essentially stop about here and claim something to the effect of, “…since  is called the energy-momentum 4-vector, E is therefore equal to .” Ya gotta be kidding me. For shame.

Next we define the force 4-vector, . The force 4-vector is the derivative of the momentum 4-vector with respect to proper time, .

.

Using our trick for equation (14), this simplifies to

.

where dotted variables represent derivatives with respect to normal time , (not proper time, ). More compactly, this equation can be written as

(16)

where is the force 3-vector, the time derivative of  (with respect to normal time, ),

,                                        (17)

where  is the acceleration 3-vector. Notice here that we’ve had to modify the force 3-vector from Newton’s classical . This is because as an object approaches the speed of light, it becomes harder and harder to push it such that it goes even faster. We must abandon Isaac Newton’s  in favor of the relativistic version, . Einstein, in his original special relativity paper [1], used a mathematically equivalent, yet conceptually different terminology. Einstein kept Newton’s classical  relationship, and accounted for the relativistic effects by modifying the definition of mass. As a matter of fact, according to Einstein’s original paper, every moving object has 2 masses associated with it, longitudinal mass and transverse mass (terms also used by Max Abraham and Hendrik Lorentz, a few years prior). Yes: one object; two different masses. Many (most?) contemporary physicists have abandoned the concept of relativistic mass, and have instead accepted the relativistic modification of the force 3-vector. We shall do the same.  (The momentum 3-vector faces a similar modification, , but we don’t explicitly use the momentum 3-vector on our napkin, thus it’s not discussed here.)

It can be shown that

.                                                                          (18)

i.e., the dot product of the force and velocity 4-vectors is zero. Proof:

NOTE: This might be a fine time to pick up a second pub serviette (bar napkin) for the proof, and then come back to the original napkin later. If you can fit it on the original serviette, then fine. But it does take up precious space. But if you attempt to explain it away verbally rather than show the explicit proof, here is some advice that might help. It makes sense that . Stop and think about if for a moment. As we already discussed, you, I and everything else in the universe are rocketing through 4-dimentional spacetime with a speed (magnitude) being precisely the value of the speed of light; no faster, no slower. It’s not possible to change the magnitude of an object’s speed through 4-dimensional spacetime – only the direction. So if any 4-force is applied to an object, it is guaranteed that the component of the force parallel to the object’s velocity 4-vector is zero. That is exactly how to interpret . All that being said, you should still be prepared to work out the following proof if requested.

Combining equations (16), (17), and taking the dot product with (14) yields

.

.

.

.

So now, we are completely confident that . Next we substitute equations (14) and (16) into equation (18) and we get a surprising result.

(19)

dividing by  and rearranging, we have

(20)

but we know from the classical physics, by nearly the very definition of kinetic energy, K.E., that

.                                              (21)

Combining equations (20) and (21), and integrating yields

.

(22)

where Kconstant is an arbitrary constant until we apply our initial conditions. We know that when the 3-velocity is 0, K.E. must be zero. When the velcocity 3-vector is zero, . Therefore,

(23)

So we have the relativistic equation for kinetic energy,

(24)

At this point we know we are on the right track. If we take the Taylor series expansion of , we find that it reduces to the classical  for v near 0. However, it is unfortunate and disheartening that vast majority of sources attempting to derive Einstein’s famous equation stop here. Their argument is that the total energy of the system is , and since  reduces ˝ mv2 for v near 0, the rest of the energy must be in the mass such that . But no, no no. Hold on. We have not proven (so far) that the total energy equals  yet. The “claim” turns out to be correct, as we shall soon show. But in the mean time, the only thing we’ve derived so far deals with the kinetic energy. We haven’t derived anything regarding even the concept of total energy. We are not finished yet. We haven’t even begun discussing matter-energy transitions yet. So that is where we must head now.

Thus begins part II of our derivation. In the collective opinion of the Shady Crypt, the most concise argument discussing mass-energy transitions comes from Albert Einstein himself in his second 1905 paper  discussing relativity [2]. Although the rest of the proof shown below is based on Einstein’s work, the terminology and details have been significantly modified to better fit with the  on a pub serviette.

Imagine an apparatus, in an internal frame, containing an isotropically (same in all directions) radiating light source. For every ray of light, there is also a ray of light in the opposite direction.

A given pair of rays (opposite directions) correspond to a unit of energy E, when measured in the apparatus’s frame of reference. Each ray of a given pair contains energy of ˝ E. When measured in the apparatus’s frame of reference, the total energy of a given pair of rays is

(25)

Where Es is energy of the pair, as measured in the same frame of reference as the apparatus, which we call the stationary frame.

Now imagine a spaceship moving toward the apparatus on the x-axis, at speed v. The spaceship measures the energy of the same pair of light rays to be

(26)

where Em is the energy of the pair, measured in the moving frame, and  is the relative angle of the pair, as measured in the stationary frame, away from the axis of the spaceship’s movement. In equation (31), the  terms come from the Doppler effect. As the ship moves toward the light ray heading in its direction, the wavefronts get scrunched up, increasing the frequency of the ray. The other ray, moving in the other direction away from the spaceship, the frequency is decreased. In both terms, there is a . The g is due to the fact that spaceship is moving, and there is time dilation involved. The clocks in the spaceship are moving slower. So the spaceship will measure more wavefronts per unit time by its own clocks, than one would measure using the stationary frame’s clocks. The difference in frequency due to time dilation remains when adding up the energy of the two light rays.

It was known back in 1905 that the energy of a “light ray” is proportional to its frequency. This relationship was established by Albert Einstein himself in another 1905 paper [3], involving the photoelectric effect. (Max Planck technically derived the relationship, but it was Einstein who nailed its significance.) The principles described in this paper are not only of particular interest here, but (ironically) would become part and parcel to the foundation of quantum mechanics. But it’s not important to really know anything about quantum mechanics for the E = mc2 on a pub serviette derivation. Notice that I have not used the term photon at all in this derivation. The term light ray is sufficient for this exercise. For this derivation, all that is necessary to take away from Einstein’s photoelectric effect, 1905 paper, is that all else being equal, the energy of a light ray is proportional to its frequency.

So the stationary observer measures E energy for a given pair of light rays, and the moving observer measures  energy for the exact same rays. But energy is energy. One can’t measure different values of energy in different frames of reference or conservation of energy would be violated; unless there is something else going on related to energy in the different frames. And there is. In the stationary frame, the apparatus has no kinetic energy relative to the stationary observer. But the apparatus does have kinetic energy according to the observer in the moving frame. So the difference between Em and Es must be related to the kinetic energy, K.E., of the apparatus.

.

which after factoring becomes

(27)

Keep in mind that we are not talking about the kinetic energy of the entire apparatus – only the kinetic energy of the mass associated with the pair of light rays that were measured*. Since the moving observer measures higher radiated energy, the associated component of that apparatus’s Kinetic energy must have been reduced by the same amount (conservation of energy). The only conclusion that follows from equation (32) is that if the apparatus gives off energy E, it’s mass must decrease by a corresponding amount as a result. Specifically, we obtain that amount by combining equations (27) and (24),

.

or simply,

.                                                                                 (28)

*(It is a subtle yet important consideration that the angle be measured in the apparatus’ frame. If an object is radiating isotopically in its own frame, it is not radiating isotopically in other frames due a process called “relativistic beaming.” In our derivation, we are talking about the same two light rays regardless of what is measuring them. We could take relativistic beaming into account and integrate the apparatus’s energy in all directions. That would give the same result we obtain but be more complicated than what we want to show on our napkin. The approach we used here mirrors Einstein’s original work.)

References:

[1] On the Electrodynamics of Moving Bodies, A. Einstein, Annelen der Physik, June 30, 1905.

[2] Does the Inertia of a Body Depend Upon It’s Energy Content?, A. Einstein, Sept. 27, 1905.

[3] Concerning an Heuristic Point of View Toward the Emission and Transformation of Light, A. Einstein, Annelen der Physik, Mar. 18, 1905.

[4] Gravity, An Introduction to Einstein’s General Relativity, James B. Hartle, Pearson Education, Inc., 2003.

Home