# Jay's Blog

## What is the Axiom of Choice?

One of the easiest ways to start a (friendly) fight in a group of mathematicians is to bring up the axiom of choice. This axiom has a really interesting place in the foundations of mathematics, and I wanted to see if I can explain what it means and why it’s controversial. As a bonus, we’ll get some insight into what an axiom is and how to think about them, and about how we use math to think about the actual world.

The axiom seems pretty simple at first:

Axiom of Choice: Given a collection of (non-empty) sets, we can choose one element from each set.1

Most people find this principle pretty inoffensive, or even obviously right, on first contact. But it’s extremely controversial and produces strong emotions; and unusually for a mathematical debate, there’s essentially no hope of a clear resolution. And I want to try to explain why.

### Easy choices

One reason the axiom of choice can sound trivial is that there are a lot of superficially similar rules that are totally fine; the controversial bit is subtle. So here are a few things that don’t cause controversy:

• If we have one set, we can definitely pick an element from it. The axiom of choice says if we have a collection of sets, we can pick one element from each set simultaneously.
• But if we can pick an element from one set, can’t we pick an element from the first set, and then the second set, and then the third, etc.? Eventually we’ll pick an element from each set.

This works if we only have a finite collection of sets. So if I have five sets, I can pick one element from each set, by picking an element from the first set, then the second set, then the third, then the fourth, then the fifth. This is sometimes known as the axiom of finite choice. And no one argues about this.

But that approach doesn’t work if we have infinitely many sets.2 If we pick elements from one set at a time, we’ll never get to all the sets; there will still be infinitely many left. This infinitude of sets is where the real problem lies. (And things get worse if we have an uncountably infinite collection of sets, which is too many to even put in order!)

• Even if we have an infinite collection of sets, we might be able to pick an element from each set. If the sets have a nice enough pattern to them, we can give an explicit rule that lets us pick an element from each set consistently. For instance, if we have a bunch of sets of positive integers, we can always say something like “pick the smallest number in each set”.

But not every collection of sets allows a deterministic rule like this.3 The axiom of choice says that we can choose an element from each set, even if we can’t describe a rule for making that choice. If we have infinitely many pairs of shoes we don’t need the axiom of choice, since we can just take the left shoe from each pair; but if we have infinitely many pairs of socks, we do need the axiom of choice.4

### What’s the problem?

The axiom of choice has weird effects precisely because it is so unlimited. It tells us that given any infinite collection of infinite sets, we can pick one option from each set, even if the sets are too big to really understand, and even if we don’t have any extra structure to guide us.

We can see how this matters by looking at a classic logic puzzle, and then taking it to infinity.

##### The (finite) hat puzzle

Imagine a game show host5 is going to line you up with 99 other people, and give each of you a hat to wear, which is either black or white. You can see everyone in front of you, including the colors of their hats; you can’t see your own hat, nor can you see anyone behind you.

Starting at the back of the line, the host will ask each person to guess whether their own hat is black or white. You’ll be able to hear the guesses, and whether they’re right or wrong.

Before the game starts, you all get a few minutes to talk and plan out your strategy. What should you do to get as many correct guesses as possible?

Stop and take a minute to think about this one. It doesn’t require any fancy mathematics, just a cute trick that’s surprisingly useful in other contexts.

Drawing by nightmargin on Tumblr

As a hint, you can do really, really well. A simple approach that isn’t too bad is to have each odd-numbered person announce the color of the hat in front of them. This guarantees 50 right answers, and on average will get 75. But we can do much better than that.

The person in the back of the line (call them $A$) doesn’t have any information, so there’s no possible way to guarantee they’ll get it right. But we can make sure everyone else wins. $A$ can count up all the black hats in front of them and figure out if the number is even or odd. If it’s even, they’ll say “white”; if it’s odd, they’ll say “black”.

Now the second person $B$ now knows whether $A$ saw an even or odd number of black hats. But $B$ can count up all the black hats they see. If $A$ sees an even number of hats, but $B$ sees an odd number, that means $B$ must be wearing the remaining black hat.

The process continues down the line. $C$ can tell whether $A$ saw an even or odd number of black hats, and can also tell whether $B$ was wearing black or white. Between that information, and seeing all the hats in front of them, $C$ can figure out their own hat color.

(This sounds like it gets complicated very quickly, but we can streamline it. Count up all the black hats in front of you, and then add 1 to the number every time someone behind you says “black”. When the host reaches you, if the number is even you’re wearing a white hat, and if it’s odd you’re wearing a black hat.)

This exact algorithm is used by a lot of computer systems, especially when transmitting data over noisy connections. Computers store information in bytes, which are strings of eight bits. But often they will only use seven of the bits to store information (for instance, in standard ASCII encoding there are 128 possible characters, represented as a 7-bit number). In transmission, the eighth bit can be used as a parity bit, which will be 1 if the other digits include an even number of “1”s, and 0 if they include an odd number of “1”s.

Thus every byte should have an odd number of “1”s, and if any byte has an even number of “1”s the system knows it contains an error. In our solutions $A$ is effectively providing a parity bit for the string of hat colors, letting each player infer the information they don’t have: the color of their own hat.

##### The uncountable hat puzzle

That puzzle is fun, and the solution is clever, but there’s nothing especially paradoxical or brain-breaking about it. And it doesn’t involve the axiom of choice at all. But we can write a harder version that does use the axiom of choice, and has truly ridiculous results.6

Suppose the game host now gets an infinite line of people, so each person can see an infinite collection of people in front of them. (Let’s assume there is a first person in the line, so it’s not infinite in both directions; you have infinitely many people in front of you, but only finitely many behind.) And instead of black or white hats, we’ll write a random real number on each person’s hat: you could have 3 or 7, or $5.234$ or $\pi^e$ or $\Gamma(3.5^{7.2e^2})$. And just to make it harder, you can’t even hear what happens behind you.

This looks plainly impossible. No one who can see your hat can communicate with you at all. Even if they could, there are more possible hat labels than there are people in line. It seems like everyone working together wouldn’t be able to guarantee even one right answer. But if we can use the axiom of choice, we can guarantee that infinitely many people get the right answer—and even better, only finitely many people will get it wrong. In our endless infinite line, there will be a last wrong person; all the endless people in front of them will guess right.

How can this possibly work? First we’ll think about the set of all possible sequences7 of real numbers. (If we’re being fancy we might call this set $\mathbb{R}^{\mathbb{N}}$.) We’ll say that two sequences are equivalent if they’re only different in finitely many places. So the sequences $\Big( 1,2,3,4,5,6, \dots \Big)$ and $\Big( 17, 2000 \pi, -\frac{345}{e}, 4, 5, 6, \dots \Big)$ are equivalent, but $\Big( 1,0,3,0,5,0, \dots \Big)$ isn’t equivalent to either of them.

This gives us what’s called an equivalence relation on the set of real sequences. Equivalence relations are a widely useful tool, and I might write about them some other time, but for right now the important thing is that they partition the set, or subdivide it into smaller sets of things that are all equivalent to each other. Each thing will be in one and only one smaller set, which we call an equivalence class.

In our case, this means we’ve taken the set of all sequences of real numbers, and split it up into a bunch of equivalence classes of sequences. Every sequence belongs to exactly one equivalence class. And within each equivalence class, all the sequences are equivalent to each other—which means that they only have finitely many differences from each other.

Now we use the axiom of choice. We can choose one representative sequence from each equivalence class, and have everyone memorize this set of chosen sequences. When we all line up, I can see everyone in front of me, so there are only finitely many people I can’t see. There’s only one sequence on my list that can possibly be equivalent to this one.

Now when the host reaches me, I don’t know what’s happened behind me. I don’t know the exact sequence of hat labels. But I don’t need to! I know which equivlence class the sequence is in, and I know which representative sequence we chose for that equivalence class. So I can tell the host the number for my position from the representative sequence that we chose.

I might not be right; I have no way to know until the host tells me. But since we’re all using the same representative sequence that we chose earlier, and the sequence is only different from the “true” sequence finitely many times, an infinite number of us will get answer correctly. And only a finite number will fail.

### What does it do for us?

The hat puzzle is obviously a little contrived, but the axiom of choice has a lot of surprising and sometimes disconcerting implications that are relevant to other fields of math. Some of these consequences are apparent paradoxes; others are things we would very much like to be true, and make the axiom of choice extremely useful.

##### Zorn’s lemma

Zorn’s Lemma is probably the most common use of the axiom of choice, but it’s a little tricky to explain. The formal statement is short enough:

Zorn’s Lemma: Every non-empty partially ordered set in which every totally ordered subset has an upper bound contains at least one maximal element.

But it’s not super obvious what this means. The basic idea is that if we have some set where

• We can compare two elements and sometimes decide which one is “larger”;
• but sometimes neither element counts as “larger”;
• and we can never have an infinite collection of successively larger elements; any time we have an infinite collection of successively larger elements, there’s some other element bigger than all of them (thanks to Sniffnoy for the correction);

then there must be a “largest” element.8

This is surprisingly useful, for one very specific reason: we can build up solutions to our problems step by step, and have a guarantee that we’ll finish. This is a tool we want to use all the time in math. We even tried it earlier: if we have a collection of sets, we can choose an element from the first one, and then the second one, and then the third one….

The problem we ran into is that this will eventually let us choose one element from each of a thousand sets, or a million, or a billion. But we have no guarantee that we can “eventually” choose from each of an infinite, possibly uncountable, collection of sets. Zorn’s lemma solves this exact problem for us, and lets us extend these constructions to infinity. And often when we’re defining functions on an infinite set, that’s exactly what we want to do.

Zorn’s lemma has one more important consequence: it is equivalent to the axiom of choice. We can use the axiom of choice to prove Zorn’s lemma; but we can also use Zorn’s lemma to prove the axiom of choice (by extending the axiom of finite choice to infinity, in exactly the way we were just discussing). We can’t duck the axiom-of-choice question by just making Zorn’s lemma into an axiom; the two are a package deal. If we want the power of Zorn’s lemma, we’re stuck with the axiom of choice and all the weirdness it implies.

##### Well-ordering

The axiom of choice is obviously true, the well-ordering principle obviously false, and who can tell about Zorn’s lemma?

Jerry Bona

These equivalences are a recurring theme in discussions of the axiom of choice. Another non-obviously equivalent statement is the Well-Ordering Principle, which says we can put any set $X$ in a definite order, so that any subset has a “first” element. This is much stranger than it probably sounds. For instance, it’s really easy to put the real numbers in order, but most subsets won’t have a first element. (What’s the smallest real number? What’s the smallest positive real number? What’s the smallest number greater than 3?)

In fact, the fact that the usual order on the real numbers is not a well-ordering is a traditional source of internet math flame wars. There have been many forum threads and blog comment threads arguing endlessly about whether the infinitely repeating decimal $.\bar{9}$ is actually equal to $1$. (Yes, it is.)

Skeptics often suggest that maybe $.\bar{9}$ isn’t quite $1$, but just very close. Maybe it’s the last number before $1$, the biggest number smaller that $1$. But with the normal order for the reals, no such number exists. The reals are not well-ordered.

But with the axiom of choice, we can make up some other order for the real numbers, where every set has a first number. In fact, for any set, we can look at all the subsets and choose a first element for each once. We need to make sure that we do this consistently, but if we’re careful that’s not a problem, and so we can create a well-ordering on any set.

So what happens if we do this to the real numbers? There’s no real way to describe it—which is exactly why it requires the axiom of choice! You can make your favorite list of numbers and “choose” those to be first; the real difficulty is the need to make infinitely many choices. The axiom of choice lets us do this, but only in a totally non-explicit way that we can’t describe concretely.

But the most famous consequence of the axiom of choice, which probably deserves its own post, is the Banach-Tarski paradox. Banach-Tarski says that if we have a solid three-dimensional ball, we can split it into five non-overlapping sets, rearrange these sets without any stretching or bending, and finish with two balls, each identical to the original ball.9

That means we’ve doubled the volume of our stuff just by moving the pieces around, which seems, um, implausible. We definitely can’t do that with a real ball. But with the axiom of choice, we can define “pieces” of the ball that are so strange that they don’t really have sizes at all. If we put them together one way, we get one volume; if we put them together a different way, we get a different volume. But the components don’t have a well-defined volume, so this is logically consistent. (And thus not actually a paradox, despite the name!)

##### A bunch of other things

There’s a long list of statements that are equivalent to the axiom of choice. They show up in fields all over math, and algebra, analysis, and topology all become much simpler if these things are true:

• Every vector space has a basis
• A product of non-empty sets is non-empty
• Every set can be made into a group
• The product of compact topological spaces is compact
• Tarski’s theorem: If $A$ is an infinite set, there’s a bijection between $A$ and $A \times A$

Since these are all equivalences, we can prove axiom of choice with any one of them. If you believe any of these statements, you’re stuck believing all of them—and the axiom of choice as well, with all its bizarre ball-cloning hat-identifying implications.

### So…is it true?

Tarski…tried to publish his theorem (stated above) in the Comptes Rendus Acad. Sci. Paris but Fréchet and Lebesgue refused to present it. Fréchet wrote that an implication between two well known propositions is not a new result. Lebesgue wrote that an implication between two false propositions is of no interest. And Tarski said that after this misadventure he never tried to publish in the Comptes Rendus.

Jan Mycielski, A System of Axioms of Set Theory for the Rationalists

The big question is: should we believe any of these statements?

That might be a surprising question. Isn’t the whole point of math to have definitive, objectively correct answers? Either we can prove a result is true, or we can’t. We don’t generally ask whether we feel like believing a theorem. We proved it; we’re stuck with it.

But axioms are a little different. We need to decide on our axioms before we can prove things at all—or even decide what counts as a proof. Just like we can’t use a recipe to decide whether we want to make a cake or a cheeseburger, we can’t prove that an axiom is “correct”.

What we can do is look at a cake recipe, see what we’d have to do, and decide that maybe we don’t feel like making a cake after all. And we can look at what an axiom allows us to prove, and decide that maybe we don’t like those results and should pick some different axioms that don’t allow them.

##### The Zermelo-Fraenkel Axioms

The standard system of axioms we use in math is called Zermelo-Fraenkel Set Theory, or just ZF. These are the rules we use as the base for all our work. If we can use them to prove a statement, we say just it’s proven; if a statement contradicts the ZF axioms, we’ve disproven it.

If the axiom of choice contradicted ZF, then we could forget about it and move on with our lives. But in 1938 Kurt Gödel proved that this isn’t the case: you can have fully consistent systems that respect both the ZF axioms and the axiom of choice.

Similarly, if we could prove the axiom of choice from the ZF axioms, we would have to either accept it as true, or completely rework all the foundations of math10. But we can’t do that either. And this is more than just acknowledging that we haven’t proved it yet: in 1963 Paul Cohen invented a technique called forcing to prove that if ZF is consistent, then we can never prove the axiom of choice from the rest of the ZF axioms.

This combination of results feels a little weird, because it’s so different from the way we usually approach math. Math has a reputation for black-and-white thinking11: there’s a right answer to every question, and other answers are wrong. But here I’m telling you that there is no right answer. We can accept or reject the axiom of choice, and it works equally well either way.

##### Independence is normal

But this is actually perfectly normal! Suppose I asked you “are triangles isosceles?” The right answer isn’t “yes” or “no”: it depends on the triangle. And there are some theorems we can prove about isosceles triangles, like “if a triangle is isosceles, it has two equal angles”. And there are different theorems we can prove about non-isosceles triangles. The “axiom of isosceles-ness” is independent from the definition of a triangle.

But that might sound a little glib; no one talks about triangles like that. A better example is Euclidean geometry. When Euclid gave his formalization of geometry in Elements, he began with five axioms (or “postulates”, as you might have called them in high school geometry). The fifth (and final) postulate, called the parallel postulate, proved to be rather awkward.

Parallel postulate: There is at most one line that can be drawn parallel to another given one through an external point.12

This axiom is extremely important to geometry, but is much more complex and less self-evident than the other four axioms, which are statements like “all right angles are equal” and “we can draw a line connecting any two points”. Two millennia of mathematicians tried to remove this awkward complexity by proving the parallel postulate just from Euclid’s other axioms.

Then in the 1800s, we finally solved this problem—in the other direction. Euclidean geometry, including the parallel postulate, is completely consistent; but it’s also consistent to work with non-Euclidean geometries, in which the parallel postulate is false. Mathematicians constructed models of elliptic geometry, in which there are no parallel lines, and of hyperbolic geometry, in which parallel lines are not unique.

What is a model? It’s just something that obeys all the axioms. So the work we do in high school, with pencil and paper on a flat surface, is a model of Euclidean geometry. It follows all five axioms, and any theorem that follows from the Euclidean axioms will be true of our pencil-and-paper work.

But if we work on the surface of a sphere, we get a model of non-Euclidean elliptic geometry. We can define a line to be a great circle, a circle that goes fully around a sphere the long way. Any two points lie on exactly one great circle, so these “lines” obey Euclid’s first four axioms. But with a little bit of playing around, you can see that any pair of great circles will intersect in two points. This model doesn’t have any parallel lines at all.

The solid curves are great circles. The solid blue curve is the equator.
The dashed curves aren’t great circles, so they don’t count as lines.

We can also build models of hyperbolic geometries, but they’re a little harder to describe. But just one of these models is enough to know that we can’t prove the parallel postulate from Euclid’s other axioms—at least, not unless the other axioms are themselves contradictory. Nor can we disprove it. We have to decide if we want to use the parallel postulate.

This is exactly what Gödel and Cohen did for the axiom of choice. Gödel constructed a model of ZF set theory with choice; Cohen constructed a model of ZF set theory without choice. So we have to decide if we want to use the axiom of choice. And this brings us back to the same question: what are we trying to describe? Is the world we want to study a model of ZF with choice, or without?

##### How do we choose?

To decide if we should adopt an axiom, we need to know what our goals are, and what we’re trying to describe. Euclidean geometry is good for arranging furniture in my room, but it’s bad for planning long-range flights, for which the fact that we live on a sphere matters.

Plane flight paths don’t look like straight lines on a flat map.
On a sphere we see they really are the shortest, “straightest” path.

We should ask the same question about the axiom of choice: what are we trying to describe? Does the axiom of choice bring us closer to describing the world accurately, or farther away? Is the world we want to study a model of ZF with choice, or without?

The obvious answer is that the axiom of choice has absurd and unrealistic results. In the real world we can’t slice up one billiard ball and assemble the pieces into two billiard balls, or save infinitely many people in the hat puzzle. So if the axiom of choice says we can, it must not be describing the real world.

But this argument isn’t terribly persuasive, because every single thing about the uncountable hat puzzle is physically absurd. Even the setup is ridiculous: we can’t have an infinite line of people, and if we were somehow put in an infinite line, we wouldn’t be able to see all the people in it, let alone the numbers on their hats.

The step where we use the axiom of choice is even more unrealistic. We take the uncountably infinite set of real sequences; we partition it into an uncountably infinite collection of infinite sets of sequences; and then we ask everyone to memorize an (infinite!) sequence from each of these infinitely many infinite sets.

I’d have a hard time remembering one list of a hundred numbers. Memorizing a thousand lists of a thousand numbers is extremely unlikely; memorizing infinitely many lists of infinitely many numbers is flatly impossible. And that’s before we ask how we can communicate the lists we’ve chosen to each other, so that each of the (infinitely many) people memorize the same infinite collection of infinite lists.

The Banach-Tarski argument isn’t any better. It splits the ball into only five pieces,sure, but each of those pieces is infinitely complex, enough so that you can’t concretely describe their shapes, let alone actually cut a ball into those pieces. The informal explanation that “you can slice a ball into five pieces and reassemble those pieces into two balls” is not true, because there’s no real way to produce the pieces you need.13

In the real world we never see infinite sets. We pretend some sets are infinite because it makes our lives easier. But any principle that only kicks in at infinity will never make contact with the reality.

Einstein probably didn’t say this, but it’s a good line.

### Not as crazy as it seems

This might feel like it’s dodging the question, though. If infinity is fake, why should we use axioms that only matter for infinity? And if we are going to say things about infinity, shouldn’t they make sense?

Maybe it’s fine for a physicist to dismiss mathematical abstractions as unphysical and thus irrelevant. But math is about reasoning through the consequences of abstract hypotheticals! If we’re going to adopt an foundational principle like the axiom of choice, we should really mean that we believe it in every abstract hypothetical situation we’re going to apply it in.

But after we realize how infinity works, our absurd results look somewhat more reasonable.14 Our “successful” strategy in the infinite hat game actually doesn’t give us all that much. Sure, only finitely many people lose; some person in the line will be the last to answer wrong. But what would this look like in practice?

You could imagine the first hundred people all getting the question wrong. But that’s okay; only finitely many people will get it wrong. Then the first thousand people all get it wrong. But we know that at some point a last person will get it wrong and everyone left will get it right. A million people all get it wrong. Everyone gets bored. The game show host decides to leave. And sure enough, only finitely many people ever answered the question wrong!

The axiom of choice argument somehow doesn’t do anything after a finite number of answers. You could have the first million, or the first trillion, people all get the question wrong, and that wouldn’t contradict our proof. All the weirdness happens out at infinity—and we already know that infinity is deeply weird.

### What’s the point?

The axiom of choice is logically independent of our axioms for set theory, so we can’t ever prove it true or false. And it says deeply strange things about deeply strange situations that can never really happen. So why does it matter?

##### Infinity is fake but useful

The answer is the same as the reason we use infinity at all. Everything we’ve ever seen is finite and discrete: objects are made out of atoms, and even if space and time aren’t truly quantized, our ability to measure them definitely is. But it’s extremely convenient to pretend that reality is continuous, which allows us to solve problems with calculus and other clever math tricks. If the world is “close enough” to being continuous, our answers will be good enough for whatever we’re doing.

Any infinity we care about will come from a limit of finite things. I can measure the width of my office in meters, or centimeters, or millimeters. With the right equipment I could measure it in micrometers or nanometers. I can’t ever measure it with infinite precision, but I can imagine doing that. And it’s really convenient to say the width is a real number, rather than to insist that it must really be some integer number of picometers

This exact reasoning is basically how all of calculus works. If I want to know how fast my car is going in miles per hour, I can measure the distance it travels in miles over the course of an hour. Or I can see how many miles it goes in a minute, and multiply by sixty. I could measure the number of miles it goes in a second, and multiply by 3600 (or more realistically, measure the number of feet it goes in a second, and multiply by 3600/5280).

But what is the speed “right now”? We imagine taking measurements over these shorter and shorter intervals; in the limit, when our interval is “infinitely short”, we get the instantaneous velocity. And that’s a derivative, which is an extremely powerful tool for doing math and physics.

But we can’t actually measure the distance traveled in an infinitely small window of time. (Nor can we measure the infinitely small time itself.) We’re taking some real, physical, finite measurements. We can measure how far a car goes in one second, multiply by 3600/5280, and then display that number on the dashboard. But the infinite version is something we only imagine.

### Just relax

If we’re trying to model the world, any infinite set we have to deal with will be a limit of finite sets. And any infinite family of infinite sets will be a limit of finite families of finite sets. And we know we have choice for finite sets of finite sets. So we can always get choice for these specific infinite sets, if we really need it—just by taking the limit of the elements we chose from our finite families.

What the axiom of choice says is: don’t worry about it. You don’t have to explain how your family of sets came from a finite family. You don’t have to explain how you’re choosing elements. We’ll just assume you can make it work somehow.

That’s what axioms are for. They tell us what we want to just assume we can do, without really explaining how. Our axioms are a list of things we don’t want to have to think about. And in practice, we don’t have to think about whether we can make choices. Any time it really matters, we can.

1. We can be more formal by phrasing this in terms of choice functions: given a collection of sets $\mathcal{X} = \{A\}$ there is a function $f : \mathcal{X} \to \bigcup_{A \in \mathcal{X}} A$ such that $f(A) \in A$ for each $A \in \mathcal{X}$. But I want to keep the discussion as readable as possible if you’re not comfortable with the language of formal set theory.

2. Using this sort of process on an infinite set is called transfinite induction. If we allow transfinite induction then we get the axiom of choice for free. But the axiom of choice also implies that we can do transfinite induction; the two concepts are logically equivalent. Transfinite induction can sometimes allow us to make choices without the axiom, but only if we can put our sets in some order. Conversely, the axiom of choice allows us to use transfinite induction in cases we otherwise couldn’t.

Thanks to Sniffnoy for a helpful correction here.

3. The set of real numbers doesn’t have a smallest element or a largest element. Nor does the set of positive real numbers, or the set of numbers between zero and one. So if we have a colleciton of sets of real numbers, the rule we used for sets of positive integers doesn’t work.

4. This example was originally offered by Bertrand Russell.

5. The classic version of the puzzle features a sadistic prison warden. While that setup is traditional, it seems unnecessarily violent, so I’ve replaced it with something friendlier.

7. If you don’t know what a sequence is, just think of this as an infinite list.

8. Sometimes there can be more than one largest element, which is a little weird. But since some pairs of elements can’t be compared, you can have multiple elements that don’t have anything above them. Imagine a company with two presidents: each of them is the highest-ranking person at the company. And that’s why we say “a” largest element rather than “the” largest.

9. The more general result is: given any two three-dimensional objects $A$ and $B$, we can partition $A$ into a finite collection of sets, and then rearrange those sets to get precisely $B$. In the special case people usually quote, $A$ is “a ball” and $B$ is “two balls”.

10. We’ve actually done that before. At the beginning of the 20th century, Bertrand Russell and others found deep contradictions in the naive version of set theory in use at the time, and the ZF axioms were developed to avoid those problems. But we’d rather avoid doing it again.

11. I don’t like this reputation in any context. Mathematical thinking creates tons of space for nuance and subtlety and shades of grey. But that’s probably a different essay.

12. This version is more precisely known as Playfair’s axiom. Euclid’s phrasing (translated from Greek) was “if a straight line falling on two straight lines make the interior angles on the same side less than two right angles, the two straight lines, if produced indefinitely, meet on that side on which the angles are less than two right angles.” But Playfair’s axiom is much simpler to state, and the two statements are equivalent.

13. Feynman has a story about this in his memoir. A math grad student described the Banach-Tarski paradox to him, and he bet that it was made up, rather than a real theorem. He was able to wriggle out of losing by pointing out that the grad student had described cutting up an orange, and you can’t slice a physical object made up of atoms infinitely finely.

14. This is a common mathematical rhetorical trick. Earlier I was trying to convince you that the implications of the axiom of choice were really weird. Now I’m going to try to convince you that they’re perfectly reasonable. This exact two-step happens quite a lot in math exposition. I suspect this is due partially to the demands of pedagogy, and partly to the way we form our mathematical intuition.

Tags: math teaching models set theory axiom of choice philosophy of math