Jekyll2022-01-25T10:53:31-08:00https://jaydaigle.net/Jay DaigleJay Daigle is a professor of mathematics at The George Washington University in Washington, D.C. In addition to his research in number theory, he brings a mathematical style to thinking about philosophy, politics, social dynamics, and everyday life.Jay DaiglePascal’s Wager, Medicine, and the Limits of Formal Reasoning2021-11-28T00:00:00-08:002021-11-28T00:00:00-08:00https://jaydaigle.net/blog/pascalian-medicine<p>Scott Alexander at Astral Codex Ten has a good post recently thinking about what he calls <a href="https://astralcodexten.substack.com/p/pascalian-medicine">Pascalian Medicine</a>. As always the entire post is worth reading, but here’s an excerpt:</p>
<blockquote>
<p>Another way of looking at this is that I must think there’s a 25% chance Vitamin D works, and a 10% chance ivermectin does. Both substances are generally safe with few side effects. So (as many commenters brought up) there’s a <a href="https://en.wikipedia.org/wiki/Pascal%27s_wager">Pascal’s Wager</a> like argument that someone with COVID should take both. The downside is some mild inconvenience and cost (both drugs together probably cost $20 for a week-long course). The upside is a well-below-50% but still pretty substantial probability that they could save my life.</p>
</blockquote>
<blockquote>
<p>…</p>
</blockquote>
<blockquote>
<p>But why stop there? Sure, take twenty untested chemicals for COVID. But there are almost as many poorly-tested supplements that purport to treat depression. The cold! The flu! Diabetes! Some of these have known side effects, but others are about as safe as we can ever prove anything to be. Maybe we should be taking twenty untested supplements for every condition!</p>
</blockquote>
<p>Scott doesn’t seem to believe we should do this, but is trying to figure out the actual flaw in this reasoning. The most convincing argument he comes up with is based in how unreliable modern medical studies are, and how easy it is to generate spurious positive results.</p>
<blockquote>
<p>I think ivermectin doesn’t work. I think that it looks like it works, because it has lots of positive studies and a few big-name endorsements. But our current scientific method is so weak and error-prone that any chemical which gets raised to researchers’ attentions and studied in depth will get approximately this amount of positive results and buzz. Look through the thirty different chemicals featured on the sidebar of the ivmmeta site if you don’t believe me.</p>
</blockquote>
<blockquote>
<p>…</p>
</blockquote>
<blockquote>
<p>Probably what I’m doing wrong here is saying that ivermectin having some decent studies raises its probability of working to 5%. I should just say 0.1% or 0.01% or whatever my prior on a randomly-selected medication treating a randomly-selected disease is (higher than you’d think, based on the argument from antibiotics).</p>
</blockquote>
<blockquote>
<p>From the Outside View, this argument seems strong. From the Inside View, I have a lot of trouble looking at a bunch of studies apparently supporting a thing, and no contrary evidence against the thing besides my own skepticism, and saying there’s a less than 1% chance that thing is true.</p>
</blockquote>
<p>The <a href="https://www.lesswrong.com/tag/inside-outside-view">Outside View</a> argument here is <em>completely right</em>, and is a great illustration of the limitations of Bayesian reasoning that I talked about <a href="/blog/paradigms-and-priors/#anomalies-and-bayes">here</a> and <a href="https://jaydaigle.net/blog/overview-of-bayesian-inference/">here</a>.</p>
<h3 id="unknown-unknowns">Unknown Unknowns</h3>
<p>The basic argument for Pascalian medicine goes: okay, suppose ivermectin has a 10% chance of reducing covid mortality by 10%. About a thousand people are dying of covid every <del>week</del> day<strong title="I originally misread the CDC page and interpreted the weekly average of daily numbers as weekly numbers. I've edited the piece throughout to reflect the true numbers, but it doesn't change any of the conclusions, since the same error happened to every rate I discussed in the piece."><sup id="fnref:edit"><a href="#fn:edit" class="footnote">1</a></sup></strong> in the US <a href="https://www.cdc.gov/coronavirus/2019-ncov/covid-data/covidview/index.html">according to the CDC weekly tracker</a>, so the expected benefit of giving all our covid patients ivermectin is something like saving ten lives per day.<strong title="There would also be benefits from fewer people being hospitalized, fewer people suffering long-term health consequences, fewer people being miserable and bedridden for a week, etc. I'm going to talk about deaths pretty exclusively because it's easier to talk about just one number."><sup id="fnref:1"><a href="#fn:1" class="footnote">2</a></sup></strong></p>
<p>Even if you think the probability ivermectin works is only something like 1%, that still adds up to one life saved per day. Since ivermectin is cheap, and “generally safe with few side effects”, an expected value of “saves one life per day” looks pretty good! So maybe we should prescribe it out of an abundance of caution.<strong title="This is very different from claims that ivermectin is a miracle cure, and we should take that instead of getting vaccinated. Ivermectin is at best mildly beneficial; vaccines are safe and effective and you should get a booster shot if you haven't already. We're talking about whether the small possibility of a minor benefit from ivermectin makes it worth taking."><sup id="fnref:2"><a href="#fn:2" class="footnote">3</a></sup></strong></p>
<p>And then we make the same argument about, apparently, twenty other drugs, and we’re taking a crazy drug cocktail. (Scott calls this the Insanity Wolf position.) So it looks like something has gone wrong. But what?</p>
<p style="text-align: center;"><img src="/assets/blog/pascalian/insanity_wolf.jpeg" alt="Insanity Wolf meme: "TAKE EVERY MEDICATION ALL THE TIME BECOME INFINITELY HEALTHY, LIVE FOREVER"" /></p>
<p>We made a basic, common error that really isn’t fully avoidable: we took a bunch of stuff we can’t measure, and decided it didn’t matter. “Generally safe with few side effects” isn’t the same as “perfectly safe”, and “cheap” isn’t the same as “free”. And something like ninety thousand people get covid in the US every day; to save that one life we’re probably giving drugs to tens of thousands of people. How confident are we that our drugs won’t hurt any of them? Especially if we give an Insanity Wolf-style twenty-drug cocktail?</p>
<p>Scott discusses this idea, of course. But I think he seriously underestimates the problem of unknown unknowns here. For well-understood drugs with large probable benefits, the unknown unknowns don’t matter very much. But for long-shot possible payoffs, like with ivermectin, unknown unknowns present a real, unavoidable problem. And the theoretically, mathematically correct response is to throw up our hands and take the Outside View instead.</p>
<h3 id="three-example-drugs">Three Example Drugs</h3>
<p>I want to take a look at three different drugs and do some illustrative calculations for the possible risks and benefits.</p>
<h5 id="paxlovid">Paxlovid</h5>
<p>There are always unknown unknowns, but in many cases we can put bounds on how good, or bad, things can be. <a href="https://en.wikipedia.org/wiki/PF-07321332">Paxlovid</a>, Pfizer’s new antiviral pill, provides a good example of this reasoning. In trials, Paxlovid <a href="https://www.pfizer.com/news/press-release/press-release-detail/pfizers-novel-covid-19-oral-antiviral-treatment-candidate">cut covid hospitalizations and deaths by about 90%</a>.<strong title="These numbers are reported a little weirdly. Looking at the study, it seems like Paxlovid cut hospitalizations by 85%, from 41/612 to 6/607; it cut deaths by 100% from 10/612 to 0/607. I think the 90% figure is the extent to which it cut (hospitalizations plus deaths), since that math checks out, but that's a slightly weird metric to judge by."><sup id="fnref:3"><a href="#fn:3" class="footnote">4</a></sup></strong> Let’s assume that’s a wildly optimistic overestimate, and give it a 50% chance of cutting deaths by 50%. Then in expectation that’s going to save a couple hundred lives each day.</p>
<p>What are the risks? This is a new drug so it’s hard to know what they are; all we know is that (1) Pfizer didn’t expect the side effects to be too bad, based on prior knowledge of this drug class, and (2) they didn’t notice anything too dramatic in the trial they ran. That doesn’t tell us how bad the side effects are, but it does put limits on them: if Paxlovid killed 1% of the people who took it, we’d know.</p>
<p>But suppose Paxlovid kills .1% of everyone who takes it. That’s about as high as it could go without us probably having noticed already, since the trial administered it to about 600 people and none of them died. (And realistically if it killed .1% of people, way more than that would have severe side effects and we probably would have noticed.) If we give Paxlovid to everyone in the US who gets covid, that’s about 90,000 people a day, and Paxlovid would kill 90 people a day. And that’s less than the couple hundred lives it would save.</p>
<p>Now, all of these numbers are <em>extremely handwavy</em>. But I chose them to make Paxlovid looks as bad as reasonably possible, and it still comes out looking pretty good. My estimate of the benefit of Paxlovid was a huge lowball; it’s probably going to save closer to 800 lives in a day than 200 if we manage to give it to everybody. And on the other hand, I’d be shocked if it’s anywhere <em>near</em> as dangerous as I assumed in the last paragraph. Sure, there’s some minuscule chance that it it’s really really dangerous but only several years after you take it, but since that’s not how these drugs usually work we can round that off to zero.</p>
<p>The benefit of Paxlovid is large enough that it outweighs any vaguely reasonable estimate of the costs. And we don’t need any especially fancy calculations to see that.</p>
<p style="text-align: center;"><img src="https://imgs.xkcd.com/comics/statistics.png" alt="https://xkcd.com/2400 Statistics. "Statistics tip: always try to get data that's good enough that you don't need to do statistics on it."" /></p>
<p style="text-align: center"><em>We could make basically the same argument about vaccines, except the worst plausible numbers look even better than for Paxlovid.</em></p>
<h5 id="tylenol">Tylenol</h5>
<p>We can run a similar analysis with common every-day drugs like Tylenol. Scott observes that “We don’t fret over the unknown unknowns of Benadryl or Tylenol or whatever, even though we know their benefits are minor.” But by the same token, we also are reasonably confident that the unknown unknown costs of those drugs are minor. If Tylenol killed .1% of patients who took it, or even .01%, <em>we would know</em>. (And in fact we know Tylenol can cause liver damage, and that is a thing we very much do fret over.) Sure, unknown harms always could exist. But in this case we can be pretty confident that they have to be really small.</p>
<p>Apparently a new potentially deadly side effect of Tylenol was discovered in 2013. If I’m reading the FDA report correctly, they belive that <a href="https://www.fda.gov/drugs/drug-safety-and-availability/fda-drug-safety-communication-fda-warns-rare-serious-skin-reactions-pain-relieverfever-reducer">one person has died</a> from this side effect since 1969. That’s the scale of side effect that can slip under the radar for a drug as widely taken and studied as Tylenol.</p>
<p>Tylenol could have unknown unknowns, but they won’t be <em>very</em> unknown.</p>
<h5 id="back-to-ivermectin">Back to Ivermectin</h5>
<p>Now compare this with the ivermectin situation. Let’s suppose we give ivermectin a 10% chance of being effective, with a benefit of reducing deaths by 20%. (The Together trial has a non-significant effect of about 10%, so let’s double that.) Then in expectation we’re saving like 2% of lives a day, which is 20 lives saved if we give it to everyone.</p>
<p>How many people would ivermectin have to kill to net out negative? If we give it to 90,000 people every day, then 20 is about .02%. So does ivermectin kill about .02% of the people who take it? My guess is, probably not. But that seems a lot more within the realm of “maybe, it’s hard to be sure”.</p>
<p>We also reach the point where a lot of our ass-pull assumptions start to really matter. We said “maybe ivermectin has a 10% chance of working”. Scott’s the expert, not me, but that seems high to me. (Do you really think that one in ten drugs that have vague but mildly-promising data in preliminary trials pan out?) If we say ivermectin has a 1% chance of reducing deaths by 20%, then our expected value is two lives per day.</p>
<p>This could still pencil out as a good trade, but with benefits so small (and uncertain) it could easily not be worth it. Especially if we account for the guaranteed annoyance of taking a pill and the common minor side effects we know ivermectin has.</p>
<h3 id="the-problem-with-made-up-numbers">The Problem with Made-Up Numbers</h3>
<p>But the larger point here is that <em>all this math is bullshit</em>. Are the odds of ivermectin working 10%? 1%? .01%? Where did that number come from? What do we mean by “working”—is it a 5% improvement? A 50% improvement?<strong title="There are systematic ways of estimating this, but they would all require numbers for "how inflated do you expect non-significant effect sizes in published studies to be?" If you spend a lot of time with the medical literature you might have a number to put here; I don't."><sup id="fnref:4"><a href="#fn:4" class="footnote">5</a></sup></strong> And at the same time, I don’t have real odds for “negative side effects”, which covers a lot of ground. (Scott himself points out that the odds of ivermectin unexpectedly killing you are definitely not zero.) And all this is the simple version of the calculation, where we don’t try to weigh things like “fever from covid might last one day less?” versus “ivermectin can cause fever?”</p>
<p>Scott argued many years ago that <a href="https://slatestarcodex.com/2013/05/02/if-its-worth-doing-its-worth-doing-with-made-up-statistics/">if it’s worth doing, it’s worth doing with made-up statistics</a>. And I don’t really disagree with that essay. Doing experimental calculations with made-up numbers can give us information, and I certainly think the analysis of Paxlovid that I did above tells us something useful. But to learn anything from these calculations, we need our made-up numbers to at least vaguely reflect reality.</p>
<p>Scott wrote:</p>
<blockquote>
<p>Remember the <a href="http://yudkowsky.net/rational/bayes">Bayes mammogram problem</a>? The correct answer is 7.8%; most doctors (and others) intuitively feel like the answer should be about 80%. So doctors – who are specifically trained in having good intuitive judgment about diseases – are wrong by an order of magnitude….But suppose some doctor’s internet is down (you have NO IDEA how much doctors secretly rely on the Internet) and she can’t remember the prevalence of breast cancer. If the doctor thinks her guess will be off by less than an order of magnitude, then making up a number and plugging it into Bayes will be more accurate than just using a gut feeling about how likely the test is to work.</p>
</blockquote>
<p>And this is right, but the caveat at the end is critical. If you have a good estimate of the prevalence of breast cancer, and a bad estimate of the chance of a false positive, then you can use the first number to get a better estimate of the second. But if you have a really good idea of the false positive rate (maybe you’ve seen thousands of positive results and learned which ones turned out to be false positives), but a shaky idea of the prevalence of breast cancer (hell, I have no idea how likely some lump is to be cancerous), you’ll be better off going with your intuition for how accurate the test is—and using that to estimate breast cancer prevalence!</p>
<p>Scott says that “varying the value of the “unknown unknowns” term until it says whatever justifies our pre-existing intuitions is the coward’s way out.” And this is one of the rare cases where I think he’s completely, unequivocally wrong. This isn’t the coward’s way out; it’s the only thing we can possibly do.</p>
<h3 id="reflective-equilibrium">Reflective Equilibrium</h3>
<p>If you find a convincing argument that generates an unlikely conclusion, you can accept the unlikely conclusion, you can decide that the premises of the argument were flawed, <em>or</em> you can decide the argument itself doesn’t work. If I collect some data, do some statistics, and calculate that taking Tylenol will cut my lifespan by thirty years, I don’t immediately throw away all my Tylenol—I look for where I screwed up my math. And that’s the correct, and rational, response.</p>
<p>If you think A is true and B is false, and find an argument that A implies B, you have three choices: you can decide A is false after all; you can decide B is true after all; or you can decide that the argument actually isn’t valid. Or you can adopt some probabilistic combination: it’s perfectly consistent to believe A is 60% likely to be true, B 60% likely to be false, and the argument 60% likely to be correct. But fundamentally you have to make a choice about which of the three pieces to adjust, and by how much.<strong title="David Chapman calls this [meta-rational reasoning](https://twitter.com/Meaningness/status/1463632030059544576). I see where he's coming from but think that's an unnecessarily complex and provocative way of talking about it."><sup id="fnref:5"><a href="#fn:5" class="footnote">6</a></sup></strong></p>
<p style="text-align: center;"><img src="/assets/blog/pascalian/two-answers.jpg" alt="Picture of kitten raising two paws: "i has two ansers. which you want?"" /></p>
<p>In the case of ivermectin, we have some data from some studies. We have an Inside View argument that, based on expected values computed from that data, taking ivermectin is probably worth it. And we have the Outside View argument that taking random long-shot drugs is not a great idea. And we have to reconcile these somehow.</p>
<p>First, we could reject, or disbelieve, the data. And we totally did that: a bunch of ivermectin studies are fraudulent or incompetent, and Scott <a href="https://astralcodexten.substack.com/p/ivermectin-much-more-than-you-wanted">argues pretty convincingly</a> that some of the honest, competent studies are really picking up the benefits of killing off intestinal parasites. But even after doing that, we’re left with the Pascalian argument: ivermectin probably doesn’t work, but it might, and the costs of taking it are low, so we might as well. Do we listen to that argument, or to our gut belief that this can’t be a good idea?</p>
<p>A common trap that smart, math-oriented people fall into is thinking that the argument with numbers and calculations must be the better one. The Inside View argument did some math, and multiplied some percentages, and came up with an expected value; the Outside View argument comes from a fuzzy intuitive sense that medicine Doesn’t Work That Way. So the mathy argument should win out.</p>
<p style="text-align: center;"><img src="/assets/blog/pascalian/peanuts-opinion.gif" alt="Peanuts comic. "How are you doing in school these days, Charlie Brown?"
"Oh, fairly well, I guess...I'm having most of my trouble in arithmetic.."
"I should think you'd like arithmetic...it's a very precise subject.."
"That's just the trouble. I'm at my best in something where the answers are mostly a matter of opinion!"" width="100%" /></p>
<p>But in this case, we were doing calculations with numbers that were, you might remember, completely made up. Sure, the Outside View argument reflects a fuzzy intuitive sense of whether a random potential cure is likely to help us. The Inside View argument, on the other hand, reflects a fuzzy intuitive sense of whether Ivermectin is likely to protect us from covid.</p>
<p>The only real difference is that we took the second fuzzy intuition, put a fuzzy number on it, and plugged it into some cost-benefit analysis formulas. And no matter what fancy formulas we use, they can never make our starting numbers <em>less</em> fuzzy. Given the choice between a fuzzy intuition, and an equally fuzzy intuition that we’ve done math to, I’m inclined to trust the first one. With fewer steps, there are fewer ways to screw up.</p>
<h3 id="finding-the-error">Finding the Error</h3>
<p>At this point I think we’ve reached roughly Scott’s position at the end of his essay. The Outside View argument is winning out in practice, but we haven’t articulated any specific problems with the Inside View argument. And this is uncomfortable, because <em>they can’t both be right</em>. We can say it’s more likely we screwed up the more complicated, mathier argument. But <em>how</em> did we screw it up?</p>
<p>And on reflection, the answer is that we’re confusing two different arguments. I think that “Sure, go ahead and take ivermectin, it probably won’t help but it might, and it probably won’t hurt either” is a pretty reasonable position, and was even more reasonable six months ago, when we knew less than we do now.<strong title="Again, "Ivermectin is a miracle cure, take that instead of getting vaccinated" is, in fact, a completely and totally nonsense position. And many public "ivermectin advocates" are saying that, and they are wrong. But that's not what we're talking about here."><sup id="fnref:6"><a href="#fn:6" class="footnote">7</a></sup></strong></p>
<p>I know a bunch of people who take Vitamin C, even though it’s not clear that accomplishes anything. I myself flip-flop between taking a multivitamin because it seems like it might make me healthier, and not-taking a multivitamin because there’s no real evidence that it does. Taking Ivermectin it case it’s helpful doesn’t really seem that different.</p>
<p>No, the crazy position is when we go full Insanity Wolf and take twenty different long-shot cures at once. <em>That</em> was the conclusion that seemed like it couldn’t possibly hold up, at least for me. And that’s <em>also</em> the point where it really does seem like the unknown unknowns start piling up. There are twenty different drugs that could all possibly cause negative side effects. There are 190 potential two-drug interactions and over a thousand potential three-drug interactions, and even if interactions are, in Scott’s words, “rarer than laypeople think”, that seems like a lot of room for something weird to happen.</p>
<p>So this is how we screwed up. We said these drugs are cheap and generally safe. But in order to make our math reasonable, we rounded “generally safe” down to “safe”, and ignored the risks entirely. As long as the risks are small enough, that works fine; but at some point we cross the threshold we can’t just ignore all the downsides when doing our calculations.</p>
<p>Is taking twenty drugs over that threshold? I don’t know, but it seems likely. Taking that many drugs <em>probably</em> won’t hurt you, but it might! And it will definitely be expensive and annoying, and a lot of those drugs have common mild-but-unpleasant side effects. And the potential benefits are relatively small, and relatively unlikely; it’s easy for them to be swamped by all these downsides.</p>
<p>But now we’re talking about the interaction of hundreds of numbers that are both small and uncertain. We can’t get away with ignoring the risks, but we can’t realistically quantify them either. All we can do is make some half-assed guesses, and our conclusions will change a lot depending on exactly which guesses we make. So we <a href="https://twitter.com/ProfJayDaigle/status/1463598150585888775">can’t do a useful Inside View calculation at all</a>. Instead we’re basically forced to rely on the Outside View argument: taking twenty pills every day that probably don’t even work seems kinda dumb.</p>
<p>But then why take ivermectin specifically, rather than Vitamin D or curcumin or some other possible treatment? I dunno. You’re buying a long-shot lottery ticket. Pick your favorite number and hope it pays out.</p>
<h3 id="the-takeaway">The Takeaway</h3>
<p>A back-of-the-envelope cost-benefit analysis tells us that taking ivermectin for covid might have positive expected value. If we follow that logic to its conclusion, we wind up taking twenty different supplements and this seems like it can’t be wise.</p>
<p>A blinkered view of rationality tells us to ignore our intuition and follow the math. A more expansive view realizes that if the numbers we’re plugging into our cost-benefit analysis are shakier than that intuition, then we should take the intuition seriously. Cost-benefit analyses and other “mathematically rational” are only as good as the numbers and arguments that we bring to them.</p>
<p>But even with shaky numbers, we can learn things from comparing our intuitions with the result of our calculations. Figuring out <em>why</em> we get two different answers can teach us a lot about our reasoning, and help us figure out where we went wrong. Taking the full Insanity Wolf cocktail really seems qualitatively different from picking your favorite long-shot drug, but the way we set up our math hid that from us.</p>
<p>Finally: please get vaccinated, and get your booster shot. And if you have a choice between Paxlovid and ivermectin, you should probably take the Paxlovid.</p>
<hr />
<p><em>Questions about cost-benefit analysis, or where the math breaks down? Do you know something I missed? Tweet me <a href="https://twitter.com/profjaydaigle">@ProfJayDaigle</a> or leave a comment below.</em></p>
<div class="footnotes">
<ol>
<li id="fn:edit">
<p>I originally misread the CDC page and interpreted the weekly average of daily numbers as weekly numbers. I’ve edited the piece throughout to reflect the true numbers, but it doesn’t change any of the conclusions, since the same error happened to every rate I discussed in the piece. <a href="#fnref:edit" class="reversefootnote">↩</a></p>
</li>
<li id="fn:1">
<p>There would also be benefits from fewer people being hospitalized, fewer people suffering long-term health consequences, fewer people being miserable and bedridden for a week, etc. I’m going to talk about deaths pretty exclusively because it’s easier to talk about just one number. <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
<li id="fn:2">
<p>This is very different from claims that ivermectin is a miracle cure, and we should take that instead of getting vaccinated. Ivermectin is at best mildly beneficial; vaccines are safe and effective and you should get a booster shot if you haven’t already. We’re talking about whether the small possibility of a minor benefit from ivermectin makes it worth taking. <a href="#fnref:2" class="reversefootnote">↩</a></p>
</li>
<li id="fn:3">
<p>These numbers are reported a little weirdly. Looking at the study, it seems like Paxlovid cut hospitalizations by 85%, from 41/612 to 6/607; it cut deaths by 100% from 10/612 to 0/607. I think the 90% figure is the extent to which it cut (hospitalizations plus deaths), since that math checks out, but that’s a slightly weird metric to judge by. <a href="#fnref:3" class="reversefootnote">↩</a></p>
</li>
<li id="fn:4">
<p>There are systematic ways of estimating this, but they would all require numbers for “how inflated do you expect non-significant effect sizes in published studies to be?” If you spend a lot of time with the medical literature you might have a number to put here; I don’t. <a href="#fnref:4" class="reversefootnote">↩</a></p>
</li>
<li id="fn:5">
<p>David Chapman calls this <a href="https://twitter.com/Meaningness/status/1463632030059544576">meta-rational reasoning</a>. I see where he’s coming from but think that’s an unnecessarily complex and provocative way of talking about it. <a href="#fnref:5" class="reversefootnote">↩</a></p>
</li>
<li id="fn:6">
<p>Again, "Ivermectin is a miracle cure, take that instead of getting vaccinated" is, in fact, a completely and totally nonsense position. And many public "ivermectin advocates" are saying that, and they are wrong. But that’s not what we’re talking about here. <a href="#fnref:6" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>Jay DaigleA back-of-the-envelope cost-benefit analysis tells us that taking ivermectin for covid might have positive expected value. If we follow that logic to its conclusion, we wind up taking twenty different supplements and this seems like it can't be wise. Resolving this apparent conflict exposes some of the deep flaws in how we often think about rationality and Bayesian reasoning. A response to a piece by Scott Alexander at Astral Codex Ten.More Thoughts on the Axiom of Choice2021-07-28T00:00:00-07:002021-07-28T00:00:00-07:00https://jaydaigle.net/blog/more-on-the-axiom-of-choice<p>I got a lot of good, interesting comments on my recent <a href="https://jaydaigle.net/blog/what-is-the-axiom-of-choice/">post on the axiom of choice</a> (both on the post itself, and in this <a href="https://news.ycombinator.com/item?id=27836406">very good Hacker News thread</a>). I wanted to answer some common questions and share the most interesting thing I learned.</p>
<h3 id="cant-we-just-pick-at-random">Can’t we just pick at random?</h3>
<p>A lot of people asked why we can’t just avoid the whole problem of the axiom of choice by picking set elements randomly. Because obviously we can just make a bunch of random choices, right? If there’s no limit to what the choices have to look like then there’s no problem.</p>
<p>If you believe that, then you believe the axiom of choice. “We can pick some element from each set, without being fussy about which one we get” is just what the axiom of choice says. And that’s fine. A lot of people believe the axiom of choice! But it’s not an alternative to the axiom of choice; it is the axiom of choice.</p>
<p>The fact that this “just pick at random” idea seems so facially compelling, or “obvious”, is a big part of why many mathematicians want to accept the axiom of choice. It just seems like we should be able to make a bunch of
choices at once, if we’re not picky about which choices we make. It’s only when they are shown the really bizarre implications of getting to make those choices that most people start questioning whether the axiom makes sense.</p>
<h3 id="why-do-we-want-to-believe-the-axiom-of-choice">Why do we want to believe the axiom of choice?</h3>
<p>Another recurring question asked why we <em>should</em> want to believe the axiom of choice. It has a lot of bizarre consequences. In the last post I argued that those consequences aren’t as troubling as they seem, but they’re still weird. Why can’t we just dumpster the axiom of choice and avoid all of them?</p>
<p>One reason is the intuitive plausibility of the “just pick at random” idea. The goal of an axiomatic system is to formalize our list of “basic moves we should be able to make”. The ZF axioms include things like the <a href="https://en.wikipedia.org/wiki/Axiom_of_extensionality">axiom of extensionality</a>, which says that two sets are equal if they have the same elements, and the <a href="https://en.wikipedia.org/wiki/Axiom_of_pairing">axiom of pairing</a>, which says that if \(A\) and \(B\) are sets then we can talk about the set \( {A,B } \). These aren’t weird exotic ideas. They’re just things we should be able to do with collections of things. They’re part of the intuition that the word “set” is trying to formalize.</p>
<p>You could see the axiom of choice as something like this—something in our basic, intuitive understanding of what a “set” is, that pre-exists formal definitions. It’s pretty easy to convince people that “choose an element from each set” is a reasonable thing to be able to do. The only problem is that it leads to absurd results like Banach-Tarski or the solution to the Infinite Hats puzzle. But if we satisfy ourselves that those absurdities aren’t a real problem, we return to “this seems like a thing we should be able to do”.</p>
<h3 id="but-really-why-do-we-want-to-believe-the-axiom-of-choice">But really, why do we <em>want</em> to believe the axiom of choice?</h3>
<p>On the other hand, that’s not a very strong reason to really care about the axiom of choice. At best, that leaves us at “why shouldn’t we, it doesn’t hurt anything”, which could just as easily be “why should we, it doesn’t help?” We <em>care</em> about the axiom of choice, and put up with the peripheral weirdness, because it lets us prove a <a href="https://en.wikipedia.org/wiki/Axiom_of_choice#Weaker_forms">variety of other results we care about</a>. These include:</p>
<ul>
<li>Every Hilbert space has an orthonormal basis (so we can put coordinates on function spaces);</li>
<li>Every field has an algebraic closure (very important in number theory—in my research I often wanted to talk about “the algebraic closure” of some large field, and that implicitly relies on the axiom of choice);</li>
<li>The union of countably many countable sets is countable;</li>
<li><a href="https://en.wikipedia.org/wiki/Hahn%E2%80%93Banach_theorem">The Hahn-Banach theorem</a> (lets us extend linear functionals and guarantees that dual spaces are “interesting”);</li>
<li><a href="https://en.wikipedia.org/wiki/G%C3%B6del's_completeness_theorem">Gödel’s completeness theorem</a> for first-order logic;</li>
<li><a href="https://en.wikipedia.org/wiki/Baire_category_theorem">The Baire category theorem</a>, which I don’t even want to try to summarize but which shows up constantly in functional analysis.</li>
</ul>
<p>All of these results are really useful in their respective fields, and we need the axiom of choice to prove them. And that’s a true “need”: these are all provable from ZFC but not from ZF.</p>
<p>These statements aren’t equivalent to the axiom of choice. If we wanted, we could take the above list as a list of new <em>axioms</em> to attach to ZF, and then we wouldn’t be stuck with choice. But that is a really strange and ad-hoc list of foundational axioms. It feels much better to take the one axiom—the axiom of choice, which is reasonably foundational and sounds plausible enough on its own—and get all these consequences for free.</p>
<h3 id="shoenfields-theorem-you-only-need-the-axiom-of-choice-for-weird-things">Shoenfield’s Theorem: You only need the axiom of choice for weird things</h3>
<p>But the coolest thing I learned about after writing the last post is <a href="https://en.wikipedia.org/wiki/Absoluteness#Shoenfield's_absoluteness_theorem">Shoenfield’s Absoluteness Theorem</a>. The statement of this theorem is pretty dense and I don’t think I completely understand it, but it has really nice implications for the axiom of choice.</p>
<p>In the last post I said that the axiom of choice just doesn’t cause problems as long as we’re not getting too far away from finite sets. This applies even to half the results in the previous section.</p>
<ul>
<li>We need the axiom of choice to show that <em>every</em> field has an algebraic closure, but not to show that the rationals do.</li>
<li>We need the axiom of choice to show that <em>every</em> Hilbert space has an orthonormal basis, but not to show that Fourier theory gives an orthonormal basis for \(L^2([-\pi,\pi])\).</li>
<li>We need the axiom of choice to prove the Baire Category Theorem for every complete metric space, but not to prove it for the real numbers or the real function space \(L^2(\mathbb{R}^n)\).</li>
</ul>
<p>Shoenfield’s theorem helps tell us exactly when the axiom of choice is actually going to matter.</p>
<p>In the last post we talked about <em>models</em> of the ZF axioms, which are collections of sets that obey all the rules. Given a model, Kurt Gödel defined something called the <a href="https://en.wikipedia.org/wiki/Constructible_universe">constructible universe</a>, which is a sort of smaller model, contained in the original model, which can be built up explicitly from smaller pieces. The constructible universe usually doesn’t contain everything in the original model, but it will in some sense contain all the simple explicitly describable things in the original model.</p>
<p>But the constructible universe has some extra nice properties. One is that the constructible universe will always satisfy the axiom of choice, even if the original model did not!<strong title="This is how Gödel proved that the axiom of choice must be consistent with the ZF axioms: the constructible universe gives us a model of ZF that also satisfies the axiom of choice."><sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup></strong> Specifically, since we construct the universe in a specific <em>order</em>, everything we’ve constructed can be <a href="/blog/what-is-the-axiom-of-choice/#well-ordering">well-ordered</a>, which implies the axiom of choice. So any theorem that relies on the axiom of choice is automatically true as long as we’re only talking about sets in the constructible universe.</p>
<p>Shoenfield’s theorem extends that result even further. If you have a sufficiently simple question (for a <a href="https://en.wikipedia.org/wiki/Analytical_hierarchy">precise definition of sufficiently simple</a>), then the original model and the constructible universe must give the same answer. Since the axiom of choice always holds in the constructible universe, the answers to these simple questions can’t depend on whether you accept the axiom of choice or not.</p>
<p>What does that mean? Any simple-enough result that you can prove with the axiom of choice, you can also prove without it. That includes everything about Peano arithmetic and basic number theory, and also everything about the <a href="https://news.ycombinator.com/item?id=27855515">correctness of explicit computable algorithms</a>. It also includes <a href="https://en.wikipedia.org/wiki/Axiom_of_choice#cite_ref-16">\(P = NP\) and the Riemann Hypothesis</a>, and a number of other major unsolved problems.</p>
<p>There are questions that the axiom of choice really does matter for. But Gödel and Shoenfield’s results show that they have to be pretty far removed from anything finite or concretely constructible. So in practice, we can use the axiom of choice as a tool to make our work simpler, knowing that it won’t screw up anything practical that really matters.</p>
<hr />
<p><em>Do you have other questions about the axiom of choice? Another cool fact I don’t know about? Or some other math topic you’d like me to explain? Tweet me <a href="https://twitter.com/profjaydaigle">@ProfJayDaigle</a> or leave a comment below.</em></p>
<hr />
<div class="footnotes">
<ol>
<li id="fn:1">
<p>This is how Gödel proved that the axiom of choice must be consistent with the ZF axioms: the constructible universe gives us a model of ZF that also satisfies the axiom of choice. <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>Jay DaigleI got a lot of good, interesting comments on my recent post on the axiom of choice (both on the post itself, and in this very good Hacker News thread). I wanted to answer some common questions and share the most interesting thing I learned.What is the Axiom of Choice?2021-07-14T00:00:00-07:002021-07-14T00:00:00-07:00https://jaydaigle.net/blog/what-is-the-axiom-of-choice<p>One of the easiest ways to start a (friendly) fight in a group of mathematicians is to bring up the <a href="https://en.wikipedia.org/wiki/Axiom_of_choice">axiom of choice</a>. This axiom has a really interesting place in the foundations of mathematics, and I wanted to see if I can explain what it means and why it’s controversial. As a bonus, we’ll get some insight into what an axiom <em>is</em> and how to think about them, and about how we use math to think about the actual world.</p>
<p style="text-align: center;"><a href="https://xkcd.com/982"><img src="https://imgs.xkcd.com/comics/set_theory.png" alt="xkcd 982: "The axiom of choice allows you to select one element from each set in a collection—and have it executed as an example to the others"" /></a></p>
<p>The axiom seems pretty simple at first:</p>
<blockquote>
<p><strong>Axiom of Choice:</strong> Given a collection of (non-empty) sets, we can choose one element from each set.<strong title="We can be more formal by phrasing this in terms of _choice functions_: given a collection of sets X = {A} there is a function f : X \to ⋃ A such that f(A) ∈ A for each A ∈ X. But I want to keep the discussion as readable as possible if you're not comfortable with the language of formal set theory."><sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup></strong></p>
</blockquote>
<p>Most people find this principle pretty inoffensive, or even obviously right, on first contact. But it’s extremely controversial and produces strong emotions; and unusually for a mathematical debate, there’s essentially no hope of a clear resolution. And I want to try to explain why.</p>
<h3 id="easy-choices">Easy choices</h3>
<p>One reason the axiom of choice can <em>sound</em> trivial is that there are a lot of superficially similar rules that are totally fine; the controversial bit is subtle. So here are a few things that don’t cause controversy:</p>
<ul>
<li>If we have one set, we can definitely pick an element from it. The axiom of choice says if we have a collection of sets, we can pick one element from each set simultaneously.</li>
<li>
<p>But if we can pick an element from one set, can’t we pick an element from the first set, and then the second set, and then the third, etc.? Eventually we’ll pick an element from each set.</p>
<p>This works if we only have a <em>finite</em> collection of sets. So if I have five sets, I can pick one element from each set, by picking an element from the first set, then the second set, then the third, then the fourth, then the fifth. This is sometimes known as the <strong>axiom of finite choice</strong>. And no one argues about this.</p>
<p>But that approach doesn’t work if we have infinitely many sets.<strong title="Using this sort of process on an infinite set is called transfinite induction. Transfinite induction can sometimes allow us to make choices without the axiom, but only if we can put our sets in some order. Conversely, the axiom of choice allows us to use transfinite induction in cases we otherwise couldn't. (Corrected from earlier version; thanks to Sniffnoy for the correction)"><sup id="fnref:2"><a href="#fn:2" class="footnote">2</a></sup></strong> If we pick elements from one set at a time, we’ll never get to all the sets; there will still be infinitely many left. This infinitude of sets is where the real problem lies. (And things get worse if we have an <a href="https://en.wikipedia.org/wiki/Uncountable_set">uncountably infinite</a> collection of sets, which is too many to even put in order!)</p>
</li>
</ul>
<p style="text-align: center;"><img src="/assets/blog/aoc/count_over_eleventy.jpg" alt="A kitten holding up its paws like it's counting. "Ai can count ober elebenty. Look see? Elebenty one elebenty two elebenty free..."" /></p>
<ul>
<li>
<p>Even if we have an infinite collection of sets, we <em>might</em> be able to pick an element from each set. If the sets have a nice enough pattern to them, we can give an explicit rule that lets us pick an element from each set consistently. For instance, if we have a bunch of sets of positive integers, we can always say something like “pick the smallest number in each set”.</p>
<p>But not every collection of sets allows a deterministic rule like this.<strong title="The set of real numbers doesn't have a smallest element or a largest element. Nor does the set of positive real numbers, or the set of numbers between zero and one. So if we have a colleciton of sets of real numbers, the rule we used for sets of positive integers doesn't work."><sup id="fnref:3"><a href="#fn:3" class="footnote">3</a></sup> </strong> The axiom of choice says that we can choose an element from each set, even if we can’t describe a rule for making that choice. If we have infinitely many pairs of shoes we don’t need the axiom of choice, since we can just take the left shoe from each pair; but if we have infinitely many pairs of socks, we do need the axiom of choice.<strong title="This example was originally offered by Bertrand Russell. "><sup id="fnref:4"><a href="#fn:4" class="footnote">4</a></sup></strong></p>
</li>
</ul>
<h3 id="whats-the-problem">What’s the problem?</h3>
<p>The axiom of choice has weird effects precisely because it is so unlimited. It tells us that given any infinite collection of infinite sets, we can pick one option from each set, even if the sets are too big to really understand, and even if we don’t have any extra structure to guide us.</p>
<p>We can see how this matters by looking at a classic logic puzzle, and then taking it to infinity.</p>
<h5 id="the-finite-hat-puzzle">The (finite) hat puzzle</h5>
<p>Imagine a game show host<strong title="The _classic_ version of the puzzle features a sadistic prison warden. While that setup is traditional, it seems unnecessarily violent, so I've replaced it with something friendlier."><sup id="fnref:5"><a href="#fn:5" class="footnote">5</a></sup></strong> is going to line you up with 99 other people, and give each of you a hat to wear, which is either black or white. You can see everyone in front of you, including the colors of their hats; you can’t see your own hat, nor can you see anyone behind you.</p>
<p>Starting at the back of the line, the host will ask each person to guess whether their own hat is black or white. You’ll be able to hear the guesses, and whether they’re right or wrong.</p>
<p>Before the game starts, you all get a few minutes to talk and plan out your strategy. What should you do to get as many correct guesses as possible?</p>
<p>Stop and take a minute to think about this one. It doesn’t require any fancy mathematics, just a cute trick that’s surprisingly useful in other contexts.</p>
<p style="text-align: center;"><img src="https://64.media.tumblr.com/a6eec2d9352742626fe1fbe09b668cec/tumblr_nvpzxaBNRN1qgomego1_500.png" alt="Papyrus from Undertale dressed as Professor Layton: "Human would you like a puzzle?" Small child: "Not really" Papyrus: "Too bad you're getting a puzzle"" />
<em>Drawing by <a href="https://nightmargin.tumblr.com/post/130512412496/professor-skeleton-and-the-mystery-of-why-is">nightmargin</a> on Tumblr</em></p>
<p>As a hint, you can do really, really well. A simple approach that isn’t too bad is to have each odd-numbered person announce the color of the hat in front of them. This guarantees 50 right answers, and on average will get 75. But we can do much better than that.</p>
<p>Ready?</p>
<p>The person in the back of the line (call them \(A\)) doesn’t have any information, so there’s no possible way to guarantee they’ll get it right. But we can make sure everyone else wins. \(A\) can count up all the black hats in front of them and figure out if the number is even or odd. If it’s even, they’ll say “white”; if it’s odd, they’ll say “black”.</p>
<p>Now the second person \(B\) now knows whether \(A\) saw an even or odd number of black hats. But \(B\) can count up all the black hats <em>they</em> see. If \(A\) sees an even number of hats, but \(B\) sees an odd number, that means \(B\) must be wearing the remaining black hat.</p>
<p>The process continues down the line. \(C\) can tell whether \(A\) saw an even or odd number of black hats, and can also tell whether \(B\) was wearing black or white. Between that information, and seeing all the hats in front of them, \(C\) can figure out their own hat color.</p>
<p>(This sounds like it gets complicated very quickly, but we can streamline it. Count up all the black hats in front of you, and then add 1 to the number every time someone behind you says “black”. When the host reaches you, if the number is even you’re wearing a white hat, and if it’s odd you’re wearing a black hat.)</p>
<p>This exact algorithm is used by a lot of computer systems, especially when transmitting data over noisy connections. Computers store information in bytes, which are strings of eight bits. But often they will only use seven of the bits to store information (for instance, in standard <a href="http://rabbit.eng.miami.edu/info/ascii.html">ASCII encoding</a> there are 128 possible characters, represented as a 7-bit number). In transmission, the eighth bit can be used as a <a href="https://en.wikipedia.org/wiki/Parity_bit">parity bit</a>, which will be 1 if the other digits include an even number of “1”s, and 0 if they include an odd number of “1”s.</p>
<p>Thus every byte should have an odd number of “1”s, and if any byte has an even number of “1”s the system knows it contains an error. In our solutions \(A\) is effectively providing a parity bit for the string of hat colors, letting each player infer the information they don’t have: the color of their own hat.</p>
<h5 id="the-uncountable-hat-puzzle">The uncountable hat puzzle</h5>
<p>That puzzle is fun, and the solution is clever, but there’s nothing especially paradoxical or brain-breaking about it. And it doesn’t involve the axiom of choice at all. But we can write a harder version that does use the axiom of choice, and has truly ridiculous results.<strong title="I think I first heard about this version from Greg Muller at https://cornellmath.wordpress.com/2007/09/13/the-axiom-of-choice-is-wrong/"><sup id="fnref:6"><a href="#fn:6" class="footnote">6</a></sup> </strong></p>
<p style="text-align: center;"><img src="/assets/blog/aoc/this-puzzle-reminds-me-of-a-puzzle.png" alt="Professor Layton's head: Doing a puzzle? That reminds me of a puzzle!"" /></p>
<p>Suppose the game host now gets an infinite line of people, so each person can see an infinite collection of people in front of them. (Let’s assume there is a <em>first</em> person in the line, so it’s not infinite in both directions; you have infinitely many people in front of you, but only finitely many behind.) And instead of black or white hats, we’ll write a random real number on each person’s hat: you could have 3 or 7, or \(5.234\) or \(\pi^e\) or \(\Gamma(3.5^{7.2e^2})\). And just to make it harder, you can’t even hear what happens behind you.</p>
<p>This looks plainly impossible. No one who can see your hat can communicate with you at all. Even if they could, there are <a href="https://en.wikipedia.org/wiki/Cantor's_diagonal_argument">more possible hat labels</a> than there are people in line. It seems like everyone working together wouldn’t be able to guarantee even one right answer. But if we can use the axiom of choice, we can guarantee that infinitely many people get the right answer—and even better, only finitely many people will get it wrong. In our endless infinite line, there will be a <em>last</em> wrong person; all the endless people in front of them will guess right.</p>
<p>How can this possibly work? First we’ll think about the set of all possible sequences<strong title="If you don't know what a sequence is, just think of this as an infinite list."><sup id="fnref:7"><a href="#fn:7" class="footnote">7</a></sup></strong> of real numbers. (If we’re being fancy we might call this set \(\mathbb{R}^{\mathbb{N}}\).) We’ll say that two sequences are equivalent if they’re only different in finitely many places. So the sequences \( \Big( 1,2,3,4,5,6, \dots \Big) \) and \( \Big( 17, 2000 \pi, -\frac{345}{e}, 4, 5, 6, \dots \Big) \) are equivalent, but \( \Big( 1,0,3,0,5,0, \dots \Big) \) isn’t equivalent to either of them.</p>
<p>This gives us what’s called an <a href="https://en.wikipedia.org/wiki/Equivalence_relation">equivalence relation</a> on the set of real sequences. Equivalence relations are a widely useful tool, and I might write about them some other time, but for right now the important thing is that they <em>partition</em> the set, or subdivide it into smaller sets of things that are all equivalent to each other. Each thing will be in one and only one smaller set, which we call an <em>equivalence class</em>.</p>
<p>In our case, this means we’ve taken the set of all sequences of real numbers, and split it up into a bunch of equivalence classes of sequences. Every sequence belongs to exactly one equivalence class. And within each equivalence class, all the sequences are equivalent to each other—which means that they only have finitely many differences from each other.</p>
<p>Now we use the axiom of choice. We can <em>choose</em> one representative sequence from each equivalence class, and have everyone memorize this set of chosen sequences. When we all line up, I can see everyone in front of me, so there are only finitely many people I can’t see. There’s only one sequence on my list that can possibly be equivalent to this one.</p>
<p>Now when the host reaches me, I don’t know what’s happened behind me. I don’t know the exact sequence of hat labels. But I don’t need to! I know which equivlence class the sequence is in, and I know which representative sequence we chose for that equivalence class. So I can tell the host the number for my position from the representative sequence that we chose.</p>
<p>I might not be right; I have no way to know until the host tells me. But since we’re all using the <em>same</em> representative sequence that we chose earlier, and the sequence is only different from the “true” sequence finitely many times, an infinite number of us will get answer correctly. And only a finite number will fail.</p>
<h3 id="what-does-it-do-for-us">What does it do for us?</h3>
<p>The hat puzzle is obviously a little contrived, but the axiom of choice has a lot of surprising and sometimes disconcerting implications that are relevant to other fields of math. Some of these consequences are apparent paradoxes; others are things we would very much like to be true, and make the axiom of choice extremely useful.</p>
<h5 id="zorns-lemma">Zorn’s lemma</h5>
<p style="text-align: center;"><img src="/assets/blog/aoc/zorns_lemon.png" alt="What's yellow, sour, and equivalent to the axiom of choice? Zorn's Lemon!" /></p>
<p>Zorn’s Lemma is probably the most common use of the axiom of choice, but it’s a little tricky to explain. The formal statement is short enough:</p>
<blockquote>
<p><strong>Zorn’s Lemma:</strong> Every non-empty partially ordered set in which every totally ordered subset has an upper bound contains at least one maximal element.</p>
</blockquote>
<p>But it’s not super obvious what this means. The basic idea is that if we have some set where</p>
<ul>
<li>We can compare two elements and sometimes decide which one is “larger”;</li>
<li>but sometimes neither element counts as “larger”;</li>
<li><del>and we can never have an infinite collection of successively larger elements;</del>
any time we have an infinite collection of successively larger elements, there’s some other element bigger than all of them (thanks to Sniffnoy for the correction);</li>
</ul>
<p>then there must be a “largest” element.<strong title="Sometimes there can be _more than one_ largest element, which is a little weird. But since some pairs of elements can't be compared, you can have multiple elements that don't have anything above them. Imagine a company with two presidents: each of them is a highest-ranking person at the company. And that's why we say 'a' largest element rather than 'the' largest."><sup id="fnref:8"><a href="#fn:8" class="footnote">8</a></sup></strong></p>
<p>This is surprisingly useful, for one very specific reason: we can build up solutions to our problems step by step, and have a guarantee that we’ll finish. This is a tool we want to use all the time in math. We even tried it earlier: if we have a collection of sets, we can choose an element from the first one, and then the second one, and then the third one….</p>
<p>The problem we ran into is that this will eventually let us choose one element from each of a thousand sets, or a million, or a billion. But we have no guarantee that we can “eventually” choose from each of an infinite, possibly uncountable, collection of sets. Zorn’s lemma <a href="https://gowers.wordpress.com/2008/08/12/how-to-use-zorns-lemma/">solves this exact problem for us</a>, and lets us extend these constructions to infinity. And often when we’re defining functions on an infinite set, that’s exactly what we want to do.</p>
<p>Zorn’s lemma has one more important consequence: it is <em>equivalent</em> to the axiom of choice. We can use the axiom of choice to prove Zorn’s lemma; but we can also use Zorn’s lemma to prove the axiom of choice (by extending the axiom of finite choice to infinity, in exactly the way we were just discussing). We can’t duck the axiom-of-choice question by just making Zorn’s lemma into an axiom; the two are a package deal. If we want the power of Zorn’s lemma, we’re stuck with the axiom of choice and all the weirdness it implies.</p>
<h5 id="well-ordering"><a name="well-ordering">Well-ordering</a></h5>
<blockquote>
<blockquote>
<p>The axiom of choice is obviously true, the well-ordering principle obviously false, and who can tell about Zorn’s lemma?</p>
</blockquote>
</blockquote>
<blockquote>
<blockquote>
<blockquote>
<p><a href="https://books.google.com/books?id=eqUv3Bcd56EC&q=Bona#v=snippet&q=Bona&f=false">Jerry Bona</a></p>
</blockquote>
</blockquote>
</blockquote>
<p>These equivalences are a recurring theme in discussions of the axiom of choice. Another non-obviously equivalent statement is the Well-Ordering Principle, which says we can put any set \(X\) in a <a href="https://en.wikipedia.org/wiki/Well-order">definite order</a>, so that any subset has a “first” element. This is much stranger than it probably sounds. For instance, it’s really easy to put the real numbers in order, but most subsets won’t have a first element. (What’s the smallest real number? What’s the smallest positive real number? What’s the smallest number greater than 3?)</p>
<p>In fact, the fact that the usual order on the real numbers is <em>not</em> a well-ordering is a traditional source of internet math flame wars. There have been many <a href="https://forums.whirlpool.net.au/thread/9nxvlq19">forum threads</a> and <a href="https://polymathematics.typepad.com/polymath/2006/06/no_im_sorry_it_.html">blog comment threads</a> arguing endlessly about whether the infinitely repeating decimal \(.\bar{9}\) is actually equal to \(1\). (Yes, it is.)</p>
<p>Skeptics often suggest that maybe \(.\bar{9}\) isn’t <em>quite</em> \(1\), but just very close. Maybe it’s the last number before \(1\), the biggest number smaller that \(1\). But with the normal order for the reals, no such number exists. The reals are not well-ordered.</p>
<p>But with the axiom of choice, we can make up some <em>other</em> order for the real numbers, where every set has a first number. In fact, for any set, we can look at all the subsets and choose a first element for each once. We need to make sure that we do this consistently, but if we’re careful that’s not a problem, and so we can create a well-ordering on any set.</p>
<p>So what happens if we do this to the real numbers? There’s no real way to describe it—which is exactly why it requires the axiom of choice! You can make your favorite list of numbers and “choose” those to be first; the real difficulty is the need to make infinitely many choices. The axiom of choice lets us do this, but only in a totally non-explicit way that we can’t describe concretely.</p>
<h5 id="the-banach-tarski-paradox">The Banach-Tarski “paradox”</h5>
<p style="text-align: center;"><a href="https://xkcd.com/804"><img src="/assets/blog/aoc/xkcd_pumpkin_carving_edit.png" alt="xkcd 804: Pumpkin Carving. "I carved and carved, and the next thing I knew I had _two_ pumpkins." "I _told_ you not to take the axiom of choice."" /></a></p>
<p>But the most famous consequence of the axiom of choice, which probably deserves its own post, is the <a href="https://en.wikipedia.org/wiki/Banach%E2%80%93Tarski_paradox">Banach-Tarski paradox</a>. Banach-Tarski says that if we have a solid three-dimensional ball, we can split it into five non-overlapping sets, rearrange these sets without any stretching or bending, and finish with two balls, each identical to the original ball.<strong title="The more general result is: given any two three-dimensional objects A and B, we can partition A into a finite collection of sets, and then rearrange those sets to get precisely B. In the special case people usually quote, A is 'a ball' and B is 'two balls'."><sup id="fnref:9"><a href="#fn:9" class="footnote">9</a></sup></strong></p>
<p>That means we’ve doubled the volume of our stuff just by moving the pieces around, which seems, um, implausible. We definitely can’t do that with a real ball. But with the axiom of choice, we can define “pieces” of the ball that are so strange that they don’t really have sizes at all. If we put them together one way, we get one volume; if we put them together a different way, we get a different volume. But the components don’t have a well-defined volume, so this is logically consistent. (And thus not actually a paradox, despite the name!)</p>
<h5 id="a-bunch-of-other-things">A bunch of other things</h5>
<p>There’s a <a href="https://en.wikipedia.org/wiki/Axiom_of_choice#Equivalents">long list of statements</a> that are equivalent to the axiom of choice. They show up in fields all over math, and algebra, analysis, and topology all become much simpler if these things are true:</p>
<ul>
<li>Every vector space has a basis</li>
<li>A product of non-empty sets is non-empty</li>
<li>Every set can be made into a group</li>
<li>The product of compact topological spaces is compact</li>
<li><a name="tarski">Tarski’s theorem:</a> If \(A\) is an infinite set, there’s a bijection between \(A\) and \( A \times A \)</li>
</ul>
<p>Since these are all equivalences, we can prove axiom of choice with any one of them. If you believe <em>any</em> of these statements, you’re stuck believing all of them—and the axiom of choice as well, with all its bizarre ball-cloning hat-identifying implications.</p>
<h3 id="sois-it-true">So…is it true?</h3>
<blockquote>
<blockquote>
<p>Tarski…tried to publish his theorem (<a href="#tarski">stated above</a>) in the <em>Comptes Rendus Acad. Sci. Paris</em> but Fréchet and Lebesgue refused to present it. Fréchet wrote that an implication between two well known propositions is not a new result. Lebesgue wrote that an implication between two false propositions is of no interest. And Tarski said that after this misadventure he never tried to publish in the <em>Comptes Rendus</em>.</p>
</blockquote>
</blockquote>
<blockquote>
<blockquote>
<blockquote>
<p>Jan Mycielski, <a href="http://www.ams.org/notices/200602/fea-mycielski.pdf"><em>A System of Axioms of Set Theory for the Rationalists</em></a></p>
</blockquote>
</blockquote>
</blockquote>
<p>The big question is: <em>should</em> we believe any of these statements?</p>
<p>That might be a surprising question. Isn’t the whole point of math to have definitive, objectively correct answers? Either we can prove a result is true, or we can’t. We don’t generally ask whether we feel like believing a theorem. We proved it; we’re stuck with it.</p>
<p>But <em>axioms</em> are a little different. We need to decide on our axioms before we can prove things at all—or even decide what counts as a proof. Just like we can’t use a recipe to decide whether we want to make a cake or a cheeseburger, we can’t prove that an axiom is “correct”.</p>
<p>What we can do is look at a cake recipe, see what we’d have to do, and decide that maybe we don’t feel like making a cake after all. And we can look at what an axiom allows us to prove, and decide that maybe we don’t like those results and should pick some different axioms that don’t allow them.</p>
<h5 id="the-zermelo-fraenkel-axioms">The Zermelo-Fraenkel Axioms</h5>
<p>The standard system of axioms we use in math is called <a href="https://en.wikipedia.org/wiki/Zermelo%E2%80%93Fraenkel_set_theory">Zermelo-Fraenkel Set Theory</a>, or just ZF. These are the rules we use as the base for all our work. If we can use them to prove a statement, we say just it’s proven; if a statement contradicts the ZF axioms, we’ve disproven it.</p>
<p style="text-align: center;"><img src="/assets/blog/aoc/set-theory-is-enough-theory-already.jpg" alt="Grumpy Cat says: Set Theory / is enough theory already" /></p>
<p>If the axiom of choice contradicted ZF, then we could forget about it and move on with our lives. But in 1938 Kurt Gödel proved that this isn’t the case: you can have fully consistent systems that respect both the ZF axioms and the axiom of choice.</p>
<p>Similarly, if we could prove the axiom of choice from the ZF axioms, we would have to either accept it as true, or completely rework all the foundations of math<strong title="We've actually done that before. At the beginning of the 20th century, Bertrand Russell and others found deep contradictions in the naive version of set theory in use at the time, and the ZF axioms were developed to avoid those problems. But we'd rather avoid doing it again."><sup id="fnref:10"><a href="#fn:10" class="footnote">10</a></sup></strong>. But we can’t do that either. And this is more than just acknowledging that we haven’t proved it <em>yet</em>: in 1963 Paul Cohen invented a technique called forcing to prove that if ZF is consistent, then we can never prove the axiom of choice from the rest of the ZF axioms.</p>
<p>This combination of results feels a little weird, because it’s so different from the way we usually approach math. Math has a reputation for black-and-white thinking<strong title="I don't like this reputation in any context. Mathematical thinking creates tons of space for nuance and subtlety and shades of gray. But that's probably a different essay."><sup id="fnref:11"><a href="#fn:11" class="footnote">11</a></sup></strong>: there’s a right answer to every question, and other answers are wrong. But here I’m telling you that there is no right answer. We can accept or reject the axiom of choice, and it works equally well either way.</p>
<h5 id="independence-is-normal">Independence is normal</h5>
<p>But this is actually perfectly normal! Suppose I asked you “are triangles isosceles?” The right answer isn’t “yes” <em>or</em> “no”: it depends on the triangle. And there are some theorems we can prove about isosceles triangles, like “if a triangle is isosceles, it has two equal angles”. And there are different theorems we can prove about non-isosceles triangles. The “axiom of isosceles-ness” is independent from the definition of a triangle.</p>
<p>But that might sound a little glib; no one talks about triangles like that. A better example is Euclidean geometry. When Euclid gave his formalization of geometry in <em>Elements</em>, he began with <a href="https://en.wikipedia.org/wiki/Euclidean_geometry#Axioms">five axioms</a> (or “postulates”, as you might have called them in high school geometry). The fifth (and final) postulate, called the <a href="https://en.wikipedia.org/wiki/Parallel_postulate">parallel postulate</a>, proved to be rather awkward.</p>
<blockquote>
<p><strong><a href="https://en.wikipedia.org/wiki/Parallel_postulate">Parallel postulate</a>:</strong> There is at most one line that can be drawn parallel to another given one through an external point.<strong title="This version is more precisely known as Playfair's axiom. Euclid's phrasing (translated from Greek) was 'if a straight line falling on two straight lines make the interior angles on the same side less than two right angles, the two straight lines, if produced indefinitely, meet on that side on which the angles are less than two right angles.' But Playfair's axiom is much simpler to state, and the two statements are equivalent."><sup id="fnref:12"><a href="#fn:12" class="footnote">12</a></sup></strong></p>
</blockquote>
<p>This axiom is extremely important to geometry, but is much more complex and less self-evident than the other four axioms, which are statements like “all right angles are equal” and “we can draw a line connecting any two points”. Two millennia of mathematicians tried to remove this awkward complexity by proving the parallel postulate just from Euclid’s other axioms.</p>
<p>Then in the 1800s, we finally solved this problem—in the other direction. Euclidean geometry, including the parallel postulate, is completely consistent; but it’s also consistent to work with <em>non</em>-Euclidean geometries, in which the parallel postulate is false. Mathematicians constructed <a href="https://en.wikipedia.org/wiki/Non-Euclidean_geometry#Models_of_non-Euclidean_geometry">models</a> of elliptic geometry, in which there are no parallel lines, and of hyperbolic geometry, in which parallel lines are not unique.</p>
<p>What is a model? It’s just something that obeys all the axioms. So the work we do in high school, with pencil and paper on a flat surface, is a model of Euclidean geometry. It follows all five axioms, and any theorem that follows from the Euclidean axioms will be true of our pencil-and-paper work.</p>
<p>But if we work on the surface of a sphere, we get a model of non-Euclidean elliptic geometry. We can define a line to be a <a href="https://en.wikipedia.org/wiki/Great_circle">great circle</a>, a circle that goes fully around a sphere the long way. Any two points lie on exactly one great circle, so these “lines” obey Euclid’s first four axioms. But with a little bit of playing around, you can see that any pair of great circles will intersect in two points. This model doesn’t have any parallel lines at all.</p>
<p style="text-align: center;"><img src="/assets/blog/aoc/Grosskreis.svg" alt="Image of a sphere, with great circles marked." /></p>
<p style="text-align: center"><em>The solid curves are great circles. The solid blue curve is the equator.</em> <br />
<em>The dashed curves aren’t great circles, so they don’t count as lines.</em> <br />
<em>Adapted from <a href="https://commons.wikimedia.org/wiki/File:Grosskreis.svg">Wikimedia Commons</a></em></p>
<p>We can also build <a href="https://en.wikipedia.org/wiki/Poincar%C3%A9_disk_model">models of hyperbolic geometries</a>, but they’re a little harder to describe. But just one of these models is enough to know that we can’t prove the parallel postulate from Euclid’s other axioms—at least, not unless the other axioms are themselves contradictory. Nor can we disprove it. We have to <em>decide</em> if we want to use the parallel postulate.</p>
<p>This is exactly what Gödel and Cohen did for the axiom of choice. Gödel constructed a model of ZF set theory with choice; Cohen constructed a model of ZF set theory without choice. So we have to decide if we want to use the axiom of choice. And this brings us back to the same question: what are we trying to describe? Is the world we want to study a model of ZF with choice, or without?</p>
<h5 id="how-do-we-choose">How do we choose?</h5>
<p>To decide if we should adopt an axiom, we need to know what our goals are, and what we’re trying to describe. Euclidean geometry is good for arranging furniture in my room, but it’s bad for planning long-range flights, for which the fact that we live on a sphere matters.</p>
<p style="text-align: center;"><img src="/assets/blog/aoc/great_circle_routes.png" alt="A diagram of a great circle flight path. First on a rectangular/planar projection, where it doesn't look like a straight line; then on a sphere, where it does." /></p>
<p style="text-align: center"><em>Plane flight paths don’t look like straight lines on a flat map.</em> <br />
<em>On a sphere we see they really are the shortest, “straightest” path.</em> <br />
<em>Adapted from <a href="https://commons.wikimedia.org/wiki/File:Different_map_projections.png">Wikimedia Commons</a> CC-BY-SA-3.0</em></p>
<p>We should ask the same question about the axiom of choice: what are we trying to describe? Does the axiom of choice bring us closer to describing the world accurately, or farther away? Is the world we want to study a model of ZF with choice, or without?</p>
<p>The obvious answer is that the axiom of choice has absurd and unrealistic results. In the real world we can’t slice up one billiard ball and assemble the pieces into two billiard balls, or save infinitely many people in the hat puzzle. So if the axiom of choice says we can, it must not be describing the real world.</p>
<p>But this argument isn’t terribly persuasive, because every single thing about the uncountable hat puzzle is physically absurd. Even the setup is ridiculous: we can’t have an infinite line of people, and if we were somehow put in an infinite line, we wouldn’t be able to see all the people in it, let alone the numbers on their hats.</p>
<p>The step where we use the axiom of choice is even more unrealistic. We take the uncountably infinite set of real sequences; we partition it into an uncountably infinite collection of infinite sets of sequences; and then we ask everyone to memorize an (infinite!) sequence from each of these infinitely many infinite sets.</p>
<p>I’d have a hard time remembering one list of a hundred numbers. Memorizing a thousand lists of a thousand numbers is extremely unlikely; memorizing infinitely many lists of infinitely many numbers is flatly impossible. And that’s before we ask how we can communicate the lists we’ve chosen to each other, so that each of the (infinitely many) people memorize the <em>same</em> infinite collection of infinite lists.</p>
<p>The Banach-Tarski argument isn’t any better. It splits the ball into only five pieces,sure, but each of those pieces is infinitely complex, enough so that you can’t concretely describe their shapes, let alone actually cut a ball into those pieces. The informal explanation that “you can slice a ball into five pieces and reassemble those pieces into two balls” is not true, because there’s no real way to produce the pieces you need.<strong title="Feynman has a story about this in his memoir. A math grad student described the Banach-Tarski paradox to him, and he bet that it was made up, rather than a real theorem. He was able to wriggle out of losing by pointing out that the grad student had described cutting up an _orange_, and you can't slice a physical object made up of atoms infinitely finely."><sup id="fnref:13"><a href="#fn:13" class="footnote">13</a></sup></strong></p>
<p>In the real world we <em>never see infinite sets</em>. We pretend some sets are infinite because it makes our lives easier. But any principle that <em>only</em> kicks in at infinity will never make contact with the reality.</p>
<p style="text-align: center"><img src="/assets/blog/aoc/einstein_stupidity.jpeg" alt="Picture of Einstein: Two things are infinite: the universe and human stupidity; and I'm not sure about the universe." height="50%" width="50%" /></p>
<p style="text-align: center"><em>Einstein <a href="https://quoteinvestigator.com/2010/05/04/universe-einstein">probably didn’t say this</a>, but it’s a good line.</em></p>
<h3 id="not-as-crazy-as-it-seems">Not as crazy as it seems</h3>
<p>This might feel like it’s dodging the question, though. If infinity is fake, why should we use axioms that only matter for infinity? And if we are going to say things about infinity, shouldn’t they make sense?</p>
<p>Maybe it’s fine for a physicist to dismiss mathematical abstractions as unphysical and thus irrelevant. But math is about reasoning through the consequences of abstract hypotheticals! If we’re going to adopt an foundational principle like the axiom of choice, we should really mean that we believe it in every abstract hypothetical situation we’re going to apply it in.</p>
<p>But after we realize how infinity works, our absurd results look somewhat more reasonable.<strong title="This is a common mathematical rhetorical trick. Earlier I was trying to convince you that the implications of the axiom of choice were really weird. Now I'm going to try to convince you that they're perfectly reasonable. This exact two-step happens quite a lot in math exposition. I suspect this is due partially to the demands of pedagogy, and partly to the way we form our mathematical intuition."><sup id="fnref:14"><a href="#fn:14" class="footnote">14</a></sup></strong> Our “successful” strategy in the infinite hat game actually doesn’t give us all that much. Sure, only finitely many people lose; some person in the line will be the last to answer wrong. But what would this look like in practice?</p>
<p>You could imagine the first hundred people all getting the question wrong. But that’s okay; only finitely many people will get it wrong. Then the first thousand people all get it wrong. But we know that at some point a last person will get it wrong and everyone left will get it right. A million people all get it wrong. Everyone gets bored. The game show host decides to leave. And sure enough, only finitely many people ever answered the question wrong!</p>
<p>The axiom of choice argument somehow doesn’t do anything after a finite number of answers. You could have the first million, or the first trillion, people all get the question wrong, and that wouldn’t contradict our proof. All the weirdness happens out at infinity—and we already know that infinity is deeply weird.</p>
<h3 id="whats-the-point">What’s the point?</h3>
<p>The axiom of choice is logically independent of our axioms for set theory, so we can’t ever prove it true or false. And it says deeply strange things about deeply strange situations that can never really happen. So why does it matter?</p>
<h5 id="infinity-is-fake-but-useful">Infinity is fake <em>but useful</em></h5>
<p>The answer is the same as the reason we use infinity at all. Everything we’ve ever seen is finite and discrete: objects are made out of atoms, and even if space and time aren’t truly quantized, our ability to measure them definitely is. But it’s extremely convenient to pretend that reality is continuous, which allows us to solve problems with calculus and other clever math tricks. If the world is “close enough” to being continuous, our answers will be good enough for whatever we’re doing.</p>
<p>Any infinity we care about will come from a limit of finite things. I can measure the width of my office in meters, or centimeters, or millimeters. With the right equipment I could measure it in micrometers or nanometers. I can’t ever measure it with infinite precision, but I can <em>imagine</em> doing that. And it’s really convenient to say the width is a real number, rather than to insist that it must <em>really</em> be some integer number of picometers</p>
<p>This exact reasoning is basically how all of calculus works. If I want to know how fast my car is going in miles per hour, I can measure the distance it travels in miles over the course of an hour. Or I can see how many miles it goes in a minute, and multiply by sixty. I could measure the number of miles it goes in a second, and multiply by 3600 (or more realistically, measure the number of <em>feet</em> it goes in a second, and multiply by 3600/5280).</p>
<p>But what is the speed “right now”? We imagine taking measurements over these shorter and shorter intervals; in the limit, when our interval is “infinitely short”, we get the instantaneous velocity. And that’s a derivative, which is an extremely powerful tool for doing math and physics.</p>
<p>But we can’t <em>actually</em> measure the distance traveled in an infinitely small window of time. (Nor can we measure the infinitely small time itself.) We’re taking some real, physical, finite measurements. We can measure how far a car goes in one second, multiply by 3600/5280, and then display that number on the dashboard. But the infinite version is something we only imagine.</p>
<h3 id="just-relax">Just relax</h3>
<p>If we’re trying to model the world, any infinite set we have to deal with will be a limit of finite sets. And any infinite family of infinite sets will be a limit of finite families of finite sets. And we know we have choice for finite sets of finite sets. So we can always get choice for these specific infinite sets, if we really need it—just by taking the limit of the elements we chose from our finite families.</p>
<p>What the axiom of choice says is: don’t worry about it. You don’t have to explain <em>how</em> your family of sets came from a finite family. You don’t have to explain <em>how</em> you’re choosing elements. We’ll just assume you can make it work somehow.</p>
<p>That’s what axioms are for. They tell us what we want to just assume we can do, without really explaining how. Our axioms are a list of things we don’t want to have to think about. And in practice, we don’t have to think about whether we can make choices. Any time it really matters, we can.</p>
<hr />
<div class="footnotes">
<ol>
<li id="fn:1">
<p>We can be more formal by phrasing this in terms of <em>choice functions</em>: given a collection of sets \(\mathcal{X} = \{A\}\) there is a function \(f : \mathcal{X} \to \bigcup_{A \in \mathcal{X}} A\) such that \(f(A) \in A \) for each \(A \in \mathcal{X} \). But I want to keep the discussion as readable as possible if you’re not comfortable with the language of formal set theory. <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
<li id="fn:2">
<p>Using this sort of process on an infinite set is called <a href="https://en.wikipedia.org/wiki/Transfinite_induction">transfinite induction</a>. <del>If we allow transfinite induction then we get the axiom of choice for free. But the axiom of choice also implies that we can do transfinite induction; the two concepts are logically equivalent.</del> Transfinite induction can sometimes allow us to make choices without the axiom, but only if we can put our sets in some order. Conversely, the axiom of choice allows us to <a href="https://en.wikipedia.org/wiki/Transfinite_induction#Relationship_to_the_axiom_of_choice">use transfinite induction in cases we otherwise couldn’t</a>.</p>
<p>Thanks to Sniffnoy for a helpful correction here. <a href="#fnref:2" class="reversefootnote">↩</a></p>
</li>
<li id="fn:3">
<p>The set of real numbers doesn’t have a smallest element or a largest element. Nor does the set of positive real numbers, or the set of numbers between zero and one. So if we have a colleciton of sets of real numbers, the rule we used for sets of positive integers doesn’t work. <a href="#fnref:3" class="reversefootnote">↩</a></p>
</li>
<li id="fn:4">
<p>This example was originally offered by Bertrand Russell. <a href="#fnref:4" class="reversefootnote">↩</a></p>
</li>
<li id="fn:5">
<p>The <em>classic</em> version of the puzzle features a sadistic prison warden. While that setup is traditional, it seems unnecessarily violent, so I’ve replaced it with something friendlier. <a href="#fnref:5" class="reversefootnote">↩</a></p>
</li>
<li id="fn:6">
<p>I think I first heard about this version from <a href="https://cornellmath.wordpress.com/2007/09/13/the-axiom-of-choice-is-wrong/">Greg Muller</a>. <a href="#fnref:6" class="reversefootnote">↩</a></p>
</li>
<li id="fn:7">
<p>If you don’t know what a sequence is, just think of this as an infinite list. <a href="#fnref:7" class="reversefootnote">↩</a></p>
</li>
<li id="fn:8">
<p>Sometimes there can be <em>more than one</em> largest element, which is a little weird. But since some pairs of elements can’t be compared, you can have multiple elements that don’t have anything above them. Imagine a company with two presidents: each of them is the highest-ranking person at the company. And that’s why we say “a” largest element rather than “the” largest. <a href="#fnref:8" class="reversefootnote">↩</a></p>
</li>
<li id="fn:9">
<p>The more general result is: given any two three-dimensional objects \(A\) and \(B\), we can partition \(A\) into a finite collection of sets, and then rearrange those sets to get precisely \(B\). In the special case people usually quote, \(A\) is “a ball” and \(B\) is “two balls”. <a href="#fnref:9" class="reversefootnote">↩</a></p>
</li>
<li id="fn:10">
<p>We’ve actually done that before. At the beginning of the 20th century, Bertrand Russell and others found deep contradictions in the naive version of set theory in use at the time, and the ZF axioms were developed to avoid those problems. But we’d rather avoid doing it again. <a href="#fnref:10" class="reversefootnote">↩</a></p>
</li>
<li id="fn:11">
<p>I don’t like this reputation in any context. Mathematical thinking creates tons of space for nuance and subtlety and shades of grey. But that’s probably a different essay. <a href="#fnref:11" class="reversefootnote">↩</a></p>
</li>
<li id="fn:12">
<p>This version is more precisely known as <a href="https://en.wikipedia.org/wiki/Playfair's_axiom">Playfair’s axiom</a>. Euclid’s phrasing (translated from Greek) was “if a straight line falling on two straight lines make the interior angles on the same side less than two right angles, the two straight lines, if produced indefinitely, meet on that side on which the angles are less than two right angles.” But Playfair’s axiom is much simpler to state, and the two statements are equivalent. <a href="#fnref:12" class="reversefootnote">↩</a></p>
</li>
<li id="fn:13">
<p>Feynman has a story about this in <a href="https://en.wikipedia.org/wiki/Surely_You're_Joking,_Mr._Feynman!">his memoir</a>. A math grad student described the Banach-Tarski paradox to him, and he bet that it was made up, rather than a real theorem. He was able to wriggle out of losing by pointing out that the grad student had described cutting up an <em>orange</em>, and you can’t slice a physical object made up of atoms infinitely finely. <a href="#fnref:13" class="reversefootnote">↩</a></p>
</li>
<li id="fn:14">
<p>This is a common mathematical rhetorical trick. Earlier I was trying to convince you that the implications of the axiom of choice were really weird. Now I’m going to try to convince you that they’re perfectly reasonable. This exact two-step happens quite a lot in math exposition. I suspect this is due partially to the demands of pedagogy, and partly to the way we form our mathematical intuition. <a href="#fnref:14" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>Jay DaigleOne of the easiest ways to start a (friendly) fight in a group of mathematicians is to bring up the axiom of choice. I'll explain what it is, why it's so controversial, and hopefully shed some light on how we choose axiomatic systems and what that means for the math we do.Lockdown Recipes&colon; Red Beans and Rice2020-05-25T00:00:00-07:002020-05-25T00:00:00-07:00https://jaydaigle.net/blog/lockdown-recipes-red-beans<p>Since we’re all stuck at home and cooking more than usual, I wanted to share one of my favorite recipes from my childhood, which is also especially suited to our current stuck-at-home ways.<strong title="Yeah, it would have made even more sense to post this two months ago. But two months ago I was trying to figure out how to teach three math classes over the internet instead of recipeblogging."><sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup></strong></p>
<p><img src="/assets/blog/recipes/red_beans_and_rice.jpg" alt="test" style="width:250px; float:right" /></p>
<p><a href="https://en.wikipedia.org/wiki/Red_beans_and_rice">Red Beans and Rice</a> is a traditional Louisiana Creole dish. It’s cheap and extremely easy and low effort to make. The one major downside is that it takes several hours of simmering (that don’t require any attention); in normal times that’s a major disadvantage, but if you’re working from home that’s not a problem at all.</p>
<p>In fact, this dish was originally a solution to a working-from-home dilemma that Louisiana cooks faced. Monday was laundry day, and the women of the house were so busy doing the wash that they couldn’t spend all day tending food on the stove. So this hands-off dish became a traditional Monday dinner.</p>
<p>There are a <em>lot</em> of ways you can vary this dish. I’ll give two straightforward recipes: one for the traditional stovetop method, and a faster pressure-cooker method that I use during busier times that takes less time, both in prep and in waiting. But I also want to talk about what some of the steps are doing, and how you can change things up to get different flavor profiles if you want.</p>
<h3 id="ingredients">Ingredients</h3>
<h5 id="aromatics">Aromatics</h5>
<ul>
<li>1/2 stick butter</li>
<li>1 chopped onion</li>
<li>4-5 ribs celery</li>
<li>1-2 chopped bell peppers</li>
<li>3 cloves garlic finely chopped garlic</li>
<li>Tablespoon chopped parsley</li>
<li>Teaspoon chopped thyme</li>
</ul>
<h5 id="body">Body</h5>
<ul>
<li>1 pound dried red kidney beans</li>
<li>1-2 pounds smoked or andouille sausage, sliced into bite-size pieces</li>
<li>6 oz tomato paste (one small can)</li>
</ul>
<h5 id="seasoning">Seasoning</h5>
<ul>
<li>2 bay leaves</li>
<li>quarter cup of brown sugar</li>
<li>1 tablespoon mustard</li>
<li>1 teaspoon paprika</li>
<li>Salt and cayenne pepper to taste</li>
</ul>
<h3 id="traditional-red-beans">Traditional red beans</h3>
<ol>
<li>In a large (at least two gallons) pot, melt the butter over medium heat. Sweat the onions, celery, and bell peppers for 5-10 minutes, until soft and onions are translucent.</li>
<li>Add garlic, parsley, and thyme and sautee for a couple minutes more, until soft.</li>
<li>Rinse kidney beans and add them to pot. Add water (or stock) until covered by an inch or two of water, and heat to a high simmer. Cover pot and leave to simmer.</li>
<li>After a half hour or so, add meat and tomato paste, and stir to combine. Return to a simmer and cover.</li>
<li>After another hour, add seasonings. Return to a simmer and cover again.</li>
<li>Once every hour or so, check on the pot. Top it off with extra liquid if it’s starting to run low, and scrape the bottom a bit to make sure nothing is sticking to the bottom.</li>
<li>After six to eight hours, the beans should be basically disintegrated: you’ll see the shells floating in the liquid, but the insides of the bean will have absorbed into the liquid base and formed a rich, thick paste. At this point you might want to taste it and adjust seasonings to your preference.</li>
<li>Serve over rice.</li>
</ol>
<h3 id="pressure-cooker-red-beans">Pressure cooker red beans</h3>
<p>Rinse the red beans. Then dump all the ingredients in the pressure cooker. Cook on high pressure for two hours, then simmer until consistency is good. Serve over rice.</p>
<p>(See how easy that was?)</p>
<h3 id="variations">Variations</h3>
<h5 id="aromatics-1">Aromatics</h5>
<p>Onions, celery, and bell peppers are the traditional base for New Orleans stocks and soups, known as the “<a href="https://en.wikipedia.org/wiki/Holy_trinity_(cuisine)">Holy Trinity</a>”. They serve the same role as the French <a href="https://en.wikipedia.org/wiki/Mirepoix_(cuisine)">mirepoix</a> (onions, celery, and carrots) or the Spanish <a href="https://en.wikipedia.org/wiki/Sofrito">sofrito</a> (garlic, onion, peppers, and tomatoes). If you like those other flavor profiles more, you can substitute a different aromatic base. You can also use whatever fat you like for the sauteeing.</p>
<p>Some people like to brown their aromatics, while others like to gently sweat them without browning. The flavor profiles are slightly different, so take your pick.</p>
<p>If you want to speed things up a bit, you can sweat your aromatics in a separate skillet while starting the boil on the red beans. I often find this easier to manage, not needing to stir the aromatics in the giant stock pot, but it does require a second pan.</p>
<h5 id="body-1">Body</h5>
<p>The most important aspect here is the kidney beans. It is <em>very important</em> that they stay at a full boil for at least half an hour; kidney beans <a href="https://en.wikipedia.org/wiki/Kidney_bean#Toxicity">are toxic</a> and it takes a good boiling to break those toxins down.</p>
<p>A lot of people like to soak their beans overnight before cooking with them. This makes the toxins break down a bit easier, and also makes them cook faster; it probably cuts the cooking time from eight hours or so down to six. It changes the flavor in a way I don’t like, so I don’t do it. But you might prefer that flavor!</p>
<p>You can definitely substitute in other beans, but you’ll get a different texture. Kidney beans are extremely tough and starchy and give the stock a nice body when completely broken down.</p>
<p>I like the flavor effect of adding a can of tomato paste, but it’s not especially traditional. This is totally optional.</p>
<p>Because the red beans add body, this broth works just fine with plain water. But if you have stock in your kitchen it can add extra layers of flavor and body to your dish. I generally start with homemade stock, and top it off with water as the cooking continues.</p>
<h5 id="meat">Meat</h5>
<p>You can flavor this broth with nearly any meat you have. Traditionally, the cook would use the leftover bones from the Sunday roast to flavor the red bean broth on Monday. If you happen to have some chicken or pork bones left over, you can do <em>far</em> worse than adding them to the pot.</p>
<p>When I’m doing it in the pressure cooker, I often like to take a 3-4 pound bone-in pork shoulder and add that in place of the sausage. I get the broth richness from the bone, and the meat of the pork shoulder falls off into the stew nicely. I haven’t tried this in the traditional method but I’m sure it would work.</p>
<p>If you do use pre-chopped meat like sausage, you can brown it in a separate pan for extra flavor. Extra steps and an extra pan, but extra flavor; your call whether it’s worth it.</p>
<p>Andouille sausage is probably the most standard sausage choice right now. It’s spicy, so you may want something milder. It’s also a bit more expensive than I tend to want to go for this dish; the sausage can easily be more than half the cost of the entire dish. My default option is Hillshire Farms smoked sausage, but you can use whichever firm sausage you like.</p>
<p>And the dish does work fine with no meat at all, if you’d prefer a vegetarian option. Replace the butter with oil and you can make it vegan.</p>
<h5 id="seasoning-1">Seasoning</h5>
<p>This is really flexible. To be honest, I primarily season with a healthy dose of Tony Chachere’s spice mix. I also add the sugar, and either a dollop of oyster sauce or a pinch of MSG powder.</p>
<p>But there are of course lots of options here. I don’t think the mustard is super traditional, but I very much like the effect.</p>
<p>Almost any spices you like can go here. I suspect coriander would be good. Swap out the cayenne pepper for black pepper, or for Tabasco sauce (very traditional in New Orleans food). Or you could change up the flavor profile entirely and push it towards your favorite cuisine. Use an Italian spice blend, or a Mexican blend, or an Indian blend, whatever strikes your fancy. And if you find something that works really well—let me know!</p>
<p><em>Did you make this? What did you think? Do you have a favorite lockdown recipe to share? Tweet me <a href="https://twitter.com/profjaydaigle">@ProfJayDaigle</a> or leave a comment below.</em></p>
<div class="footnotes">
<ol>
<li id="fn:1">
<p>Yeah, it would have made even more sense to post this two months ago. But two months ago I was trying to <a href="https://jaydaigle.net/blog/online-teaching-in-the-time-of-coronavirus/">figure out how to teach three math classes over the internet</a> instead of recipeblogging. <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>Jay DaigleSince we're all stuck at home and cooking more than usual, I wanted to share one of my favorite recipes from my childhood, which is also especially suited to our current stuck-at-home ways. Red Beans and Rice is a traditional Louisiana Creole dish. It's cheap and extremely easy and low effort to make. The one major downside is that it takes several hours of simmering (that don't require any attention); in normal times that's a major disadvantage, but if you're working from home that's not a problem at all.The SIR Model of Epidemics2020-03-27T00:00:00-07:002020-03-27T00:00:00-07:00https://jaydaigle.net/blog/the-sir-model-of-epidemics<script src="https://sagecell.sagemath.org/static/embedded_sagecell.js"></script>
<script>sagecell.makeSagecell({"inputLocation": ".sage"});</script>
<p>For <em>some</em> reason, a lot of people have gotten really interested in epidemiology lately. Myself included.</p>
<p><img src="/assets/blog/sir/coronavirus.jpg" alt="Picture of a coronavirus, by Alissa Eckert, MS and Dan Higgins, MAMS, courtesy of the CDC" class="center" style="width:350px" /></p>
<p style="text-align: center"><em>I have no idea why.</em></p>
<p>Now, I’m not an epidemiologist. I don’t study infectious diseases. But I do know a little about how mathematical models work, so I wanted to explain how one of the common, simple epidemiological models works. This model isn’t anywhere near good enough to make concrete predictions about what’s going to happen. But it <em>can</em> give some basic intuition about how epidemics progress, and provide some context for what the experts are saying.</p>
<hr />
<p><strong>Disclaimer:</strong> I don’t study epidemics, and I don’t even study differential equation models like this one. I’m basically an interested amateur. I’m going to try my best not to make any predictions, or say anything specific about COVID-19. I don’t know what’s going to happen, and you shouldn’t listen to my guesses, or the guesses of anyone else who isn’t an actual epidemiologist.</p>
<hr />
<h2 id="the-sir-model">The SIR Model</h2>
<h3 id="parameters">Parameters</h3>
<p>The SIR model divides the population into three groups, which give the model its name:</p>
<ul>
<li>$S$ is the number of <strong>S</strong>usceptible people in the population. These are people who aren’t sick yet, but could get sick in the future.</li>
<li>$I$ is the number of <strong>I</strong>nfected people. These are the people who are sick<strong title="Or people who are asymptomatic carriers. This model doesn't worry about who actually gets a fever and starts coughing, just who carries the virus and can maybe infect others."><sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup></strong> right now.</li>
<li>$R$ is the number of people who have <strong>R</strong>ecovered from the virus. They are immune and can’t get sick again.</li>
<li>We also will use $N$ for the total number of people. So $N = S+ I + R$.</li>
</ul>
<p><img src="/assets/blog/sir/knight.jpg" alt="Picture of a Knight, by Paul Mercuri (1860)" class="center" style="width:400px" /></p>
<p style="text-align:center"><em>Not that kind of “sir”.</em></p>
<p>For the purposes of this model, we assume that the total number of people, $N$, doesn’t change. But the number of people in each $S,I,R$ group is changing all the time: susceptible people get infected, and infected people recover. So we write $S(t)$ for the number of susceptible people “at time $t$”—which is just a fancy way of saying that $S(3)$ means the number of susceptible people on the third day.</p>
<h3 id="change-over-time">Change Over Time</h3>
<p>In order to model how these groups evolve over time, we need to know how often those two changes happen. How quickly do sick people recover? And how quickly do susceptible people get sick?</p>
<p>The first question, in this model, is simple. Each infected person has a chance of recovering each day, which we call $\gamma$. So if the average person is sick for two weeks, we have $\gamma = \frac{1}{14}$. And on each day, $\gamma I$ sick people recover from the virus.</p>
<p>The second question is a little trickier. There are basically three things that determine how likely a susceptible person is to get sick: how many people they encounter in a day, what fraction of those people are sick, and how likely a sick person is to transmit the disease. The middle factor, the fraction of people who are sick, is $\frac{I}{N}$. We could think about the other two separately, but for mathematical convenience we group them together and call them $\beta$.</p>
<p>So the chance that a given susceptible person gets sick on each day is $\beta \frac{I}{N}$.<strong title="If we're being fancy, we say that the chance of getting sick is proportional to I/N and that β is the constant of proportionality. But if you're not used to differential equations already I'm not sure that tells you very much."><sup id="fnref:2"><a href="#fn:2" class="footnote">2</a></sup></strong> And thus the total number of people who get sick each day is $\beta \frac{I}{N} S$.</p>
<p>If these letters look scary, it might help to realize that you’ve probably spent a lot of time lately thinking about $\beta$—although you probably didn’t call it that. The parameter $\beta$ measures how likely you are to get sick. You can decrease it by reducing the number of people you encounter in a day, through “social distancing” (or <a href="https://www.washingtonpost.com/lifestyle/wellness/social-distancing-coronavirus-physical-distancing/2020/03/25/a4d4b8bc-6ecf-11ea-aa80-c2470c6b2034_story.html">physical distancing</a>). And you can decrease it by improved hygiene—better handwashing, not touching your face, and sterilizing common surfaces.</p>
<p>There’s one more number we can extract from this model, which you might have heard of. In a population with no resistance to the disease (so $S$ and $I$ are both small, and we can pretend that $S=N$), a sick person will infect $\beta$ people each day, and will be sick for $\frac{1}{\gamma}$ days, and so will infect a total of $\frac{\beta}{\gamma}$ people. We call this ratio is $R_0$; you may have seen in the news that the $R_0$ for COVID-19 is probably about $2.5$.</p>
<p><img src="/assets/blog/sir/file-20200128-120039-bogv2t.png" alt="A graph demonstrating exponential growth when R0 = 2" class="center" style="width:377px;" /></p>
<p style="text-align: center;"><em>When $\beta$ is twice as big as $\gamma$, things can get bad very quickly. From <a href="https://theconversation.com/r0-how-scientists-quantify-the-intensity-of-an-outbreak-like-coronavirus-and-predict-the-pandemics-spread-130777">The Conversation</a>, licensed under <a href="http://creativecommons.org/licenses/by-nd/4.0/">CC BY-ND</a></em></p>
<h3 id="assumptions-and-limitations">Assumptions and Limitations</h3>
<p>Like all models, this is a dramatic oversimplification of the real world. Simplifcation is good, because it means we can actually understand what the model says, and use that to improve our intuitions. But we do need to stay aware of some of the things we’re leaving out, and think about whether they matter.</p>
<p><strong>First</strong>: the model assumes a static population: no one is born and no one dies. This is obviously <em>wrong</em> but it shouldn’t matter too much over the months-long timescale that we’re thinking about here. On the other hand, if you want to model years of disease progression, then you might need to include terms for new susceptible people being born, and for people from all three groups dying.</p>
<p><strong>Second</strong>: the model assumes that recovery gives permanent immunity. Everyone who’s infected will eventually transition to recovered, and recovered people never lose their immunity and become susceptible again. I don’t think we know yet how many people develop immunity after getting COVID-19, or how long that immunity lasts.</p>
<p>But it seems basically reasonable to assume that most people will get immunity for at least several months; in this model we’re simplifying that to assume “all” of them do. And since we’re only trying to model the next several months, it doesn’t matter for our purposes whether immunity will last for one year or ten.</p>
<p><strong>Third</strong>: we assumed that $\beta$ and $\gamma$ are constants, and not changing over time. But a lot of the response to the coronavirus has been designed to decrease $\beta$—and the extent of those changes may vary over time. People will be more or less careful as they get more or less worried, as the disease gets worse or better. And people might just get restless from staying home all the time and start being sloppier. An improved testing regime might also decrease $\beta$, and better treatments could improve $\gamma$.</p>
<p>But the model leaves $\beta$ and $\gamma$ the same at all times. So we can imagine it as describing what would happen if we didn’t change our lifestyle or do anything in response to the virus.</p>
<p><strong>Finally</strong>: the first two factors, combined, mean that the susceptible population can only decrease, and the recovered population can only increase. Since we also hold $\beta$ and $\gamma$ constant, this model of the pandemic will only have one peak. It will never predict periodic or seasonal resurgences of infection, like we see with the flu.</p>
<p><img src="/assets/blog/sir/CDC-influenza-pneumonia-deaths-2015-01-10.gif" alt="graph of flu deaths, 2010 - 2014" class="center" /></p>
<p style="text-align: center;"><em>A graph of flu deaths per week, peaking each winter, from the CDC. The vanilla SIR model will never produce this sort of periodic seasonal pattern.</em></p>
<p><img src="https://miro.medium.com/max/2000/1*ok3NLISRGvK-4SQyDA5KTg.png" alt="stylized graph of possible COVID-19 trajectories" class="center" style="width:500px;" /></p>
<p style="text-align: center;"><em>This green curve imagines a “dance” where we suppress coronavirus infections through an aggressive quarantine, and then spend months alternately relaxing the quarantine until infections get too high, and then tightening it again until infections fall back down. The SIR model doesn’t allow this sort of dynamic variation of $\beta$ and can never produce the green curve.</em></p>
<h3 id="the-whole-system">The Whole System</h3>
<p>If we put all this together we get a <em>system of ordinary nonlinear differential equations</em>. A differential equation is an equation that talks about how quickly something changes; in these equations, we have the rates at which the number of susceptible, infected, and recovered people change. “Ordinary” means that there’s only one input variable; all the parameters change with time, but we’re not taking location as an input or anything. “Nonlinear” means that our equations aren’t in a specific “linear” form that’s really easy to work with.</p>
<p><img src="/assets/blog/sir/13974391215433.jpg" alt="Photo of a Kitten" class="center" style="width:479px" /></p>
<p style="text-align: center"><em>Calling these equations a “nonlinear system” is a lot like calling this kitten a “nondog animal”. It’s not wrong, but it’s kind of weirdly specific if you’re not at a dog show.</em></p>
<p>If you took calculus, you might remember that we often write $\frac{dS}{dt}$ to mean the rate at which $S$ is changing over time. Roughly speaking, it’s the change in the total number of susceptible people over the course of a day. We know that $S$ is decreasing, since susceptible people get sick but we’re assuming that people don’t <em>become</em> susceptible, so $\frac{dS}{dt}$ is negative. And specifically, we worked out that $\frac{dS}{dt}$ is $-\beta \frac{IS}{N}$, since that’s the number of people who get sick each day.</p>
<p>Similarly, we saw that $\frac{dR}{dt}$ is $\gamma I$, the number of people who recover each day. And $\frac{dI}{dt}$ is the number of people who get sick minus the number who recover. All together this gives us:</p>
<p>\begin{align}
\frac{dS}{dt} & = - \beta \frac{IS}{N} \\\<br />
\frac{dI}{dt} &= \beta \frac{IS}{N} - \gamma I \\\<br />
\frac{dR}{dt} & = \gamma I
\end{align}</p>
<hr />
<h2 id="what-did-we-learn">What Did We Learn?</h2>
<p>Now that we have this model, what’s the point? We can actually do a few different things with a model like this. If we want, we can write down an <a href="https://arxiv.org/abs/1403.2160">exact formula</a> that tells us how many people will be sick on each day. Unfortunately, the exact formula isn’t actually all that helpful. The paper I linked includes lovely equations like</p>
<script type="math/tex; mode=display">z(\psi )= e^{-\mu\int_1^{\psi } \frac{ e^{\Psi (\xi )}}{\xi } \, d\xi } \left[\int_1^{\psi } e^{\Psi (\chi )+\mu\int_1^{\chi } \frac{ e^{\Psi (\xi )}}{\xi } \, d\xi } \, d\chi
-\int_1^{\gamma N_2} e^{\Psi (\chi )+\mu\int_1^{\chi } \frac{ e^{\Psi (\xi )}}{\xi } \, d\xi } \, d\chi +N_3 e^{\mu\int_1^{\gamma N_2} \frac{
e^{\Psi (\xi )}}{\xi } \, d\xi }\right].</script>
<p>And I don’t want to touch a formula that looks like that any more than you do.</p>
<p>Even if the formula were nicer, it wouldn’t be all that useful. Getting an exact solution to the equations doesn’t mean we know exactly how many people are going to get sick. Like all models, this one is a gross oversimplification of the real world. It’s not useful for making exact predictions; and if you want predictions that are <em>kinda</em> accurate, you should talk to the epidemiological experts, who have much more complicated models and much better data.</p>
<h3 id="qualitative-judgments">Qualitative Judgments</h3>
<p>But this model does give us a qualitative sense of how epidemics progress. For instance, in the very early stages of the epidemic, almost everyone will be susceptible. So we can make a further simplifying assumption that $S = N, I = R =0$, and get the equation
<script type="math/tex">\frac{dI}{dt} = \beta I.</script>
This is <a href="https://jaydaigle.net/blog/a-neat-argument-for-the-uniqueness-of-e-x/">famously</a> the equation for <a href="https://en.wikipedia.org/wiki/Exponential_growth">exponential growth</a>. And indeed, graphs of new coronavirus infections seem to start nearly perfectly exponential.</p>
<p><img src="https://cdn.i24news.tv/uploads/49/ba/a9/51/db/2f/9b/b6/08/0e/96/64/95/71/70/7f/49baa951db2f9bb6080e96649571707f.png" alt="Comparison of reported Chinese cases with exponential curve" class="center" style="width:320px;" /></p>
<p style="text-align: center;"><em>This graph <a href="https://www.i24news.tv/en/news/international/asia-pacific/1580327226-analysis-at-current-rate-china-virus-could-infect-over-25-000-by-february">from I24 news</a> of reported infections in China almost perfectly matches the exponential curve.</em></p>
<p><img src="https://static01.nyt.com/images/2020/03/20/science/virus-log-chart-1584728689795/virus-log-chart-1584728689795-facebookJumbo.jpg" alt="Linear and logarithmic scale plots of US and Italian coronavirus cases" style="width:600px;" class="center" /></p>
<p style="text-align: center;"><em>This <a href="https://www.nytimes.com/2020/03/20/health/coronavirus-data-logarithm-chart.html">New York Times graph</a> shows the exponential curves in both the US and Italy on the left. The right-hand logarithmic plots look nearly like straight lines, which which also reflects the exponential growth pattern.</em></p>
<p>As the epidemic progresses, the numbers of infected and recovered people climb. Each sick person will infect fewer additional people, since more of the people they meet are immune. We can see this in the model: the number of people who get infected each day is $\beta \frac{S}{N} I$. After many people have gotten sick, $\frac{S}{N}$ goes down and so fewer people get infected for a given value of $I$.</p>
<p>The epidemic will peak when people are recovering at least as fast as they get sick. This happens when $\beta \frac{IS}{N} \leq \gamma I$, and thus when $S = \frac{\gamma}{\beta} N$. Remember that $\frac{\beta}{\gamma}$ was our magic number $R_0$, so by the peak of the epidemic, only one person out of every $R_0$ people will have avoided getting sick.</p>
<p>If the estimates of $R_0 \approx 2.5$ are correct, this would mean that the epidemic would peak when something like 60% of the population had gotten sick. And remember, that’s not the end of the epidemic; that’s just the worst part. It would slowly get weaker from that time on, until it eventually fizzles.</p>
<p>(These are <em>not predictions</em>, for many reasons. I’m not an epidemiologist. Any real epidemiologist would be using a much more sophisticated model than this one to try to make real predictions. Don’t pay attention to the specific numbers I use here. But you can get a qualitative sense of what changing these numbers would do—and have more context for understanding what the real experts tell you.)</p>
<p><img src="/assets/blog/sir/imperial_projections_chart.png" alt="Chart" class="center" style="width:448px;" /></p>
<p style="text-align: center"><em>Predictions from actual experts use a ton of data and consider a huge range of possibilities, and generally look like <a href="https://spiral.imperial.ac.uk:8443/handle/10044/1/77482">this table</a> from a team at Imperial College London.</em></p>
<h3 id="numeric-simulations">Numeric Simulations</h3>
<p>There’s one more thing that toy models like this can do. We can use them to run numeric simulations (using <a href="https://en.wikipedia.org/wiki/Euler_method">Euler’s method</a> or something similar). We can see what would happen under our assumptions, and how the results change if we vary those assumptions.</p>
<p>Below is some code for the SIR model written in SageMath. (I borrowed the code from <a href="https://sage.math.clemson.edu:34567/home/pub/161/">this page</a> at Clemson; I believe the code was written by <a href="http://people.oregonstate.edu/~medlockj/">Jan Medlock</a>.) I’ve primed it with $\gamma = .07$, which means that people are sick for two weeks on average, and $\beta = .2$, which gives us an $R_0$ of about $2.8$.</p>
<p>If you just click “Evaluate”, you’ll see what happens if we run this model using those values of $\beta$ and $\gamma$ over the next 400 days. It’s pretty grim; the epidemic peaks two months out with a sixth of the country sick at once (the red curve), and in six months well over 80% of the country has fallen ill at some point (the blue curve).<strong title=" Reminder: I don't believe that this will happen, for many reasons. And you shouldn't listen to me if I did. Numbers are for illustrative purposes only and should not be construed as epidemiological advice."><sup id="fnref:3"><a href="#fn:3" class="footnote">3</a></sup></strong></p>
<p>But with this widget you can play with those assumptions. What happens if we find a way to cure people faster, so $\gamma$ goes down? What if we lower $\beta$, by physical distancing or improved hygiene? The graph improves dramatically. And you can change up all the numbers if you want to. Play around, and see what you learn.</p>
<p>And stay safe out there.</p>
<div class="sage">
<script type="text/x-sage">
# Transmission rate
beta = 0.20
# Recovery rate
gamma = 0.07
# Population size
N = 300000000
# Initial infections
IInit = 100000
SInit = N - IInit
RInit = 0
R0 = beta / gamma
show(r'R_0 = %g' % R0)
# End time
tMax = 400
# Standard SIR model
def ODE_RHS(t, Y):
(S, I, R) = Y
dS = - beta * S * I / N
dI = beta * S * I / N - gamma * I
dR = gamma * I
return (dS, dI, dR)
# Set up numerical solution of ODE
solver = ode_solver(function = ODE_RHS,
y_0 = (SInit, IInit, RInit),
t_span = (0, tMax),
algorithm = 'rk8pd')
# Numerically solve
solver.ode_solve(num_points = 1000)
# Plot solution
show(
plot(solver.interpolate_solution(i = 0), 0, tMax, legend_label = 'S(t)', color = 'green')
+ plot(solver.interpolate_solution(i = 1), 0, tMax, legend_label = 'I(t)', color = 'red')
+ plot(solver.interpolate_solution(i = 2), 0, tMax, legend_label = 'R(t)', color = 'blue')
)
# code from https://sage.math.clemson.edu:34567/home/pub/161/
# Thanks to Jan Medlock
</script>
</div>
<p><em>Have a question about the SIR model? Have other good resources on this to point people at? Or did you catch a mistake? Tweet me <a href="https://twitter.com/profjaydaigle">@ProfJayDaigle</a> or leave a comment below.</em></p>
<p><em>And take care of yourself.</em></p>
<div class="footnotes">
<ol>
<li id="fn:1">
<p>Or people who are asymptomatic carriers. This model doesn’t worry about who actually gets a fever and starts coughing, just who carries the virus and can maybe infect others. <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
<li id="fn:2">
<p>If we’re being fancy, we say that the chance of getting sick is proportional to $\frac{I}{N}$ and that $\beta$ is the constant of proportionality. But if you’re not used to differential equations already I’m not sure that tells you very much. <a href="#fnref:2" class="reversefootnote">↩</a></p>
</li>
<li id="fn:3">
<p>Reminder: I don’t believe that this will happen, for many reasons. And you shouldn’t listen to me if I did. Numbers are for illustrative purposes only and should not be construed as epidemiological advice. <a href="#fnref:3" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>Jay DaigleFor some reason, a lot of people have gotten really interested in epidemiology lately. Myself included. Now, I'm not an epidemiologist. I don't study infectious diseases. But I do know a little about how mathematical models work, so I wanted to explain how one of the common, simple epidemiological models works. This model isn't anywhere near good enough to make concrete predictions about what's going to happen. But it _can_ give some basic intuition about how epidemics progress, and provide some context for what the experts are saying.Online Teaching in the Time of Coronavirus2020-03-14T00:00:00-07:002020-03-14T00:00:00-07:00https://jaydaigle.net/blog/online-teaching-in-the-time-of-coronavirus<p>I’ve been spending a lot of the past week looking at different options for transitioning my teaching online for the rest of the term. There are certainly people far more expert at online instruction than I am, but I wanted to share some of my thoughts and what I’ve found.</p>
<h2 id="handling-assignments">Handling Assignments</h2>
<h3 id="online-assignment-options">Online Assignment Options</h3>
<p>There are a lot of options for doing homework online. Many of these products (like WebAssign) have temporarily made everything freely available. I’m sure some of them are good, but I don’t know much about them.</p>
<p>This term I’ve been experimenting with using <a href="https://webwork.maa.org/">the MAA’s WeBWork system</a>, which has been going quite well. If you can administer your own server it’s completely free; if you can’t, the MAA will give you one trial class and then charge $200 per course you want to host. I don’t know how willing they are to start these up mid-semester, though. WeBWork is hardly a solution to everything, but it works very well for questions with numerical or algebraic answers.</p>
<p>(With WeBWork you can even give assignments that have to be completed inside a narrow window–say, an assignment that is only answerable between 2 and 3:30 on Thursday. So we could maybe use this to somewhat replace tests. Though again, not perfectly.)</p>
<h3 id="written-homework">Written Homework</h3>
<p>Of course, some assignments really need to include a written component. Written homework probably can just be photographed (or scanned) with a mobile phone; I expect most of our students have access to some sort of digital camera. I don’t know anything about the scanning apps but I know they exist. I have in fact graded photographed homework before, and my student graders have expressed a willingness to do this for the rest of the term.</p>
<p>We can also consider encouraging our students, especially in upper-division classes, to start using LaTeX for more assignments. That’s an unreasonable imposition on Calc 1 students but most of the people in the upper-level classes have probably been exposed to it, and it would make a lot of this much simpler. No scanning, no photographing, just emailing in PDFs.</p>
<h2 id="lectures-and-office-hours">Lectures and Office Hours</h2>
<p>I purchased a writing tablet for my computer. This is a peripheral that plugs into your computer and allows you to write/draw with a pen. I specifically ordered a Huion 1060 Plus, which gives a 10x6 writing area and <a href="https://amazon.com/gp/product/B01FTE9HS2/">goes for $70 on Amazon</a>. I haven’t gotten to test it yet, so don’t consider that quite a recommendation. The other thing that gets highly recommended is the <a href="https://amazon.com/Wacom-Drawing-Software-Included-CTL4100/dp/B079HL9YSF">Wacom Intuos</a>, which is supposed to be somewhat nicer but also gives a much smaller writing surface (something like 6x4), so if you write big this might not be comfortable.</p>
<p>I’ve been looking into options to stream lectures and other content. There are really two things I want to do here: the first is to have video conferences where I can stream lectures and share my screen to show written notes, LaTeX’d notes, Mathematica notebooks, etc. The second is to create a persistent space for student interactions. I’d like to create a space where even when I’m not “holding a lecture” or “having office hours”, my students can still ask questions—of each other and of me.</p>
<h3 id="discord">Discord</h3>
<p>I’ve been doing the second thing with Discord for my research group for the past year or so. It works pretty well. You create a room with a bunch of channels and all messages in a channel stay permanently (unless deleted by a moderator). You can scroll up to see what people have talked about in the past. Makes it great for students to have conversations with you and each other, and other students can see what happened in them. (There’s also a private messaging feature, of course.)</p>
<p>Discord is also good for voice calls, and has a screen sharing feature. Both of them worked very smoothly when I tried them, except the screensharing has some limitations that I believe are Linux-specific (in particular, in my multi-monitor setup I can share one window, or my entire desktop, but I can’t share exactly one monitor, which is something I would like to do). I’ve been in touch with <a href="http://www-personal.umich.edu/~speyer/">David Speyer</a>, who’s written up a bunch of thoughts about Discord <a href="https://academia.stackexchange.com/questions/145389/using-discord-to-support-online-teaching/145390#145390">here, with a basic tutorial for setting it up</a>.</p>
<p>One thing about discord that is both good and bad is that many of our students use it already. (It was designed for online videogame playing, and is now a widely used chat and voice program.) This is good because our students are already familiar with the program and how to use it. It may be bad because that means our students often already have screen names and identities on Discord that they may want to keep separated from their academic/professional personas. If we use some software they have not used before, they can create fresh accounts and keep their online personas appropriately segmented.</p>
<h3 id="oxys-suggestions-bluejeans-and-moodle">Oxy’s Suggestions: BlueJeans and Moodle</h3>
<p>My institution made some software recommendations. BlueJeans is the recommended videoconferencing software. I’ve played around with it a bit and it seems serviceable but not great. (Again, it has some specific issues with Linux that are more or less dealbreakers for me, as well.) One thing I miss from it is that it’s designed for video calls/conferences, but it doesn’t have the capacity to create a persistent chat room. So if I want that persistent interaction space, I’d need to use a second tool; I’d prefer to run everything on one platform if I can.</p>
<p>Moodle has a tool for creating chat rooms, but it’s <em>awful</em>. Do not want. It’s still a good place to post assignments and such if you don’t already have a place to post them and your institution uses Moodle. (If your institution uses some other learning management software, I can’t say much; Moodle is the only one I’ve ever used.)</p>
<h3 id="zoom-videoconferencing">Zoom Videoconferencing</h3>
<p>I’ve been leaning towards a videoconferencing solution called Zoom. The screensharing works great, and the recording feature works great. There’s an ability to create a shared whiteboard space, that I and students can both write on, which seems helpful for virtual office hours.</p>
<p>Zoom has the ability to create a persistent chatroom, and it worked very smoothly in some testing I did today with a couple of my undergraduates. (One of them reported that it “felt really slick”, which is a good sign; most of the experience was pretty seamless.) The videoconferencing can work without anyone making an account, I think, but the persistent chat room would require all our students to make (free) accounts. Anyone with a Gmail account can just log in with that, so that might not be a large barrier.</p>
<p>One major downside is that videoconferences are limited to 40 minutes. They’ve been relaxing this for schools and in affected areas, so I don’t know how much this would be in practice. But I also think we could just start again at the end of the 40 minute period if we needed to. (Or maybe just keep formal lectures below forty minutes; it’s hard to ask students to pay attention that long anyway. If you’re posting recorded video suggestions seem to be to keep them under ten minutes.)</p>
<h2 id="closing-thoughts">Closing thoughts</h2>
<p>There are a bunch of other resources floating around to help you; I’ve looked at several but unfortunately haven’t been keeping a list. But if you poke around on Twitter or elsewhere there are many people more informed than I am who will offer help!</p>
<p>I know the MAA has a <a href="https://twitter.com/mathcirque/status/1238119797747068929?s=09">recorded online chat on online teaching</a>, though I haven’t looked at it yet.</p>
<p>But the most important thing is not to get hung up on perfection. I didn’t plan to teach my courses remotely this term, and I’m sure they will suffer for lack of direct instructional contact. But that’s okay! And I’m going to be honest with my students about this.</p>
<p>This is a really unfortunate way to finish out the semester. It sucks. But I’m going to do what I can to make it only suck a medium amount. And I hope my students will bear with me and help to make this only medium suck.</p>
<p>We’ll get through this.</p>
<hr />
<p><em>I’d love to hear any ideas or feedback you have about moving to online instruction. And I’m happy to answer any questions I can—we’re in this together.</em> <em>Tweet me <a href="https://twitter.com/profjaydaigle">@ProfJayDaigle</a>, or leave a comment below.</em></p>Jay DaigleI’ve been spending a lot of the past week looking at different options for transitioning my teaching online for the rest of the term. There are certainly people far more expert at online instruction than I am, but I wanted to share some of my thoughts and what I’ve found.2019 Spring Class Reflections: Calculus2019-07-10T00:00:00-07:002019-07-10T00:00:00-07:00https://jaydaigle.net/blog/spring-2019-class-reflections-calculus<p>Now that the term is over, I want to reflect a bit on the courses I taught, what worked well, and what I might want to do differently next time. (Honestly, it probably would have been more useful to write this sooner after finishing the courses, when they were fresher in my mind. But I don’t have a time machine, so I can’t do much about that now.) In this post I’ll talk about my calculus class; I’ll try to write about the others soon.</p>
<h3 id="my-previous-course-design-had-limited-success">My previous course design had limited success</h3>
<p>Math 114 at Occidental is intended for students, usually freshmen, who have seen calculus before but haven’t mastered the material sufficiently to be ready for calculus 2. This has the advantage that everyone in the course is familiar with the basic ideas, and that I can sometimes reference ideas we haven’t talked about yet to help justify what we do in the early parts of the course. It also has the disadvantage that my students arrive with a lot of preconceptions and confusions about the subject.<strong title="And a lot of anxiety. After all, the typical student in this course took calculus in high school and then failed the AP exam; they've all had at least one not-great experience with the material."><sup id="fnref:anxiety"><a href="#fn:anxiety" class="footnote">1</a></sup></strong></p>
<p>It also means that we have extra time available to learn about extra topics that are interesting or useful or just help explain the ideas of calculus better, even if those topics aren’t really necessary to prepare for calculus 2.</p>
<p>In past years I had used this extra time to do the epsilon-delta definition of limits. I’m still proud of having successfully taught many freshmen to write clean epsilon-delta proofs. But over time I came to the conclusion that this wasn’t the best use of class time.</p>
<p>I had wanted the epsilon-delta proofs section to accomplish two things: help my students learn to write and reason more clearly, and give them a taste of what higher math was like. Neither of these goals were complete failures, but neither was really a success either.</p>
<ul>
<li>
<p>My students got better at writing proofs, but I don’t think they learned this in a way that transferred skills to their other writing and communication. Beginner proofs tend to be written in a very restrictive, formal organization, effectively following a template. This template looks like it does for a reason, and is useful as a baseline for people to grow from. But in practice my students were just repeating the template to me instead of growing beyond it, so I don’t think they were gaining much.</p>
</li>
<li>
<p>And my students got a taste of higher math, but I’m pretty sure it was an unfortunately bitter taste. Epsilon-delta proofs are actually pretty complicated things and especially hard for novice proof-writers to execute successfully, so they don’t make a great first experience in proofs.</p>
</li>
<li>
<p>Making things worse, it tends to be really unclear why we need to prove any of these things. Most of the limit facts that come up in a first calculus course are “obviously true,” and so the effort we’re putting in often doesn’t feel like it’s actually accomplishing anything.<strong title="This same problem arises even in upper-division analysis courses. My undergraduate analysis professor Sandy Grabiner used to say that the point of a first analysis course is to prove that most of the time, what happens is exactly what you would expect to happen, and the second analysis course starts talking about the exceptions. But we tend to hope that our upperclassmen math majors at least are willing to bear with us through the proofs by that point."><sup id="fnref:analysis"><a href="#fn:analysis" class="footnote">2</a></sup></strong> Proofs often come across as a particularly obnoxious hoop that I’m making my students jump through to satisfy some perverse math–professor urge. <a href="https://mathwithbaddrawings.com/2019/01/09/a-brief-case-against-limits/">Ben Orlin</a> makes this case pretty clearly: calculus 1 students haven’t run into any of the problems that epsilon-delta proofs were invented to solve, and so they seem like an unnecessary runaround.</p>
</li>
</ul>
<ul>
<li>Most of all, it actually took quite a lot of time to do this well! Getting freshmen with no proof experience to the point where they could mostly write epsilon-delta proofs took a good three weeks out of a thirteen-week course. That’s a huge chunk of the course, and needs to be accomplishing a lot to justify itself. An epsilon-delta approach to limits just wasn’t worth the time and effort we were putting into it.</li>
</ul>
<h3 id="an-approximate-approach">An approximate approach</h3>
<p>Over time I realized that my course had gotten less focused on using the formal limits ideas anyway. I had drifted more and more to talking about two big ideas once we got out of the limits section: models and approximation.</p>
<p><em>Models</em> are the big idea I’ve been thinking about lately.<strong title="You can read a lot more of my thoughts about this in my post on word problems at https://jaydaigle.net/blog/why-word-problems/, for instance."><sup id="fnref:word-problems"><a href="#fn:word-problems" class="footnote">3</a></sup></strong> On its own terms, math is a purely abstract enterprise; to use math to understand the world we need to have some model of how the world can be described mathematically. This modeling is a really important skill for any field where you’re expect to apply math to solve problems—and the same skills can help reason about situations with no explicit mathematical model.</p>
<p><em>Approximation</em> is the big idea of calculus. This is true on a surface level, where we can think of limits as taking an “infinitely good” approximation of the value of a function at a point, and derivatives are an approximation of the rate of change. But it’s also the case that many of the applications of calculus and especially of derivatives have to do with notions of approximation.</p>
<p>After some wrestling with both ideas, I decided to take the latter approach in this term’s course. It meshed well with the way I tend to think about the ideas in calculus 1, and the way I had been explaining them to students. So I reorganized my course into five sections.</p>
<ol>
<li><strong>Zero-order approximations:</strong> Continuity and limits. We can think of a continuous function as one where $f(a)$ is a good approximation of $f(x)$ when $x$ is close to $a$. A lot of the facts about limits we need to learn are answers to questions that arise naturally when we want to approximate various functions. And “discontinuities” make sense as “points where approximation is hard for some reason”.</li>
<li><strong>First-order approximations:</strong> Derivatives. We started with the linear approximation formula $f(x) = f(a) + m(x-a)$ and asked what value of $m$ would make this the best possible approximation. A little rearrangement gives the definition of derivative, but now that definition is the answer to a question, not a definition just dropped on our heads from the sky. We want to be able to compute derivatives <em>so that</em> we can approximate functions easily, and as a bonus we can reinterpret all of this geometrically, in terms of the tangent line.</li>
<li><strong>Modeling:</strong> Word problems and differential equations. We reinterpret the derivative a third time as an answer to the problem of average versus “instantaneous” speed, and then as the answer to all sorts of concrete “rate of change” problems. We can talk about the idea of differential equations, and practice turning descriptions of situations into toy mathematical models with derivatives. We can’t solve these equations explicitly without integrals, but we can <em>approximate</em> solutions using Euler’s method, and get a good definition of the function $e^x$ in the bargain. Implicit derivatives and related rates also show up here, using derivatives in a different type of model.</li>
<li><strong>Inverse Problems:</strong> Inverse functions and antiderivatives. We take all the questions we’ve asked and turn them around. We define inverse functions, especially the logarithm and inverse trig functions, and use the inverse function theorem to find their derivatives. We can use the intermediate value theorem and Newton’s method to approximate the solutions to equations. We finish by defining the antiderivative as the (not-quite) inverse of the derivative.</li>
<li><strong>Second-Order approximations:</strong> The second derivative allows us to find the best <em>quadratic</em> approximation to a given function. This is a natural setting for thinking about extreme value problems, so we cover all the optimization topics, along with Rolle’s theorem and the mean value theorem, and then put all this information together to sketch graphs of functions. We finished up with brief explanations of Taylor series and of imaginary numbers.</li>
</ol>
<h3 id="most-of-it-worked-pretty-well">Most of it worked pretty well.</h3>
<p>This course was basically successful, but there are lots of ways to improve it. I think my students both had a more comfortable experience and gained a much better understanding of some of the core ideas of calculus, especially the basic idea of linear approximation.</p>
<p>The first section, on limits, was okay. It’s still a little awkward, and I’m tempted to Ben’s approach of starting with derivatives entirely. But I really liked the way it started, with making the point that $\sqrt{5}$ is “about 2”. This simplest-possible-approximation made a good anchor for the course, and helps reinforce the sort of basic numeracy that helps us understand basically any numerical information we learn. I still need to do a bit more work on the logical flow and transitions, and the idea of limits at infinity is important but doesn’t sit in here entirely comfortably.</p>
<p>The section on derivatives and first-order approximations worked wonderfully. This is the section that contains many of the ideas driving this course approach, and I’ve used many of them before, so it makes sense that this worked well.</p>
<p>The section on inverse functions again worked pretty well. It’s pretty easy to justify “solving equations” to students in a math class, and “this equation is too hard so let’s find a way to avoid solving it” is pretty compelling.</p>
<p>And finally the section on the second-order stuff felt pretty strong as well, but could still be improved. While in my head I have a clear picture relating “approximation with a parabola” to the maxima and minima of a function, I don’t know that it came across clearly in the class. And I was feeling a little time pressure by this point; I really wish I had had an extra couple of days of class time.</p>
<h3 id="modeling-is-hard">Modeling is hard</h3>
<p>But the section on modeling needs a lot of work. A lot of the ideas that I wanted to include in here aren’t things I’ve ever taught before, so the material is still a little rough. I also got really sick right when this section was starting, so my preparation probably wasn’t as good as it could have been.</p>
<p>In particular, I wasn’t very satisfied with the section on describing real-world situations in terms of models, and coming up with differential equations. I showed a bunch of examples but don’t know that we really got a clear grasp on the underlying principles as a class. And my homework questions on this modeling process probably contained a bit too much “right answer and wrong answer” for a topic that’s as inherently fuzzy as modeling.</p>
<p>I’m toying with the idea of assigning some problems where I ask students to <em>argue</em> for some modeling choices they make—handle it less like there’s one correct model, and more like there are a bunch of defensible choices. But I don’t know how well I can get that to fit in to the calculus class and the framework of first- or second-order ODEs.<strong title="It probably doesn't help that I never actually studied ODEs in any way, so I don't have many of my own examples to draw on."><sup id="fnref:4"><a href="#fn:4" class="footnote">4</a></sup></strong> (Maybe I should do some modeling that doesn’t involve derivatives since understanding modeling is a goal on its own.)</p>
<p>I also wish I could fit the mean value theorem into the discussion of speed, but proving it really requires a lot of ideas I wanted to hold off on until later. Maybe I should state and explain it here, but then prove it later when the proof comes up for other reasons.</p>
<p>One thing I <em>did</em> really like in this section is the way I introduced the exponential $e^x$ as the solution to the initial value problem $y’ = y, y(0)=1$. This makes $e$ seem less like a number we made up to torture math students, and more like the answer to a question people would reasonably ask again.</p>
<h3 id="final-thoughts">Final thoughts</h3>
<p>Overall, I feel pretty good about this redesign. I’m definitely not going back to the epsilon-delta definitions for this course any time soon, and I think this course will be really strong with a bit of work.</p>
<p>But there are a lot of ideas in the modeling topic that are important but that I don’t quite feel like I’m doing justice to yet. I need to go over that section carefully and figure out how to improve it.</p>
<p>I’m also thinking about moving <em>some</em> of my homework to an online portal. If we take all the “compute these eight derivatives” questions and have them automatically graded, I can use scarce human-grading time to give thorough comments on some more interesting conceptual questions.</p>
<p>To anyone who’s read this entire post, I’d love your feedback—on the course design as a whole, and on how to fix some of the problems I ran into. And if anyone is curious how I handled things, I’d be happy to share my course materials. You can find most of them <a href="https://jaydaigle.net/teaching/courses/2019-spring-114/">on the course page</a> but I’m happy to talk or share more if you’re interested!</p>
<hr />
<p><em>Have ideas about this course plan? Have questions about why I did things?</em> <em>Tweet me <a href="https://twitter.com/profjaydaigle">@ProfJayDaigle</a> or leave a comment below, and let me know!</em></p>
<div class="footnotes">
<ol>
<li id="fn:anxiety">
<p>And a lot of anxiety. After all, the typical student in this course took calculus in high school and then failed the AP exam; they’ve all had at least one not-great experience with the material. <a href="#fnref:anxiety" class="reversefootnote">↩</a></p>
</li>
<li id="fn:analysis">
<p>This same problem arises even in upper-division analysis courses. My undergraduate analysis professor Sandy Grabiner used to say that the point of a first analysis course is to prove that most of the time, what happens is exactly what you would expect to happen, and the second analysis course starts talking about the exceptions. But we tend to hope that our upper-classmen math majors at least are willing to bear with us through the proofs by that point. <a href="#fnref:analysis" class="reversefootnote">↩</a></p>
</li>
<li id="fn:word-problems">
<p>You can read a lot more of my thoughts about this in my <a href="https://jaydaigle.net/blog/why-word-problems/">post on word problems</a>, for instance. <a href="#fnref:word-problems" class="reversefootnote">↩</a></p>
</li>
<li id="fn:4">
<p>It probably doesn’t help that I never actually studied ODEs in any way, so I don’t have many of my own examples to draw on. <a href="#fnref:4" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>Jay DaigleNow that the term is over, I want to reflect a bit on the courses I taught, what worked well, and what I might want to do differently next time. (Honestly, it probably would have been more useful to write this sooner after finishing the courses, when they were fresher in my mind. But I don’t have a time machine, so I can’t do much about that now.) In this post I’ll talk about my calculus class; I’ll try to write about the others soon.An Overview of Bayesian Inference2019-02-20T00:00:00-08:002019-02-20T00:00:00-08:00https://jaydaigle.net/blog/overview-of-bayesian-inference<p>A few weeks ago I <a href="https://jaydaigle.net/blog/paradigms-and-priors/">wrote about Kuhn’s theory of paradigm shifts</a> and how it relates to Bayesian inference. In this post I want to back up a little bit and explain what Bayesian inference is, and eventually rediscover the idea of a paradigm shift just from understanding how Bayesian inference works.</p>
<p>Bayesian inference is important in its own right for many reasons beyond just improving our understanding of philosophy of science. Bayesianism is at its heart an extremely powerful mathematical method of using evidence to make predictions. Almost any time you see anyone making predictions that involve probabilities—whether that’s a projection of election results like the ones from <a href="https://fivethirtyeight.com/">FiveThirtyEight</a>, a prediction for the results of a big sports game, or just a weather forecast telling you the chances of rain tomorrow—you’re seeing the results of a Bayesian inference.</p>
<p>Bayesian inference is also the foundation of many machine learning and artificial intelligence tools. Amazon wants to predict how likely you are to buy things. Netflix wants to predict how likely you are to like a show. Image recognition programs want to predict whether that picture contains a bird. And self-driving cars want to predict whether they’re going to crash into that wall.</p>
<p>You’re using tools based on Bayesian inference every day, and probably at this very moment.<strong title="I'm old enough to remember the late nineties, when spam was such a big problem that email became almost unusable. These days when I complain about email spam it's usually my employer sending too many messages out through internal mailing lists; but there was a period in the nineties when for every legitimate email you'd get four or five filled with links to pr0n sites or trying to sell you v1@gr@ and c1@lis CHEAP!!! It was a major problem. Entire conferences were held on developing methods to defeat the spam problem. These days I see about one true spam message like that per _year_. And one major reason for that is the invention of effective spam filters using Bayesian inference to predict whether a given email is spam or legitimate. So you're using Bayesian tools right now purely by _not_ receiving dozens of unwanted pornographic pictures in your email inbox every day."><sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup></strong> So it’s worth understanding how they work.</p>
<hr />
<p>The basic idea of Bayesian inference is that we start with some <em>prior probability</em> that describes what we originally believe the world is like in terms of probability, by specifying the probabilities of various things happening. Then we make observations of the world, and update our beliefs, giving our conclusion as a <em>posterior probability</em>.</p>
<p>As a really simple example: suppose I tell you I’ve flipped a coin, but I don’t tell you how it landed. Your prior is probably a 50% chance that it shows heads, and a 50% chance that it shows tails. After you get to look at the coin, you update your prior beliefs to reflect your new knowledge. Your posterior probability says there is a 100% chance that it shows heads and a 0% chance that it shows tails.<strong title="This particular example is far too simple to really be worth setting up the Bayesian framework, but it gives a pretty direct and explicit demonstration of what all the pieces mean."><sup id="fnref:2"><a href="#fn:2" class="footnote">2</a></sup></strong></p>
<p>The rule we use to update our beliefs is called <a href="https://en.wikipedia.org/wiki/Bayes_theorem">Bayes’s Theorem</a> (hence the name “Bayesian inference”). Specifically, we use the mathematical formula
\[
P(H |E) = \frac{ P(E|H) P(H)}{P(E)},
\]
where</p>
<ul>
<li>$H$ is some hypothesis we had—some thing we thought might maybe happen—and $P(H)$ is how likely we originally thought that hypothesis was.</li>
<li>$E$ is the <em>evidence</em> we just observed, and $P(E)$ is how likely we originally thought we were to see that evidence.</li>
<li>$P(E|H)$ is the most complicated bit to explain. It tells us, if we assume that our hypothesis $H$ is true, how likely we originally thought seeing the evidence $E$ would be. So it tells us what we would have thought <em>before</em> seeing the new evidence, if we had assumed the hypothesis $H$ was true.</li>
<li>$P(H|E)$ is the new, updated, posterior probability we give to the hypothesis $H$, <em>after</em> seeing the evidence $E$.</li>
</ul>
<p>Let’s work through a quick example. Suppose I have a coin, and you think that there’s a 50% chance it’s a fair coin, and a 50% chance that it actually has two heads. So we have $P(H_{fair}) = .5$ and $P(H_{unfair}) = .5$.</p>
<p>Now you flip the coin ten times, and it comes up heads all ten times. If the coin is fair, this is pretty unlikely! The probability of that happening is $\frac{1}{2}^{10} = \frac{1}{1024}$, so we have $P(E|H_{fair}) = \frac{1}{1024}$. But if the coin is two-headed, this will definitely happen; the probability of getting ten heads is 100%, or $1$. So when you see this, you probably conclude that the coin is unfair.</p>
<p>Now let’s work through that same chain of reasoning algebraically. If the coin is fair, the probability of seeing ten heads in a row is $\frac{1}{2^{10}} = \frac{1}{1024}$. And if the coin is unfair, the probability is 1. So if we think there’s a 50% chance the coin is fair, and a 50% chance it’s unfair, then the overall probability of seeing ten heads in a row is
\begin{align}
P(H_{fair}) \cdot P(E | H_{fair}) + P(H_{unfair}) \cdot P(E | H_{unfair}) \\\ = .5 \cdot \frac{1}{1024} + .5 \cdot 1 = \frac{1025}{2048} \approx .5005.
\end{align}</p>
<p>By Bayes’s Theorem, we have
\begin{align}
P(H_{fair} | E) &= \frac{ P(E | H_{fair}) P(H_{fair})}{P(E)} \\<br />
& = \frac{ \frac{1}{1024} \cdot .5}{\frac{1025}{2048}} = \frac{1}{1025} \\<br />
P(H_{unfair} | E) & = \frac{ P(E | H_{unfair}) P(H_{unfair})}{P(E)} \\<br />
&= \frac{1 \cdot \frac{1}{2}}{\frac{1025}{2048}} = \frac{1024}{1025}.
\end{align}
Thus we conclude that the probability the coin is fair is $\frac{1}{1025} \approx .001$, and the probability it is two-headed is $\frac{1024}{1025} \approx .999$. This matches what our intuition tells us: if it comes up ten heads in a row, it probably isn’t fair.</p>
<hr />
<p>But let’s tweak things a bit. Suppose I have a table with a thousand coins, and I tell you that all of them are fair except one two-headed one. You pick one at random, flip it ten times, and see ten heads. Now what do you think?</p>
<p>You have exactly the same <em>evidence</em>, but now your prior is different. Your prior tells you that $P(H_{fair}) = \frac{999}{1000}$ and $P(H_{unfair}) = \frac{1}{1000}$. We can do the same calculations as before. We have
\begin{align}
P(H_{fair}) \cdot P(E | H_{fair}) + P(H_{unfair}) \cdot P(E | H_{unfair}) \\<br />
= \frac{999}{1000} \cdot \frac{1}{1024} + \frac{1}{1000} \cdot 1
\approx .00198
\end{align}</p>
<p>\begin{align}
P(H_{fair} | E) &= \frac{ P(E | H_{fair}) P(H_{fair})}{P(E)} \\<br />
& = \frac{ \frac{1}{1024} \cdot \frac{999}{1000}}{.00198} \approx .494 \\<br />
P(H_{unfair} | E) & = \frac{ P(E | H_{unfair}) P(H_{unfair})}{P(E)} \\<br />
&= \frac{1 \cdot \frac{1}{1000}}{.00198} \approx .506.
\end{align}
So now you should think it’s about equally likely that your coin is fair or unfair. <strong title="The exact probabilities are 999/2023 and 1024/2023. As a bonus, try to see why having some of those exact numbers makes sense, and reassures us that we did this right."><sup id="fnref:3"><a href="#fn:3" class="footnote">3</a></sup></strong></p>
<p>Why does this happen? If you have a fair coin, then seeing ten heads in a row is pretty unlikely. But having an unfair coin is <em>also</em> unlikely, because of the thousand coins you could have picked, only one was unfair. In this example those two unlikelinesses cancel out almost exactly, leaving us uncertain whether you got a (normal) fair coin and then a surprisingly unlikely result, or if you got a surprisingly unfair coin and then the normal, expected result.</p>
<p>In other words, you should definitely be somewhat surprised to see ten heads in a row. Remember, we worked out that your prior probability of seeing <em>that</em> is just $P(E) \approx .00198$—less than two tenths of a percent! But there are two different ways to get that unusual result, and you don’t know which of those unusual things happened.</p>
<hr />
<p>Bayesian inference also does a good job of handling evidence that disproves one of your hypotheses. Suppose you have the same prior we were just discussing: $999$ fair coins, and one two-headed coin. What happens if you flip the coin once and it comes up <em>tails</em>?</p>
<p>Informally, we immediately realize that we can’t be flipping a two-headed coin. It came up tails, after all. So how does this work out in the math?</p>
<p>If the coin is fair, we have a $50\%$ chance of getting tails, and a $50\%$ chance of getting heads. If the coin is unfair, we have a $0\%$ chance of tails and a $100\%$ chance of heads. So we compute:
\begin{align}
P(H_{fair}) \cdot P(E | H_{fair}) + P(H_{unfair}) \cdot P(E | H_{unfair}) \\<br />
= \frac{999}{1000} \cdot \frac{1}{2} + \frac{1}{1000} \cdot 0
= \frac{999}{2000}
\end{align}</p>
<p>\begin{align}
P(H_{fair} | E) &= \frac{ P(E | H_{fair}) P(H_{fair})}{P(E)} \\<br />
& = \frac{ \frac{1}{2} \cdot \frac{999}{1000}}{\frac{999}{2000}} = 1 \\<br />
P(H_{unfair} | E) & = \frac{ P(E | H_{unfair}) P(H_{unfair})}{P(E)} \\<br />
&= \frac{0 \cdot \frac{1}{1000}}{\frac{999}{2000}} = 0.
\end{align}</p>
<p>Thus the math agrees with us: once we see a tails, the probability that we’re flipping a two-headed coin is zero.</p>
<hr />
<p>As long as everything behaves well, we can use these techniques to update our beliefs. In fact, this method is pretty powerful. We can prove that it is the best possible decision rule according to a few different sets of criteria<strong title="There are two really important results that occur to me. Cox's Theorem gives a collection of reasonable-sounding conditions, and proves that Bayesian inference is the only possible rule that satisfies them all. Dutch Book Arguments show that this inference rule protects you from making a collection of bets which are guaranteed to lose you money."><sup id="fnref:4"><a href="#fn:4" class="footnote">4</a></sup></strong>; and there are pretty good guarantees about eventually converging to the right answer after collecting enough evidence.</p>
<p>But there are still a few ways Bayesian inference can go wrong.</p>
<p>What if you get tails and keep flipping the coin—and get ten tails in a row? We’ll still draw the same conclusion: the coin can’t be double-headed, so it’s definitely fair. (You can work through the equations on this if you like; they’ll look just like the last computation I did, but longer). And if we keep flipping and get a thousand tails in a row, or a million, our computation will still tell us yes, the coin is definitely fair.</p>
<p>But before we get to a million flips, we might start suspecting, pretty strongly, that the coin is <em>not</em> fair. When it comes up tails a thousand times in a row, we probably suspect that in fact the coin has two tails. <strong title="No, you can't just check this by looking at the coin. Because I said so. More seriously, it's pretty common to have experiments where you can see the results, but can't inspect the mechanism by which those results are reached. In a particle collider you can see the tracks of exiting particles, but you can't actually observe the collision. In an educational study, you can look at students' test results, but you can't look inside their brains and observe exactly when the learning happens. So it's useful for this thought experiment to assume we can see how the coin lands, but can never look at both sides at the same time."><sup id="fnref:5"><a href="#fn:5" class="footnote">5</a></sup></strong> So why doesn’t the math reflect this at all?</p>
<p>In this case, we made a mistake at the very beginning. Our prior told us that there was a $99.9\%$ chance we had a fair coin, and a $.1\%$ chance that we had a coin with two heads. And that means that our prior left no room for the possibility that our coin did anything else. We said our prior was
\[
P(H_{fair}) = \frac{999}{1000} \qquad P(H_{unfair}) = \frac{1}{1000};
\]
but we really should have said
\[
P(H_{fair}) = \frac{999}{1000} \qquad P(H_{two\ heads}) = \frac{1}{1000} \qquad P(H_{two\ tails}) = 0.
\]
And since we started with the belief that a two-tailed coin was <em>impossible</em>, no amount of evidence will cause us to change our beliefs. Thus Bayesian inference follows the old rule of Sherlock Holmes: “when you have excluded the impossible, whatever remains, however improbable, must be the truth.”</p>
<hr />
<p>This example demonstrates both the power and the problems of doing Bayesian inference. The power is that it reflects what we already know. If something is known to be quite rare, then we probably didn’t just encounter it. (It’s more likely that I saw a random bear than a sasquatch—and that’s true even if sasquatch exist, since bear sightings are clearly more common). And if something is outright impossible, we don’t need to spend a lot of time thinking about the implications of it happening.</p>
<p>The problem is that in pure Bayesian inference, you’re trapped by your prior. If your prior thinks the “true” hypothesis is possible, then eventually, with enough evidence, you will conclude that the true hypothesis is extremely likely. But if your prior gives no probability to the true hypothesis, then no amount of evidence can ever change your mind. If we start out with $P(H) = 0$, then it is mathematically impossible to update your prior to believe that $H$ is possible.</p>
<p>But Douglas Adams neatly explained the flaw in the Sherlock Holmes principle in the voice of his character Dirk Gently:</p>
<blockquote>
<p>The impossible often has a kind of integrity to it which the merely improbable lacks. How often have you been presented with an apparently rational explanation of something that works in all respects other than one, which is that it is hopelessly improbable?…The first idea merely supposes that there is something we don’t know about, and God knows there are enough of those. The second, however, runs contrary to something fundamental and human which we do know about. We should therefore be very suspicious of it and all its specious rationality.</p>
</blockquote>
<p>In real life, when we see something we had thought was extremely improbable, we often reconsider our beliefs about what is possible. Maybe there’s some possibility we had originally dismissed, or not even considered, that makes our evidence look reasonable or even likely; and if we change our prior to include that possibility, suddenly our evidence makes sense. This is the “paradigm shift” I talked about in my <a href="https://jaydaigle.net/blog/paradigms-and-priors/">recent post on Thomas Kuhn</a>, and extremely unlikely evidence, like our extended series of tails, is a Kuhnian anomaly.</p>
<p>But rethinking your prior isn’t really allowed by the mathematics and machinery of Bayesian inference—it’s something <em>else</em>, something <em>outside</em> of the procedure, that we do to cover for the shortcomings of unaugmented Bayesianism.</p>
<hr />
<p>Let’s return to the coin-flipping thought experiment; there’s one other way it can go wrong that I want to tell you about. Suppose you fix your prior to acknowledge the possibility that is two-headed <em>or</em> two-tailed. (We could even set up our prior to include the possibility that the coin is two-sided but biased— so that the coin comes up head 70% of the time, say. I’m going to ignore this case completely because it makes the calculations a lot more complicated and doesn’t actually clarify anything. But it’s important that we <em>can</em> do that if we want to).<strong title="Gelman and Nolan have argued that it's not physically possible to bias a coin flip in this way. This is arguably another reason to ignore the possibility that a coin is biased. And if you believe Gelman and Nolan's argument, then you _should_ have a low or zero prior probability that the coin is biased. But the actual reason I'm ignoring it is to avoid computing integrals in public."><sup id="fnref:6"><a href="#fn:6" class="footnote">6</a></sup></strong></p>
<p>You assign the prior probabilities
\[
P(H_{fair}) = \frac{98}{100} \qquad P(H_{two\ heads}) = \frac{1}{100} \qquad P(H_{two\ tails}) = \frac{1}{100},
\]
giving a 1% chance of each possible double-sided coin. (This is a higher chance than you gave it before, but clearly when I give you these coins I’ve been messing with you, so you should probably be less certain of everything). You flip the coin.</p>
<p><a href="https://youtu.be/M0I-xm7iCBU?t=15">And it lands on its edge.</a></p>
<p>What does our rule of inference tell us now? We can try to do the same calculations we did before. The first thing we need to calculate is $P(E)$, which is easy. We started out by assuming this couldn’t happen, so the prior probability of seeing the coin landing on its side is zero!</p>
<p>(Algebraically, a fair coin has a 50% chance of heads and a 50% chance of tails. So if the coin is fair, then $P(E|H_{fair}) = 0$. But if the coin has a 100% chance of heads, then $P(E| H_{two\ heads}) = 0$. And if the coin has a 100% chance of tails, then $P(E| H_{two\ tails}) = 0$. Thus
\begin{align}
P(E) &= P(E|H_{fair}) \cdot P(H_{fair}) + P(E|H_{two\ heads}) \cdot P(H_{two\ heads}) + P(E|H_{two\ heads}) \cdot P(H_{two\ heads}) \\<br />
& = 0 \cdot \frac{98}{100} + 0 \cdot \frac{1}{100} + 0 \cdot \frac{1}{100} = 0.
\end{align}
So we conclude that $P(E) = 0$).</p>
<p>Now we can actually calculate our new, updated, posterior probabilities—or can we? We have the formula that
\[
P(H_{fair} | E) = \frac{ P(E | H_{fair}) P(H_{fair})}{P(E)}.
\]
But with the probabilities we just calculated, this works out to
\[
P(H_{fair} | E) = \frac{ 0 \cdot \frac{98}{100}}{0} = \frac{0}{0}.
\]
And our calculation has broken down completely; $\frac{0}{0}$ isn’t a <em>number</em>, let alone a useful probability.</p>
<p>Even more so than the last example, this is a serious Kuhnian anomaly. If we ever try to update and get $\frac{0}{0}$ as a response, something has gone wrong. We had said that something was totally impossible, and then it happened. All we can do is back up and choose a new prior.</p>
<p>And Bayesian inference can’t tell us how to do that.</p>
<p>There are a few different ways people try to get around this problem. But that’s another post.</p>
<hr />
<p><em>Questions about this post? Was something confusing or unclear? Or are there other things you want to know about Bayesian reasoning?</em> <em>Tweet me <a href="https://twitter.com/profjaydaigle">@ProfJayDaigle</a> or leave a comment below, and let me know!</em></p>
<div class="footnotes">
<ol>
<li id="fn:1">
<p>I’m old enough to remember the late nineties, when spam was such a big problem that email became almost unusable. These days when I complain about email spam it’s usually my employer sending too many messages out through internal mailing lists; but there was a period in the nineties when for every legitimate email you’d get four or five filled with links to pr0n sites or trying to sell you v1@gr@ and c1@lis CHEAP!!! It was a major problem. Entire conferences were held on developing methods to defeat the spam problem.</p>
<p>These days I see about one true spam message like that per <em>year</em>. And one major reason for that is the invention of effective spam filters using Bayesian inference to predict whether a given email is spam or legitimate. So you’re using Bayesian tools right now purely by <em>not</em> receiving dozens of unwanted pornographic pictures in your email inbox every day. <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
<li id="fn:2">
<p>This particular example is far too simple to really be worth setting up the Bayesian framework, but it gives a pretty direct and explicit demonstration of what all the pieces mean. <a href="#fnref:2" class="reversefootnote">↩</a></p>
</li>
<li id="fn:3">
<p>The exact probabilities are 999/2023 and 1024/2023. As a bonus, try to see why having some of those exact numbers makes sense, and reassures us that we did this right. <a href="#fnref:3" class="reversefootnote">↩</a></p>
</li>
<li id="fn:4">
<p>I’m primarily thinking of two really important results here. <a href="https://en.wikipedia.org/wiki/Cox's_theorem">Cox’s Theorem</a> gives a collection of reasonable-sounding conditions, and proves that Bayesian inference is the only possible rule that satisfies them all. <a href="https://plato.stanford.edu/entries/dutch-book/">Dutch Book Arguments</a> show that this inference rule protects you from making a collection of bets which are guaranteed to lose you money. <a href="#fnref:4" class="reversefootnote">↩</a></p>
</li>
<li id="fn:5">
<p>No, you can’t just check this by looking at the coin. Because I said so.</p>
<p>More seriously, it’s pretty common to have experiments where you can see the results, but can’t inspect the mechanism by which those results are reached. In a particle collider you can see the tracks of exiting particles, but you can’t actually observe the collision. In an educational study, you can look at students’ test results, but you can’t look inside their brains and observe exactly when the learning happens. So it’s useful for this thought experiment to assume we can see how the coin lands, but can never look at both sides at the same time. <a href="#fnref:5" class="reversefootnote">↩</a></p>
</li>
<li id="fn:6">
<p>Gelman and Nolan have argued that it’s <a href="https://www.tandfonline.com/doi/abs/10.1198/000313002605">not physically possible to bias a coin flip in this way</a>. This is arguably another reason to ignore the possibility that a coin is biased. And if you believe Gelman and Nolan’s argument, then you <em>should</em> have a low or zero prior probability that the coin is biased. But the actual reason I’m ignoring it is to avoid computing integrals in public. <a href="#fnref:6" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>Jay DaigleA few weeks ago I wrote about Kuhn’s theory of paradigm shifts and how it relates to Bayesian inference. In this post I want to back up a little bit and explain what Bayesian inference is, and eventually rediscover the idea of a paradigm shift just from understanding how Bayesian inference works.Paradigms and Priors2019-01-15T00:00:00-08:002019-01-15T00:00:00-08:00https://jaydaigle.net/blog/paradigms-and-priors<p>Scott Alexander at <a href="https://slatestarcodex.com/">Slate Star Codex</a> has been blogging lately about Thomas Kuhn and the idea of paradigm shifts in science. This is a topic near and dear to my heart, so I wanted to take the opportunity to share some of my thoughts and answer some questions that Scott asked in his posts.</p>
<h3 id="the-big-idea">The Big Idea</h3>
<p>I’m going to start with my own rough summary of what I take from Kuhn’s work. But since this is all in response to Scott’s <a href="https://slatestarcodex.com/2019/01/08/book-review-the-structure-of-scientific-revolutions/">book review of <em>The Structure of Scientific Revolutions</em></a>, you may want to read his post first.</p>
<p>The main idea I draw from Kuhn’s work is that science and knowledge aren’t only, or even primarily, of a collection of facts. Observing the world and incorporating evidence is <em>important</em> to learning about the world, but evidence can’t really be interpreted or used without a prior framework or model through which to interpret it. For example, check out <a href="https://twitter.com/OrbenAmy/status/1084856550383149057">this Twitter thread</a>: researchers were able to draw thousands of different and often mutually contradictory conclusions from a single data set by varying the theoretical assumptions they used to analyze it.</p>
<p>Kuhn also provided a response to <a href="https://en.wikipedia.org/wiki/Falsifiability#Falsificationism">Popperian falsificationism</a>. No theory can ever truly be falsified by observation, because you can force almost any observation to match most theories with enough special cases and extra rules added in. And it’s often quite difficult to tell whether a given extra rule is an important development in scientific knowledge, or merely motivated reasoning to protect a familiar theory. After all, if you claim that objects with different weights fall at the same speed, you then have to explain why that doesn’t apply to bowling balls and feathers.</p>
<p>This is often described as the <em>theory-ladenness of observation</em>. Even when we think directly perceiving things, those perceptions are always mediated by our theories of how the world works and can’t be fully separated from them. This is most obvious when engaging in a complicated indirect experiment: there’s a lot of work going on between “I’m hearing a <a href="https://en.wikipedia.org/wiki/Geiger_counter">clicking sound</a> from this thing I’m holding in my hand” and “a bunch of atoms just ejected alpha particles from their nuclei”.</p>
<p>But even in more straightforward scenarios, any inference comes with a lot of theory behind it. I drop two things that weigh different amounts, and see that the heavier one falls faster—proof that Galileo was wrong!</p>
<p>Or even more mundanely: I look through my window when I wake up, see a puddle, and conclude that it rained overnight. Of course I’m relying on the assumption that when I look through my window I actually see what’s on the other side of it, and not, say, a clever science-fiction style holoscreen. But more importantly, my conclusion that it rained depends on a lot of assumptions I normally wouldn’t explicitly mention—that rain would leave a puddle, and that my patio would be dry if it hadn’t rained.</p>
<p>(In fact, I discovered several months after moving in that my air conditioner condensation tray overflows on hot days. So the presence of puddles doesn’t actually tell me that it rained overnight).</p>
<p>Even direct perception, what we can see right in front of us, is mediated by internal modeling our brains do to put our observations into some comprehensible context. This is why optical illusions work so well; they hijack the modeling assumptions of your perceptual system to make you “see” things that aren’t there.</p>
<p style="text-align:center"><img src="/assets/blog/scintillating-grid-illusion.png" alt="An example of the Scintillating Grid illusion." /></p>
<p style="text-align:center"><em>There are <a href="https://en.wikipedia.org/wiki/Grid_illusion#Scintillating_grid_illusion">no black dots in this picture</a>.</em> <br />
<em>Who are you going to believe: me, or your own eyes?</em></p>
<hr />
<h3 id="what-does-this-tell-us-about-science">What does this tell us about science?</h3>
<p>Kuhn divides scientific practice into three categories. The first he calls pre-science, where there is no generally accepted model to interpret observations. Most of life falls into this category—which makes sense, because most of life isn’t “science”. Subjects like history and psychology with multiple competing “schools” of thought are pre-scientific, because while there are a number of useful and informative models that we can use to understand parts of the subject, no single model provides a coherent shared context for all of our evidence. There is no unifying consensus perspective that basically explains everything we know.</p>
<p>A model that does achieve such a coherent consensus is called a <em>paradigm</em>. A paradigm is a theory that explains all the known evidence in a reasonable and satisfactory way. When there is a consensus paradigm, Kuhn says that we have “normal science”. And in normal science, the idea that scientists are just collecting more facts actually makes sense. Everyone is using the same underlying theory, so no one needs to spend time arguing about it; the work of science is just to collect more data to interpret within that theory.</p>
<p>But sometimes during the course of normal science you find <em>anomalies</em>, evidence that your paradigm can’t readily explain. If you have one or two anomalies, the best response is to assume that they really are anomalies—there’s something weird going on there, but it isn’t a problem for the paradigm.</p>
<p>A great example of an unimportant anomaly is the <a href="https://en.wikipedia.org/wiki/OPERA_experiment">OPERA experiment</a> from a few years ago that measured neutrinos traveling faster than the speed of light. This meant one of two things: either special relativity, a key component of the modern physics paradigm, was wrong; or there was an error somewhere in a delicate measurement process. Pretty much everyone assumed that the measurement was flawed, and pretty much everyone was right.</p>
<p>In contrast, sometimes the anomalies aren’t so easy to resolve. Scientists find more and more anomalies, more results that the dominant paradigm can’t explain. It becomes clear the paradigm is flawed, and can’t provide a satisfying explanation for the evidence. At this point people start experimenting with other models, and with luck, eventually find something new and different that explains all the evidence, old and new, normal and anomalous. A new paradigm takes over, and normal science returns.</p>
<p>(Notice that the old paradigm was never <em>falsified</em>, since you can always add epicycles to make the new data fit. In fact, the proverbial “epicycles” were added to the Ptolemaic model of the solar system to make it fit astronomical observations. In the early days of the Copernican model, it actually fit the evidence worse than the Ptolemaic model did—but it didn’t require the convoluted epicycles that made the Ptolemaic model work. Sabine Hossenfelder describes this process as, not falsification, but “implausification”: “a continuously adapted theory becomes increasingly difficult and arcane—not to say ugly—and eventually practitioners lose interest.”)</p>
<p>Importantly, Kuhn argued that two different paradigms would be <em>incommensurable</em>, so different from each other that communication between them is effectively impossible. I think this is sometimes overblown, but also often underestimated. Imagine trying to explain a modern medical diagnosis to someone who believes in four humors theory. Or remember how difficult it is to have conversations with someone whose politics are very different from your own; the background assumptions about how the world works are sometimes so different that it’s hard to agree even on basic facts.<strong title="If you're interested in the political angle on this more than the scientific, check out the talk I gave at TedxOccidentalCollege last year at https://www.youtube.com/watch?v=aTSrHfv9C94."><sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup></strong></p>
<h3 id="scotts-example-questions">Scott’s example questions</h3>
<p>Now I can turn to the very good questions Scott asks in section II of his book review.</p>
<blockquote>
<p>For example, consider three scientific papers I’ve looked at on this blog recently….What paradigm is each of these working from?</p>
</blockquote>
<p>As a preliminary note, if we’re maintaining the Kuhnian distinction between a paradigm on the one hand and a model or a school of thought on the other, it is plausible that none of these are working in true paradigms. One major difficulty in many fields, especially the social sciences is that there isn’t a paradigm that unifies all our disparate strands of knowledge. But asking what possibly-incommensurable <em>model</em> or <em>theory</em> these papers are working from is still a useful and informative exercise.</p>
<p>I’m going to discuss the first study Scott mentions in a fair amount of depth, because it turned out I had a lot to say about it. I’ll follow that up by making briefer comments on his other two examples.</p>
<h4 id="cipriani-ioannidis-et-al">Cipriani, Ioannidis, et al.</h4>
<blockquote>
<p>– Cipriani, Ioannidis, et al perform a meta-analysis of antidepressant effect sizes and find that although almost all of them seem to work, amitriptyline works best.</p>
</blockquote>
<p>This is actually a great example of some of the ways paradigms and models shape science. The study is a meta-analysis of various antidepressants to assess their effectiveness. So what’s the underlying model here?</p>
<p>Probably the best answer is: “depression is a real thing that can be caused or alleviated by chemicals”. Think about how completely incoherent this entire study would seem to a Szasian who thinks that mental illnesses are just choices made by people with weird preferences, to a medieval farmer who thinks mental illnesses are caused by demonic possession, or to a natural-health advocate who thinks that “chemicals” are bad for you. The medical model of mental illness is powerful and influential enough that we often don’t even notice we’re relying on it, or that there are alternatives. But it’s not the only model that we could use.<strong title="In fact, this was my third or fourth answer in the first draft of this section. Then I looked at it again and realized it was by far the _best_ answer. That's how paradigms work: as long as everything is functioning normally, you don't even have to think about the fact that they're there."><sup id="fnref:2"><a href="#fn:2" class="footnote">2</a></sup></strong></p>
<hr />
<p>While this is the best answer Scott’s question, it’s not the only one. When Scott <a href="https://slatestarcodex.com/2018/02/26/ssc-journal-club-cipriani-on-antidepressants/">originally wrote about this study</a> he compared it to one he had done himself, which got very different results. Since they’re (mostly) studying the same drugs, in the same world, they “should” get similar results. But they don’t. Why not?</p>
<p>I’m not in any position to actually answer that question, since I don’t know much about psychiatric medications. But I <em>can</em> point out one very plausible reason: the studies made different modeling assumptions. And Scott highlights some of these assumptions himself in his analysis. For instance, he looks at the way Cipriani et al. control for possible bias in studies:</p>
<blockquote>
<p>I’m actually a little concerned about the exact way he did this. If a pharma company sponsored a trial, he called the pharma company’s drug’s results biased, and the comparison drugs unbiased….</p>
</blockquote>
<blockquote>
<p>But surely if Lundbeck wants to make Celexa look good [relative to clomipramine], they can either finagle the Celexa numbers upward, finagle the clomipramine numbers downward, or both. If you flag Celexa as high risk of being finagled upwards, but don’t flag clomipramine as at risk of being finagled downwards, I worry you’re likely to understate clomipramine’s case.</p>
</blockquote>
<blockquote>
<p>I make a big deal of this because about a dozen of the twenty clomipramine studies included in the analysis were very obviously pharma companies using clomipramine as the comparison for their own drug that they wanted to make look good; I suspect some of the non-obvious ones were too. If all of these are marked as “no risk of bias against clomipramine”, we’re going to have clomipramine come out looking pretty bad.</p>
</blockquote>
<p>Cipriani et al. had a model for which studies were producing reliable data, and fed it into their meta-analysis. Notice they aren’t denying or ignoring the numbers that were reported, but they <em>are</em> interpreting them differently based on background assumptions they have about the way studies work. And Scott is disagreeing with those assumptions and suggesting a different set of assumptions instead.</p>
<p>(For bonus points, look at <em>why</em> Scott flags this specific case. Cipriani et al. rated clomipramine badly, but Scott’s experience is that clomipramine is quite good. This is one of Kuhn’s paradigm-violating anomalies: the model says you should expect one result, but you observe another. Sometimes this causes you to question the observation; sometimes a drug that “everyone knows” is great actually doesn’t do very much. But sometimes it causes you to question the model instead.)</p>
<p>Scott’s model here isn’t really incommensurable with Cipriani et al.’s in a deep sense. But the difference in models does make <em>numbers</em> incommensurable. An odds ratio of 1.5 means something very different if your model expects it to be biased downwards than it does if you expect it to be neutral—or biased upwards. You can’t escape this sort of assumption just by “looking at the numbers”.</p>
<p>And this is true even though Scott and Cipriani et al. are largely working with the same sorts of models. They both believe in the medical model of mental illness. Their paradigm does include the idea that randomized controlled trials work, as Scott suggests in his piece. A bit more subtly, their shared paradigm also includes whatever instruments they use to measure antidepressant effectiveness. Since Cipriani et al. is actually a meta-analysis, they don’t address this directly. But each study they include is probably using some sort of questionnaire to assess how depressed people are. The numbers they get are only coherent or meaningful at all if you think that questionnaire is measuring something you care about.</p>
<hr />
<p>There’s one more paradigm choice here that I want to draw attention to, because it’s important, and because I know Scott is interested in it, and because we may be in the middle of a moderate paradigm shift right now.</p>
<p>Studies this one tend to assume that a given drug will work about the same for everyone. And then people find that no antidepressant works consistently for everyone, and they all have small effect sizes, and conclude that maybe antidepressants aren’t very useful. But that’s hard to square with the fact that people regularly report massive benefits from going on antidepressants. We found an anomaly!</p>
<p>A number of researchers, including Scott himself, have suggested that any given person will respond well to some antidepressants and poorly to others. So when a study says that bupropion (or whatever) has a small effect on average, maybe that doesn’t mean bupropion isn’t helping anyone. Maybe instead it’s helping some people quite a lot, and it’s completely useless for other people, and so on average its effect is small but positive.</p>
<p>But this is a completely different way of thinking clinically and scientifically about these drugs. And it potentially undermines the entire idea behind meta-analyses like Cipriani et al. If our data is useless because we’re doing too much averaging, then averaging all our averages together isn’t really going to help. Maybe we should be doing something entirely different. We just need to figure out what.</p>
<h4 id="ceballos-ehrlich-et-al">Ceballos, Ehrlich et al.</h4>
<blockquote>
<p>– Ceballos, Ehrlich, et al calculate whether more species have become extinct recently than would be expected based on historical background rates; after finding almost 500 extinctions since 1900, they conclude they definitely have.</p>
</blockquote>
<p>I actually think Scott mostly answers his own questions here.</p>
<blockquote>
<p>As for the extinction paper, surely it can be attributed to some chain of thought starting with Cuvier’s catastrophism, passing through Lyell, and continuing on to the current day, based on the idea that the world has changed dramatically over its history and new species can arise and old ones disappear. But is that “the” paradigm of biology, or ecology, or whatever field Ceballos and Lyell are working in? Doesn’t it also depend on the idea of species, a different paradigm starting with Linnaeus and developed by zoologists over the ensuing centuries? It look like it dips into a bunch of different paradigms, but is not wholly within any.</p>
</blockquote>
<p>The paper is using a model where</p>
<ul>
<li>Species is a real and important distinction;</li>
<li>Species extinction is a thing that happens and matters;</li>
<li>Their calculated background rate for extinction is the relevant comparison.</li>
</ul>
<p>(You can in fact see a lot of their model/paradigm come through pretty clearly in the “Discussion” section of the paper— which is good writing practice.)</p>
<p>Scott seems concerned that it might dip a whole bunch of paradigms, but I don’t think that’s really a problem. Any true unifying paradigm will include more than one big idea; on the other hand, if there isn’t a true paradigm, you’d expect research to sometimes dip into multiple models or schools of thought. My impression is that biology is closer to having a real paradigm than not, but I can’t say for sure.</p>
<h4 id="terrell-et-al">Terrell et al.</h4>
<blockquote>
<p>– Terrell et al examine contributions to open source projects and find that men are more likely to be accepted than women when adjusted for some measure of competence they believe is appropriate, suggesting a gender bias.</p>
</blockquote>
<p>Social science tends to be less paradigm-y than the physical sciences, and this sort of politically-charged sociological question is probably the least paradigm-y of all, in that there’s no well-developed overarching framework that can be used to explain and understand data. If you can look at a study and know that people will immediately start arguing about what it “really means”, there’s probably no paradigm.</p>
<p>There is, however, a model underlying any study like this, as there is for any sort of research. Here I’d summarize it something like:</p>
<ul>
<li>Gender is an interesting and important construct;</li>
<li>Acceptance rates for pull requests are a measure of (perceived) code quality;</li>
<li>Their program that evaluated “obvious gender cues” does a good job of evaluating gender as perceived by other GitHub users;</li>
<li>The “insider versus outsider” measure they report is important;</li>
<li>The confounders they check are important, and the confounders they don’t check aren’t.</li>
</ul>
<p>Basically, any time you get to do some comparisons and not others, or report some numbers and not others, you have to fall back on a model or paradigm to tell you which comparisons are actually important. Without some guiding model, you’d just have to report every number you measured in a giant table.</p>
<p>Now, sometimes people actually do this. They measure a whole bunch of data, and then they try to correlate everything with everything else, and see what pops up. This is <a href="https://statmodeling.stat.columbia.edu/2017/01/30/no-guru-no-method-no-teacher-just-nature-garden-forking-paths/">not usually good research practice</a>.</p>
<p>If you had exactly this same paper except, instead of “men and women” it covered “blondes and brunettes”, you’d probably be <em>able</em> to communicate the content of the paper to other people; but they’d probably look at you kind of funny, because why would that possibly matter?</p>
<h3 id="anomalies-and-bayes">Anomalies and Bayes</h3>
<p>Possibly the most interesting thing Scott has posted is his <a href="https://slatestarcodex.com/2019/01/10/paradigms-all-the-way-down/">Grand Unified Chart</a> relating Kuhnian theories to related ideas in other disciplines. The chart takes the Kuhnian ideas of “paradigm”, “data”, and “anomaly” and identifies equivalents from other fields. (I’ve flipped the order of the second and third columns here). In political discourse Scott relates them to “ideology”, “facts”, and “cognitive dissonance”; in psychology he relates them to “prediction”, “sense data”, and “surprisal”.</p>
<p>In the original version of the chart, several entries in the “anomalies” column were left blank. He has since filled some of them in, and removed a couple of other rows. I think his answer for the “Bayesian probability” row is wrong; but I think it’s interestingly wrong, in a way that effectively illuminates some of the philosophical and practical issues with Bayesian reasoning.</p>
<p>A quick informal refresher: in Bayesian inference, we start with some <em>prior probability</em> that describes what we originally believe the world is like, by specifying the probabilities of various things happening. Then we make observations of the world, and update our beliefs, giving our conclusion as a <em>posterior probability</em>.</p>
<p>The rule we use to update our beliefs is called <a href="https://en.wikipedia.org/wiki/Bayes_theorem">Bayes’s Theorem</a> (hence the name “Bayesian inference”). Specifically, we use the mathematical formula
\[
P(H |E) = \frac{ P(E|H) P(H)}{P(E)},
\]
where $P$ is the probability function, $H$ is some hypothesis, and $E$ is our new evidence.</p>
<p>I have often drawn the same comparison Scott draws between a Kuhnian paradigm and a Bayesian prior. (They’re not exactly the same, and I’ll come back to this in a bit). And certainly Kuhnian “data” and Bayesian “evidence” correspond pretty well. But the Bayesian equivalent of the Kuhnian anomaly isn’t really the KL-divergence that Scott suggests.</p>
<p>KL-divergence is mathematical way to measure how far apart two probability distributions are. So it’s an appropriate way to look at two priors and tell how different they are. But you never directly observe a probability distribution—just a collection of data points—so KL-divergence doesn’t tell you how surprising your data is. (Your prior does that on its own).</p>
<p>But “surprising evidence” isn’t the same thing as an anomaly. If you make a new observation that was likely under your prior, you get an updated posterior probability and everything is fine. And if you make a new observation that was unlikely under your prior, you get an updated posterior probability and everything is fine. As long as the true<strong title=""True" isn't really the most accurate word to use here, but it works well enough and I want to avoid another thousand-word digression on the subject of metaphysics."><sup id="fnref:3"><a href="#fn:3" class="footnote">3</a></sup></strong> hypothesis is in your prior at all, you’ll converge to it with enough evidence; that’s one of the great strengths of Bayesian inference. So even a very surprising observation doesn’t force you to rethink your model.</p>
<p>In contrast, if you make a new observation that was <em>impossible</em> under your prior, you hit a literal divide-by-zero error. If your prior says that $E$ can’t happen, then you can’t actually carry out the Bayesian update calculation, because Bayes’s rule tells you to divide by $P(E)$—which is zero. And this is the Bayesian equivalent of a Kuhnian anomaly.</p>
<p>We can imagine a <a href="https://en.wikipedia.org/wiki/Liar!_(short_story)">robot in an Asimov short story</a> encountering this situation, trying to divide by zero, and crashing fatally. But people aren’t quite so easy to crash, and an intelligently designed AI wouldn’t be either. We can do something that a simple Bayesian inference algorithm doesn’t allow: we can invent a new prior and start over from the beginning. We can shift paradigms.</p>
<hr />
<p>A theoretically perfect Bayesian inference algorithm would start with a <em>universal prior</em>—a prior that gives positive probability to every conceivable hypothesis and every describable piece of evidence. No observation would ever be impossible under the universal prior, so no update would require division by zero.</p>
<p>But it’s easier to talk about such a prior than it is to actually come up with one. The usual example I hear is the <a href="https://en.wikipedia.org/wiki/Solomonoff_induction">Solomonoff prior</a>, but it is known to be uncomputable. I would guess that any useful universal prior would be similarly uncomputable. But even if I’m wrong and a theoretically computable universal prior exists, there’s definitely no way we could actually carry out the infinitely many computations it would require.</p>
<p>Any practical use of Bayesian inference, or really any sort of analysis, has to restrict itself to considering only a few classes of hypotheses. And that means that sometimes, the “true” hypothesis <em>won’t be in your prior</em>. Your prior gives it a zero probability. And that means that as you run more experiments and collect more evidence, your results will look weirder and weirder. Eventually you might get one of those zero-probability results, those anomalies. And then you have to start over.</p>
<p>A lot of the work of science—the “normal” work—is accumulating more evidence and feeding it to the (metaphorical) Bayesian machine. But the most difficult and creative part is coming up with <em>better hypotheses</em> to include in the prior. Once the “true” hypothesis is in your prior, collecting more evidence will drive its probability up. But you need to add the hypothesis to your prior first. And that’s what a paradigm shift looks like.</p>
<hr />
<p>It’s important to remember that this is an analogy; a paradigm isn’t exactly the same thing as a prior. Just as “surprising evidence” isn’t an anomaly, two priors with slightly different probabilities put on some hypotheses aren’t operating in different paradigms.</p>
<p>Instead, a paradigm comes <em>before</em> your prior. Your paradigm tells you what counts as a hypothesis, what you should include in your prior and what you should leave out. You can have two different priors in the same paradigm; you can’t have the same prior in two different paradigms. Which is kind of what it means to say that different paradigms are incommensurable.</p>
<p>This is probably the biggest weakness of Bayesian inference, in practice. Bayes gives you a systematic way of evaluating the hypotheses you have based on the evidence you see. But it doesn’t help you figure out what sort of hypotheses you should be considering in the first place; you need some theoretical foundation to do that.</p>
<p>You need a paradigm.</p>
<hr />
<p><em>Have questions about philosophy of science? Questions about Bayesian inference? Want to tell me I got Kuhn completely wrong? Tweet me <a href="https://twitter.com/profjaydaigle">@ProfJayDaigle</a> or leave a comment below, and let me know!</em></p>
<div class="footnotes">
<ol>
<li id="fn:1">
<p>If you’re interested in the political angle on this more than the scientific, check out the <a href="https://www.youtube.com/watch?v=aTSrHfv9C94">talk I gave at TedxOccidentalCollege last year</a>. <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
<li id="fn:2">
<p>In fact, this was my third or fourth answer in the first draft of this section. Then I looked at it again and realized it was by far the <em>best</em> answer. That’s how paradigms work: as long as everything is working normally, you don’t even have to think about the fact that they’re there. <a href="#fnref:2" class="reversefootnote">↩</a></p>
</li>
<li id="fn:3">
<p>"True" isn’t really the most accurate word to use here, but it works well enough and I want to avoid another thousand-word digression on the subject of metaphysics. <a href="#fnref:3" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>Jay DaigleScott Alexander at Slate Star Codex has been blogging lately about Thomas Kuhn and the idea of paradigm shifts in science. This is a topic near and dear to my heart, so I wanted to take the opportunity to share some of my thoughts and answer some questions that Scott asked in his posts.Numerical Semigroups and Delta Sets2019-01-05T00:00:00-08:002019-01-05T00:00:00-08:00https://jaydaigle.net/blog/numerical-semigroups<p>In this post I want to outline my main research project, which involves non-unique factorization in numerical semigroups. I’m going to define semigroups and numerical semigroups; explain what non-unique factorization means; define the invariant I study, called the delta set; and talk about some of the specific questions I’m interested in.</p>
<h3 id="semigroups">Semigroups</h3>
<p>A <em>semigroup</em> is a set $S$ with one associative operation. This really just means we have a set of things, and some way of combining any two of them to get another. Semigroups generalize the more common idea of a <em>group</em>, which has an identity and inverses in addition to the associative operation. Every group is also a semigroup, but not every semigroup is a group.<strong title="There is also something called a "monoid", which has an identity element but no inverses; thus every group is a monoid and every monoid is a semigroup. The presence of an identity element doesn't actually matter for any of the questions we're asking, so researchers use the terms "semigroup" and "monoid" more or less interchangeably."><sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup></strong></p>
<p>The simplest example of a semigroup is the natural numbers $\mathbb{N}$, with the operation of addition: we can add any two natural numbers together, but without negative numbers we don’t have any way to subtract, which would be an inverse. This is the free semigroup on one generator, which means we can get every element by starting with $1$ and adding it to itself some number of times.</p>
<p>Other examples of semigroups are:</p>
<ul>
<li>$\mathbb{N}^n, +$: ordered $n$-tuplets of natural numbers.</li>
<li>$\mathbb{N}, \times$: the natural numbers using multiplication as the operation. This has infinitely many generators, since we need to start with every prime number to get every possible natural number.</li>
<li>String Concatenation: we can take our set to be the set of all strings of English letters, and we combine two strings by just sticking the second one after the first.</li>
<li><a href="http://mathworld.wolfram.com/BlockMonoid.html">Block Monoids</a> are semigroups whose elements are lists of group elements that mulitiply out to zero under the operation of concatenation.</li>
</ul>
<p>Numerical semigroups, which are the main object I study, are formally defined as sub-semigroups of the natural numbers but that phrase doesn’t actually explain a lot if you’re not already familiar with the field. However, I can explain what they actually are them much less technically and more simply.</p>
<h3 id="numerical-semigroups">Numerical Semigroups</h3>
<p>We can define the numerical semigroup generated by $a_1, \dots, a_k$ to be the set of integers
\[
\langle a_1, \dots, a_k \rangle = {n_1 a_1 + \dots + n_k a_k : n_i \in \mathbb{Z}_{\geq 0} }.
\]
In other words, our semigroup is the set of all the numbers you can get by adding up the generators some number of times, but without allowing subtraction.</p>
<p>I like to think about the <a href="https://arxiv.org/abs/1709.01606">Chicken McNugget semigroup</a> to explain this. When I was a kid, at McDonald’s you could get a 4-piece, 6-piece, or 9-piece order of Chicken McNuggets.<strong title="For some reason, they switched over to 4-, 6-, and 10-piece orders when I was a teenager. That semigroup is much less interesting, so I'm going to pretend that never happened."><sup id="fnref:2"><a href="#fn:2" class="footnote">2</a></sup></strong> And then we can ask: which numbers of nuggets is it possible to order?</p>
<p>You certainly can’t order one, two, or three nuggets. You can order four, but not five. You can order six, but not seven. You can get eight by ordering two 4-pieces, nine by ordering one 9-piece, and ten by ordering a 4-piece and a 6-piece. There’s no way to order exactly eleven nuggets, and it turns out we can get any number of nuggets past that exactly. (This makes eleven the <a href="https://en.wikipedia.org/wiki/Coin_problem">Frobenius number</a> for this semigroup). We can summarize all this in the table below:</p>
<p>\[
\begin{array}{cc}
1 & \text{not possible} \\\<br />
2 & \text{not possible} \\\<br />
3 & \text{not possible} \\\<br />
4 & = 1 \cdot 4 + 0 \cdot 6 + 0 \cdot 9 \\\<br />
5 & \text{not possible} \\\<br />
6 & = 0 \cdot 4 + 1 \cdot 6 + 0 \cdot 9 \\\<br />
7 & \text{not possible} \\\<br />
8 & = 2 \cdot 4 + 0 \cdot 6 + 0 \cdot 9 \\\<br />
9 & = 0 \cdot 4 + 0 \cdot 6 + 1 \cdot 9 \\\<br />
10 & = 1 \cdot 4 + 1 \cdot 6 + 0 \cdot 9 \\\<br />
11 & \text{not possible} \\\<br />
12 & = 3 \cdot 4 + 0 \cdot 6 + 0 \cdot 9 \\\<br />
& = 0 \cdot 4 + 2 \cdot 6 + 0 \cdot 9 \\\<br />
13 & = 1 \cdot 4 + 0 \cdot 6 + 1 \cdot 9
\end{array}
\]</p>
<p>Looking at this table you might notice something else: there are two rows for the number 12, because we can order 12 nuggets in two different ways: we can order three 4-piece orders, or two 6-piece orders. We call each of these ways of ordering twelve nuggets a <em>factorization</em> of 12 with respect to the generators $4,6,9$. And not only do we have two different factorizations of 12; they actually have different numbers of factors!</p>
<p>If we look at larger numbers, the variety in factorizations becomes far greater. Consider this table of ways to factor 36:
\[
\begin{array}{cc}
\text{factorization} & \text{length} \\\<br />
9 \cdot 4 + 0 \cdot 6 + 0 \cdot 9 & 9 \\\<br />
6 \cdot 4 + 2 \cdot 6 + 0 \cdot 9 & 8 \\\<br />
3 \cdot 4 + 4 \cdot 6 + 0 \cdot 9 & 7 \\\<br />
3 \cdot 4 + 1 \cdot 6 + 2 \cdot 9 & 6 \\\<br />
0 \cdot 4 + 6 \cdot 6 + 0 \cdot 9 & 6 \\\<br />
0 \cdot 4 + 3 \cdot 6 + 2 \cdot 9 & 5 \\\<br />
0 \cdot 4 + 0 \cdot 6 + 4 \cdot 9 & 4
\end{array}
\]
We have seven distinct ways we can factor 12. The shortest has four factors and the longest has nine; every length in between is represented.</p>
<p>From here we can ask a number of questions. How many ways can we order a given number of chicken nuggets? How many different lengths can these factorizations have? What patterns can we find?</p>
<p>All this is very different from what we’re used to. When we factor integers into prime numbers, the <a href="https://en.wikipedia.org/wiki/Fundamental_theorem_of_arithmetic">Fundamental Theorem of Arithmetic</a> tells us that there is a unique way to do this. We generally learn this in grade school, and so from a very young age we’re used to having only one way to factor things. But this unique factorization property isn’t universal, and it doesn’t apply here.</p>
<p>Numerical semigroups essentially never have unique factorization. But we want to find ways to measure how not-unique their factorization is.</p>
<h3 id="the-delta-set">The Delta Set</h3>
<p>In my research I study something called the <em>delta set</em> of a semigroup. The delta set is a way of measuring how complicated the relationships among different factorizations can get.</p>
<p>For an element $x$ in a semigroup, we can look at all the factorizations of $x$, and then we can look at all the possible lengths of these factorizations. (In our example above, we had $\mathbf{L}(36) = \{4,5,6,7,8,9\}$; we don’t repeat the $6$ because we only care about which lengths are possible, and not how many times they occur). Then we can ask a bunch of questions about these sets of lengths.</p>
<p>A simple thing to compute is the <em>elasticity</em> of an element, which is just the ratio of the longest factorization to the shortest, and tells you how much the lengths can vary. (The elasticity of $36$ is $9/4$). A good exercise is to convince yourself that the largest elasticity of any element in a semigroup is the ratio of the largest generator to the smallest generator. (And thus that $36$ has the maximum possible elasticity for $\langle 4, 6, 9 \rangle$).</p>
<p>The delta set is a bit more complicated. The delta set of $x$ is the set of successive differences in lengths. So instead of looking at the shortest and longest factorizations, we look at all of them, and see what sort of gaps show up. (For our example, the delta set is just $\Delta(36) = \{1\}$, since there’s a factorization of each length between $4$ and $9$. If the set of lengths were $\{3,5,8,15\}$ then the delta set would be $\{2,3,7\}$).</p>
<p>We want to understand the whole semigroup, not just individual elements. So we often want to talk about the delta set of an entire semigroup, which is just the union of the delta sets of all the elements. So $\Delta(S)$ tells us what kind of gaps can appear in <em>any</em> set of lengths for any element of the semigroup. It turns out that for the Chicken McNugget semigroup $S = \langle 4,6,9 \rangle$, the delta set is just $\Delta(S) = \{1\}$. This means that the delta set of any element is just $\{1\}$, and thus that every set of lengths is a set of consecutive integers $\{n,n+1, \dots, n+k \}$.</p>
<h3 id="what-do-we-know">What Do We Know?</h3>
<p>Delta sets can be a little tricky to compute. It’s fairly easy to show a number <em>is</em> in the delta set of a semigroup: find an element, calculate all the factorization lengths, and see that you have a gap of the desired size. But to show that a number is not in the delta set of the semigroup, you have to show that it isn’t in the delta set of any element, which is much trickier.</p>
<p>However, there are a few things we do know.</p>
<ul>
<li>
<p>The smallest element of the delta set is the greatest common divisor of the other elements of the delta set. This means that $\{2,3\}$ can’t be the delta set of any semigroup, since $2$ isn’t the GCD of $2$ and $3$.</p>
</li>
<li>
<p>If $S = \langle a, b \rangle$ is generated by exactly two elements, then $\Delta(S) = \{b - a\}$. More generally, if $S = \langle a, a+d, a+2d, \dots, a+kd \rangle$ then $\Delta(S) = \{d\}$. (We call such semigroups “arithmetic semigroups” since their generating set is an <a href="https://en.wikipedia.org/wiki/Arithmetic_progression">arithmetic progression</a>).</p>
</li>
<li>
<p>For any numerical semigroup $S$, there is a finite collection of (computable) elements called the <em>Betti elements</em>, and the maximum element of the delta set of $S$ is in the delta set of at least one of the Betti elements.</p>
</li>
<li>
<p>Finally and most importantly, the delta set is eventually periodic. This means that if you check the delta sets for a (possibly large but known) number of elements of the semigroup, you will see everything you can possibly see. This makes it possible to compute the delta set of any given semigroup and know you haven’t left anything out. <strong title="This result was originally proven by Scott Chapman, Rolf Hoyer, and Nathan Kaplan in 2008, during an undergraduate REU research program I was also participating in. But the original result had an unfortunately large bound, so using this to compute delta sets wasn't really practically feasible. In 2014, a paper by J. I. García-García, M. A. Moreno-Frías, and A. Vigneron-Tenorio improved the bound dramatically and made computation of delta sets feasible on personal computers."><sup id="fnref:3"><a href="#fn:3" class="footnote">3</a></sup></strong></p>
</li>
</ul>
<p>But this is nearly everything that we really know about delta sets. There are a lot of open questions left, which primarily fall into two categories:</p>
<ol>
<li>
<p>For some nice category of semigroup, compute the delta set. We’ve already seen this question answered for semigroups generated by arithmetic sequences; we also have complete or partial answers for semigroups generated by <a href="https://pdfs.semanticscholar.org/a100/c2ba10554d593c6f7c59245f330117d5d2c6.pdf">generalized arithmetic sequences</a>, geometric sequences, and <a href="https://arxiv.org/abs/1503.05993">compound sequences</a>.</p>
</li>
<li>
<p>The <em>realization problem</em>: given a set of natural numbers, is it the delta set of some numerical semigroup? We don’t actually know a lot about this. About the only thing that we know <em>can’t</em> happen is a minimum element that isn’t the GCD of the set. But to show that something <em>can</em> happen, about all we can do is find a specific semigroup that has that delta set. There’s a lot of room to explore here.</p>
</li>
</ol>
<h3 id="non-minimal-generating-sets">Non-Minimal Generating Sets</h3>
<p>In my research I introduce one more complication. Earlier we talked about the Chicken McNugget semigroup, of all the ways we can build orders out of 4, 6, or 9 chicken nuggets. But McDonald’s also offers a 20 piece order of chicken nuggets. <strong title="My parents would never let me order this when I was a child, and I'm still bitter."><sup id="fnref:4"><a href="#fn:4" class="footnote">4</a></sup> </strong></p>
<p>From a purely algebraic perspective, this doesn’t change anything. Anything we can get with 20 piece orders, we can get with a combination of 4 and 6 pieces, so we have the same set and the same operation, and thus the same semigroup. (We say that 20 isn’t “irreducible” because we can factor it into other simpler elements). So in this sense, nothing should change.</p>
<p>But the set of factorizations does change. If we replicate our earlier table of factorizations of 36 but now allow $20$ as a factor, we get
\[
\begin{array}{cc}
\text{factorization} & \text{length} \\\<br />
9 \cdot 4 + 0 \cdot 6 + 0 \cdot 9 + 0 \cdot 20 & 9 \\\<br />
6 \cdot 4 + 2 \cdot 6 + 0 \cdot 9 + 0 \cdot 20 & 8 \\\<br />
3 \cdot 4 + 4 \cdot 6 + 0 \cdot 9 + 0 \cdot 20 & 7 \\\<br />
3 \cdot 4 + 1 \cdot 6 + 2 \cdot 9 + 0 \cdot 20 & 6 \\\<br />
0 \cdot 4 + 6 \cdot 6 + 0 \cdot 9 + 0 \cdot 20 & 6 \\\<br />
0 \cdot 4 + 3 \cdot 6 + 2 \cdot 9 + 0 \cdot 20
& 5 \\\<br />
\color{blue}{4 \cdot 4 + 0 \cdot 6 + 0 \cdot 9 + 1 \cdot 20}
& \color{blue}{5} \\\<br />
0 \cdot 4 + 0 \cdot 6 + 4 \cdot 9 + 0 \cdot 20
& 4 \\\<br />
\color{blue}{1 \cdot 4 + 2 \cdot 6 + 0 \cdot 9 + 1 \cdot 20 }
& \color{blue}{4}
\end{array}
\]
The extra generator gives us the two additional factorizations in blue.</p>
<p>Now every question we asked about factorizations in numerical semigroups, we can ask again for factorizations with respect to our non-minimal generating set. For instance, we can ask for the delta set with respect to our generating set. For 36 above, we see that the delta set is still 1, just as it was before; nothing has changed.</p>
<p>But let’s look instead at the element 20. With our old generating set of $4,6,9$, we can only get 20 nuggets in two ways. But with our non-minimal generating set, we have three different ways to order 20 nuggets: $20 = 5 \cdot 4 = 2 \cdot 4 + 2 \cdot 6 = 1 \cdot 20$. These three “factorizations” have lengths 5, 4, and 1, and a little experimentation will convince you that they’re the only possible factorizations. Therefore our set of lengths is $\mathbf{L}(20) = \{1,4,5\}$ and the delta set is $\Delta(20) = \{1,3\}$.</p>
<p>This is a big change! With the original, minimal generating set, the delta set of the <em>entire semigroup</em> was ${1}$. There was no element with a length gap larger than 1. But by adding a new generator in, we can get an element whose delta set is ${1,3}$. And a little experimentation shows us that
\[
26 = 5 \cdot 4 + 1 \cdot 6 = 2 \cdot 4 + 3 \cdot 6
= 2 \cdot 4 + 2 \cdot 9 = 1 \cdot 6 + 1 \cdot 20
\]
and thus $\mathbf{L}(26) = \{2,4,5,6\}$ and $\Delta(26) = \{1,2\}$. So the delta set for the entire semigroup is $\{1,2,3\}$.<strong title="I haven't actually shown that you can't get a gap bigger than 3. But it's true."><sup id="fnref:5"><a href="#fn:5" class="footnote">5</a></sup></strong> We’ve gotten a different delta set for the exact same semigroup, but using a different set of generators.</p>
<p>This raises a number of questions for us to study. We can start with our previous two questions: given a semigroup (and a non-minimal set of generators), what is the delta set? And given a set, is it the delta set of some semigroup and non-minimal generating set? But we also have a new question: what happens to the delta set of a semigroup as we continually add things to the generating set? Can we make the delta set bigger? Can we make it smaller? What ways of adding generators produce interesting patterns?</p>
<p>There’s a lot of fertile ground here. A few questions have been answered already, in a <a href="https://www.tandfonline.com/doi/abs/10.1080/00927870903045165">paper I cowrote with Scott Chapman, Rolf Hoyer, and Nathan Kaplan in 2010</a>. For instance, it is always possible to force the delta set to be $\{1\}$ by adding more elements to the generating set. A couple other groups have done some work since then, but as far as I know, nothing else has been published.</p>
<p>But hopefully I’ve convinced you that there are quite a few interesting and unanswered questions in this field. Many of the answers should be accessible with a bit of work, and I hope to be able to provide some of them soon.</p>
<hr />
<p><em>Have a question about numerical semigroups? Factorization theory? My research? Tweet me <a href="https://twitter.com/profjaydaigle">@ProfJayDaigle</a> or leave a comment below.</em></p>
<hr />
<div class="footnotes">
<ol>
<li id="fn:1">
<p>There is also something called a “monoid”, which has an identity element but no inverses; thus every group is a monoid and every monoid is a semigroup. The presence of an identity element doesn’t actually matter for any of the questions we’re asking, so researchers use the terms “semigroup” and “monoid” more or less interchangeably. <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
<li id="fn:2">
<p>For some reason, they switched over to 4-, 6-, and 10-piece orders when I was a teenager. That semigroup is much less interesting, so I’m going to pretend that never happened. <a href="#fnref:2" class="reversefootnote">↩</a></p>
</li>
<li id="fn:3">
<p>This result was originally <a href="https://link.springer.com/article/10.1007%2Fs00010-008-2948-4">proven by Scott Chapman, Rolf Hoyer, and Nathan Kaplan in 2008</a>, during an undergraduate REU research program I was also participating in. But the original result had an unfortunately large bound, so using this to compute delta sets wasn’t really practically feasible. In 2014, a <a href="https://arxiv.org/abs/1406.0280">paper by J. I. García-García, M. A. Moreno-Frías, and A. Vigneron-Tenorio</a> improved the bound dramatically and made computation of delta sets feasible on personal computers. <a href="#fnref:3" class="reversefootnote">↩</a></p>
</li>
<li id="fn:4">
<p>My parents would never let me order this when I was a child, and I’m still bitter. <a href="#fnref:4" class="reversefootnote">↩</a></p>
</li>
<li id="fn:5">
<p>I haven’t actually shown that you can’t get a gap bigger than $3$. But it’s true. <a href="#fnref:5" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>Jay DaigleIn this post I want to outline my main research project, which involves non-unique factorization in numerical semigroups. I’m going to define semigroups and numerical semigroups; explain what non-unique factorization means; define the invariant I study, called the delta set; and talk about some of the specific questions I’m interested in.