Jay's Blog

Math, Teaching, Literature, and Life

Recent Posts:

Subscribe via RSS

Twitter: @ProfJayDaigle

Lockdown Recipes: Red Beans and Rice

May 25, 2020

Since we’re all stuck at home and cooking more than usual, I wanted to share one of my favorite recipes from my childhood, which is also especially suited to our current stuck-at-home ways.1


Red Beans and Rice is a traditional Louisiana Creole dish. It’s cheap and extremely easy and low effort to make. The one major downside is that it takes several hours of simmering (that don’t require any attention); in normal times that’s a major disadvantage, but if you’re working from home that’s not a problem at all.

In fact, this dish was originally a solution to a working-from-home dilemma that Louisiana cooks faced. Monday was laundry day, and the women of the house were so busy doing the wash that they couldn’t spend all day tending food on the stove. So this hands-off dish became a traditional Monday dinner.

There are a lot of ways you can vary this dish. I’ll give two straightforward recipes: one for the traditional stovetop method, and a faster pressure-cooker method that I use during busier times that takes less time, both in prep and in waiting. But I also want to talk about what some of the steps are doing, and how you can change things up to get different flavor profiles if you want.



Traditional red beans

  1. In a large (at least two gallons) pot, melt the butter over medium heat. Sweat the onions, celery, and bell peppers for 5-10 minutes, until soft and onions are translucent.
  2. Add garlic, parsley, and thyme and sautee for a couple minutes more, until soft.
  3. Rinse kidney beans and add them to pot. Add water (or stock) until covered by an inch or two of water, and heat to a high simmer. Cover pot and leave to simmer.
  4. After a half hour or so, add meat and tomato paste, and stir to combine. Return to a simmer and cover.
  5. After another hour, add seasonings. Return to a simmer and cover again.
  6. Once every hour or so, check on the pot. Top it off with extra liquid if it’s starting to run low, and scrape the bottom a bit to make sure nothing is sticking to the bottom.
  7. After six to eight hours, the beans should be basically disintegrated: you’ll see the shells floating in the liquid, but the insides of the bean will have absorbed into the liquid base and formed a rich, thick paste. At this point you might want to taste it and adjust seasonings to your preference.
  8. Serve over rice.

Pressure cooker red beans

Rinse the red beans. Then dump all the ingredients in the pressure cooker. Cook on high pressure for two hours, then simmer until consistency is good. Serve over rice.

(See how easy that was?)



Onions, celery, and bell peppers are the traditional base for New Orleans stocks and soups, known as the “Holy Trinity”. They serve the same role as the French mirepoix (onions, celery, and carrots) or the Spanish sofrito (garlic, onion, peppers, and tomatoes). If you like those other flavor profiles more, you can substitute a different aromatic base. You can also use whatever fat you like for the sauteeing.

Some people like to brown their aromatics, while others like to gently sweat them without browning. The flavor profiles are slightly different, so take your pick.

If you want to speed things up a bit, you can sweat your aromatics in a separate skillet while starting the boil on the red beans. I often find this easier to manage, not needing to stir the aromatics in the giant stock pot, but it does require a second pan.


The most important aspect here is the kidney beans. It is very important that they stay at a full boil for at least half an hour; kidney beans are toxic and it takes a good boiling to break those toxins down.

A lot of people like to soak their beans overnight before cooking with them. This makes the toxins break down a bit easier, and also makes them cook faster; it probably cuts the cooking time from eight hours or so down to six. It changes the flavor in a way I don’t like, so I don’t do it. But you might prefer that flavor!

You can definitely substitute in other beans, but you’ll get a different texture. Kidney beans are extremely tough and starchy and give the stock a nice body when completely broken down.

I like the flavor effect of adding a can of tomato paste, but it’s not especially traditional. This is totally optional.

Because the red beans add body, this broth works just fine with plain water. But if you have stock in your kitchen it can add extra layers of flavor and body to your dish. I generally start with homemade stock, and top it off with water as the cooking continues.


You can flavor this broth with nearly any meat you have. Traditionally, the cook would use the leftover bones from the Sunday roast to flavor the red bean broth on Monday. If you happen to have some chicken or pork bones left over, you can do far worse than adding them to the pot.

When I’m doing it in the pressure cooker, I often like to take a 3-4 pound bone-in pork shoulder and add that in place of the sausage. I get the broth richness from the bone, and the meat of the pork shoulder falls off into the stew nicely. I haven’t tried this in the traditional method but I’m sure it would work.

If you do use pre-chopped meat like sausage, you can brown it in a separate pan for extra flavor. Extra steps and an extra pan, but extra flavor; your call whether it’s worth it.

Andouille sausage is probably the most standard sausage choice right now. It’s spicy, so you may want something milder. It’s also a bit more expensive than I tend to want to go for this dish; the sausage can easily be more than half the cost of the entire dish. My default option is Hillshire Farms smoked sausage, but you can use whichever firm sausage you like.

And the dish does work fine with no meat at all, if you’d prefer a vegetarian option. Replace the butter with oil and you can make it vegan.


This is really flexible. To be honest, I primarily season with a healthy dose of Tony Chachere’s spice mix. I also add the sugar, and either a dollop of oyster sauce or a pinch of MSG powder.

But there are of course lots of options here. I don’t think the mustard is super traditional, but I very much like the effect.

Almost any spices you like can go here. I suspect coriander would be good. Swap out the cayenne pepper for black pepper, or for Tabasco sauce (very traditional in New Orleans food). Or you could change up the flavor profile entirely and push it towards your favorite cuisine. Use an Italian spice blend, or a Mexican blend, or an Indian blend, whatever strikes your fancy. And if you find something that works really well—let me know!

Did you make this? What did you think? Do you have a favorite lockdown recipe to share? Tweet me @ProfJayDaigle or leave a comment below.

  1. Yeah, it would have made even more sense to post this two months ago. But two months ago I was trying to figure out how to teach three math classes over the internet instead of recipeblogging. 


The SIR Model of Epidemics

March 27, 2020

For some reason, a lot of people have gotten really interested in epidemiology lately. Myself included.

Picture of a coronavirus, by Alissa Eckert, MS and Dan Higgins, MAMS, courtesy of the CDC

I have no idea why.

Now, I’m not an epidemiologist. I don’t study infectious diseases. But I do know a little about how mathematical models work, so I wanted to explain how one of the common, simple epidemiological models works. This model isn’t anywhere near good enough to make concrete predictions about what’s going to happen. But it can give some basic intuition about how epidemics progress, and provide some context for what the experts are saying.

Disclaimer: I don’t study epidemics, and I don’t even study differential equation models like this one. I’m basically an interested amateur. I’m going to try my best not to make any predictions, or say anything specific about COVID-19. I don’t know what’s going to happen, and you shouldn’t listen to my guesses, or the guesses of anyone else who isn’t an actual epidemiologist.

The SIR Model


The SIR model divides the population into three groups, which give the model its name:

Picture of a Knight, by Paul Mercuri (1860)

Not that kind of “sir”.

For the purposes of this model, we assume that the total number of people, $N$, doesn’t change. But the number of people in each $S,I,R$ group is changing all the time: susceptible people get infected, and infected people recover. So we write $S(t)$ for the number of susceptible people “at time $t$”—which is just a fancy way of saying that $S(3)$ means the number of susceptible people on the third day.

Change Over Time

In order to model how these groups evolve over time, we need to know how often those two changes happen. How quickly do sick people recover? And how quickly do susceptible people get sick?

The first question, in this model, is simple. Each infected person has a chance of recovering each day, which we call $\gamma$. So if the average person is sick for two weeks, we have $\gamma = \frac{1}{14}$. And on each day, $\gamma I$ sick people recover from the virus.

The second question is a little trickier. There are basically three things that determine how likely a susceptible person is to get sick: how many people they encounter in a day, what fraction of those people are sick, and how likely a sick person is to transmit the disease. The middle factor, the fraction of people who are sick, is $\frac{I}{N}$. We could think about the other two separately, but for mathematical convenience we group them together and call them $\beta$.

So the chance that a given susceptible person gets sick on each day is $\beta \frac{I}{N}$.2 And thus the total number of people who get sick each day is $\beta \frac{I}{N} S$.

If these letters look scary, it might help to realize that you’ve probably spent a lot of time lately thinking about $\beta$—although you probably didn’t call it that. The parameter $\beta$ measures how likely you are to get sick. You can decrease it by reducing the number of people you encounter in a day, through “social distancing” (or physical distancing). And you can decrease it by improved hygiene—better handwashing, not touching your face, and sterilizing common surfaces.

There’s one more number we can extract from this model, which you might have heard of. In a population with no resistance to the disease (so $S$ and $I$ are both small, and we can pretend that $S=N$), a sick person will infect $\beta$ people each day, and will be sick for $\frac{1}{\gamma}$ days, and so will infect a total of $\frac{\beta}{\gamma}$ people. We call this ratio is $R_0$; you may have seen in the news that the $R_0$ for COVID-19 is probably about $2.5$.

A graph demonstrating exponential growth when R0 = 2

When $\beta$ is twice as big as $\gamma$, things can get bad very quickly. From The Conversation, licensed under CC BY-ND

Assumptions and Limitations

Like all models, this is a dramatic oversimplification of the real world. Simplifcation is good, because it means we can actually understand what the model says, and use that to improve our intuitions. But we do need to stay aware of some of the things we’re leaving out, and think about whether they matter.

First: the model assumes a static population: no one is born and no one dies. This is obviously wrong but it shouldn’t matter too much over the months-long timescale that we’re thinking about here. On the other hand, if you want to model years of disease progression, then you might need to include terms for new susceptible people being born, and for people from all three groups dying.

Second: the model assumes that recovery gives permanent immunity. Everyone who’s infected will eventually transition to recovered, and recovered people never lose their immunity and become susceptible again. I don’t think we know yet how many people develop immunity after getting COVID-19, or how long that immunity lasts.

But it seems basically reasonable to assume that most people will get immunity for at least several months; in this model we’re simplifying that to assume “all” of them do. And since we’re only trying to model the next several months, it doesn’t matter for our purposes whether immunity will last for one year or ten.

Third: we assumed that $\beta$ and $\gamma$ are constants, and not changing over time. But a lot of the response to the coronavirus has been designed to decrease $\beta$—and the extent of those changes may vary over time. People will be more or less careful as they get more or less worried, as the disease gets worse or better. And people might just get restless from staying home all the time and start being sloppier. An improved testing regime might also decrease $\beta$, and better treatments could improve $\gamma$.

But the model leaves $\beta$ and $\gamma$ the same at all times. So we can imagine it as describing what would happen if we didn’t change our lifestyle or do anything in response to the virus.

Finally: the first two factors, combined, mean that the susceptible population can only decrease, and the recovered population can only increase. Since we also hold $\beta$ and $\gamma$ constant, this model of the pandemic will only have one peak. It will never predict periodic or seasonal resurgences of infection, like we see with the flu.

graph of flu deaths, 2010 - 2014

A graph of flu deaths per week, peaking each winter, from the CDC. The vanilla SIR model will never produce this sort of periodic seasonal pattern.

stylized graph of possible COVID-19 trajectories

This green curve imagines a “dance” where we suppress coronavirus infections through an aggressive quarantine, and then spend months alternately relaxing the quarantine until infections get too high, and then tightening it again until infections fall back down. The SIR model doesn’t allow this sort of dynamic variation of $\beta$ and can never produce the green curve.

The Whole System

If we put all this together we get a system of ordinary nonlinear differential equations. A differential equation is an equation that talks about how quickly something changes; in these equations, we have the rates at which the number of susceptible, infected, and recovered people change. “Ordinary” means that there’s only one input variable; all the parameters change with time, but we’re not taking location as an input or anything. “Nonlinear” means that our equations aren’t in a specific “linear” form that’s really easy to work with.

Photo of a Kitten

Calling these equations a “nonlinear system” is a lot like calling this kitten a “nondog animal”. It’s not wrong, but it’s kind of weirdly specific if you’re not at a dog show.

If you took calculus, you might remember that we often write $\frac{dS}{dt}$ to mean the rate at which $S$ is changing over time. Roughly speaking, it’s the change in the total number of susceptible people over the course of a day. We know that $S$ is decreasing, since susceptible people get sick but we’re assuming that people don’t become susceptible, so $\frac{dS}{dt}$ is negative. And specifically, we worked out that $\frac{dS}{dt}$ is $-\beta \frac{IS}{N}$, since that’s the number of people who get sick each day.

Similarly, we saw that $\frac{dR}{dt}$ is $\gamma I$, the number of people who recover each day. And $\frac{dI}{dt}$ is the number of people who get sick minus the number who recover. All together this gives us:

\begin{align} \frac{dS}{dt} & = - \beta \frac{IS}{N} \\\
\frac{dI}{dt} &= \beta \frac{IS}{N} - \gamma I \\\
\frac{dR}{dt} & = \gamma I \end{align}

What Did We Learn?

Now that we have this model, what’s the point? We can actually do a few different things with a model like this. If we want, we can write down an exact formula that tells us how many people will be sick on each day. Unfortunately, the exact formula isn’t actually all that helpful. The paper I linked includes lovely equations like

And I don’t want to touch a formula that looks like that any more than you do.

Even if the formula were nicer, it wouldn’t be all that useful. Getting an exact solution to the equations doesn’t mean we know exactly how many people are going to get sick. Like all models, this one is a gross oversimplification of the real world. It’s not useful for making exact predictions; and if you want predictions that are kinda accurate, you should talk to the epidemiological experts, who have much more complicated models and much better data.

Qualitative Judgments

But this model does give us a qualitative sense of how epidemics progress. For instance, in the very early stages of the epidemic, almost everyone will be susceptible. So we can make a further simplifying assumption that $S = N, I = R =0$, and get the equation This is famously the equation for exponential growth. And indeed, graphs of new coronavirus infections seem to start nearly perfectly exponential.

Comparison of reported Chinese cases with exponential curve

This graph from I24 news of reported infections in China almost perfectly matches the exponential curve.

Linear and logarithmic scale plots of US and Italian coronavirus cases

This New York Times graph shows the exponential curves in both the US and Italy on the left. The right-hand logarithmic plots look nearly like straight lines, which which also reflects the exponential growth pattern.

As the epidemic progresses, the numbers of infected and recovered people climb. Each sick person will infect fewer additional people, since more of the people they meet are immune. We can see this in the model: the number of people who get infected each day is $\beta \frac{S}{N} I$. After many people have gotten sick, $\frac{S}{N}$ goes down and so fewer people get infected for a given value of $I$.

The epidemic will peak when people are recovering at least as fast as they get sick. This happens when $\beta \frac{IS}{N} \leq \gamma I$, and thus when $S = \frac{\gamma}{\beta} N$. Remember that $\frac{\beta}{\gamma}$ was our magic number $R_0$, so by the peak of the epidemic, only one person out of every $R_0$ people will have avoided getting sick.

If the estimates of $R_0 \approx 2.5$ are correct, this would mean that the epidemic would peak when something like 60% of the population had gotten sick. And remember, that’s not the end of the epidemic; that’s just the worst part. It would slowly get weaker from that time on, until it eventually fizzles.

(These are not predictions, for many reasons. I’m not an epidemiologist. Any real epidemiologist would be using a much more sophisticated model than this one to try to make real predictions. Don’t pay attention to the specific numbers I use here. But you can get a qualitative sense of what changing these numbers would do—and have more context for understanding what the real experts tell you.)


Predictions from actual experts use a ton of data and consider a huge range of possibilities, and generally look like this table from a team at Imperial College London.

Numeric Simulations

There’s one more thing that toy models like this can do. We can use them to run numeric simulations (using Euler’s method or something similar). We can see what would happen under our assumptions, and how the results change if we vary those assumptions.

Below is some code for the SIR model written in SageMath. (I borrowed the code from this page at Clemson; I believe the code was written by Jan Medlock.) I’ve primed it with $\gamma = .07$, which means that people are sick for two weeks on average, and $\beta = .2$, which gives us an $R_0$ of about $2.8$.

If you just click “Evaluate”, you’ll see what happens if we run this model using those values of $\beta$ and $\gamma$ over the next 400 days. It’s pretty grim; the epidemic peaks two months out with a sixth of the country sick at once (the red curve), and in six months well over 80% of the country has fallen ill at some point (the blue curve).3

But with this widget you can play with those assumptions. What happens if we find a way to cure people faster, so $\gamma$ goes down? What if we lower $\beta$, by physical distancing or improved hygiene? The graph improves dramatically. And you can change up all the numbers if you want to. Play around, and see what you learn.

And stay safe out there.

Have a question about the SIR model? Have other good resources on this to point people at? Or did you catch a mistake? Tweet me @ProfJayDaigle or leave a comment below.

And take care of yourself.

  1. Or people who are asymptomatic carriers. This model doesn’t worry about who actually gets a fever and starts coughing, just who carries the virus and can maybe infect others. 

  2. If we’re being fancy, we say that the chance of getting sick is proportional to $\frac{I}{N}$ and that $\beta$ is the constant of proportionality. But if you’re not used to differential equations already I’m not sure that tells you very much. 

  3. Reminder: I don’t believe that this will happen, for many reasons. And you shouldn’t listen to me if I did. Numbers are for illustrative purposes only and should not be construed as epidemiological advice. 


Online Teaching in the Time of Coronavirus

March 14, 2020

I’ve been spending a lot of the past week looking at different options for transitioning my teaching online for the rest of the term. There are certainly people far more expert at online instruction than I am, but I wanted to share some of my thoughts and what I’ve found.

Handling Assignments

Online Assignment Options

There are a lot of options for doing homework online. Many of these products (like WebAssign) have temporarily made everything freely available. I’m sure some of them are good, but I don’t know much about them.

This term I’ve been experimenting with using the MAA’s WeBWork system, which has been going quite well. If you can administer your own server it’s completely free; if you can’t, the MAA will give you one trial class and then charge $200 per course you want to host. I don’t know how willing they are to start these up mid-semester, though. WeBWork is hardly a solution to everything, but it works very well for questions with numerical or algebraic answers.

(With WeBWork you can even give assignments that have to be completed inside a narrow window–say, an assignment that is only answerable between 2 and 3:30 on Thursday. So we could maybe use this to somewhat replace tests. Though again, not perfectly.)

Written Homework

Of course, some assignments really need to include a written component. Written homework probably can just be photographed (or scanned) with a mobile phone; I expect most of our students have access to some sort of digital camera. I don’t know anything about the scanning apps but I know they exist. I have in fact graded photographed homework before, and my student graders have expressed a willingness to do this for the rest of the term.

We can also consider encouraging our students, especially in upper-division classes, to start using LaTeX for more assignments. That’s an unreasonable imposition on Calc 1 students but most of the people in the upper-level classes have probably been exposed to it, and it would make a lot of this much simpler. No scanning, no photographing, just emailing in PDFs.

Lectures and Office Hours

I purchased a writing tablet for my computer. This is a peripheral that plugs into your computer and allows you to write/draw with a pen. I specifically ordered a Huion 1060 Plus, which gives a 10x6 writing area and goes for $70 on Amazon. I haven’t gotten to test it yet, so don’t consider that quite a recommendation. The other thing that gets highly recommended is the Wacom Intuos, which is supposed to be somewhat nicer but also gives a much smaller writing surface (something like 6x4), so if you write big this might not be comfortable.

I’ve been looking into options to stream lectures and other content. There are really two things I want to do here: the first is to have video conferences where I can stream lectures and share my screen to show written notes, LaTeX’d notes, Mathematica notebooks, etc. The second is to create a persistent space for student interactions. I’d like to create a space where even when I’m not “holding a lecture” or “having office hours”, my students can still ask questions—of each other and of me.


I’ve been doing the second thing with Discord for my research group for the past year or so. It works pretty well. You create a room with a bunch of channels and all messages in a channel stay permanently (unless deleted by a moderator). You can scroll up to see what people have talked about in the past. Makes it great for students to have conversations with you and each other, and other students can see what happened in them. (There’s also a private messaging feature, of course.)

Discord is also good for voice calls, and has a screen sharing feature. Both of them worked very smoothly when I tried them, except the screensharing has some limitations that I believe are Linux-specific (in particular, in my multi-monitor setup I can share one window, or my entire desktop, but I can’t share exactly one monitor, which is something I would like to do). I’ve been in touch with David Speyer, who’s written up a bunch of thoughts about Discord here, with a basic tutorial for setting it up.

One thing about discord that is both good and bad is that many of our students use it already. (It was designed for online videogame playing, and is now a widely used chat and voice program.) This is good because our students are already familiar with the program and how to use it. It may be bad because that means our students often already have screen names and identities on Discord that they may want to keep separated from their academic/professional personas. If we use some software they have not used before, they can create fresh accounts and keep their online personas appropriately segmented.

Oxy’s Suggestions: BlueJeans and Moodle

My institution made some software recommendations. BlueJeans is the recommended videoconferencing software. I’ve played around with it a bit and it seems serviceable but not great. (Again, it has some specific issues with Linux that are more or less dealbreakers for me, as well.) One thing I miss from it is that it’s designed for video calls/conferences, but it doesn’t have the capacity to create a persistent chat room. So if I want that persistent interaction space, I’d need to use a second tool; I’d prefer to run everything on one platform if I can.

Moodle has a tool for creating chat rooms, but it’s awful. Do not want. It’s still a good place to post assignments and such if you don’t already have a place to post them and your institution uses Moodle. (If your institution uses some other learning management software, I can’t say much; Moodle is the only one I’ve ever used.)

Zoom Videoconferencing

I’ve been leaning towards a videoconferencing solution called Zoom. The screensharing works great, and the recording feature works great. There’s an ability to create a shared whiteboard space, that I and students can both write on, which seems helpful for virtual office hours.

Zoom has the ability to create a persistent chatroom, and it worked very smoothly in some testing I did today with a couple of my undergraduates. (One of them reported that it “felt really slick”, which is a good sign; most of the experience was pretty seamless.) The videoconferencing can work without anyone making an account, I think, but the persistent chat room would require all our students to make (free) accounts. Anyone with a Gmail account can just log in with that, so that might not be a large barrier.

One major downside is that videoconferences are limited to 40 minutes. They’ve been relaxing this for schools and in affected areas, so I don’t know how much this would be in practice. But I also think we could just start again at the end of the 40 minute period if we needed to. (Or maybe just keep formal lectures below forty minutes; it’s hard to ask students to pay attention that long anyway. If you’re posting recorded video suggestions seem to be to keep them under ten minutes.)

Closing thoughts

There are a bunch of other resources floating around to help you; I’ve looked at several but unfortunately haven’t been keeping a list. But if you poke around on Twitter or elsewhere there are many people more informed than I am who will offer help!

I know the MAA has a recorded online chat on online teaching, though I haven’t looked at it yet.

But the most important thing is not to get hung up on perfection. I didn’t plan to teach my courses remotely this term, and I’m sure they will suffer for lack of direct instructional contact. But that’s okay! And I’m going to be honest with my students about this.

This is a really unfortunate way to finish out the semester. It sucks. But I’m going to do what I can to make it only suck a medium amount. And I hope my students will bear with me and help to make this only medium suck.

We’ll get through this.

I’d love to hear any ideas or feedback you have about moving to online instruction. And I’m happy to answer any questions I can—we’re in this together. Tweet me @ProfJayDaigle, or leave a comment below.


2019 Spring Class Reflections: Calculus

July 10, 2019

Now that the term is over, I want to reflect a bit on the courses I taught, what worked well, and what I might want to do differently next time. (Honestly, it probably would have been more useful to write this sooner after finishing the courses, when they were fresher in my mind. But I don’t have a time machine, so I can’t do much about that now.) In this post I’ll talk about my calculus class; I’ll try to write about the others soon.

My previous course design had limited success

Math 114 at Occidental is intended for students, usually freshmen, who have seen calculus before but haven’t mastered the material sufficiently to be ready for calculus 2. This has the advantage that everyone in the course is familiar with the basic ideas, and that I can sometimes reference ideas we haven’t talked about yet to help justify what we do in the early parts of the course. It also has the disadvantage that my students arrive with a lot of preconceptions and confusions about the subject.1

It also means that we have extra time available to learn about extra topics that are interesting or useful or just help explain the ideas of calculus better, even if those topics aren’t really necessary to prepare for calculus 2.

In past years I had used this extra time to do the epsilon-delta definition of limits. I’m still proud of having successfully taught many freshmen to write clean epsilon-delta proofs. But over time I came to the conclusion that this wasn’t the best use of class time.

I had wanted the epsilon-delta proofs section to accomplish two things: help my students learn to write and reason more clearly, and give them a taste of what higher math was like. Neither of these goals were complete failures, but neither was really a success either.

An approximate approach

Over time I realized that my course had gotten less focused on using the formal limits ideas anyway. I had drifted more and more to talking about two big ideas once we got out of the limits section: models and approximation.

Models are the big idea I’ve been thinking about lately.3 On its own terms, math is a purely abstract enterprise; to use math to understand the world we need to have some model of how the world can be described mathematically. This modeling is a really important skill for any field where you’re expect to apply math to solve problems—and the same skills can help reason about situations with no explicit mathematical model.

Approximation is the big idea of calculus. This is true on a surface level, where we can think of limits as taking an “infinitely good” approximation of the value of a function at a point, and derivatives are an approximation of the rate of change. But it’s also the case that many of the applications of calculus and especially of derivatives have to do with notions of approximation.

After some wrestling with both ideas, I decided to take the latter approach in this term’s course. It meshed well with the way I tend to think about the ideas in calculus 1, and the way I had been explaining them to students. So I reorganized my course into five sections.

  1. Zero-order approximations: Continuity and limits. We can think of a continuous function as one where $f(a)$ is a good approximation of $f(x)$ when $x$ is close to $a$. A lot of the facts about limits we need to learn are answers to questions that arise naturally when we want to approximate various functions. And “discontinuities” make sense as “points where approximation is hard for some reason”.
  2. First-order approximations: Derivatives. We started with the linear approximation formula $f(x) = f(a) + m(x-a)$ and asked what value of $m$ would make this the best possible approximation. A little rearrangement gives the definition of derivative, but now that definition is the answer to a question, not a definition just dropped on our heads from the sky. We want to be able to compute derivatives so that we can approximate functions easily, and as a bonus we can reinterpret all of this geometrically, in terms of the tangent line.
  3. Modeling: Word problems and differential equations. We reinterpret the derivative a third time as an answer to the problem of average versus “instantaneous” speed, and then as the answer to all sorts of concrete “rate of change” problems. We can talk about the idea of differential equations, and practice turning descriptions of situations into toy mathematical models with derivatives. We can’t solve these equations explicitly without integrals, but we can approximate solutions using Euler’s method, and get a good definition of the function $e^x$ in the bargain. Implicit derivatives and related rates also show up here, using derivatives in a different type of model.
  4. Inverse Problems: Inverse functions and antiderivatives. We take all the questions we’ve asked and turn them around. We define inverse functions, especially the logarithm and inverse trig functions, and use the inverse function theorem to find their derivatives. We can use the intermediate value theorem and Newton’s method to approximate the solutions to equations. We finish by defining the antiderivative as the (not-quite) inverse of the derivative.
  5. Second-Order approximations: The second derivative allows us to find the best quadratic approximation to a given function. This is a natural setting for thinking about extreme value problems, so we cover all the optimization topics, along with Rolle’s theorem and the mean value theorem, and then put all this information together to sketch graphs of functions. We finished up with brief explanations of Taylor series and of imaginary numbers.

Most of it worked pretty well.

This course was basically successful, but there are lots of ways to improve it. I think my students both had a more comfortable experience and gained a much better understanding of some of the core ideas of calculus, especially the basic idea of linear approximation.

The first section, on limits, was okay. It’s still a little awkward, and I’m tempted to Ben’s approach of starting with derivatives entirely. But I really liked the way it started, with making the point that $\sqrt{5}$ is “about 2”. This simplest-possible-approximation made a good anchor for the course, and helps reinforce the sort of basic numeracy that helps us understand basically any numerical information we learn. I still need to do a bit more work on the logical flow and transitions, and the idea of limits at infinity is important but doesn’t sit in here entirely comfortably.

The section on derivatives and first-order approximations worked wonderfully. This is the section that contains many of the ideas driving this course approach, and I’ve used many of them before, so it makes sense that this worked well.

The section on inverse functions again worked pretty well. It’s pretty easy to justify “solving equations” to students in a math class, and “this equation is too hard so let’s find a way to avoid solving it” is pretty compelling.

And finally the section on the second-order stuff felt pretty strong as well, but could still be improved. While in my head I have a clear picture relating “approximation with a parabola” to the maxima and minima of a function, I don’t know that it came across clearly in the class. And I was feeling a little time pressure by this point; I really wish I had had an extra couple of days of class time.

Modeling is hard

But the section on modeling needs a lot of work. A lot of the ideas that I wanted to include in here aren’t things I’ve ever taught before, so the material is still a little rough. I also got really sick right when this section was starting, so my preparation probably wasn’t as good as it could have been.

In particular, I wasn’t very satisfied with the section on describing real-world situations in terms of models, and coming up with differential equations. I showed a bunch of examples but don’t know that we really got a clear grasp on the underlying principles as a class. And my homework questions on this modeling process probably contained a bit too much “right answer and wrong answer” for a topic that’s as inherently fuzzy as modeling.

I’m toying with the idea of assigning some problems where I ask students to argue for some modeling choices they make—handle it less like there’s one correct model, and more like there are a bunch of defensible choices. But I don’t know how well I can get that to fit in to the calculus class and the framework of first- or second-order ODEs.4 (Maybe I should do some modeling that doesn’t involve derivatives since understanding modeling is a goal on its own.)

I also wish I could fit the mean value theorem into the discussion of speed, but proving it really requires a lot of ideas I wanted to hold off on until later. Maybe I should state and explain it here, but then prove it later when the proof comes up for other reasons.

One thing I did really like in this section is the way I introduced the exponential $e^x$ as the solution to the initial value problem $y’ = y, y(0)=1$. This makes $e$ seem less like a number we made up to torture math students, and more like the answer to a question people would reasonably ask again.

Final thoughts

Overall, I feel pretty good about this redesign. I’m definitely not going back to the epsilon-delta definitions for this course any time soon, and I think this course will be really strong with a bit of work.

But there are a lot of ideas in the modeling topic that are important but that I don’t quite feel like I’m doing justice to yet. I need to go over that section carefully and figure out how to improve it.

I’m also thinking about moving some of my homework to an online portal. If we take all the “compute these eight derivatives” questions and have them automatically graded, I can use scarce human-grading time to give thorough comments on some more interesting conceptual questions.

To anyone who’s read this entire post, I’d love your feedback—on the course design as a whole, and on how to fix some of the problems I ran into. And if anyone is curious how I handled things, I’d be happy to share my course materials. You can find most of them on the course page but I’m happy to talk or share more if you’re interested!

Have ideas about this course plan? Have questions about why I did things? Tweet me @ProfJayDaigle or leave a comment below, and let me know!

  1. And a lot of anxiety. After all, the typical student in this course took calculus in high school and then failed the AP exam; they’ve all had at least one not-great experience with the material. 

  2. This same problem arises even in upper-division analysis courses. My undergraduate analysis professor Sandy Grabiner used to say that the point of a first analysis course is to prove that most of the time, what happens is exactly what you would expect to happen, and the second analysis course starts talking about the exceptions. But we tend to hope that our upper-classmen math majors at least are willing to bear with us through the proofs by that point. 

  3. You can read a lot more of my thoughts about this in my post on word problems, for instance. 

  4. It probably doesn’t help that I never actually studied ODEs in any way, so I don’t have many of my own examples to draw on. 


An Overview of Bayesian Inference

February 20, 2019

A few weeks ago I wrote about Kuhn’s theory of paradigm shifts and how it relates to Bayesian inference. In this post I want to back up a little bit and explain what Bayesian inference is, and eventually rediscover the idea of a paradigm shift just from understanding how Bayesian inference works.

Bayesian inference is important in its own right for many reasons beyond just improving our understanding of philosophy of science. Bayesianism is at its heart an extremely powerful mathematical method of using evidence to make predictions. Almost any time you see anyone making predictions that involve probabilities—whether that’s a projection of election results like the ones from FiveThirtyEight, a prediction for the results of a big sports game, or just a weather forecast telling you the chances of rain tomorrow—you’re seeing the results of a Bayesian inference.

Bayesian inference is also the foundation of many machine learning and artificial intelligence tools. Amazon wants to predict how likely you are to buy things. Netflix wants to predict how likely you are to like a show. Image recognition programs want to predict whether that picture contains a bird. And self-driving cars want to predict whether they’re going to crash into that wall.

You’re using tools based on Bayesian inference every day, and probably at this very moment.1 So it’s worth understanding how they work.

The basic idea of Bayesian inference is that we start with some prior probability that describes what we originally believe the world is like in terms of probability, by specifying the probabilities of various things happening. Then we make observations of the world, and update our beliefs, giving our conclusion as a posterior probability.

As a really simple example: suppose I tell you I’ve flipped a coin, but I don’t tell you how it landed. Your prior is probably a 50% chance that it shows heads, and a 50% chance that it shows tails. After you get to look at the coin, you update your prior beliefs to reflect your new knowledge. Your posterior probability says there is a 100% chance that it shows heads and a 0% chance that it shows tails.2

The rule we use to update our beliefs is called Bayes’s Theorem (hence the name “Bayesian inference”). Specifically, we use the mathematical formula \[ P(H |E) = \frac{ P(E|H) P(H)}{P(E)}, \] where

Let’s work through a quick example. Suppose I have a coin, and you think that there’s a 50% chance it’s a fair coin, and a 50% chance that it actually has two heads. So we have $P(H_{fair}) = .5$ and $P(H_{unfair}) = .5$.

Now you flip the coin ten times, and it comes up heads all ten times. If the coin is fair, this is pretty unlikely! The probability of that happening is $\frac{1}{2}^{10} = \frac{1}{1024}$, so we have $P(E|H_{fair}) = \frac{1}{1024}$. But if the coin is two-headed, this will definitely happen; the probability of getting ten heads is 100%, or $1$. So when you see this, you probably conclude that the coin is unfair.

Now let’s work through that same chain of reasoning algebraically. If the coin is fair, the probability of seeing ten heads in a row is $\frac{1}{2^{10}} = \frac{1}{1024}$. And if the coin is unfair, the probability is 1. So if we think there’s a 50% chance the coin is fair, and a 50% chance it’s unfair, then the overall probability of seeing ten heads in a row is \begin{align} P(H_{fair}) \cdot P(E | H_{fair}) + P(H_{unfair}) \cdot P(E | H_{unfair}) \\\ = .5 \cdot \frac{1}{1024} + .5 \cdot 1 = \frac{1025}{2048} \approx .5005. \end{align}

By Bayes’s Theorem, we have \begin{align} P(H_{fair} | E) &= \frac{ P(E | H_{fair}) P(H_{fair})}{P(E)} \\
& = \frac{ \frac{1}{1024} \cdot .5}{\frac{1025}{2048}} = \frac{1}{1025} \\
P(H_{unfair} | E) & = \frac{ P(E | H_{unfair}) P(H_{unfair})}{P(E)} \\
&= \frac{1 \cdot \frac{1}{2}}{\frac{1025}{2048}} = \frac{1024}{1025}. \end{align} Thus we conclude that the probability the coin is fair is $\frac{1}{1025} \approx .001$, and the probability it is two-headed is $\frac{1024}{1025} \approx .999$. This matches what our intuition tells us: if it comes up ten heads in a row, it probably isn’t fair.

But let’s tweak things a bit. Suppose I have a table with a thousand coins, and I tell you that all of them are fair except one two-headed one. You pick one at random, flip it ten times, and see ten heads. Now what do you think?

You have exactly the same evidence, but now your prior is different. Your prior tells you that $P(H_{fair}) = \frac{999}{1000}$ and $P(H_{unfair}) = \frac{1}{1000}$. We can do the same calculations as before. We have \begin{align} P(H_{fair}) \cdot P(E | H_{fair}) + P(H_{unfair}) \cdot P(E | H_{unfair}) \\
= \frac{999}{1000} \cdot \frac{1}{1024} + \frac{1}{1000} \cdot 1 \approx .00198 \end{align}

\begin{align} P(H_{fair} | E) &= \frac{ P(E | H_{fair}) P(H_{fair})}{P(E)} \\
& = \frac{ \frac{1}{1024} \cdot \frac{999}{1000}}{.00198} \approx .494 \\
P(H_{unfair} | E) & = \frac{ P(E | H_{unfair}) P(H_{unfair})}{P(E)} \\
&= \frac{1 \cdot \frac{1}{1000}}{.00198} \approx .506. \end{align} So now you should think it’s about equally likely that your coin is fair or unfair. 3

Why does this happen? If you have a fair coin, then seeing ten heads in a row is pretty unlikely. But having an unfair coin is also unlikely, because of the thousand coins you could have picked, only one was unfair. In this example those two unlikelinesses cancel out almost exactly, leaving us uncertain whether you got a (normal) fair coin and then a surprisingly unlikely result, or if you got a surprisingly unfair coin and then the normal, expected result.

In other words, you should definitely be somewhat surprised to see ten heads in a row. Remember, we worked out that your prior probability of seeing that is just $P(E) \approx .00198$—less than two tenths of a percent! But there are two different ways to get that unusual result, and you don’t know which of those unusual things happened.

Bayesian inference also does a good job of handling evidence that disproves one of your hypotheses. Suppose you have the same prior we were just discussing: $999$ fair coins, and one two-headed coin. What happens if you flip the coin once and it comes up tails?

Informally, we immediately realize that we can’t be flipping a two-headed coin. It came up tails, after all. So how does this work out in the math?

If the coin is fair, we have a $50\%$ chance of getting tails, and a $50\%$ chance of getting heads. If the coin is unfair, we have a $0\%$ chance of tails and a $100\%$ chance of heads. So we compute: \begin{align} P(H_{fair}) \cdot P(E | H_{fair}) + P(H_{unfair}) \cdot P(E | H_{unfair}) \\
= \frac{999}{1000} \cdot \frac{1}{2} + \frac{1}{1000} \cdot 0 = \frac{999}{2000} \end{align}

\begin{align} P(H_{fair} | E) &= \frac{ P(E | H_{fair}) P(H_{fair})}{P(E)} \\
& = \frac{ \frac{1}{2} \cdot \frac{999}{1000}}{\frac{999}{2000}} = 1 \\
P(H_{unfair} | E) & = \frac{ P(E | H_{unfair}) P(H_{unfair})}{P(E)} \\
&= \frac{0 \cdot \frac{1}{1000}}{\frac{999}{2000}} = 0. \end{align}

Thus the math agrees with us: once we see a tails, the probability that we’re flipping a two-headed coin is zero.

As long as everything behaves well, we can use these techniques to update our beliefs. In fact, this method is pretty powerful. We can prove that it is the best possible decision rule according to a few different sets of criteria4; and there are pretty good guarantees about eventually converging to the right answer after collecting enough evidence.

But there are still a few ways Bayesian inference can go wrong.

What if you get tails and keep flipping the coin—and get ten tails in a row? We’ll still draw the same conclusion: the coin can’t be double-headed, so it’s definitely fair. (You can work through the equations on this if you like; they’ll look just like the last computation I did, but longer). And if we keep flipping and get a thousand tails in a row, or a million, our computation will still tell us yes, the coin is definitely fair.

But before we get to a million flips, we might start suspecting, pretty strongly, that the coin is not fair. When it comes up tails a thousand times in a row, we probably suspect that in fact the coin has two tails. 5 So why doesn’t the math reflect this at all?

In this case, we made a mistake at the very beginning. Our prior told us that there was a $99.9\%$ chance we had a fair coin, and a $.1\%$ chance that we had a coin with two heads. And that means that our prior left no room for the possibility that our coin did anything else. We said our prior was \[ P(H_{fair}) = \frac{999}{1000} \qquad P(H_{unfair}) = \frac{1}{1000}; \] but we really should have said \[ P(H_{fair}) = \frac{999}{1000} \qquad P(H_{two\ heads}) = \frac{1}{1000} \qquad P(H_{two\ tails}) = 0. \] And since we started with the belief that a two-tailed coin was impossible, no amount of evidence will cause us to change our beliefs. Thus Bayesian inference follows the old rule of Sherlock Holmes: “when you have excluded the impossible, whatever remains, however improbable, must be the truth.”

This example demonstrates both the power and the problems of doing Bayesian inference. The power is that it reflects what we already know. If something is known to be quite rare, then we probably didn’t just encounter it. (It’s more likely that I saw a random bear than a sasquatch—and that’s true even if sasquatch exist, since bear sightings are clearly more common). And if something is outright impossible, we don’t need to spend a lot of time thinking about the implications of it happening.

The problem is that in pure Bayesian inference, you’re trapped by your prior. If your prior thinks the “true” hypothesis is possible, then eventually, with enough evidence, you will conclude that the true hypothesis is extremely likely. But if your prior gives no probability to the true hypothesis, then no amount of evidence can ever change your mind. If we start out with $P(H) = 0$, then it is mathematically impossible to update your prior to believe that $H$ is possible.

But Douglas Adams neatly explained the flaw in the Sherlock Holmes principle in the voice of his character Dirk Gently:

The impossible often has a kind of integrity to it which the merely improbable lacks. How often have you been presented with an apparently rational explanation of something that works in all respects other than one, which is that it is hopelessly improbable?…The first idea merely supposes that there is something we don’t know about, and God knows there are enough of those. The second, however, runs contrary to something fundamental and human which we do know about. We should therefore be very suspicious of it and all its specious rationality.

In real life, when we see something we had thought was extremely improbable, we often reconsider our beliefs about what is possible. Maybe there’s some possibility we had originally dismissed, or not even considered, that makes our evidence look reasonable or even likely; and if we change our prior to include that possibility, suddenly our evidence makes sense. This is the “paradigm shift” I talked about in my recent post on Thomas Kuhn, and extremely unlikely evidence, like our extended series of tails, is a Kuhnian anomaly.

But rethinking your prior isn’t really allowed by the mathematics and machinery of Bayesian inference—it’s something else, something outside of the procedure, that we do to cover for the shortcomings of unaugmented Bayesianism.

Let’s return to the coin-flipping thought experiment; there’s one other way it can go wrong that I want to tell you about. Suppose you fix your prior to acknowledge the possibility that is two-headed or two-tailed. (We could even set up our prior to include the possibility that the coin is two-sided but biased— so that the coin comes up head 70% of the time, say. I’m going to ignore this case completely because it makes the calculations a lot more complicated and doesn’t actually clarify anything. But it’s important that we can do that if we want to).6

You assign the prior probabilities \[ P(H_{fair}) = \frac{98}{100} \qquad P(H_{two\ heads}) = \frac{1}{100} \qquad P(H_{two\ tails}) = \frac{1}{100}, \] giving a 1% chance of each possible double-sided coin. (This is a higher chance than you gave it before, but clearly when I give you these coins I’ve been messing with you, so you should probably be less certain of everything). You flip the coin.

And it lands on its edge.

What does our rule of inference tell us now? We can try to do the same calculations we did before. The first thing we need to calculate is $P(E)$, which is easy. We started out by assuming this couldn’t happen, so the prior probability of seeing the coin landing on its side is zero!

(Algebraically, a fair coin has a 50% chance of heads and a 50% chance of tails. So if the coin is fair, then $P(E|H_{fair}) = 0$. But if the coin has a 100% chance of heads, then $P(E| H_{two\ heads}) = 0$. And if the coin has a 100% chance of tails, then $P(E| H_{two\ tails}) = 0$. Thus \begin{align} P(E) &= P(E|H_{fair}) \cdot P(H_{fair}) + P(E|H_{two\ heads}) \cdot P(H_{two\ heads}) + P(E|H_{two\ heads}) \cdot P(H_{two\ heads}) \\
& = 0 \cdot \frac{98}{100} + 0 \cdot \frac{1}{100} + 0 \cdot \frac{1}{100} = 0. \end{align} So we conclude that $P(E) = 0$).

Now we can actually calculate our new, updated, posterior probabilities—or can we? We have the formula that \[ P(H_{fair} | E) = \frac{ P(E | H_{fair}) P(H_{fair})}{P(E)}. \] But with the probabilities we just calculated, this works out to \[ P(H_{fair} | E) = \frac{ 0 \cdot \frac{98}{100}}{0} = \frac{0}{0}. \] And our calculation has broken down completely; $\frac{0}{0}$ isn’t a number, let alone a useful probability.

Even more so than the last example, this is a serious Kuhnian anomaly. If we ever try to update and get $\frac{0}{0}$ as a response, something has gone wrong. We had said that something was totally impossible, and then it happened. All we can do is back up and choose a new prior.

And Bayesian inference can’t tell us how to do that.

There are a few different ways people try to get around this problem. But that’s another post.

Questions about this post? Was something confusing or unclear? Or are there other things you want to know about Bayesian reasoning? Tweet me @ProfJayDaigle or leave a comment below, and let me know!

  1. I’m old enough to remember the late nineties, when spam was such a big problem that email became almost unusable. These days when I complain about email spam it’s usually my employer sending too many messages out through internal mailing lists; but there was a period in the nineties when for every legitimate email you’d get four or five filled with links to pr0n sites or trying to sell you v1@gr@ and c1@lis CHEAP!!! It was a major problem. Entire conferences were held on developing methods to defeat the spam problem.

    These days I see about one true spam message like that per year. And one major reason for that is the invention of effective spam filters using Bayesian inference to predict whether a given email is spam or legitimate. So you’re using Bayesian tools right now purely by not receiving dozens of unwanted pornographic pictures in your email inbox every day. 

  2. This particular example is far too simple to really be worth setting up the Bayesian framework, but it gives a pretty direct and explicit demonstration of what all the pieces mean. 

  3. The exact probabilities are 999/2023 and 1024/2023. As a bonus, try to see why having some of those exact numbers makes sense, and reassures us that we did this right. 

  4. I’m primarily thinking of two really important results here. Cox’s Theorem gives a collection of reasonable-sounding conditions, and proves that Bayesian inference is the only possible rule that satisfies them all. Dutch Book Arguments show that this inference rule protects you from making a collection of bets which are guaranteed to lose you money. 

  5. No, you can’t just check this by looking at the coin. Because I said so.

    More seriously, it’s pretty common to have experiments where you can see the results, but can’t inspect the mechanism by which those results are reached. In a particle collider you can see the tracks of exiting particles, but you can’t actually observe the collision. In an educational study, you can look at students’ test results, but you can’t look inside their brains and observe exactly when the learning happens. So it’s useful for this thought experiment to assume we can see how the coin lands, but can never look at both sides at the same time. 

  6. Gelman and Nolan have argued that it’s not physically possible to bias a coin flip in this way. This is arguably another reason to ignore the possibility that a coin is biased. And if you believe Gelman and Nolan’s argument, then you should have a low or zero prior probability that the coin is biased. But the actual reason I’m ignoring it is to avoid computing integrals in public.