Data

I wanted this man on my side. He had access to data.

Don DeLillo, White Noise

I track lots of things beyond art consumption—restaurants, haircut cadence, biking miles, illnesses—but my centerpiece project is my Friend CRM.

There's a CLT for maxes

How my quote randomizer led me to the Gumbel distribution

I recently added a quote randomizer to my website’s homepage. Upon each page load, the randomizer chooses a quote to display from a pool of about 35 options—usually from literature, but sometimes from the band Geese (which counts as literature, let’s be honest).

As I was testing the randomizer, repeatedly refreshing locally to see how the different lengths were getting formatted, I started to wonder about the average number of page reloads before a user sees all 35 quotes. Is it 35 times, 70 times, 100, 200, 500? It turns out that this maps exactly to a well-known problem in probability called the coupon collector’s problem—so called because you calculate how many cereal boxes you need to buy on average before you’ve collected all the different types of coupons within. Or yet another formulation: How many McDonald’s Happy Meals did you need to buy in the early ’90s before you collected all the Super Mario 3 toys? (I never did. Also this assumes that all four were on hand at the drive-thru, and you were randomly assigned one with equal probabilities for each. I just kept getting Luigi on a cloud.)

Continue reading →

The Universal Baseball Association: Multisets, stars and bars, and the negative binomial distribution

Some 20th-century literature, some combinatorics

First edition of Robert Coover’s novel
First edition of Robert Coover’s novel

In Robert Coover’s 1968 novel The Universal Baseball Association, Inc., J. Henry Waugh, Prop., the protagonist invents a fantasy baseball league whose outcomes hinge on the repeated roll of three dice, the main randomizer of his simulated world. When Coover explains the scheme for the first time, he declares these combinatorics:

Continue reading →

Friend CRM

Here’s the tweet thread from when I launched this project in the first year of COVID, explaining the original intent: namely, contact tracing and deliberate planning of hangouts when they got harder to do. And I never stopped tracking—the data is current through today!

It seems like a new article about the loneliness epidemic pops up virtually every month in mainstream publications. I’ve been amused to see when friend tracking specifically gets vindicated. Here’s just one example, from The New York Times Style Magazine, January 29, 2026: “Some even like to keep a log of their latest interactions with friends, to make sure too much time doesn’t elapse between catchups.”1

Continue reading →

Job tenure

I’ve had two instances where two back-to-back jobs really represented a single role: Northwestern (Kellogg) and University of Chicago (Chicago Booth) were essentially the same role under one research group, and Thinx and Kimberly-Clark were essentially the same role because of the latter’s acquisition of the former. When I calculate my average tenure, which is 2.18 years (Typical millennial in tech? Who knows?), I thought it made the most sense to combine these jobs.

Continue reading →

Tweet times

In November 2024, The Economist did a visualization of Elon Musk’s time of day while tweeting. I saw another Twitter user do this with his own historical data export—and since I’m such a big Twitter fan, I had to make my own version as well.

Continue reading →