David G. Khachatrian

An optimization problem.

Quick! Given $p \in (0,1)$, $x \in \mathbb{R^+}$, and $g: \mathbb{R^+} \to \mathbb{R}$, what’s

subject to $f \in [0,1]$? To make the expression slightly more concrete, let’s say $g(x) = \ln(x)$. What’s $f^*$?

We’ll find out the answer soon enough, and with a lot less pain than one might expect, by solving an equivalent and simpler problem. First, a bit of background – where does this nasty-looking expression come from?

Betting on biased coin flips.

Let’s describe a hypothetical situation. Someone comes up to you and offers to bet on the results of the flips of a coin with probability $p$ of landing on heads. If you guess the coin flip correctly, your earnings equal the amount you bet; if you’re wrong, you lose your bet. You start off with $X_0 = x$ amount of money, and you value your money (i.e. have a utility function for money) according to the concave function $g(X) = \ln(X)$ – it doesn’t hurt to have more, but getting that marginal dollar doesn’t matter to you nearly as much when you have $1,000,000 to your name compared to $10. You have to play $N$ total games with this person if you choose to accept. You don’t want to make things too complicated for yourself – you’ll commit to one strategy for determining what fraction $f$ of your money you’ll bet in a round. What fraction $f$ should you bet to get the most out of the opportunity (i.e., maximize your expected utility) after all $N$ games?

Well, I didn’t exactly try to hide it – this hypothetical situation corresponds exactly with our beautiful expression at the top. And paying attention to the nature of our objective and how $X$ evolves over the rounds will help us solve our question with fewer headaches.

How the game unfolds.

Let’s consider how our pot grows over time. We’ll denote $X_i$ as the amount of money you have after the $i$’th round. Before you’ve played any rounds, you have the money in your pocket almost surely (barring any pickpockets or what have you), so we have

Let’s assume $p > 0.5$, in which case the optimal decision is always to bet on heads. Now, play a round, betting $f$ of your earnings on heads. What could be your outcome?

OK, what about the next round, for $X_2$? Well, we can enumerate the possibilities: there’s one way to have won twice (with probability $p^2$), two ways to have won once and lost once (with probability $p(1-p)$), and one way to have lost twice (with probability $(1-p)^2$). So:

You may have noticed the pattern already. After a total of $N$ games, we’ll have

for $i \in {0, 1, \cdots, N}$, corresponding to winning anywhere from $0$ and all $N$ games. Wrapping our $X_N$ with our utility function $g$ and calculating its expectation gives us our wonderful expression before, repeated below:

OK, great. But how does that help?

The trick, funnily enough: greed (in the algorithmic sense).

We wrote our $X_i$ somewhat explicitly above, but let’s write it recursively instead.

That is, each round looks “the same” as any other – you have some starting money (which just happens to be some random variable $X_i$) and you want to maximize your expected utility

Notice that $E[g(X_{i+1})]$ depends linearly on $E[g(X_{i})]$ regardless of $i$. No matter what round, given fixed $f$ and $p$, you’re always better off after the $i+1$’th round if you came in with as large an $X_i$ as possible (a sort of “first-order stochastic dominance” between the optimal play $X_i$ and suboptimal play $Y_i$).

That means we can wind this back all the way to our first round – if we did the first round optimally, we’ll play the second round optimally, and the third round, and the fourth round, and…

This is a case where a greedy algorithm leads to the optimal solution – which is convenient, because the global objective function isn’t very pleasant. So, comparing the objective function after $N$ rounds:

and after just one round:

we have $f^*_1 = f^*_2$.

The second version is much simpler to solve for: take the derivative with respect to $f$, set equal to zero, and try to isolate $f$. For arbitrary $g$, you have the optimal $f$ when

In our original formulation at the top, you could plug in whatever $g$ would happen to be and isolate $f$. Using a (somewhat) reasonable utility function $g(x) = \ln(x)$ (reasonable in that it’s concave – the blowup for $0 < x < 1$ might be a bit silly (maybe it’s your lucky dollar)), we can solve for $f$ explicitly and find that

The formula for this problem setup (with fixed $p$, $f$, and $g(x) = \ln(x)$) is called the Kelly criterion, specifically with $b = 1$:1 betting odds (the generalized for arbitrary $b$ isn’t much more difficult).

Regrouping.

Well, that wasn’t so bad, was it?

It turned out to be a good reminder to see whether there’s a simpler subproblem underlying the full problem. This idea is a staple in dynamic programming and optimal control theory (e.g., the Bellman equation). In this case, the optimal solution was an even simpler, “greedy” solution. The idea to try this sort of approach may come more naturally when considering explicit programming problems (e.g. fulfilling server requests/load balancing) – but one may not very often consider the idea when staring at mathematical expressions!

Duality is powerful. Finding connections between seemingly disparate objects, allows us to describe functions of time as functions of frequency, to convert problems of dynamics into problems of geometry – and indeed, to turn mathematical optimization problems into questions about biased coin flips. This exercise is just another reminder of the power of duality and its problem-solving utility.