The Cumulant Generating Function: From Moments to Max

Starting Point: Taylor Expansion

For a random variable , the moment generating function (MGF) is . Expanding the exponential:

The coefficient of is the -th moment — one function “generates” the whole sequence. Differentiating times at recovers .

Taking the log gives the cumulant generating function (CGF):

The cumulants are cleaner than moments: , , encodes skewness, kurtosis. They are additive over independent sums — a property moments lack — because the log turns the multiplicative MGF of a sum into addition.

Small β: Mean-Dominated

Apply this to a reward under policy : . For small , the series truncates:

The leading term is just scaled mean reward — standard RL. Higher cumulants are negligible.

Large β: Max-Dominated

Do not use the Taylor series here (it loses meaning at large ). Use the original definition. For a batch , factor out :

Each non-max term has exponent , decaying exponentially as grows. The sum collapses to , so . Suboptimal rewards are exponentially suppressed; the objective is dominated by the single best outcome.

Connections: Softmax and Free Energy

Softmax. The tilted distribution — the gradient of with respect to — is the softmax. controls hardness: gives uniform, gives one-hot on argmax. “Soft” does not mean “weak at large “; it means “smooth, differentiable approximation to max at any .”

Free energy. In statistical mechanics, . Identify (inverse temperature) and , and is negative free energy. The tilted distribution is the Boltzmann distribution. Cooling a system () collapses it to the ground state — physically, the same phenomenon as softmax sharpening to argmax.

Takeaway

One parameter, , smoothly interpolates between mean (small , leading cumulant) and max (large , exponential suppression of suboptimal outcomes). That is why CGF is the natural objective for discovery: tune to slide along this spectrum.