The Cumulant Generating Function: From Moments to Max
Starting Point: Taylor Expansion
For a random variable , the moment generating function (MGF) is . Expanding the exponential:
The coefficient of is the -th moment — one function “generates” the whole sequence. Differentiating times at recovers .
Taking the log gives the cumulant generating function (CGF):
The cumulants are cleaner than moments: , , encodes skewness, kurtosis. They are additive over independent sums — a property moments lack — because the log turns the multiplicative MGF of a sum into addition.
Small β: Mean-Dominated
Apply this to a reward under policy : . For small , the series truncates:
The leading term is just scaled mean reward — standard RL. Higher cumulants are negligible.
Large β: Max-Dominated
Do not use the Taylor series here (it loses meaning at large ). Use the original definition. For a batch , factor out :
Each non-max term has exponent , decaying exponentially as grows. The sum collapses to , so . Suboptimal rewards are exponentially suppressed; the objective is dominated by the single best outcome.
Connections: Softmax and Free Energy
Softmax. The tilted distribution — the gradient of with respect to — is the softmax. controls hardness: gives uniform, gives one-hot on argmax. “Soft” does not mean “weak at large “; it means “smooth, differentiable approximation to max at any .”
Free energy. In statistical mechanics, . Identify (inverse temperature) and , and is negative free energy. The tilted distribution is the Boltzmann distribution. Cooling a system () collapses it to the ground state — physically, the same phenomenon as softmax sharpening to argmax.
Takeaway
One parameter, , smoothly interpolates between mean (small , leading cumulant) and max (large , exponential suppression of suboptimal outcomes). That is why CGF is the natural objective for discovery: tune to slide along this spectrum.