Money, Power laws, Compounding and Distributions

NOTES ON STATISTICS, PROBABILITY and MATHEMATICS

Money, Power laws, Compounding and Distributions

The distribution of returns: Geometric mean and the log-normal distribution:

Introduction:

In non-ergotic processes the current step is completely constrained by the previous step. If one’s wealth drops to zero on step $4,$ steps $5$ through $100$ are completely locked out. It impossible to average one’s way out of a total loss because the path has hit an absorbing barrier.

Additive processes (like rolling a die to win or lose a flat $\$5)$ tend to be ergodic because the shocks don’t change the baseline leverage. Non-ergodic processes almost always involve multiplication or percentage scaling. The hallmark is that the system scales based on its current size ($W_{t+1} = W_t \times (1 + r)$). Because a $-40\%$ drop on a large base requires a $+66.7\%$ gain just to break even, the structural asymmetry of multiplication tears the individual time-path away from the linear group average.

A system over time in which the average outcome looks incredibly wealthy, but the typical individual experience is complete stagnation or decay, is non-ergodic. Over time, the probability distribution develops a massive, highly skewed right tail. The ensemble mean is pulled skyward by a tiny handful of extreme outliers, while the median trajectory continuously sinks toward zero. The group gets richer while the individual goes broke.

Particles in a gas chamber, they can bounce around and eventually visit every single possible state in the room given enough time. In a non-ergodic process, certain states are one-way doors. The existence of ruin, bankruptcy, death, or structural failure. Once a trajectory hits an absorbing barrier (a state one cannot leave), it is permanently removed from the system. Because time moves in only one direction, we cannot pool the survival metrics of an entire room of people today to predict our own survival probability over the next $40$ years.

The most prominent structural equivalent of a non-ergodic system in condensed matter physics is glass (and the broader family of spin glasses and amorphous solids). If we look at a collection of thousands of independent fluid particles at a low temperature, statistical mechanics predicts they should optimize into a highly ordered, perfectly repeating crystalline lattice (this is the ensemble perspective). But if liquid is cooled incredibly rapidly into a glass (a process called vitrification), its viscosity skyrockets before it can crystallize. The individual molecules become structurally jammed into a disordered, high-energy local configuration (this is the time perspective).

The Ole Peters’ coin toss consists of “I toss a fair coin, and if it comes up heads I’ll add 50% to your current wealth; if it comes up tails I will take away 40% of your current wealth.”

To understand why the individual path collapses, we must look at how percentages interact over a single individual time path. In an additive random walk (like a standard physical diffusion process or a simple game of adding/subtracting $\$5$), steps do not talk to each other. If we win $\$5$ and lose $\$5$, we are exactly back to even. The path is symmetric.

In a multiplicative random walk, the steps are connected. If our portfolio experiences a $+50\%$ gain followed by a $-40\%$ loss, the sequential math yields: $(1 + 0.50) \times (1 - 0.40) = 1.50 \times 0.60 = 0.90.$ We did not break even. We lost $10\%$ of our capital.

Multiplication is strictly commutative; meaning order doesn’t matter for the end result of a single path — yet these paths are sequence-dependent or connected. The resolution to this paradox lies in separating the order of events from the structural asymmetry of the percentages themselves. Here is why commutativity remains perfectly intact, but why the game still functions as a psychological trap.

Because multiplication is commutative, the sequence of our returns inside a single timeline changes absolutely nothing about our terminal wealth. If we start with $\$1.00$ and experience a gain followed by a loss $1.00 \times 1.50 \times 0.60 = 0.90,$ identical to first suffering a loss, and then a gain: $1.00 \times 0.60 \times 1.50 = 0.90.$ If we play this game for $1,000$ steps, and we flip exactly $500$ Heads and $500$ Tails, it doesn’t matter if we flip all $500$ Heads first, all $500$ Tails first, or alternate them perfectly like a checkerboard. Our terminal bank account will be exactly the same down to the penny:

\[W_{1000} = W_0 \times (1.50)^{500} \times (0.60)^{500} \approx 0.00000022\]

So, within a single timeline, commutativity guarantees complete path invariance regarding the order of returns.

When we say the steps are “connected” or “path-dependent,” we don’t mean that step $2$ knows what happened at step $1.$ We mean that the dollar value of the next step depends entirely on the current capital base. The illusion stems from our additive intuition. Human brains naturally hear $+50\%$ and $-40\%$ and intuitively average them out to $+5\%$. We expect a net positive drift because $+50$ is a bigger number than $-40$. But multiplication doesn’t care about the nominal size of the raw percentages; it operates on ratios: to mathematically erase a $-40\%$ drop (multiplying by $0.60$), we don’t need a $+40\%$ gain; we need to multiply by $\frac{1}{0.60} \approx 1.666$, which is a $+66.6\%$ gain. Because our $+50\%$ gain falls short of that required $+66.6\%$, the combination of a win and a loss is fundamentally a net-downward machine ($0.90$). Commutativity simply states that a destructive pairing destroys our capital regardless of whether the blow lands on the first flip or the second flip.

Because multiplication treats drops as devastating cuts to our compounding base, downward shocks are mathematically more powerful than upward shocks of the same scale. To recover from a $50\%$ drop, we don’t need a $50\%$ gain; we need a $100\%$ gain. This structural imbalance means that a single individual sequence of alternating up-and-down steps naturally exerts a downward gravitational pull on wealth. This structural downward pull affects individual paths and the collective ensemble in completely opposite ways. When a single individual plays this game for $n$ steps, their final wealth is a product of their specific sequence of shocks:

\[W_n = W_0 \prod_{t=1}^n (1 + r_t)\]

Taking the natural log of both sides transforms this into an additive timeline:$\log(W_n) = \log(W_0) + \sum_{t=1}^n \log(1 + r_t).$ By the Law of Large Numbers, as $n \to \infty$, the average performance of this individual timeline will converge strictly to the expected value of the log, which is a negative geometric rate ($r_g \approx -5.1\%$):

\[\frac{1}{n}\log\left(\frac{W_n}{W_0}\right) \to \mathbb{E}[\log(1+r)] = -0.05125\] This means that for any individual path, ruin is the structurally deterministic destination over a long enough time horizon. The collective expectation ignores the individual time path entirely. It asks a completely different question: “If $1,000$ independent players start simultaneously, what is the average wealth across all players at step $n$?” Because expectation ($\mathbb{E}$) is a linear operator, it slices across space, ignoring the path-dependent link between steps:

\[\mathbb{E}[W_n] = W_0 \prod_{t=1}^n \mathbb{E}[1 + r_t] = W_0 \times (1.05)^n\]

The ensemble average is pulled upward by a vanishingly small fraction of exponentially lucky paths. At step $1,000,$ a single player out of billions might hit an extraordinary streak of heads and accumulate a fortune so massive that, when averaged across the thousands of players who went broke, the group average looks highly profitable. The collective expectation gains because it can pool wealth across alternate universes, effectively short-circuiting the path-dependence that destroys the single individual locked into a single timeline.

This asymmetry is the defining, real-world characteristic of the log-normal distribution: the stark mathematical divergence between the mean and the median of a log-normal curve. The log-normal distribution explains the entire ergodicity paradox.

If we look at a single person’s wealth after $n$ coin tosses, it is a product of random multipliers. Because multiplying independent factors is messy, we take the logarithm to turn it into an additive problem:

\[\log(W_n) = \log(W_0) + \sum_{t=1}^n \log(1 + r_t)\]

By the Central Limit Theorem, when you add together a large number of independent random variables (in this case, the individual logged shocks $\log(1+r_t)$), their sum converges toward a symmetric normal (Gaussian) Distribution. If the logarithm of wealth is normally distributed, then the raw wealth itself ($W_n$) must follow a log-normal distribution. A normal distribution is beautifully symmetric: its mean, median, and mode are all the exact same number. But when we exponentiate that symmetric bell curve back into raw dollar space, it warps into a highly skewed shape.

In physical phase space, an absorbing or capturing state is a configuration from which the system can never escape once it enters. In finance, these capturing states appear either absolute Ruin ($W = 0$) — The strict absorbing barrier equivalent to betting $100\%$ of our wealth on a coin toss and losing. In a multiplicative system, zero is an absolute black hole: $0 \times 1.50 = 0.$ Once a trajectory touches this boundary, the multiplier engine loses all fuel. The path cannot explore any other coordinates of wealth space ever again. It is permanently captured. Most commonly, the scenario is relative ruin $(W \to 0),$ in which even if we only bet a fixed percentage of our wealth (so we never hit absolute zero), the log-scale center of gravity behaves as a capturing state. Because the geometric drift is negative ($-5.1\%$), the individual path is caught in a geometric tractor beam pulling it toward zero. As $n \to \infty$, our wealth asymptotically approaches zero ($W_n \to 0$). Once our portfolio drops from $\$1,000,000$ down to $\$0.01$, it is effectively captured by ruin. Even if we hit ten “Heads” in a row from that point, our wealth only climbs back to a few cents. We are trapped in a microscopic pocket of the broader phase space, completely unable to ever climb back up and catch the exploding red line of the ensemble average.

So we started at multiplicative random processes to find non-ergodicity and log-normal distributions. We are now adjacent to fat tails. In 1953, an economist named David Champernowne proved that if we take a standard multiplicative random walk (which naturally wants to form a log-normal distribution) and simply inject a strict lower bound $W_{\min}$ that catches paths and bounces them back up, the steady-state distribution of the system converges exactly to a power-law:

\[P(W > w) \propto w^{-\alpha}\] Paths trying to decay are stopped cold at $W_{\min}$. Because they cannot go lower, the random multiplicative shocks push them back upward. The wealth piles up against the floor, creating a dense reservoir of poor paths, while a lucky few are propelled into the stratosphere.

Another elegant way to stretch the log-normal narrative into a power-law is to introduce a finite lifespan for the paths, paired with the injection of brand-new paths. This is known as a birth-death or Yule process. Imagine $1,000$ players tossing coins, but with two new rules: 1. In every round, there is a small probability that a player gets eliminated or retires. 2. For every player that dies, a new player enters the game, starting back at baseline wealth ($W_0 = 1$). Because old paths constantly die out, no single path has an infinite amount of time to drift down to absolute zero. The system reaches a dynamic equilibrium. The paths that have survived for a very long time have experienced an enormous number of compounding multiplicative steps. Because multiplication grows exponentially, these ancient, surviving paths become cosmic outliers. When you combine a constant injection of young, baseline paths at the bottom with exponential growth for the rare, long-surviving paths at the top, the smooth log-normal curve stretches out and straightens into a strict, scale-free power-law.

Time-dependent individual path versus ensemble expectation:

This post is a riff on money / portfolio-based mathematical issues inspired by this post by Ole Peters. A blog entry touches on what he calls the infamous coin toss. The idea is as presented in the chart below, showing the discrepancy (under fair coin assumption) between the expected ensemble expectation, and the individual time paths largely ending in ruin.

I toss a fair coin, and if it comes up heads I’ll add 50% to your current wealth; if it comes up tails I will take away 40% of your current wealth. A fun thing to do in a lecture on the topic is to pause at this point and ask the audience if they’d like to take the gamble. Some will say yes, other no, and usually an interesting discussion of people’s motivations emerges. Often, the question comes up whether we’re allowed to repeat the gamble, and we will see that this leads naturally to the ergodicity problem.

The ergodicity problem, at least the part of it that is important to us, boils down to asking whether we get the same number when we average a fluctuating quantity over many different systems and when we average it over time. If we try this for the fluctuating wealth in the Peters coin toss the answer is no, and this has far-reaching consequences for economic theory.

A fair bet ($+50\%$ on heads, $-40\%$ on tails) has positive expected value, causing collective ensemble wealth to grow at $5\%$ per round while individual time-average trajectories decay toward ruin almost surely due to multiplicative effects: a lone individual faces self-destructive outcomes from repeated play, but a small group cooperating via risk pooling can access the beneficial collective growth that single trajectories cannot achieve. This ergodicity-breaking example, shown in the attached plot of diverging expected value versus individual experiences, underscores how non-ergodic processes create misalignments between personal incentives and group-level benefits in economics.

In the coin toss experiment,the arithmetic (ensemble) mean is $+5\%$ per round, leading to collective wealth growth. Yet, the geometric (time-average) mean is $\sqrt{1.5 \times 0.6} \approx 0.9487$ leading to $\sim 5.1\%$ decay per round. As a result, individual paths will almost surely tend to ruin, while the ensemble looks great.

In the meantime, the stock market works long-term for many individuals (especially passive index investors) precisely because its parameters are fundamentally different from the coin-toss example — even though both are multiplicative random walks with a trend.

The stock market returns, modeled as geometric brownian motion, are multiplicative and noisy. The positive drift from real economic growth (corporate earnings, dividends, productivity, innovation) is strong enough to overcome volatility: The arithmetic mean of the return is $\approx 10 - 12\%$ per year (historical S&P 500 nominal).

The arithmetic mean is simply the standard sample average of those individual annual returns. This represents the ensemble expectation — what we would expect to make if we picked a single year at random or pooled thousands of simultaneous portfolios:

\[r_a = \frac{1}{n} \sum_{i=1}^{n} r_i \approx 0.10 \text{ to } 0.12\]

Is $r$ the Same as $r_a$?

No, $r$ and $r_a$ are fundamentally different mathematical objects. Confusing the two is actually the exact mathematical oversight that hides the “non-ergodic trap” in finance.

Here is the breakdown of what each symbol represents:

The Variable $r$ (The Random Variable)

The symbol $r$ is a stochastic (random) variable. It represents the highly volatile, fluctuating return of the market in any single, isolated period.

It does not have a single value; it has a probability distribution.

In one year, $r$ might equal $+50\%$ (a massive boom).

In the next year, $r$ might equal $-50\%$ (a devastating crash).

When we write $\mathbb{E}[\log(1 + r)]$, we are taking the logarithm of $(1 + r)$ for every possible fluctuating outcome, and then calculating the average of those logged values.

The Constant $r_a$ (The Arithmetic Average)

The symbol $r_a$ is a constant, single number. It is the expected value (or long-run average) of that fluctuating variable $r$:

\[r_a = \mathbb{E}[r]\]

For a typical stock market index, $r_a$ is a fixed historical baseline, like $+10\%$ (or $0.10$).

It does not fluctuate. It is the steady, aggregated “group expectation” of the system.

Why This Distinction is Crucial: Jensen’s Inequality

If $r$ and $r_a$ were the same, we could just slide the expectation operator inside the logarithm:

\[\mathbb{E}[\log(1 + r)] \overset{?}{=} \log(1 + \mathbb{E}[r]) = \log(1 + r_a)\]

But because the logarithm is a downward-curving (concave) function, Jensen’s Inequality states that the average of the logs is always strictly less than the log of the average:

\[\mathbb{E}[\log(1 + r)] < \log(1 + r_a)\]

If we use $r_a$ inside the log, we get a flat line that is completely blind to risk. But if we use the fluctuating variable $r$, the curvature of the log punishes the volatility. The Taylor expansion translates this exact mathematical gap for us:

\[\mathbb{E}[\log(1 + r)] \approx \mathbb{E}[r] - \frac{\mathbb{E}[r^2]}{2} \approx r_a - \frac{\sigma^2}{2}\]

Summary

$r$ is the bumpy, real-world roller coaster of yearly returns (e.g., $+50\%$, then $-50\%$).

$r_a$ is the smooth, deceptive average speed of the roller coaster (e.g., $0\%$).

Inside the expectation $\mathbb{E}[\log(1+r)]$, we must use the bumpy variable $r$ so that the math can physically calculate and subtract the volatility drag ($\sigma^2/2$).

The volatility is $\approx 15-20\%$ per year.

\[\sigma = \sqrt{\frac{1}{n-1} \sum_{i=1}^{n} (r_i - r_a)^2} \approx 0.15 \text{ to } 0.20\]

For low volatility values $(\sigma = 15\%$ or $0.15)$:

\[\small \text{Volatility Drag} = \frac{(0.15)^2}{2} = \frac{0.0225}{2} = 0.01125 = \mathbf{1.13\%}\]

For high volatility $(\sigma = 20\%$ or $0.20)$:

\[\small\text{Volatility Drag} = \frac{(0.20)^2}{2} = \frac{0.0400}{2} = 0.0200 = \mathbf{2.00\%}\]

The geometric (time-average) mean return is

\[\small\text{geometric mean} \approx \text{arithmetic mean} - (\sigma^2/2) \approx 7-10\%\]

per year (still strongly positive!).

Let’s check what is happening mathematically:

The wealth after several years is a product of multipliers:

\[\small W_{\text{final}} = W_0 \times (1+r_1) \times (1+r_2) \times \dots \times (1+r_n)\]

Calculating the average of a long string of multiplications is computationally messy. However, the logarithm has a unique property: $\log(A \times B) = \log(A) + \log(B)$. Taking the log of the wealth turns that chain of multiplication into simple additions:

\[\small \log(W_{\text{final}}) = \log(W_0) + \log(1+r_1) + \log(1+r_2) + \dots + \log(1+r_n)\]

To turn this addition into a dollar amount, we plug it into the exponential function:

\[W_{\text{final}} = W_0 \cdot e^{\text{sum of logs}}\]

The function $f(r) = 1 + r$ is a perfectly straight line. It has no memory of risk. But the function $f(r) = \log(1 + r)$ is concave (it curves downward). This curve is the mathematical representation of the fact that losses hurt more than gains help. If we gain $50\%$ and then lose $50\%,$ our arithmetic average is $0\%$ $(1 + 0.5 - 0.5 = 1).$ But our actual wealth is down $25\%$ $(1.5 \times 0.5 = 0.75)$.

To obtain the Taylor expansion we start at

\[\log(W_{\text{final}}) = \log(W_0) + \sum_{i=1}^{n} \log(1+r_i)\]

by the law of large numbers as $n\to \infty$

\[\text{Long-run growth rate } (r_g) = \mathbb{E}[\log(1+r)]=\frac{1}{n} \log\left(\frac{W_{\text{final}}}{W_0}\right) = \frac{1}{n} \sum_{i=1}^{n} \log(1+r_i)\]

To understand why the long-term growth rate $r_g$ is mathematically equivalent to the expected value of the logarithm of $(1+r)$, we can break the derivation down into three logical steps:

The definition of growth rate of a portfolio that compounds over $n$ periods, at a continuously compounded growth rate $r_g$ is the ratio of the final wealth $W_{\text{final}}$ to the starting wealth $W_0$:

\[W_{\text{final}} = W_0 e^{r_g n}\]

If we want to isolate the growth rate $r_g$ to see what the compounding engine averaged per period, we solve for $r_g$ using basic algebra:

\[\frac{W_{\text{final}}}{W_0} = e^{r_g n}\]

Taking the natural logarithm ($\log$) of both sides:

\[\log\left(\frac{W_{\text{final}}}{W_0}\right) = r_g n\]

Dividing by $n$ gives us the exact definition of the realized growth rate per period:

\[r_g = \frac{1}{n} \log\left(\frac{W_{\text{final}}}{W_0}\right)\]

We also know that the final wealth $W_{\text{final}}$ is physically generated by multiplying our starting wealth by a sequence of individual annual returns $(1 + r_i)$:

\[W_{\text{final}} = W_0 \prod_{i=1}^{n} (1 + r_i)\]

If we divide by $W_0$ and take the logarithm of both sides, the logarithm native property ($\log(A \times B) = \log(A) + \log(B)$) converts that chain of multiplications into a simple sum of additions:

\[\log\left(\frac{W_{\text{final}}}{W_0}\right) = \log\left( \prod_{i=1}^{n} (1 + r_i) \right) = \sum_{i=1}^{n} \log(1 + r_i)\]

If we substitute this sum back into our definition of $r_g$:

\[r_g = \frac{1}{n} \sum_{i=1}^{n} \log(1 + r_i)\]

This formula is a time average. It is the average of the log-returns experienced by a single portfolio over a sequence of $n$ steps.

If we assume that the annual market returns $r_1, r_2, \dots, r_n$ are independent and identically distributed (i.i.d.) random variables, then their logarithmic counterparts:

\[Y_i = \log(1 + r_i)\] are also a sequence of i.i.d. random variables.

The Strong Law of Large Numbers (LLN) states that if you take the sample average (time average) of a sequence of i.i.d. random variables over an increasingly long horizon, that average is mathematically guaranteed to converge to the true statistical expected value (ensemble average) as $n$ approaches infinity:

\[\lim_{n \to \infty} \frac{1}{n} \sum_{i=1}^{n} Y_i = \mathbb{E}[Y]\]

Substituting $Y_i = \log(1+r_i)$ back into the LLN formula:

\[\lim_{n \to \infty} \frac{1}{n} \sum_{i=1}^{n} \log(1 + r_i) = \mathbb{E}[\log(1 + r)]\]

By linking these three steps, we get the complete chain of equalities:

\[\small\text{Long-run growth rate } (r_g) = \lim_{n \to \infty} \underbrace{\frac{1}{n} \log\left(\frac{W_{\text{final}}}{W_0}\right)}_{\text{Definition of } r_g} = \lim_{n \to \infty} \underbrace{\frac{1}{n} \sum_{i=1}^{n} \log(1 + r_i)}_{\text{Time Average of Log Returns}} \xrightarrow{\text{LLN}} \underbrace{\mathbb{E}[\log(1 + r)]}_{\text{Ensemble Expectation}}\]

If we look at raw wealth $W$, it is non-ergodic: the average of $1,000$ portfolios over $1$ year does not equal the long-term average of $1$ portfolio over $1,000$ years. However, by taking the logarithm of wealth, we turn multiplication into addition. Addition obeys the standard Law of Large Numbers. Therefore, while wealth itself is non-ergodic, the log-returns are ergodic. Because log-returns are ergodic, the time-average growth rate over a long horizon ($n \to \infty$) converges perfectly to the ensemble expectation of a single-step log-return, $\mathbb{E}[\log(1+r)]$.

We don’t know $\mathbb{E}[\log(1+r)]$ off the top of our heads. We only know the average return ($r_a$) and the volatility ($\sigma$). We use the Taylor expansion to translate the log back into numbers we actually have. We cannot easily compute the expected value of a logarithm, $\mathbb{E}[\ln(1 + r)]$, but we can easily compute the expected value of returns, $\mathbb{E}[r]$, and their squares, $\mathbb{E}[r^2]$.

We are taking the Taylor expansion of the natural logarithm of $1$ plus the rate of return, $r$: $f(r) = \ln(1 + r).$ We want to expand this function around the point $r = 0$ (which represents a $0\%$ return, the baseline where nothing changes).

A Taylor series for any smooth function $f(r)$ centered at $a = 0$ (also known as a Maclaurin series) is defined as:

\[f(r) = f(0) + f'(0)r + \frac{f''(0)}{2!}r^2 + \frac{f'''(0)}{3!}r^3 + \frac{f^{(4)}(0)}{4!}r^4 + \cdots\]

Computing the derivatives:

To build the expansion, we evaluate $f(r)$ and its derivatives at the center point $r = 0$:

The base value:

\[f(r) = \ln(1 + r) \implies f(0) = \ln(1) = 0\]

The first derivative (slope at $0$):

\[f'(r) = \frac{d}{dr}[\ln(1+r)] = \frac{1}{1 + r}\]

\[f'(0) = \frac{1}{1 + 0} = 1\]

The second derivative (curvature at $0$):

\[f''(r) = \frac{d}{dr}\left[(1+r)^{-1}\right] = -(1+r)^{-2} = -\frac{1}{(1 + r)^2}\]

\[f''(0) = -\frac{1}{(1 + 0)^2} = -1\]

The third derivative:

\[f'''(r) = \frac{d}{dr}\left[-(1+r)^{-2}\right] = 2(1+r)^{-3} = \frac{2}{(1 + r)^3}\]

\[f'''(0) = \frac{2}{(1 + 0)^3} = 2\]

The fourth derivative:

\[f^{(4)}(r) = \frac{d}{dr}\left[2(1+r)^{-3}\right] = -6(1+r)^{-4} = -\frac{6}{(1 + r)^4}\]

\[f^{(4)}(0) = -\frac{6}{(1 + 0)^4} = -6\]

Assembling the series:

\[f(r) = f(0) + f'(0)r + \frac{f''(0)}{2!}r^2 + \frac{f'''(0)}{3!}r^3 + \frac{f^{(4)}(0)}{4!}r^4 + \cdots\]

\[f(r) = 0 + (1)r + \frac{-1}{2}r^2 + \frac{2}{6}r^3 + \frac{-6}{24}r^4 + \cdots\]

Simplifying the fractions ($2/6 = 1/3$ and $6/24 = 1/4$) yields the classic, elegant power series:

\[\ln(1 + r) = r - \frac{r^2}{2} + \frac{r^3}{3} - \frac{r^4}{4} + \cdots\]

In the financial derivation, we truncate (cut off) the infinite series after the quadratic term:

\[\ln(1 + r) \approx r - \frac{r^2}{2}\]

We do this for two practical, real-world reasons: 1. Real-world annual stock market returns $r$ usually sit between $-0.20$ and $+0.20$. When we raise a small fraction to a high power, it becomes microscopic. For example, if $r = 0.10$ ($10\%$ return): First term ($r$): $0.10$; second term ($r^2/2$): $0.005$ (highly relevant); third term ($r^3/3$): $0.00033$ (virtually negligible). The term $r^2$ is the magic variable that lets us connect the log-return directly to volatility (variance). Since the expected value of squared deviations from the mean is the definition of variance ($\sigma^2$), keeping the second-order term allows us to mathematically calculate the “volatility drag” penalty.

\[\log(1+r) \approx r - \frac{r^2}{2}\]

Taking the expectation of that expansion:

\[\mathbb{E}[\log(1+r)] \approx \mathbb{E}[r] - \frac{\mathbb{E}[r^2]}{2}\]

Since $\mathbb{E}[r^2] \approx \sigma^2$ (for small $r_a$), we get the growth rate:

\[r_g \approx r_a - \frac{\sigma^2}{2}\]

The value we just found, $r_g$, is the continuous growth rate. To see what our actual bank account looks like, we have to undo the log we took: We raise $e$ to the power of our growth rate multiplied by time ($n$):

\[W_{\text{final}} = W_0 \cdot e^{\left(r_a - \frac{\sigma^2}{2}\right)\,n}\]

\[\log(1 + r) \approx r - \frac{r^2}{2}\]

captures the volatility penalty. The $r$ is our gain, but the $-\frac{r^2}{2}$ is the cost of the curve: the specific amount our portfolio growth is dragged down by the volatility of the path.

We use the log because of the strong law of large numbers. It states that for a sum of random variables, the average will eventually settle on the expected value. Because we turned our wealth into a sum using logs, we can say that in the long run:

\[\small \text{Growth Rate} = \frac{1}{n} \sum \log(1+r_i) \xrightarrow{n \to \infty} \mathbb{E}[\log(1+r)]\] The actual realized growth rate over $n$ years, which we know is the sum of all those individual logs divided by time:

\[\text{Realized Growth Rate} = \frac{1}{n} \left[ \log(1+r_1) + \log(1+r_2) + \dots + \log(1+r_n) \right]\] Because a Taylor expansion works for any small return, we can apply the expansion $r - \frac{r^2}{2}$ to every single year in that list:

Year 1: $\log(1+r_1) \approx r_1 - \frac{r_1^2}{2}$

Year 2: $\log(1+r_2) \approx r_2 - \frac{r_2^2}{2}$$\dots$

Year $n$: $\log(1+r_n) \approx r_n - \frac{r_n^2}{2}$

Putting these yearly calculations together:

\[\small\text{Realized Growth Rate} \approx \underbrace{\frac{r_1 + r_2 + \dots + r_n}{n}}_{\text{Average of Returns}} - \frac{1}{2} \left( \underbrace{\frac{r_1^2 + r_2^2 + \dots + r_n^2}{n}}_{\text{Average of Squares}} \right)\]

As we invest over a long horizon ($n \to \infty$), the LLNs kicks in. Those two sample averages naturally tend to their statistical expected values:

Pile 1 becomes the true arithmetic mean:

\[\frac{r_1 + r_2 + \dots + r_n}{n} \longrightarrow \mathbb{E}[r] = r_a\]

Pile 2 becomes the expected value of the squared returns:

\[\frac{r_1^2 + r_2^2 + \dots + r_n^2}{n} \longrightarrow \mathbb{E}[r^2] = \sigma^2 + r_a^2\]

If we didn’t use the log, we would be calculating $\mathbb{E}[1+r]$, which is the ensemble average. That tells us what happens to a collective or ensemble of people in one year. But the individual lives through a sequence of years. The log is what translates the collective’s luck into our personal reality.

We start with the Taylor expansion for a single realization of a return $r$:

\[\log(1 + r) \approx r - \frac{r^2}{2}+ \frac{r^3}{3}-\frac{r^4}{4}+\cdots\]

To find the long-run average, we take the expected value $(\mathbb{E})$ of both sides. Because expectation is a linear operator, we can apply it to each term individually:

\[\mathbb{E}[\log(1 + r)] \approx \mathbb{E}[r] - \mathbb{E}\left[\frac{r^2}{2}\right]\]

\[\mathbb{E}[\log(1 + r)] \approx \mathbb{E}[r] - \frac{1}{2}\mathbb{E}[r^2]\]

The fundamental definition of variance says that it equals the second raw moment $\mathbb E[x^2]$ minus the first raw moment $\mathbb E[x]$ squared:

\[\sigma^2 = \mathbb{E}[r^2] - \left(\mathbb{E}[r]\right)^2\]

If we rearrange this to solve for $\mathbb{E}[r^2]$, we get:

\[\mathbb{E}[r^2] = \sigma^2 + (\mathbb{E}[r])^2\]

Now, substituting this back into the Taylor approximation

\[\mathbb{E}[\log(1 + r)] \approx \mathbb{E}[r] - \frac{1}{2}(\sigma^2 + \left(\mathbb{E}[r])^2\right)\]

In the context of the stock market, the expected return $\mathbb{E}[r]$ (let’s call it $r_a$) is usually a small decimal like $0.10.$ This means its square $(r_a)^2$ is tiny $(0.01).$ Compared to the variance $\sigma^2$ (which for a $20\%$ volatility market is $0.20^2 = 0.04$), the squared mean is often small enough to ignore for a clean rule of thumb. Dropping the $(r_a)^2$ term gives us the famous result:

\[\bbox[10px,border:2px solid red]{\mathbb{E}[\log(1 + r)] \approx r_a - \frac{\sigma^2}{2}}\]

The second term is the volatility; a tax on our compounding. $r_a$ is the engine (the arithmetic mean), while $-\frac{\sigma^2}{2}$ is the friction (the volatility drag). $r_a$ it is not simple interest, though it looks deceptively like it. In this context, $r_a$ (the arithmetic mean) is the expected value of the single-period return. If we look at the stock market for exactly one year, $r_a$ is the average of all possible outcomes for that year. If the media says “The S&P 500 has an average annual return of $12\%,$” they are usually summing up the annual returns of the last $90$ years and dividing by $90.$ That $12\%$ is our $r_a$. The trap is that $r_a$ is an ensemble measure. It tells us “If $1,000$ people invest for $1$ year, the average person will have $1 + r_a$.” However, it does not tell us, “If $1$ person invests for $1,000$ years, he will have $(1 + r_a)^{1000}$. $r_a$ (arithmetic) is the average of the outcomes. In the coin toss $r_a$ is $+5\%.$ It looks profitable, but this is the ensemble average. The $r_g$ (geometric) is the average of the growth (the geometric mean - see below). In that same coin toss, $r_g$ is $-5.1\%.$ ($-0.05125$) - it guarantees that we are actually going broke. This is the time average.

Therefore, the $r_g$ would be:

\[\bbox[10px,border:2px solid red]{r_g=e^{\mathbb E[\log(1+r)]}-1}\]

$r$ (neither $r_a$ (arithmetic mean) nor $r_g$ (geometric mean)) is the raw, single-period random variable — the actual return of the market in any given single year (or the outcome of a single coin toss). It is a fluctuating variable that can take on many different values based on its underlying probability distribution.

We see again the equation

\[W_{\text{final}} = W_0 \cdot e^{\left(r_a - \frac{\sigma^2}{2}\right)\,n} = W_0 \cdot \left(e^{\mathbb{E}[\log(1+r)]}\right)^n\]

In Ole Peters’ classic “infamous coin toss” model, the game has two equally likely outcomes ($p = 0.5$ each):

Heads: $+50\%$ return ($r_H = +0.50$)

Tails: $-40\%$ return ($r_T = -0.40$)

To calculate the variance ($\sigma^2$) for this specific coin toss, we follow the exact statistical steps we just walked through, using discrete probabilities:

First, we find the arithmetic mean ($r_a$): First, we find the simple average return of a single flip by multiplying each outcome by its probability:

\[r_a = \mathbb{E}[r] = (0.5 \times 0.50) + (0.5 \times -0.40)\]

\[r_a = 0.25 - 0.20 = 0.05\]

The arithmetic expectation is positive. If $1,000$ people play this once, the group’s total pool of money grows by $5\%.$

Now we find the expected value of the squared returns ($\mathbb{E}[r^2]$):

We square each individual outcome and find the average of those squares:

\[\mathbb{E}[r^2] = (0.5 \times (0.50)^2) + (0.5 \times (-0.40)^2)\]

\[\mathbb{E}[r^2] = (0.5 \times 0.25) + (0.5 \times 0.16)\]

\[\mathbb{E}[r^2] = 0.125 + 0.080 = 0.205\]

Finally, we compute the variance ($\sigma^2$): we plug these into the variance formula, $\sigma^2 = \mathbb{E}[r^2] - (r_a)^2$:

\[\sigma^2 = 0.205 - (0.05)^2\]

\[\sigma^2 = 0.205 - 0.0025 = 0.2025\]

Now that we have computed our parameters for the coin toss ($r_a = 0.05$ and $\sigma^2 = 0.2025$), let’s watch the Taylor expansion reveal the individual’s long-run fate:

\[\mathbb{E}[\log(1+r)] \approx r_a - \frac{\sigma^2}{2}\]

\[\mathbb{E}[\log(1+r)] \approx 0.05 - \frac{0.2025}{2}\]

\[\mathbb{E}[\log(1+r)] \approx 0.05 - 0.10125 = -0.05125\]

Whether we use the exact calculation or the quick approximation, the result is deeply negative (around $5.2\%$ per flip). Because the variance ($\sigma^2 = 0.2025$) is so massive, the volatility drag ($\approx 10.1\%$) completely overwhelms the arithmetic engine ($5\%$). The volatility physically bends our time-average growth rate below zero, guarantees that an individual player’s wealth will decay toward absolute ruin with every flip, and neatly unmasks the ergodicity paradox.

As time ($n$) approaches infinity, a single, solitary timeline out of billions is mathematically sufficient to carry the entire group expectation on its back. The other billions of timelines can sit at absolute zero, completely ruined. This sounds like a statistical impossibility, but it is the inevitable destination of a compounding, non-ergodic system. Because the game is multiplicative, wealth doesn’t just fluctuate; it concentrates. Every time a lucky path flips “Heads,” its capital base widens, meaning its next dollar gain will be even larger. Conversely, every time a path flips “Tails,” its capital base shrinks, structurally neutering its ability to make large dollar gains in the future. As the steps compound from $60$ to $100,$ and then to $1,000,$ the distribution doesn’t just skew — it undergoes a profound phase transition where the number of surviving paths shrinks faster than the wealth grows.

Below is an example with $100$ paths highlighting the “winning” paths in red, along the expected ensemble expectation. The histogram is again consistent with a log-normal distribution.

Note that the true discrete geometric rate is $\ln(0.9487) \approx -0.0527$. The second-order Taylor expansion gives a close approximation, but because the coin toss features massive, discrete jumps ($+50\% / -40\%$), higher-order terms in the Taylor series expansion retain a small amount of residual weight. In continuous-time financial modeling where variations occur over infinitesimal steps ($dt$), these higher-order terms vanish completely, making the bridge formula exact.

The sum of logs ($\frac{1}{n} \sum \log(1+r_i)$) is simply the log-geometric mean. If we take the arithmetic mean of the logarithms, we get the exact same result as taking the geometric mean of the original numbers.

\[\small \text{Geometric Mean} = e^{\text{Arithmetic Mean of Logs}}=\sqrt[n]{(1+r_1)(1+r_2)\dots(1+r_n)} - 1\]

The Taylor expansion and the $\mathbb{E}[\log(1+r)]$ are useful because the arithmetic Mean ($r_a$) and variance ($\sigma^2$) are what we actually see in the world. Banks quote arithmetic interest rates; news outlets quote arithmetic market returns; analysts quote volatility (standard deviation). The formula $r_g \approx r_a - \frac{\sigma^2}{2}$ is the bridge.

A process is ergodic if its arithmetic mean (ensemble) is equal to its geometric mean (time).

Addition is ergodic: If 100 people each get $1,$ the average is $1.$ If 1 person gets $1$ every day for $100$ days, the average is $1.$

Compounding is not ergodic: As soon as we have a multiplicative process with randomness, the geometric mean must be lower than the arithmetic mean.

Connection to the log-normal distribution:

When we multiply a long string of random returns together (like in the coin toss or the stock market), the central limit theorem doesn’t produce a normal distribution of wealth. Instead, it produces a log-normal distribution. The log-normal is heavily skewed to the right. It has a long “tail” of extremely lucky outcomes (the few people who get very rich) and a huge “bulge” near zero (the many people who lose money). In a log-normal distribution, the three measures of “average” actually physically separate from each other: The mean (arithmetic) is pulled way to the right by a few lottery winners. This represents the ensemble average. The median (50th percentile) is significantly lower than the mean. This is what the typical individual experiences. The mode (the peak) is even lower than the median. It’s the most likely single outcome (often near zero in the coin toss).

The “infamous coin toss” is essentially a process where the mean stays high (5% growth), but the median and mode drift toward zero. Because we usually define “success” as the mean, we think the game is good. But because we actually live as the median individual, we experience ruin. This is why the Taylor expansion result ($r_g \approx r_a - \sigma^2/2$) is so powerful: it calculates exactly how far the median (our reality) has fallen below the mean (the group’s average) due to the variance. There are other related distributions, such as the Cauchy, where the variance is so high that the arithmetic mean doesn’t even exist mathematically.

If wetake a sample of $n$ individual, independent annual returns ($r_1, r_2, \dots, r_n$) and calculate their standard arithmetic mean:

\[\bar{r} = \frac{r_1 + r_2 + \dots + r_n}{n}\]

the Central Limit Theorem (CLT) dictates that as our sample size $n$ grows, the distribution of that arithmetic mean $\bar{r}$ will converge to a Normal distribution. This is true regardless of the underlying distribution of the individual returns. If instead of adding the returns, we are looking at the average compound growth rate over time (the geometric mean), we are multiplying them:

\[\small \text{Geometric Mean} = \left( \prod_{i=1}^n (1 + r_i) \right)^{1/n}\]

If we take the natural log of this geometric mean, it transforms into an arithmetic average of log-returns:

\[\log(\text{Geometric Mean}) = \frac{1}{n} \sum_{i=1}^n \log(1 + r_i)\]

By applying the Central Limit Theorem to this sum, the distribution of the average log-return becomes Normal. Because the log of the geometric mean is normally distributed, if we exponentiate it to look at the actual growth factor, the distribution of that geometric mean is log-normal.

The log-normal distribution is seen when looking at $(1 + r)$, not $r$. Specifically, it is the total compounded multiplier of our wealth over time—which is built by multiplying a long string of these $(1+r)$ terms together — that forms a log-normal distribution.

This is illustrated in the following plot. Notice the effect of the volatility drag: sliding the selection towards zero, turns the distribution of the results into a Gaussian.

Multiplicative processes, power-laws the gamma function and the Mellin transform:

The Mellin transform and the gamma function:

The Mellin transform of a locally integrable function $f(t)$ defined on the positive real axis is given by the integral:

\[\mathcal{M}\{f(t)\}(s) = \int_{0}^{\infty} t^{s-1} f(t) \, dt=\int_{0}^{\infty} t^s f(t) \, \frac{dt}t\]

where $\frac{dt}t$ is the Haar measure.

The Mellin transform is often described as the natural transform for scale-invariant problems. While the Fourier transform deals with shifts (additions), the Mellin transform deals with scaling (multiplications).

Let’s say we have a function $f(t)$ that is a clean mix of polynomial degrees, like a Taylor series:

\[f(t) = a_0 + a_1 t + a_2 t^2 + a_3 t^3 + \dots = \sum_{k=0}^{\infty} a_k t^k\]

If we feed this function directly into the Mellin transform without changing anything

\[\mathcal{M}\{f(t)\}(s) = \int_{0}^{\infty} t^{s-1} \left( a_0 + a_1 t + a_2 t^2 + \dots \right) dt\]

Focusing just on the behavior near zero, near the lower limit of integration ($t \to 0$), the integral of the $k$-th term looks like:

\[\int_{0}^{1} a_k t^{s-1+k} \, dt = \left[ a_k \frac{t^{s+k}}{s+k} \right]_0^1 = \frac{a_k}{s+k}\]

If we evaluate this transform at $s = -k$, the denominator becomes zero, and the entire Mellin transform blows up to infinity. This means the Mellin transform converts polynomial terms into vertical spikes (poles) located at the negative integers on the complex plane. Because the Mellin transform automatically creates these spikes, we can read the exact composition of our original function directly off the map of its poles: The location of the pole tells us the degree of the polynomial ($t^k$ creates a pole at $s = -k$). The strength (or “residue”) of the pole tells us the coefficient ($a_k$) of that polynomial. If our goal is to use an integral transform that directly maps a function to its raw power series coefficients without the inverse flipping effect, we have to switch tools. The tool we’d be looking for is called the Borel transform.

The Mellin transform is suitable for multiplicative processes, which include the prototype of a multiplicative process: the factorial, or in continuous space, the gamma function. The gamma functions is just the Mellin transform of the $f(x)=e^{-x}$ exponential decay function:

\[\Gamma(s) = \mathcal{M}\{e^{-t}\}(s)=\int_{0}^{\infty} t^{s-1} e^{-t} \, dt\]

If we try to think about it in terms of shapes (matching a growing function to a growing function), it makes absolutely zero intuitive sense that we would have selected $e^{-x}$. The breakthrough happens when we stop looking at the shape of the functions, and start looking at the arithmetic operations that build them. The Gamma function isn’t a continuous factorial because of what $e^{-t}$ looks like; it is a continuous factorial because of what $e^{-t}$ does when you differentiate it. We want to get this behaviour:

\[n! = n \times (n-1) \times (n-2) \times \dots \times 1\]

To mimic this continuously using calculus, we need a machine that automatically spits out a chain of descending integers. What is the only operation in calculus that drops an exponent down by one and turns it into a multiplier? Differentiation.

\[\frac{d}{dt}(t^n) = n \cdot t^{n-1}\]

If we want to generate a full factorial, we need a process that allows us to differentiate a power over and over again, collecting those multipliers along the way:

\[n \cdot (n-1) \cdot (n-2) \dots\]

To run that differentiation machine smoothly inside an integral, we need a partner function that can handle being differentiated repeatedly without changing its own structure. We need a function that satisfies:

\[\frac{d}{dt} f(t) = \pm f(t)\]

There are only two functions in the universe that do this: $e^t$ and $e^{-t}$. If we use $e^t$, our calculus machine explodes instantly at the boundary of infinity ($e^\infty = \infty$). It is a runaway engine with no brakes. But $e^{-t}$ has a beautiful, asymmetrical superpower: At the starting line ($t=0$), it is perfectly stable: $e^0 = 1$. At the finish line ($t \to \infty$), it acts as a perfect mathematical sink: $e^{-\infty} = 0$.Because $e^{-t}$ drops to zero at infinity, it creates a “containment field.” It allows us to use integration by parts to repeatedly differentiate your probing power $t^n$, dragging down $n$, then $n-1$, then $n-2$, all while keeping the boundary math perfectly clean and finite.

Let’s pick a concrete value, say $s = 4$. This means our Mellin probing function is $t^{s-1} = t^3$. We want to see how this probe interacts with the infinite power series of the negative exponential:

\[e^{-t} = 1 - t + \frac{t^2}{2!} - \frac{t^3}{3!} + \frac{t^4}{4!} - \frac{t^5}{5!} + \dots = \sum_{n=0}^{\infty} \frac{(-1)^n t^n}{n!}\]

If we try to multiply the probe $t^3$ directly into this series term-by-term and integrate from $0$ to $\infty$, we run into a major calculus obstacle: individual terms like $\int_0^\infty t^{n+3} dt$ diverge completely.

To see the true harmonic interaction without infinity blowing up the math, we have to look at the discrete, algebraic soul of the Mellin transform using a brilliant trick: integration by parts. The Gamma function for $s=4$ is:

\[\Gamma(4) = \int_0^\infty t^3 e^{-t} \, dt\]

Let’s apply integration by parts, setting $u = t^3$ (our probe) and $dv = e^{-t}dt$ (the target). This yields $du = 3t^2 dt$ and $v = -e^{-t}$:

\[\Gamma(4) = \left[ -t^3 e^{-t} \right]_0^\infty + \int_0^\infty (3t^2) e^{-t} \, dt\]

The boundary term $\left[ -t^3 e^{-t} \right]_0^\infty$ vanishes at both limits because $e^{-t}$ crushes $t^3$ at infinity, and $t^3$ is $0$ at zero. We are left with:

\[\Gamma(4) = 3 \int_0^\infty t^2 e^{-t} \, dt\]

Notice what just happened. The probe $t^3$ interacted with the exponential, transferred its power downward by differentiation, and left behind a scaling factor of $3.$

If we repeat this integration by parts two more times, the power of the probe keeps cascading down:

\[\Gamma(4) = 3 \times 2 \int_0^\infty t^1 e^{-t} \, dt\]

\[\Gamma(4) = 3 \times 2 \times 1 \int_0^\infty t^0 e^{-t} \, dt\]

At the very last step, the probing function has been stripped of its variable entirely ($t^0 = 1$). We are left with a pure accumulation of the scaling factors, multiplied by a fundamental baseline integral:

\[\Gamma(4) = (3 \times 2 \times 1) \int_0^\infty e^{-t} \, dt = 3! \cdot [ -e^{-t} ]_0^\infty = 3! \cdot (0 - (-1)) = 3!\]

Now, let’s look back at the power series of $e^{-t}$. The summand that matches the original degree of our probe ($t^3$) is:

\[\text{The } n=3 \text{ term} = -\frac{t^3}{3!}\]

The denominator of this specific target summand is exactly $3!$. The cascade of the Mellin probing function generated a numerator of exactly $3!$. When the probe $t^3$ extracts information from $e^{-t}$, it acts like a combinations generator. The process of integrating across the entire domain allows the $t^3$ probe to systematically peel back the layers of the exponential function via differentiation until it strikes the exact core layer — the $n=3$ harmonic component — perfectly canceling out its structural denominator ($3!$) and leaving behind the integer factorial.

Mellin transform and $p$-adics:

To analyze the zeta function, Riemann used standard calculus. In 1950 John Tate considered that the real number line may not be the right place to look at primes. This idea led to a brand-new geometry where a specific prime number is the center of the universe. This is where the $p$-adic numbers come in. On the standard real number line, numbers get small if they are close to zero ($0.1, 0.01, 0.001$). In Tate’s $3$-adic world (for example in $p=3$), numbers get small in the ultrametric measure if they are divisible by high powers of $3.$ In this world, the number $3$ is close to zero. The number $9$ is even closer. The number $27$ is practically touching zero. Because of this bizarre rule for distance, the $p$-adic world doesn’t look like a smooth line. Instead of a continuous highway, the space looks like a vast collection of nested, concentric rings or shells around zero. The unit shell contains all the numbers that aren’t divisible by $3$ at all (like $1, 2, 4, 5, 7, 8$). The next shell closer to zero contains numbers divisible by $3$ (like $3, 6, 12, 15$). The next shell even closer contains numbers divisible by $9$ (like $9, 18, 27$). There are also outer shells for fractions like $\frac{1}{3}$ and $\frac{1}{9}$ that are far away from zero.

Let’s take a simple geometric function on the p-adic shells, feed it into the Mellin transform, and watch it magically transform into the local Euler factor — the fundamental building block of the Riemann Zeta function.

Let’s use our 3-adic world ($\mathbb{Q}_3$) for the calculation. We need a function $f(x)$ to analyze. Tate chose the simplest, most natural function possible for the p-adic integers: the indicator function of $\mathbb{Z}_3$ (let’s call it $\mathbf{1}_{\mathbb{Z}_3}$).

This function acts like a flat, geometric step: It outputs a $1$ if the number $x$ is a $3$-adic integer (meaning it lives on the unit shell or any inner shell closer to zero). It outputs a $0$ if the number $x$ has a $3$ in the denominator (meaning it lives on the outer fractional shells). Mathematically on the ground, this function is just a flat plateau covering the center of our shattered world.

Now we feed this step function into the Mellin transform. The transform multiplies our function by the “frequency” term $|x|_3^s$ and scans it using our multiplicative ruler $d^\times x$:

\[\mathcal{M}\{f\}(s) = \int_{\mathbb{Q}_3^\times} \mathbf{1}_{\mathbb{Z}_3}(x) \cdot |x|_3^s \, d^\times x\]

Because the function jumps to $0$ outside of the integers, the transform automatically discards all the outer fractional shells. The infinity of the integral vanishes, and the domain shrinks strictly to the non-zero integers:

\[\mathcal{M}\{f\}(s) = \int_{\mathbb{Z}_3 \setminus \{0\}} |x|_3^s \, d^\times x\]

Now, the transform performs its magic by scanning the remaining space shell by shell. Recall that the space of non-zero integers is just a collection of disjoint concentric rings stretching inward toward zero. We can break the single integral into an infinite sum of smaller integrals, one for each shell:

\[\mathcal{M}\{f\}(s) = \int_{\text{Shell } 0} |x|_3^s \, d^\times x + \int_{\text{Shell } 1} |x|_3^s \, d^\times x + \int_{\text{Shell } 2} |x|_3^s \, d^\times x + \dots\]

The calculus becomes on each individual shell becomes simple:

On shell $0$ (The units $\mathbb{Z}_3^\times$): Every number has size $|x| = 3^0 = 1$. The volume of the shell is $1$.

\[\int_{\text{Shell } 0} (1)^s \, d^\times x = 1 \cdot 1 = 1\]

On Shell $1$ (Numbers like $3, 6, 12$): Every number has size $|x| = 3^{-1}$. The multiplicative volume of this shell is also exactly $1$.

\[\int_{\text{Shell } 1} (3^{-1})^s \, d^\times x = 1 \cdot 3^{-s} = 3^{-s}\]

On shell $2$ (Numbers like $9, 18, 27$): Every number has size $|x| = 3^{-2}$. The volume is still $1$.

\[\int_{\text{Shell } 2} (3^{-2})^s \, d^\times x = 1 \cdot (3^{-s})^2 = 3^{-2s}\]

When we string the results of all these individual shells back together, the Mellin transform hands you an infinite geometric series:

\[\mathcal{M}\{f\}(s) = 1 + 3^{-s} + 3^{-2s} + 3^{-3s} + 3^{-4s} + \dots\]

If we apply the standard high-school algebra formula for a geometric series ($\sum r^n = \frac{1}{1-r}$), this infinite chain collapses into a single, elegant complex function:

\[\mathcal{M}\{f\}(s) = \frac{1}{1 - 3^{-s}}\]

We started with a rigid, localized geometric step function ($\mathbf{1}_{\mathbb{Z}_3}$) living in a fractured, non-Archimedean space.By passing it through the Mellin transform, the fractured geometry completely evaporated, and out popped $\frac{1}{1 - 3^{-s}}$. This is exactly the 3-adic Euler factor of the Riemann Zeta Function!

\[\zeta(s) = \frac{1}{1 - 2^{-s}} \cdot \mathbf{\frac{1}{1 - 3^{-s}}} \cdot \frac{1}{1 - 5^{-s}} \dots\]

Tate proved that the analytic heart of the Zeta function isn’t some mysterious, artificial complex formula. It is simply the global “echo” you get when you use a multiplicative transform to scan the local, natural geometry of prime numbers.

Connection to finances and power-laws:

To get a permanent macroscale power law (where a tiny fraction of companies or individuals hold a massive, disproportionate share of the wealth), an economy only requires three basic conditions. If these three things are true, a power law is mathematically inevitable.

Small percentage changes must compound over time (making $\$100$ grow by $5\%$ behaves differently than just adding a flat $\$5$ every day).
A kill rate (portfolio churn or corporate attrition): Companies or portfolios cannot live forever. There must be a constant clearing pressure—bankruptcy, acquisition, or retirement—that takes older, large entities out of the game and resets the clock with new, small ones.
Continuous reinvestment or rebirth: new players must enter at the bottom floor to keep the population stable while the old ones grow or die.

Here is exactly how the two distributions manage the growth machine and the death machine to create this structure:

The compounding part of the economy in isolation, the distribution that takes care of it is the $\text{log-normal}$ distribution. It is the standard Central Limit Theorem, but for multiplication. If a business experiences a random sequence of percentage shocks over its life (e.g., $+10\%$, $-5\%$, $+2\%$), its wealth over time is a chain of multiplications:

\[\small \text{Final Wealth} = \text{Starting Wealth} \times 1.10 \times 0.95 \times 1.02 \dots\]

If a group of businesses run this multiplicative race for a fixed amount of time, they will naturally form a log-normal curve. This curve is already skewed (it has a right-hand tail), but it is not a power law yet. Left completely alone, a pure $\small\color{red}{\text{log-normal}}$ distribution keeps spreading out indefinitely, and its tail eventually decays too fast to match a true, stable Pareto power law.

Portfolios do not get to compound forever. They are being actively affected by real-world friction. As we established, a single catastrophic market shock arrives randomly, following an $\small\color{red}{\text{exponential}}$ distribution. If a business has zero safety buffers, the very first shock kills it. Its lifespan is purely exponential. But a real business builds capital buffers, cash reserves, or lines of credit. Let’s say a business can survive exactly $k$ major shocks before it goes bankrupt. The time it takes for Shock $1$ to hit is exponential ($X_1$). The time between Shock $1$ and Shock $2$ is exponential ($X_2$). The time until the final, fatal $k$-th shock hits is $X_1 + X_2 + \dots + X_k$. This sum transforms the lifespan of the company from a simple exponential decay into a $\small\color{red}{\text{gamma}}$ distribution.

If $X_1, \dots, X_k$ are independent and exponentially distributed with parameter $\lambda$, then $X_1 + \dots + X_k$ is $\Gamma$ distributed with parameters $k$ and $\lambda$.

The gamma distribution describes the demographics of age in the economy: it dictates exactly what percentage of businesses in the market are young, middle-aged, or old survivors.

The log-normal growth machine is trying to stretch the lucky survivors exponentially upward into infinite wealth over time. The gamma “kill” machine (attrition, churn) is constantly chopping down businesses based on their age, acting as a structural filter. Because of the gamma lifespan, there are millions of new, young, small companies at the bottom (governed by a $\small\color{red}{\text{Poisson}}$ point process (for timing) combined with a highly skewed distribution for entry size), a decent number of middle-aged medium companies, and only a tiny handful of ancient, lucky giants that have managed to navigate the gauntlet of shocks without hitting their $k$-th failure limit. Looking at a snapshot of this economy today, we see a mixture of ages. The gamma distribution dictates how many companies have reached a certain age, and the log-normal distribution dictates how much wealth a company accumulates if it survives to that age. When these two exponential forces collide, i.e. exponential compounding stretching wealth upward, and exponential/gamma lifespans cutting timelines short—they perfectly balance each other out. The time variable cancels out of the system entirely, and the distribution freezes into a permanent, stable Pareto power law. The power law is simply the steady-state signature of a system where wealth compounds multiplicatively, but the time allowed to compound is strictly rationed by a Gamma-distributed clock of ruin.

When we look at asset returns or growth, we don’t add them; we multiply them. If a portfolio grows by a factor of $1.5$ and then drops by a factor of $0.8$, the total change is $1.5 \times 0.8$. To find the average return, we use the geometric mean (multiplying the values and taking the root). But if we map those returns into “log-space” by taking $\ln(x)$, those multiplications become additions ($\ln(a \cdot b) = \ln(a) + \ln(b)$). The geometric mean in the original space becomes the familiar arithmetic mean in log-space.

The Mellin Transform is a Fourier Transform in log-space. The exact same mapping happens under the hood of the Mellin transform. If we take the definition of the Mellin transform of a function $f(x)$ (see below for the $\color{red}x$ (*) in the denominator):

\[\mathcal{M}\{f(x)\}(s) = \int_0^\infty f(x) x^{s} \frac{dx}{\color{red}x}\] and we make a logarithmic change of variables by letting $x = e^{-t}$ (which means $t = -\ln(x)$), the entire integral transforms into a standard Fourier (or bilateral Laplace) transform:

\[\int_{-\infty}^\infty f(e^{-t}) e^{-st} dt\]

By moving to log-space, multiplication ($x$) becomes addition ($t$), and the scale-invariant Mellin transform becomes the shift-invariant Fourier transform.

The Fourier transform as operating on a standard linear ruler. On this ruler, the distance between $1$ and $2$ is the same as the distance between $100$ and $101$. It is a world governed by addition and translation (shifts). If we shift a function by adding a constant ($x \to x + a$), the Fourier transform handles this beautifully because its basis functions ($e^{i\omega x}$) turn shifts into simple phase rotations. The Mellin transform, however, lives on a logarithmic ruler. On this ruler, the distance between $1$ and $2$ is the same as the distance between $100$ and $200$. This is a world governed by multiplication and scaling. If we scale a variable by multiplying it ($x \to ax$), the Mellin transform handles it seamlessly because its basis functions ($x^{s-1}$) transform scaling into a simple shift in the frequency domain.

Power laws are the ultimate expression of scale invariance, and the Mellin transform is the natural mathematical tool used to analyze them. A power law is any relationship that fits the form:

\[f(x) = C x^{-\alpha}\]

The defining feature of a power law is that if we scale the input $x$ by a constant factor $a$, the function’s output changes only by a constant proportional factor. It doesn’t change its fundamental shape or behavior. Mathematically, if we replace $x$ with $ax$:

\[f(ax) = C (ax)^{-\alpha} = a^{-\alpha} \cdot (C x^{-\alpha}) = a^{-\alpha} f(x)\]

Scaling the argument by $a$ just multiplied the entire function by the constant $a^{-\alpha}$. The relative structure remains completely identical. If we look at a power law on a log-log plot, it forms a perfectly straight line. Whether we zoom in on the range from $1$ to $10,$ or zoom out to look at $10,000$ to $100,000,$ the slope ($-\alpha$) never changes. There is no characteristic scale or baseline unit that dictates the behavior of the system.

The Mellin Transform is a power law detector. Just as the Fourier transform uses sines and cosines ($e^{i\omega x}$) as probes to find periodic cycles in a signal, the Mellin transform uses power laws ($x^{s-1}$) as its probes. When we take the Mellin transform of a function, we are taking the inner product of that function against a spectrum of power laws:

\[\mathcal{M}\{f(x)\}(s) = \int_0^\infty f(x) x^{s-1} dx\]

If the function $f(x)$ contains a pure power law or behaves like one over a certain range, the Mellin transform will manifest a pole at that specific exponent. For example, if we have a system whose behavior transitions from one power law to another as it grows, the Mellin transform will reveal distinct poles that correspond exactly to those different exponents. It maps the complex, multi-scale physical behavior into a static map of singular points in the complex plane.

When we multiply independent random variables together (like daily financial returns or sequential growth phases), the Central Limit Theorem in log-space dictates that the distribution becomes log-normal. However, if we introduce certain types of feedback loops, drops, or phase transitions into that multiplicative growth, the distribution shifts from log-normal to a heavy-tailed power law (like a Pareto distribution).

In both cases, the underlying engine is purely multiplicative. Because the Mellin transform natively tracks multiplicative scaling via power-law probes, it serves as the exact tool needed to calculate the moments, expected values, and asymptotic behaviors of these complex, heavy-tailed systems.

In the real world, macroscale power laws are often born from a mixture of microscale gamma or exponential processes. This is a beautiful phenomenon known as Superstatistics.

A microstate is a specific, highly detailed snapshot of a system. It tells us the exact position and momentum of every single individual atom at a precise millisecond. A microstate is the precise financial snapshot of a single company or portfolio. It is the realization of that specific entity’s local random walk — its current capital base, its exact sequence of coin-flip shocks, and its unique lifespan. At this micro-level, the system looks like standard, short-term thermodynamics. If we isolate a single firm, its local dynamics are driven by an ordinary Boltzmann-like exponential factor, $e^{-\beta E}$, where $E$ is a state variable (like a sudden drop in wealth) and $\beta$ is a local intensive parameter. In physics, $\beta$ is inverse temperature ($\frac{1}{k_B T}$); in finance, $\beta$ represents the local rate of volatility friction or capital decay.

A macrostate ignores the microscopic details of individuals and looks only at the large-scale, coarse-grained properties of the entire system. In a gas, the macrostate is defined by overall variables we can measure with a thermometer or pressure gauge, like Temperature ($T$), Pressure ($P$), and Volume ($V$). Millions of different microscopic atomic configurations (microstates) can produce the exact same macrostate. In our economic narrative, the macrostate is the overall wealth distribution function of the entire country or market index. It doesn’t care whether Company $A$ or Company $B$ went bankrupt today.

Superstatistics is the statistics of the intensive parameter $\beta$ itself. Instead of treating the world as one big room with a uniform temperature, superstatistics models the macroscale as a mosaic of many independent “local equilibrium cells.” Inside each individual cell, the system behaves standardly: it settles into a local exponential microstate driven by its own constant $\beta$. But if we step back to look at the macrostate, the parameter $\beta$ varies dynamically from cell to cell because different industries or regions experience different levels of risk, lifecycle constraints, and ambient economic conditions.

Imagine an economy full of thousands of independent businesses or individual portfolios. Inside each individual business, wealth is driven by a localized, exponential or gamma-like process (short-term compounding shocks and lifespans). Each company has its own internal decay rate or “temperature” parameter ($\beta$). However, across the whole population, that internal parameter $\beta$ isn’t uniform. It varies from company to company, and across the macro-economy, the distribution of these parameter values itself follows a gamma distribution. If wee mathematically compound (integrate) a localized exponential growth process over a gamma-distributed field of fluctuating background rates, an incredible mathematical transformation takes place:

\[\int (\text{Exponential Process}) \times (\text{Gamma Background Variance}) \, d\beta = \mathbf{\text{Power Law (Pareto)}}\]

The exponential decays and the continuous gamma functions melt into one another, and what drops out of the bottom of the math is a crisp, straight-line power-law tail.

(*) The Haar measure $\frac{dx}x:$ On a standard ruler, the distance between $1$ inch and $2$ inches is exactly the same as the distance between $10$ inches and $11$ inches. That absolute step size is $dx$. If we add $\$1$ to a $\$10$ portfolio, our absolute step is $dx = 1$. If we add $\$1$ to a $\$1,000,000$ portfolio, our absolute step is still $dx = 1$. The flat ruler treats an absolute increment of $1$ unit identically, no matter where we are standing on the line.

$\frac{dx}{x}$ is a ruler that measures proportions instead of absolute increments. It is the absolute step size ($dx$) divided by our current location $x$. Let’s look at the exact same $\$1$ steps from above through the lens of this new metric:

We are standing at $x = 10$. We take a step of $dx = 1$. Our proportional distance is:

\[\frac{dx}{x} = \frac{1}{10} = 10\%\]

If instead we are standing at $x = 1,000,000$ and take a step of $dx = 1$ the proportional distance is:

\[\frac{dx}{x} = \frac{1}{1,000,000} = 0.0001\%\]

On the percentage ruler, the exact same physical step ($dx = 1$) shrunk drastically because we were standing further up the scale. To make a step at $\$1,000,000$ feel exactly the same size as the step at $\$10$, we can’t just add $\$1$. We have to scale our step size up proportionally. We would need to add $\$100,000$, because $\frac{100,000}{1,000,000} = 10\%$.

Home Page

NOTE: These are tentative notes on different topics for personal use - expect mistakes and misunderstandings.

Money, Power laws, Compounding and Distributions

NOTES ON STATISTICS, PROBABILITY and MATHEMATICS