Criteria for thick tails (or heavy tails):
Excess Kurtosis:
Kurtosis measures the “peakedness” and “tailedness” of a distribution. high kurtosis can be due to either a sharp peak or thick tails, or a combination of both. A distribution with excess kurtosis greater than 0 is considered leptokurtic (thick-tailed), while a distribution with excess kurtosis less than 0 is considered platykurtic (thin-tailed). The normal distribution has excess kurtosis of 0.
Tail Index:
The tail index is a measure of how quickly the tails of a distribution decay. A distribution with a tail index less than 2 is considered thick-tailed, while a distribution with a tail index greater than 2 is considered thin-tailed. The normal distribution has a tail index of 3.
Comparison to the Normal Distribution:
If the tails of a distribution are significantly heavier than the tails of a normal distribution with the same mean and variance, it is considered thick-tailed. This can be assessed visually by comparing the tails of the two distributions or by using statistical tests such as the Jarque-Bera test.
Heavy-Tailed Properties:
A distribution is considered thick-tailed if it exhibits certain properties, such as: Infinite variance, infinite higher moments, low convergence of the sample mean to the population mean.
Fat tails implies power-law:
A power-law distribution is one where the probability of an event is proportional to a power of the event magnitude. This means that the probability of extreme events is much higher than in a normal distribution.
Characteristics of power-law distributions:
Scale invariance: The distribution looks the same regardless of the scale at which it is viewed.
Lack of a characteristic scale: There is no typical value or average that can describe the distribution.
Long tails: The tails of the distribution decay slowly, leading to a higher probability of extreme events.
From Wolfram alpha,
The pdf of the power law is
\[f_X(x) =C x^\alpha, x \in [x_0, x_1]\] and the normalization constant can be determined by considering that \(f_X(x)\) has to integrate to 1:
\[\int_{x_0}^{x_1} f_X(x)dx= C \frac{[x^{\alpha+1}]_{x_0}^{x_1}}{\alpha+1}=1\] and hence,
\[C=\frac{\alpha+1}{x_1^{\alpha+1}- x_0^{\alpha+1}}.\]
However, notice that in other sources, the negative value of alpha is explicit in the equations. For instance:
\[p(x)=\frac{\alpha - 1}{x_{\mathrm{min}}}\left(\frac{x}{x_{\mathrm{min}}} \right)^{-\alpha}\]
From pag 4 of this article by M. E. J. Newman, the majority of power law distributions occurring in nature \(2\leq \alpha \leq 3.\)
The cumulative probability distribution is
\[F_X(x) = \Pr(X <x)=\int_{x_0}^x C x^\alpha dx =\frac{C}{\alpha+1}\left(x^{\alpha+1}-x_0^{\alpha+1 }\right)\]
If \(Y\sim U[0,1]\) and by the probability integral transform,
\[y\equiv \frac{C}{\alpha+1} \left(x^{\alpha+1}-x_0^{\alpha+1} \right)\]
So \(y=g(x)\) with inverse
\[X = \left( \frac{\alpha+1}{C}y + x_0^{\alpha+1} \right)^{1/{\alpha+1}}\]
and replacing \(C\)
\[X = \left[ (x_1^{\alpha+1}-x_0^{\alpha+1})y + x_0^{\alpha+1} \right]^{1/{\alpha+1}}\]
and \(X\sim L(x),\) i.e. it is distributed following a power law.
Note that it is different from page 3 of this document.
Plotting a power law distribution in log-log will result in a line:
\[f_X(x)= C\, x^{-\alpha}\]
\[\log(f_X(x))=\log C + -\alpha\, \log x\]
with slope \(-\alpha\)
In R
x1 = 5
x0 = 0.1 # It can't be zero; otherwise X^0^(neg) is 1/0.
alpha = -2.5 # It has to be negative.
set.seed(123)
y = runif(1e5)
x = ((x1^(alpha+1) - x0^(alpha+1))*y + x0^(alpha+1))^(1/(alpha+1))
summary(x)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1000 0.1211 0.1579 0.2579 0.2501 4.9910
hist(x, xlim = c(0,1), ylim = c(0,12), border=F, col=rgb(0,0,0.8,.5),
breaks=200,
cex.main=.7, cex.lab=.7, prob=T,
main="Power law")
lines(density(x), col=4, lwd=1)
lines(density(x, adjust=2), lty="dotted", col="turquoise", lwd=2)
h = hist(x, prob=T, breaks=40, plot=F)
plot(h$count, log="xy", type='l', lwd=1, lend=2, xlab="", ylab="", main="Density in logarithmic scale")
The number of samples in the bins becomes small towards the right of the plot, leading to statistical fluctuations, and the noisy right tails.
If we hadn’t generated these data synthetically, and were looking at
\(x\) as empirical data from an
experiment, we could try to fit a power law distribution with the
package igraph
and the
functionpower.law.fit()
.
INTUITION: Power-law distributions are the inverse function \(f(x) = 1/x\) turned into a distribution. This provides a lot of insight into what they are trying to express: The frequency or probability is inversely proportional to the magnitude of the effect.
In terms of the magnitude of the effects it is good to remember that the integral of the inverse function is the logarithmic function \(\int \frac 1 x = \log x \; + \;c\):
which immediately explains why if we draw samples from a Pareto distribution, and we sort them in decreasing order, the cumulative sum of the overall effect will be explained by the most massive events, following the \(80/20\) Pareto principle. As it is the case in the \(\log(x)\) function, there is a rapid ascent, and a plateau afterwards, such that, although the function goes to infinity, it does so at a very slow rate:
library(EnvStats)
library(ggplot2)
library(ggthemes)
# Generate a Pareto distribution with location 1 and shape 1
set.seed(123); n <- 1000; data <- rpareto(n, location = 1, shape = 1)
# Calculate cumulative sum and proportion
cum_effect <- cumsum(sort(data, decreasing = TRUE))
cum_prop <- cum_effect / sum(data)
# Create a data frame for plotting
df <- data.frame(
index = 1:length(data),
percentage = 1:length(data)/ length(data) * 100,
cum_prop = cum_prop
)
# Plot the cumulative proportion
ggplot(df, aes(x = percentage, y = cum_prop)) +
geom_line() +
geom_hline(yintercept = 0.8, linetype = "dashed") +
geom_vline(xintercept = 20, linetype = "dashed") +
theme_fivethirtyeight() +
theme(plot.title = element_text(size = 14),
axis.title.x = element_text(size = 10),
axis.title.y = element_text(size = 10)) +
labs(
x = "Percentage of samples (decreasing order magnitude)",
y = "Overall effect (%)",
title = "Pareto Principle - Extreme events dominate"
)
\[f_X(x) = \frac{\alpha \, x^\alpha_m}{x^{\alpha+1}}\]
with support \(x\in [x_m, +\infty),\) and \(x_m>0\) and \(\alpha>0\) corresponding to shape parameters. The parameter \(\alpha\) is called the tail index, or the Pareto index when discussing distribution of wealth.
When plotted on linear axes, the distribution assumes the familiar J-shaped curve which approaches each of the orthogonal axes asymptotically. All segments of the curve are self-similar (subject to appropriate scaling factors). When plotted in a log-log plot, the distribution is represented by a straight line.
Here is a plot of the Pareto distribution with minimum value parameter \(k = 3\) and shape parameter \(\alpha =\{0.5,0.75, 1.5\}\) in Mathematica:
Plot[Table[ PDF[ParetoDistribution[3, \[Alpha]], x], {\[Alpha], {.5, .75, 1.5}}] // Evaluate,
{x, 3, 3 + 4},
Filling -> Axis,
PlotLegends -> {"\[Alpha] = 0.5", "\[Alpha] = 0.75", "\[Alpha] = 1.5"},
PlotRange -> All, AxesLabel -> {"x", "Probability Density"},
AxesStyle -> Arrowheads[0.02], PlotStyle -> {Directive[Dashed, Red],
Directive[Dotted, Blue],
Directive[Solid, Green]},
LabelStyle -> Directive[FontSize -> 14],
ImageSize -> 600]
Compare it to the Lévy distribution, and notice how as the dispersion parameter \(\sigma\) decreases, the pdf adopts the shape of \(1/x\)
(* Define the location and scale parameters *)
mu = 0;
sigma = 1;
(* Create a table of Lévy distributions with different scale parameters *)
levyDistTable = Table[LevyDistribution[mu, sigma*\[Alpha]], {\[Alpha], {0.1, 0.3, 0.5}}];
(* Plot the probability density functions (PDFs) *)
Plot[Table[PDF[levyDist, x], {levyDist, levyDistTable}] // Evaluate,
{x, 0.1, 3},
Filling -> Axis,
PlotLegends -> {"\[Sigma] = 0.1", "\[Sigma] = 0.3", "\[Sigma] = 0.5"},
PlotRange -> All,
AxesLabel -> {"x", "Probability Density"},
AxesStyle -> Arrowheads[0.02],
PlotStyle -> ColorFunction -> "BrightBands",
LabelStyle -> Directive[FontSize -> 14],
ImageSize -> 600]
As stated in Wikipedia the tail of the Lévy distribution exhibits power-law characteristics, and hence, “the Lévy distribution is not just heavy-tailed but also fat-tailed”.
Heavy-tailed distributions: These distributions have tails that decay slower than an exponential function. This means that there’s a higher probability of observing extreme values compared to a normal distribution.
Fat-tailed distributions: This is a more specific subset of heavy-tailed distributions. They exhibit a power-law decay in their tails, meaning the probability of extreme events decreases at a slower rate than a simple exponential decay.
The Levy distribution has a heavier tail than the Pareto distribution, meaning it assigns more probability to extreme values. In general, both the Lévy and Pareto distributions are known for their heavy tails, but the Lévy distribution is famed for its exceptionally heavy tails. The Levy distribution is more skewed to the right than the Pareto distribution. If your data is highly skewed, the Levy distribution may be a better fit.
Applications:
From here:
Zipf theorised that the distribution of word use was due to tendency to communicate efficiently with least effort. Hence, the principle of least effort is also known as Zipf’s Law. Based on the principle of least effort, it is human nature to want the greatest outcome at the least amount of work. And Zipf showed that useful behaviors were performed frequently. Frequent behaviors became quicker and easier to perform over time.
The Zipf distribution is a discrete distribution also known as Estoup distribution. From here:
According to Mitzenmacher (2004), “the first known attribution of the power law distribution of word frequencies appears to be due to Estoup in 1916”. This is noted by Rousseau (2002), where he indicated that Zipf made reference to Estoup in his very first publication on the relative frequency of words. The Estoup distribution, unlike the other discrete power law distributions seen thus far, exists only in truncated form since the harmonic series is not convergent (Z¨ornig and Altmann, 1995).
The Zipf distribution has support is the set of positive integers \(\{1, 2, 3, \dots, n\}\). This finite support is the main difference with the zeta distribution. The pdf of the Zipf PMF has two parameters, \(s\), a positive real number, which determines the decay of the tail, and \(n\), a positive integer. Here is the PMF:
\[P(k; s, n) = \frac{1}{k^{s}H_{n,s}}\]
Notice that we have \(1/x\) normalized with the \(n,s\)-th generalized harmonic number:
\[H_{n,s}=\sum_{k=1}^n \frac 1{k^s}\] Here is the plot of the Zipf distribution with support ranging to \(n=10\) and varying shape parameter:
DiscretePlot[
Table[PDF[ZipfDistribution[10,\[Rho]], k], {\[Rho], {.1, 1, 2}}] //
Evaluate, {k, 0, 10}, PlotRange -> All, PlotMarkers -> Automatic]
As anticipated, the zeta distribution has support \(k \in \{1,2,\dots\}\), and it is also a discrete probability distribution, in fact referred to as the discrete Pareto distribution.
The PMF is only dependent on the parameter \(s\), and the PMF is:
\[P(X = k; s) = \frac{1}{k^{s}\zeta(s)}\]
where we encounter the Riemann zeta function:
\[\zeta(s)=\sum_{k =1}^\infty \frac 1{k^s}\] The plot looks pretty much like the Zipf distribution:
DiscretePlot[
Table[PDF[ZipfDistribution[\[Rho]], k], {\[Rho], {1, 2, 3}}] //
Evaluate, {k, 0, 20}, PlotRange -> All, PlotMarkers -> Automatic]
and it can be seen presented as the \(-\log(\cdot)\) of the PMF as in:
DiscretePlot[
Table[-Log@PDF[ZipfDistribution[\[Rho]], k], {\[Rho], {1, 2, 3}}] //
Evaluate, {k, 0, 20}, PlotRange -> All, PlotMarkers -> Automatic]
There are three important subclasses of heavy-tailed distributions: the fat-tailed distributions, the long-tailed distributions and the subexponential distributions. In practice, all commonly used heavy-tailed distributions belong to the subexponential class.
All commonly used heavy-tailed distributions are subexponential. From Wikipedia:
In probability theory, heavy-tailed distributions are probability distributions whose tails are not exponentially bounded: that is, they have heavier tails than the exponential distribution. In many applications it is the right tail of the distribution that is of interest, but a distribution may have a heavy left tail, or both tails may be heavy.
There are three important subclasses of heavy-tailed distributions: the fat-tailed distributions, the long-tailed distributions, and the subexponential distributions. In practice, all commonly used heavy-tailed distributions belong to the subexponential class, introduced by Jozef Teugels.
One-tailed:
Two-tailed:
In probability theory, a distribution is said to be stable if a linear combination of two independent random variables with this distribution has the same distribution, up to location and scale parameters. A random variable is said to be stable if its distribution is stable. The stable distribution family is also sometimes referred to as the Lévy alpha-stable distribution.
\[f_X(x)=\frac{k}{\lambda}\left(\frac x k\right)^{k-1}\, e^{-(x/\lambda)^k}\] \(\lambda\) is the scale, and \(k\) is the shape parameter.
Like the exponential, it models time between events. However, the Weibull is more flexible:
The exponential distribution models situations where events occur continuously and independently at a constant average rate. It’s memoryless, whereas the Weibull is not memoryless.
The Weibull distribution generalizes the exponential by introducing a shape parameter. If the shape parameter is \(k=1\), it behaves exactly like the exponential:
\[f_X(x)=\frac{1}{\lambda} e^{-x/\lambda}\]
If it’s less than \(1\), it models decreasing failure rates over time, as in the Lindy effect, significant “infant mortality”, or defective items failing early and the failure rate decreasing over time, negative word of mouth. A shape parameter than \(1\) captures increasing failure rates, such as “aging process” and positive word of mouth.
Fisher-Tippet theorem:
Let \(X_1,X_2,\dots, X_n\) be iid rvs. If there exist constants \(a_n > 0\) and \(b_n\in \mathbb R,\) and some distribution \(G,\) such that
\[\frac{M_n - b_n}{a_n}\to G,\]
Then \(G\) belongs to one of the three extreme value distribution types:
Fréchet: \[\Phi_\alpha(x)=\begin{cases}0,&& x\leq 0\\e^{-x^{-\alpha}},&& x>0\end{cases}\]
Weilbull \[\Psi_\alpha(x)=\begin{cases}e^{-x^\alpha}&& x\leq 0\\1,&& x>0\end{cases}\]
Gumbel \[\Lambda(x)=e^{-e^{-x}}\]
The Fréchet is a heavy-tailed distribution:
Plot[Table[
PDF[FrechetDistribution[\[Alpha], 1.2],
x], {\[Alpha], {1/2, 1, 2}}] // Evaluate, {x, .1, 3},
Filling -> Axis]
However, the Gumbel it is not heavy-tailed, and it’s use in EVT has to to with the “block maxima” approach, which involves dividing the data into blocks and modeling the maximum or minimum value within each block. The Gumbel distribution is often used for this purpose, even if the underlying data distribution is not heavy-tailed.
The standard Cauchy distribution is the same as the Student’s t-distribution with one degree of freedom.
It can be thought of as kicking a ball from a point on the \(x,y\)-plane towards the \(x\)-axis at a random angle, and determining the point in the \(x\)-axis the ball hits:
Mathematically this can be simulated choosing randomly from a uniform \([0,1]\) and transforming the value, such that
\[x = \tan\left(\pi (u -\frac 12) \right)\] with \(x_o\) and \(\gamma\) being the location and scale parameters. So if we fix the \(x_0\) value to \(x_0=0\) and we change the scale parameter, the tails will be heavier, the farther away the \(\gamma\) parameter positions \((x_0,\gamma)\):
The ratio of two normal random variables follows a Cauchy.
It is the canonical example of a “pathological” distribution with undefined moments.
Some fun facts are explained in this and this videos, including the connection to the curve the witch of Agnesi, described in Wikipedia:
Geogebra project by Gillermo Bautista.
The equation for the Witch of Agnesi curve is
\[y = \frac{8a^3 }{ x^2 + 4a^2}\]
while the pdf of the Cauchy distribution is
\[f_X(x)=\frac{1}{\pi\gamma \left(1 + (\frac{x-x_0}{\gamma})^2\right)}=\frac{\gamma}{\pi \left((x - x_0)^2 + \gamma^2 \right) }\]
The witch of Agnesi is a scale and shifted version of the Cauchy pdf.
The applications of the Cauchy include:
Outlier Detection: The Cauchy distribution is often used to model outliers or extreme values in data sets, as it has a longer tail than the normal distribution.
Robust Regression: Due to its robustness to outliers, the Cauchy distribution can be used in robust regression models to minimize the impact of extreme data points.
Lorentzian Line Shape: In spectroscopy, the Lorentzian line shape, which is directly related to the Cauchy distribution, is used to model the spectral lines of atoms and molecules.
Photon Scattering: The Cauchy distribution can be used to model the scattering of photons in certain materials.
Risk Modeling: The Cauchy distribution can be used to model extreme financial events, such as market crashes or financial crises, due to its heavy-tailed nature.
Value at Risk (VaR): The Cauchy distribution can be used to calculate VaR, a measure of financial risk, as it can capture the potential for extreme losses.
Noise Modeling: The Cauchy distribution can be used to model certain types of noise in signal processing applications, such as impulsive noise.
Reliability Engineering: The Cauchy distribution can be used to model the lifetime of components or systems that are prone to failures caused by extreme events.
Environmental Science: The Cauchy distribution can be used to model extreme weather events or environmental phenomena.
A log-Cauchy distribution is a probability distribution of a random variable whose logarithm is distributed in accordance with a Cauchy distribution. If \(X\) is a random variable with a Cauchy distribution, then \(Y = \exp(X)\) has a log-Cauchy distribution.
The Erlang distribution is subexponential. From Wikipedia:
The Erlang distribution is the distribution of a sum of \(k\) independent exponential variables with mean \(1/λ\) each. Equivalently, it is the distribution of the time until the \(k\)-th event of a Poisson process with a rate of \(λ\). The Erlang and Poisson distributions are complementary, in that while the Poisson distribution counts the events that occur in a fixed amount of time, the Erlang distribution counts the amount of time until the occurrence of a fixed number of events. When \(k=1\), the distribution simplifies to the exponential distribution. The Erlang distribution is a special case of the gamma distribution in which the shape of the distribution is discretized.
The gamma distribution can be thick-tailed or thin-tailed depending on its shape parameter (α) and a rate parameter (\(\beta\)) with pdf
\[f_X (x)=\frac{\beta^\alpha}{\color{orange}\Gamma(\alpha)} \color{red}{x^{\alpha - 1}\,e^{-\beta x}} \]
essentially the integrand of the gamma function:
\[\Gamma(z) = \int_0^\infty \color{red}{t^{z-1} \, e^{-t}}\;dt\]
Thin-tailed: When α is greater than 1, the gamma distribution has a relatively thin tail. This means that extreme values are less likely to occur.
Thick-tailed: When α is less than or equal to 1, the gamma distribution has a thicker tail, indicating a higher probability of extreme values.
The gamma distribution is explained here.
Factorials often appear in discrete probability distributions, such as the binomial and Poisson distributions. The gamma function’s connection to factorials means it can be used to derive continuous probability distributions, like the gamma and beta distributions. This ability to transition between discrete and continuous distributions enhances our toolkit in probability and statistics.
INTUITION: The pdf is just the integrand of the gamma function, and in it there is a polynomial form on the random variable, which tends to give the pdf an initial smooth upswing. Quickly, the exponential part of the pdf kicks in and kills the function. So they tend to have a positive skew. As in the case of the Weibull distribution with \(k=1\), the elimination of the polynomial part turns the distribution like an exponential.
The gamma distribution is used in:
Waiting Times: The gamma distribution is often used to model waiting times between events, especially when the events are not independent or when the waiting time is not exponentially distributed.
Survival Analysis: In survival analysis, the gamma distribution can be used to model the time to an event of interest, such as the time to failure of a component or the time to death.
Insurance: The gamma distribution is used in insurance modeling to represent the distribution of claim amounts.
Engineering: In engineering, the gamma distribution is used to model various processes, such as the time between failures of equipment or the distribution of material properties.
From these CalTech notes…
Exponential distributions are not subexponential!
x1 = 5
x0 = 0.07 # It can't be zero; otherwise X^0^(neg) is 1/0.
alpha = -2.5 # It has to be negative.
set.seed(123)
y = runif(1e5)
x = ((x1^(alpha+1) - x0^(alpha+1))*y + x0^(alpha+1))^(1/(alpha+1))
par(mfrow = c(1,2))
hist(x, xlim = c(0,1), border=F, col=2, breaks=80,
cex.main=.7, cex.lab=.7, ylim= c(0,12000),
main="Power law")
abline(v=.4, lty=2)
y = rexp(1e5, rate = 15)
hist(y, breaks=20, xlim = c(0,1), ylim= c(0,12000),border=F, col=2, cex.main=.7, cex.lab=.7,
main="Exponential")
abline(v=.4, lty=2)
These distributions can be intuitively approximated through these heuristics:
The principle of the single big jump
Many mice and a few elephants
The catastophe principle (versus the conspiracy principle of light tailed distributions).
E.g. the sum up the estimated cost of the damages from each hurricane during a given year is unexpectedly large. It is probably not because there were a lot of hurricanes that caused larger than typical amounts of damage.The catastrophe principle for heavy-tailed distributions tells us that the most likely reason that the sum is unexpectedly large is because there was one hurricane that caused an extremely large amount of damage. In fact, we should expect that there was one hurricane that caused nearly all of the damage caused during the year, i.e., there was a catastrophe.
Mathematically,
A distribution \(F\) on the positive half-line is subexponential if
\[\overline{F^{*n}}(x) \sim n\,\bar F(x), x \to \infty\]
The probabilistic interpretation of this is that, for a sum of \(\displaystyle n\) independent random variables \(\displaystyle X_{1},\ldots ,X_{n}\) with common distribution \(\displaystyle F\),
\[\displaystyle \Pr[X_{1}+\cdots +X_{n}>x]\sim \Pr[\max(X_{1},\ldots ,X_{n})>x]\quad {\text{as }}x\to \infty .\]
Not all highly right-skewed distributions follow the power law. The log-normal distribution can have very similar plots.
From this source
The log-normal is seen as a result of multiplying random variables, as much as adding random variables result in a normal distribution (central limit theorem).
The log-normal (or lognormal) distribution is a continuous probability distribution of a random variable whose logarithm is normally distributed. It is a subexponential distribution, \(\text{Lognormal}(\mu,\sigma^2)\) with pdf,
\[f_X(x) =\frac{1}{x\sqrt{2\pi \sigma^2}}\,\exp\left[{-\frac{(\ln x-\mu)^2}{2\sigma^2}}\right]\]
Derivation from the normal:
\[f_X(x) =\frac{1}{\sqrt{2\pi \sigma^2}}\,\exp\left[{-\frac{( x-\mu)^2}{2\sigma^2}}\right]\]
If we look at it as a change of variables, the Jacobian will account for \(\frac{1}{x}\) as we apply the chain rule, \(\frac{d}{dx}\ln x= \frac{1}{x}.\)
In base \(10\) the pdf is
\[f_X(x) =\frac{\log_{10}(e)}{10^x\sqrt{2\pi \sigma^2}}\,\exp\left[{-\frac{( x-\mu)^2}{2\sigma^2}}\right]\]
If we start off with
\[f_X(x) =\frac{1}{y\ln(10)\sqrt{2\pi \sigma^2}}\,\exp\left[{-\frac{( \log_{10}(y)-\mu)^2}{2\sigma^2}}\right]\]
because \(d/dx \log_{10}(y) = \frac{1}{y \ln(10)}.\) Now, since \(\ln(10)=\frac{\log_{10}(10)}{\log_{10}(e)}=\frac{1}{\log_{10}e},\)
\[f_X(x) =\frac{\log_{10}(e)}{y\sqrt{2\pi \sigma^2}}\,\exp\left[{-\frac{( \log_{10}(y)-\mu)^2}{2\sigma^2}}\right]\]
If we substitute \(y = 10^x,\)
\[f_X(x) =\frac{\log_{10}(e)}{10^x\sqrt{2\pi \sigma^2}}\,\exp\left[{-\frac{( x-\mu)^2}{2\sigma^2}}\right]\]
In the log-log plots, the log-normal will assume a shifted and inverted parabola:
\[\log\left(f_X(x)\right) =\log\left\{\frac{\exp\left[{-\frac{(\ln x-\mu)^2}{2\sigma^2}}\right]}{x\sqrt{2\pi \sigma^2}}\right\}\]
\[\begin{align} \log\left(f_X(x)\right) &=\log\left\{\exp\left[{-\frac{(\ln x-\mu)^2}{2\sigma^2}}\right]\right\}- \log \left\{x\sqrt{2\pi \sigma^2}\right\}\\[2ex] &=-\frac{(\ln x-\mu)^2}{2\sigma^2}- \log \left\{x\sqrt{2\pi \sigma^2}\right\} \end{align}\]
set.seed(1)
x = rlnorm(1e5)
x = x[x < 10]
plot(density(x), log="xy", type="l", xlim=c(.03, 8), col = 4, lwd = 2,
main="Log-normal distribution")
which in the downsloping part can simulate a line as in the power law plotted in log-log.
Actually the lognormal plotted in log-log form is linear when the \(\sigma\) value is high:
set.seed(1)
x = rlnorm(1e5, 0, 10)
x = x[x < 10]
plot(density(x), log="xy", type="l", xlim=c(.03, 8), col = 4, lwd = 2,
main="Log-normal distribution")
NOTE: These are tentative notes on different topics for personal use - expect mistakes and misunderstandings.