NOTES ON STATISTICS, PROBABILITY and MATHEMATICS


Time Series with Stocks (VTI):


The Vanguard Total Stock Market ETF (NYSEARCA: VTI) tracks the performance of the CRSP U.S. Total Market Index. The fund has returned 6.23% since its inception in 2001. The fund is a market capitalization-weighted index that measures the entire investable U.S. equities market. It includes small-, mid- and large-cap companies. The fund is managed in a passive manner and uses an index-sampling strategy.

require(quantmod)

getSymbols("VTI")
## [1] "VTI"
# str(VTI) # We start with an xts
z = as.data.frame(VTI$VTI.Adjusted) #Subsetting VTI.Adjusted

D2D = function (x) {
    days = nrow(x)
    delta = numeric(days)
    for(i in 2:days){
        delta[i] <- (100*((x[i, 1] - x[i - 1, 1]) / (x[i - 1, 1])))
    }
    delta
}

VTI.InterDay = D2D(z)
VTI.InterDay[1] <- mean(VTI.InterDay)#Something to fill in the 0 in row 1. 
VTI = merge(VTI,VTI.InterDay)

Notice the split of the shares reflected as the broad band on the left of the plot:

Vanguard Total Stock Market ETF (VTI) has \(1\) split in our VTI split history database. The split for VTI took place on June \(18\), 2008. This was a \(2\) for \(1\) split, meaning for each share of VTI owned pre-split, the shareholder now owned \(2\) shares.


How much can the VTI shares oscillate from day to day?
head(VTI$VTI.InterDay)  
##            VTI.InterDay
## 2007-01-03   0.04394312
## 2007-01-04   0.20006687
## 2007-01-05  -0.79158808
## 2007-01-08   0.36660795
## 2007-01-09   0.00000000
## 2007-01-10   0.22203750
summary(VTI$VTI.InterDay)
##      Index             VTI.InterDay      
##  Min.   :2007-01-03   Min.   :-11.38086  
##  1st Qu.:2010-11-06   1st Qu.: -0.41840  
##  Median :2014-09-16   Median :  0.08152  
##  Mean   :2014-09-15   Mean   :  0.04395  
##  3rd Qu.:2018-07-23   3rd Qu.:  0.61078  
##  Max.   :2022-05-27   Max.   : 12.82977

When did the max loss and max gain occur?

time(VTI)[VTI$VTI.InterDay == min(VTI$VTI.InterDay)]
## [1] "2020-03-16"
time(VTI)[which.max(VTI$VTI.InterDay)]
## [1] "2008-10-13"

How often does a drop of say, \(-4\%\) occurs?

sum(VTI$VTI.InterDay < -4)
## [1] 36
mean(VTI$VTI.InterDay < -4) * 100 # or 0.7% of the days
## [1] 0.9280742
sum(VTI$VTI.InterDay < -4) / 16   # or between 1 and 2 days every year.
## [1] 2.25

… and what about gains of \(4\%\)?

sum(VTI$VTI.InterDay > 4)
## [1] 27
mean(VTI$VTI.InterDay > 4) * 100 # or 0.6% of the days
## [1] 0.6960557
sum(VTI$VTI.InterDay > 4) / 16   # or between 1 and 2 days every year?
## [1] 1.6875

But they can’t be distributed uniformly… Let’s first look at the losing days (to the left), and compare to winning days:

par(mfrow=c(1,2))
plot(time(VTI), VTI$VTI.InterDay < -4, col=2, type='h', xaxt="n",
     xlab="Year", ylab="Days with > - 4% change", 
     cex.axis=.5, cex.main=.8, cex.lab =.5,las=2, 
     main  = "Clustering of big drop days")

# Selecting the time information in the names of the rows of VTI:
tt = time(VTI)
# We select points spaced by approximately 1 business year:
# 365 days - 2 days off each weekend - 9 Holidays in the USA
ix = seq(1, nrow(VTI), by = 365 - 2*4*12 - 9)
# Formatting the labels as just simply the year with 4 digits:
fmt = "%Y"
# Generating vector of potential labels:
labs = format(tt, fmt) 
# Plotting the x axis:
axis(side = 1, at = tt[ix], labels = labs[ix], cex.axis = 0.7, las = 2)


plot(time(VTI), VTI$VTI.InterDay > 4, col=3, type='h', xaxt="n",
     xlab="Year", ylab="Days with > 4% gains", 
     cex.axis=.5, cex.main=.8, cex.lab =.5,las=2, 
     main  = "Clustering of big winning days")

# Selecting the time information in the names of the rows of VTI:
tt = time(VTI)
# We select points spaced by approximately 1 business year:
# 365 days - 2 days off each weekend - 9 Holidays in the USA
ix = seq(1, nrow(VTI), by = 365 - 2*4*12 - 9)
# Formatting the labels as just simply the year with 4 digits:
fmt = "%Y"
# Generating vector of potential labels:
labs = format(tt, fmt) 
# Plotting the x axis:
axis(side = 1, at = tt[ix], labels = labs[ix], cex.axis = 0.7, las = 2)

Wow! Huge spike around 2008 - 2009. How much clustering is there?

sum(VTI["2008/2009"]$VTI.InterDay < - 4) # Total number of drops > -4%
## [1] 22
# As a percentage of all losing days...
sum(VTI["2008/2009"]$VTI. < - 4) / sum(VTI$VTI.InterDay < -4) 
## [1] 0

What about wins during the same period?

sum(VTI["2008/2009"]$VTI.InterDay >  4) # Total number of winning days > 4%
## [1] 15
# As a percentage of all winning days...
sum(VTI["2008/2009"]$VTI.InterDay >  4) / sum(VTI$VTI.InterDay > 4)
## [1] 0.5555556

So, during the crisis there were \(22\) “big losing” days, and \(15\) “big winning days”.

plot(VTI$VTI.InterDay["2008-09-01/2009-03-01"], las=2, cex.axis=.6, main="Interday changes during crisis")
abline(h = 0)

Yet, the market tanked. Were the losing days worse, not just more numerous?

The maximum drop did occur during the crisis…

min(VTI["2008/2009"]$VTI.InterDay)
## [1] -9.352953

when?

time(VTI["2008/2009"])[which.min(VTI["2008/2009"]$VTI.InterDay)]
## [1] "2008-10-15"

Here is the news flashback:

Another huge Dow loss Blue-chip indicator drubbed \(733\) points - \(2\)nd biggest point loss ever - as recession fears resurface.

By Alexandra Twin, CNNMoney.com senior writer

Last Updated: October 15, 2008: 6:21 PM ET

NEW YORK (CNNMoney.com) – Recession talk scared Wall Street Wednesday, sending the Dow Jones industrial average to its second biggest one-day point loss ever.

A weak retail sales report and dour forecasts from the Federal Reserve, coupled with sober comments from Fed Chairman Ben Bernanke, sent stocks tumbling.

a whopping \(-9\%\)! Incredibly the maximum inter-day win also happen around this period…

max(VTI["2008/2009"]$VTI.InterDay)
## [1] 12.82977

Wow! When did it happen?

time(VTI["2008/2009"])[which.max(VTI["2008/2009"]$VTI.InterDay)]
## [1] "2008-10-13"
VTI["2008-10-13"]
##            VTI.Open VTI.High VTI.Low VTI.Close VTI.Volume VTI.Adjusted
## 2008-10-13    47.05    50.23   46.66     50.04    7777500     38.64702
##            VTI.InterDay
## 2008-10-13     12.82977

and the news reflected it:

Dow jumps \(936\) points and S&P up \(104\), in the biggest point gains ever. The Dow, S&P and Nasdaq all gain over \(11\%\).

By Alexandra Twin, CNNMoney.com senior writer

Last Updated: October 13, 2008: 6:15 PM ET

NEW YORK(CNNMoney.com)

Stocks rallied Monday afternoon, with the Dow rallying \(976\) points during the session, as investors bet that the worst of the credit crisis is over, following a series of global initiatives announced over the last few days.

That bounce was predicated by the day’s news, with investors breathing a sigh of relief that some specifics were finally released regarding the \(\$700\) billion bank bailout.


Let’s hone down on the crisis period:

crisis = VTI["2008-09-01/2013-03-01"]
plot(time(crisis), crisis$VTI.InterDay < - 4, col=2, type='h',
     xlab="Year", ylab="", 
     cex.axis=.5, cex.main=.8, cex.lab =.5,las=2, 
     main  = "Big losing and winning days during the crisis")
points(time(crisis), crisis$VTI.InterDay > 4, col=3, type='h')


Between the start of the crisis around September 2008, and the nadir, on March 09, 2009,

A year later, we know that March 9 was the bottom of a months-long financial panic that wiped away trillions of dollars in assets. But on what now appears to have been the best buying opportunity of a generation, many only wondered how much lower the markets would tumble. (Forbes)

For Dow, another \(12\)-year low.

S&P also finishes at lowest level in more than a decade as Wall Street resumes its retreat on economic worries.

By Alexandra Twin, CNNMoney.com senior writer Last Updated: March 9, 2009: 6:18 PM ET

there was a loss of

(drop = as.numeric(VTI$VTI.Close["2009-03-09"]) - as.numeric(VTI$VTI.Close["2008-09-10"]))
## [1] -28.22
drop/as.numeric(VTI$VTI.Close["2008-09-10"]) * 100 # In percentage...
## [1] -45.57493

which didn’t recover fully until March 2013, four years later:

What happened on October 10, 2009? Not much, it was Saturday. And on Monday

VTI["2009-10-12"]
##            VTI.Open VTI.High VTI.Low VTI.Close VTI.Volume VTI.Adjusted
## 2009-10-12    54.63    54.84   54.38      54.6     868000      43.1978
##            VTI.InterDay
## 2009-10-12    0.3676827

not much happened, either… October 14, 2009, perhaps?

VTI["2009-10-14"]
##            VTI.Open VTI.High VTI.Low VTI.Close VTI.Volume VTI.Adjusted
## 2009-10-14    55.11    55.48   54.87     55.42    1322200     43.84655
##            VTI.InterDay
## 2009-10-14     1.781469

Finally, let’s take a look at the fat tails of VTI as representative of the market, and as compared to the normal distribution:

hist(VTI$VTI.InterDay,main="INTERDAY VTI % CHANGES: FAT TAILS",sub="fitted normal (in purple); pdf estimate (in blue)",xlab = "PERCENTAGE CHANGE FROM DAY TO DAY",
     ylab = "FREQUENCY", prob = TRUE, col ="coral", ylim=c(0,0.5),
     xlim=c(-4,4), cex.main=.9, cex.sub=.8, cex.lab=.7, border=F,
     breaks = 33)

#Add density estimate
lines(density(VTI$VTI.InterDay,adjust=7),col="darkred",lwd=2) #Prettier, adjusted pdf estimate
sd = sd(VTI$VTI.InterDay)
m = mean(VTI$VTI.InterDay)
curve(dnorm(x,mean=m,sd=sd),col="grey60",lwd=2,add=T,yaxt="n")
lines(density(VTI$VTI.InterDay,adjust=7),col="darkred",lwd=2) #Prettier, adjusted pdf estimate

But let’s objectivize the lack of normality:

qqnorm(VTI$VTI.InterDay, pch=19, cex=.1, col=rgb(0,0,1,0.5), cex.main=.8, cex.lab=.7, cex.axis=.7)
qqline(VTI$VTI.InterDay, col=rgb(0,0,1,0.8))

shapiro.test(as.vector(VTI$VTI.InterDay))
## 
##  Shapiro-Wilk normality test
## 
## data:  as.vector(VTI$VTI.InterDay)
## W = 0.87681, p-value < 2.2e-16

Home Page

NOTE: These are tentative notes on different topics for personal use - expect mistakes and misunderstandings.