Notes/Portfolio Theory: Why Diversification Is Mathematically Provable

Portfolio Theory: Why Diversification Is Mathematically Provable

Markowitz mean-variance optimization, the efficient frontier, correlation matrices, and the formal proof that diversification reduces risk without sacrificing expected return.

2025-08-23AI-Synthesized from Personal Notes

Quantitative FinancePortfolio TheoryMarkowitzDiversification

Terminology

Term	Definition
Portfolio	A collection of assets (stocks, bonds, etc.) held together, each with a weight $w_i$ representing its fraction of total investment, where $\sum w_i = 1$
Expected Return	The weighted average of individual asset returns: $E[R_p] = \sum_{i=1}^{n} w_i \cdot E[R_i]$
Variance ($\sigma^2$)	A measure of how spread out returns are around the mean. Higher variance means more uncertainty
Standard Deviation ($\sigma$)	The square root of variance, used as the primary measure of risk in portfolio theory because it shares units with returns
Covariance ($\sigma_{ij}$)	A measure of how two assets move together. Positive covariance means they tend to rise and fall in tandem; negative means they move oppositely
Correlation ($\rho_{ij}$)	Normalized covariance: $\rho_{ij} = \sigma_{ij} / (\sigma_i \cdot \sigma_j)$, bounded between $-1$ and $+1$. Correlation of $+1$ means perfect co-movement, $-1$ means perfect opposition
Covariance Matrix ($\Sigma$)	An $n \times n$ symmetric matrix where entry $(i,j)$ is $\sigma_{ij}$. Encodes all pairwise risk relationships between $n$ assets
Efficient Frontier	The set of portfolios that offer the highest expected return for each level of risk. No portfolio below the frontier is rational to hold
Mean-Variance Optimization	The Markowitz framework for finding optimal portfolio weights by minimizing variance for a target return, or maximizing return for a target variance
Sharpe Ratio	Risk-adjusted return: $(E[R_p] - R_f) / \sigma_p$, where $R_f$ is the risk-free rate. Higher is better. The tangency portfolio maximizes this ratio

What & Why

"Don't put all your eggs in one basket" is ancient wisdom. Harry Markowitz proved it mathematically in 1952, and won a Nobel Prize for it in 1990. The core insight: when you combine assets that do not move in perfect lockstep, the portfolio's risk is strictly less than the weighted average of individual risks. Diversification is not just a heuristic. It is a theorem.

This matters because investors face a fundamental trade-off between risk and return. Higher expected returns generally come with higher volatility. But Markowitz showed that by carefully choosing portfolio weights, you can reduce risk without reducing expected return, or increase expected return without increasing risk. The set of portfolios that achieve this optimal trade-off forms the efficient frontier.

For computer scientists, portfolio optimization is a constrained quadratic programming problem. The objective function is quadratic (portfolio variance involves products of weights), the constraints are linear (weights sum to 1, optional bounds on individual weights), and the solution space is continuous. This makes it a clean, well-studied optimization problem with direct connections to linear algebra, convex optimization, and numerical methods.

The inputs to the problem are:

Expected returns vector $\boldsymbol{\mu}$: an $n$-dimensional vector of predicted returns for each asset
Covariance matrix $\Sigma$: an $n \times n$ matrix encoding how assets co-move
Constraints: weights sum to 1, optional lower/upper bounds, optional sector limits

The output is a weight vector $\mathbf{w}$ that tells you how much to invest in each asset. The elegance of the framework is that all the complexity of diversification collapses into matrix algebra.

How It Works

Portfolio Return and Risk

For a portfolio of $n$ assets with weights $\mathbf{w} = [w_1, w_2, \ldots, w_n]$, the expected return is simply the weighted sum:

$E[R_p] = \mathbf{w}^T \boldsymbol{\mu} = \sum_{i=1}^{n} w_i \cdot E[R_i]$

Portfolio variance is where things get interesting. It is not just the weighted sum of individual variances. It includes every pairwise covariance:

$\sigma_p^2 = \mathbf{w}^T \Sigma \mathbf{w} = \sum_{i=1}^{n} \sum_{j=1}^{n} w_i \cdot w_j \cdot \sigma_{ij}$

For two assets, this expands to:

$\sigma_p^2 = w_1^2 \sigma_1^2 + w_2^2 \sigma_2^2 + 2 w_1 w_2 \sigma_{12}$

The cross-term $2 w_1 w_2 \sigma_{12}$ is the key to diversification. When $\sigma_{12} < \sigma_1 \cdot \sigma_2$ (correlation less than 1), the portfolio variance is less than the weighted average of individual variances.

The Diversification Proof (Two-Asset Case)

Consider two assets with equal volatility $\sigma$ and correlation $\rho$. An equal-weight portfolio ($w_1 = w_2 = 0.5$) has variance:

$\sigma_p^2 = 0.25\sigma^2 + 0.25\sigma^2 + 2(0.25)\rho\sigma^2 = 0.5\sigma^2(1 + \rho)$

When $\rho = 1$ (perfect correlation): $\sigma_p^2 = \sigma^2$. No diversification benefit at all.

When $\rho = 0$ (uncorrelated): $\sigma_p^2 = 0.5\sigma^2$. Risk drops by $\sqrt{2} \approx 29\%$.

When $\rho = -1$ (perfect negative correlation): $\sigma_p^2 = 0$. Risk vanishes entirely.

For any $\rho < 1$, the portfolio risk is strictly less than the individual asset risk. This is the mathematical proof that diversification works whenever assets are not perfectly correlated.

The Covariance Matrix

For $n$ assets, all pairwise relationships are captured in the covariance matrix $\Sigma$:

$\Sigma = \begin{bmatrix} \sigma_1^2 & \sigma_{12} & \cdots & \sigma_{1n} \\ \sigma_{21} & \sigma_2^2 & \cdots & \sigma_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ \sigma_{n1} & \sigma_{n2} & \cdots & \sigma_n^2 \end{bmatrix}$

Key properties of $\Sigma$:

Symmetric: $\sigma_{ij} = \sigma_{ji}$, so $\Sigma = \Sigma^T$
Positive semi-definite: $\mathbf{w}^T \Sigma \mathbf{w} \geq 0$ for all $\mathbf{w}$ (variance cannot be negative)
Diagonal entries are individual variances $\sigma_i^2$
Off-diagonal entries are covariances $\sigma_{ij} = \rho_{ij} \sigma_i \sigma_j$

The matrix has $n(n+1)/2$ unique entries (due to symmetry). For a universe of 500 stocks, that is 125,250 parameters to estimate, which is one of the practical challenges of portfolio optimization.

The Efficient Frontier

The efficient frontier is the upper boundary of the set of all achievable portfolios in risk-return space. Every portfolio on the frontier is optimal: you cannot get higher return without accepting more risk, and you cannot reduce risk without giving up return.

The Markowitz optimization problem finds the frontier by solving, for each target return $\mu_{\text{target}}$:

$\min_{\mathbf{w}} \quad \mathbf{w}^T \Sigma \mathbf{w}$

$\text{subject to:} \quad \mathbf{w}^T \boldsymbol{\mu} = \mu_{\text{target}}, \quad \mathbf{w}^T \mathbf{1} = 1$

This is a quadratic program with linear equality constraints. The Lagrangian approach yields a closed-form solution. Introduce multipliers $\lambda$ and $\gamma$:

$\mathcal{L} = \mathbf{w}^T \Sigma \mathbf{w} - \lambda(\mathbf{w}^T \boldsymbol{\mu} - \mu_{\text{target}}) - \gamma(\mathbf{w}^T \mathbf{1} - 1)$

Taking the derivative and setting it to zero:

$\frac{\partial \mathcal{L}}{\partial \mathbf{w}} = 2\Sigma \mathbf{w} - \lambda \boldsymbol{\mu} - \gamma \mathbf{1} = 0$

$\mathbf{w}^* = \frac{1}{2}\Sigma^{-1}(\lambda \boldsymbol{\mu} + \gamma \mathbf{1})$

The multipliers $\lambda$ and $\gamma$ are determined by substituting back into the two constraints. This gives a system of two linear equations in two unknowns, solvable in closed form.

The Capital Market Line and Sharpe Ratio

When a risk-free asset (like a Treasury bill) with return $R_f$ is available, investors can combine it with any risky portfolio. The line from $R_f$ through a portfolio $P$ in risk-return space has slope:

$\text{Sharpe Ratio} = \frac{E[R_p] - R_f}{\sigma_p}$

The tangency portfolio is the risky portfolio that maximizes the Sharpe ratio. The line from $R_f$ through the tangency portfolio is the Capital Market Line (CML). Every rational investor should hold some combination of the risk-free asset and the tangency portfolio.

The tangency portfolio weights are:

$\mathbf{w}_{\text{tangency}} = \frac{\Sigma^{-1}(\boldsymbol{\mu} - R_f \mathbf{1})}{\mathbf{1}^T \Sigma^{-1}(\boldsymbol{\mu} - R_f \mathbf{1})}$

Diversification with $n$ Assets

For $n$ assets with equal weight ($w_i = 1/n$), equal variance $\sigma^2$, and equal pairwise correlation $\rho$, the portfolio variance is:

$\sigma_p^2 = \frac{\sigma^2}{n} + \frac{n-1}{n}\rho\sigma^2$

As $n \to \infty$:

$\lim_{n \to \infty} \sigma_p^2 = \rho \sigma^2$

The first term $\sigma^2/n$ is the diversifiable (idiosyncratic) risk that vanishes as you add more assets. The second term $\rho\sigma^2$ is the systematic (market) risk that cannot be diversified away. This is why even a portfolio of thousands of stocks still has nonzero volatility: the average correlation between stocks is positive.

The dashed line at the bottom represents systematic risk ($\rho\sigma^2$), the irreducible floor that diversification cannot eliminate.

Complexity Analysis

Operation	Time	Space	Notes
Portfolio return $\mathbf{w}^T\boldsymbol{\mu}$	$O(n)$	$O(1)$	Dot product of two $n$-vectors
Portfolio variance $\mathbf{w}^T\Sigma\mathbf{w}$	$O(n^2)$	$O(n^2)$	Matrix-vector multiply then dot product
Covariance matrix estimation	$O(n^2 T)$	$O(n^2)$	$T$ return observations for $n$ assets
Matrix inversion $\Sigma^{-1}$	$O(n^3)$	$O(n^2)$	Gaussian elimination or Cholesky decomposition
Efficient frontier (closed form)	$O(n^3)$	$O(n^2)$	Dominated by the single matrix inversion
Constrained optimization (QP solver)	$O(n^3)$ typical	$O(n^2)$	Interior-point methods for inequality constraints

The bottleneck is the $O(n^3)$ matrix inversion. For a universe of $n = 500$ stocks, this is roughly $1.25 \times 10^8$ operations, fast on modern hardware. For $n = 5000$, it becomes $1.25 \times 10^{11}$, requiring careful numerical methods. In practice, factor models reduce the effective dimensionality: instead of inverting a full $n \times n$ matrix, you work with a $k \times k$ factor covariance matrix where $k \ll n$ (typically $k = 5$ to $50$).

Estimation Error: The Curse of Dimensionality

The covariance matrix has $n(n+1)/2$ free parameters. With $T$ time periods of return data, reliable estimation requires $T \gg n$. When $T < n$, the sample covariance matrix is singular (not invertible). Even when $T > n$, estimation noise can dominate the true signal, leading to unstable and extreme portfolio weights.

This is why practitioners use shrinkage estimators (like the Ledoit-Wolf estimator) that blend the sample covariance matrix toward a structured target:

$\hat{\Sigma}_{\text{shrunk}} = \alpha \cdot \hat{\Sigma}_{\text{sample}} + (1 - \alpha) \cdot \hat{\Sigma}_{\text{target}}$

where $\alpha \in [0, 1]$ is chosen to minimize expected estimation error.

Implementation

Portfolio Return and Variance

FUNCTION portfolioReturn(weights, expectedReturns):
    INPUT: weights (array of n floats), expectedReturns (array of n floats)
    OUTPUT: expected portfolio return (float)

    result ← 0
    FOR i ← 0 TO LENGTH(weights) - 1 DO
        result ← result + weights[i] * expectedReturns[i]
    END FOR
    RETURN result


FUNCTION portfolioVariance(weights, covMatrix):
    INPUT: weights (array of n floats), covMatrix (n x n matrix)
    OUTPUT: portfolio variance (float)

    n ← LENGTH(weights)
    variance ← 0

    FOR i ← 0 TO n - 1 DO
        FOR j ← 0 TO n - 1 DO
            variance ← variance + weights[i] * weights[j] * covMatrix[i][j]
        END FOR
    END FOR

    RETURN variance

Covariance Matrix from Returns

FUNCTION estimateCovarianceMatrix(returns):
    INPUT: returns (T x n matrix, T observations of n asset returns)
    OUTPUT: n x n covariance matrix

    T ← NUMBER_OF_ROWS(returns)
    n ← NUMBER_OF_COLUMNS(returns)

    // Step 1: compute mean returns
    means ← ARRAY of n zeros
    FOR i ← 0 TO n - 1 DO
        FOR t ← 0 TO T - 1 DO
            means[i] ← means[i] + returns[t][i]
        END FOR
        means[i] ← means[i] / T
    END FOR

    // Step 2: compute covariance entries
    cov ← n x n matrix of zeros
    FOR i ← 0 TO n - 1 DO
        FOR j ← i TO n - 1 DO
            sum ← 0
            FOR t ← 0 TO T - 1 DO
                sum ← sum + (returns[t][i] - means[i]) * (returns[t][j] - means[j])
            END FOR
            cov[i][j] ← sum / (T - 1)
            cov[j][i] ← cov[i][j]    // symmetric
        END FOR
    END FOR

    RETURN cov

Minimum Variance Portfolio (Closed Form)

FUNCTION minimumVarianceWeights(covMatrix):
    INPUT: covMatrix (n x n covariance matrix)
    OUTPUT: weight vector for the minimum variance portfolio

    n ← SIZE(covMatrix)
    ones ← ARRAY of n ones

    covInverse ← INVERT(covMatrix)

    // w* = (Sigma^-1 * 1) / (1^T * Sigma^-1 * 1)
    numerator ← MATRIX_VECTOR_MULTIPLY(covInverse, ones)
    denominator ← DOT_PRODUCT(ones, numerator)

    weights ← ARRAY of n zeros
    FOR i ← 0 TO n - 1 DO
        weights[i] ← numerator[i] / denominator
    END FOR

    RETURN weights

Efficient Frontier via Lagrangian

FUNCTION efficientFrontierPoint(covMatrix, expectedReturns, targetReturn):
    INPUT: covMatrix (n x n), expectedReturns (n-vector), targetReturn (float)
    OUTPUT: optimal weight vector for the given target return

    covInverse ← INVERT(covMatrix)
    ones ← ARRAY of n ones

    // Precompute scalars
    A ← DOT_PRODUCT(ones, MATRIX_VECTOR_MULTIPLY(covInverse, expectedReturns))
    B ← DOT_PRODUCT(expectedReturns, MATRIX_VECTOR_MULTIPLY(covInverse, expectedReturns))
    C ← DOT_PRODUCT(ones, MATRIX_VECTOR_MULTIPLY(covInverse, ones))
    D ← B * C - A * A

    // Lagrange multipliers
    lambda ← (C * targetReturn - A) / D
    gamma ← (B - A * targetReturn) / D

    // Optimal weights
    w1 ← MATRIX_VECTOR_MULTIPLY(covInverse, expectedReturns)
    w2 ← MATRIX_VECTOR_MULTIPLY(covInverse, ones)

    weights ← ARRAY of n zeros
    FOR i ← 0 TO n - 1 DO
        weights[i] ← lambda * w1[i] + gamma * w2[i]
    END FOR

    RETURN weights

Tangency Portfolio (Maximum Sharpe Ratio)

FUNCTION tangencyWeights(covMatrix, expectedReturns, riskFreeRate):
    INPUT: covMatrix (n x n), expectedReturns (n-vector), riskFreeRate (float)
    OUTPUT: weight vector for the tangency portfolio

    n ← LENGTH(expectedReturns)
    ones ← ARRAY of n ones
    covInverse ← INVERT(covMatrix)

    // Excess returns vector
    excessReturns ← ARRAY of n zeros
    FOR i ← 0 TO n - 1 DO
        excessReturns[i] ← expectedReturns[i] - riskFreeRate
    END FOR

    // w = Sigma^-1 * (mu - Rf*1) / (1^T * Sigma^-1 * (mu - Rf*1))
    numerator ← MATRIX_VECTOR_MULTIPLY(covInverse, excessReturns)
    denominator ← DOT_PRODUCT(ones, numerator)

    weights ← ARRAY of n zeros
    FOR i ← 0 TO n - 1 DO
        weights[i] ← numerator[i] / denominator
    END FOR

    RETURN weights

Real-World Applications

Index fund construction: funds like the S&P 500 use market-cap weighting, but smart-beta funds use minimum variance or risk-parity approaches derived directly from Markowitz optimization
Pension fund management: pension funds allocate across stocks, bonds, real estate, and alternatives using mean-variance optimization with liability constraints to ensure they can meet future obligations
Robo-advisors: automated investment platforms (Wealthfront, Betterment) use portfolio theory to construct diversified portfolios tailored to each user's risk tolerance, rebalancing automatically
Risk management: banks compute Value-at-Risk (VaR) for their trading books using portfolio variance. The covariance matrix is the central input to these calculations
Cryptocurrency portfolios: despite higher volatility and shorter history, the same framework applies. Crypto assets have varying correlations with each other and with traditional assets, making diversification analysis valuable
Machine learning model ensembles: combining multiple ML models is analogous to portfolio construction. Each model is an "asset" with expected accuracy and covariance with other models. Ensemble methods implicitly solve a diversification problem
Cloud infrastructure allocation: distributing workloads across multiple cloud providers or regions reduces the "risk" of outages, following the same mathematical principle as financial diversification

Key Takeaways

Diversification is a mathematical theorem, not just folk wisdom: when asset correlation $\rho < 1$, combining them reduces portfolio risk below the weighted average of individual risks
Portfolio variance $\sigma_p^2 = \mathbf{w}^T \Sigma \mathbf{w}$ depends on the full covariance matrix, not just individual variances. The off-diagonal terms (covariances) are what make diversification work
The efficient frontier is the set of portfolios offering maximum return for each risk level. It is found by solving a quadratic program with linear constraints
Adding more assets eliminates idiosyncratic risk ($\sigma^2/n \to 0$) but not systematic risk ($\rho\sigma^2$), which is the irreducible floor
The computational bottleneck is $O(n^3)$ matrix inversion. Factor models and shrinkage estimators are practical necessities for large asset universes

2025-08-24

The Black-Scholes Model: Pricing Options with Partial Differential Equations

Options pricing from first principles: the Black-Scholes PDE derivation (intuition), closed-form solution, the Greeks (delta, gamma, theta, vega), and why volatility is the only unknown.

2025-08-29

Technical Analysis & Candlestick Patterns: Reading the Market's Body Language

OHLC data, candlestick anatomy, classic patterns like doji and engulfing, support and resistance levels, moving averages, Bollinger Bands, and why technical analysis remains controversial yet widely practiced.

2025-09-06

Reading Charts in Finance: A Visual Toolkit

A survey of the major chart types used in quantitative finance, covering candlestick, line, area, bar, scatter, radar, and waterfall charts, with guidance on what each shows, when to use it, and common misreadings.