16.1.3 Multiple Linear Least Squares Minimization#

In 16.1.1 Linear Least Squares Minimization, we considered the linear functional relation between two measurable variables, \(x\) and \(y\):

\[ y = a_0 + a_1 x \]

where \(a_0\) and \(a_1\) are unknown parameters to be determined.

On this page we will look at the more generic case, where we solve the problem for an arbitrary number of independent variables and parameters.

Two Independent Variables#

Let’s start by solving this problem for three measurable variables: \(y\), \(x_1\) and \(x_2\), in the linear functional relation:

\[ y = a_0 + a_1 x_1 + a_2 x_2 \]

where \(a_0\), \(a_1\) and \(a_2\) are unknown coefficients.

Consider a data set of measured \((x_{1,~i}, x_{2,~i}, y_i)\) pairs for \(i = 1, 2, 3, \dots , N\). If we attribute the dispersion of this data from the functional relation to error in the \(y_i\) terms, \(\epsilon_i\), then we can relate the data points with:

\[\begin{align*} y_i + \epsilon_i &= a_0 + a_1 x_{1,~i} + a_2 x_{2,~ i}\\ \therefore \epsilon_i &= a_0 + a_1 x_{1,~ i} + a_2 x_{2,~ i} - y_i\\ \end{align*}\]

The sum of errors squared is given by:

\[\begin{align*} S &= \sum_{i = 1}^N \epsilon_i^2 \\ &= \sum_{i=1}^N (a_0 + a_1 x_{1,~i} + a_2 x_{2,~i} - y_i)^2\\ \end{align*}\]

We want to minimize \(S\) with respect to each of the constants, \(a_0\), \(a_1\) and \(a_2\):

\[ \frac{\partial S}{\partial a_0} = 2 \sum_{i=0}^n (a_0 + a_1 x_{1, ~i} + a_2 x_{2, ~i} - y_i) = 0 \]

,

\[ \frac{\partial S}{\partial a_1} = 2 \sum_{i=0}^n (a_0 + a_1 x_{1,~i} + a_2 x_{2,~i} - y_i)x_{1,~i} = 0 \]

and

\[ \frac{\partial S}{\partial a_2} = 2 \sum_{i=0}^n (a_0 + a_1 x_{1,~i} + a_2 x_{2,~i} - y_i)x_{2,~i} = 0 \]

Re-arranging the above equations and using our statistical notation yields:

\[ a_0 + a_1 \langle x_1 \rangle + a_2 \langle x_2 \rangle = \langle y \rangle \]

,

\[\begin{split} a_0\langle x_1\rangle + a_1\langle x_1^2\rangle + a_2\langle x_1 x_2 \rangle = \langle x_1 y \rangle\\ \end{split}\]

and

\[ a_0\langle x_2\rangle + a_1\langle x_1 x_2\rangle + a_2\langle x_2^{~~2}\rangle = \langle x_2 y\rangle \]

This time algebraic manipulation is a lot more work, instead we shall use a matrix equation (which will serve us better in the more generic case to come). The matrix equation representation is:

\[\begin{split} \begin{pmatrix} 1 &\langle x_1\rangle &\langle{x_2}\rangle\\ \langle{x_1}\rangle &\langle{x_1^2}\rangle &\langle{x_1x_2}\rangle\\ \langle{x_2}\rangle &\langle{x_1x_2}\rangle &\langle{x_2^2}\rangle\\ \end{pmatrix} \begin{pmatrix} a_0\\ a_1\\ a_2\\ \end{pmatrix} = \begin{pmatrix} \langle{y}\rangle\\ \langle{x_1 y}\rangle\\ \langle{x_2 y}\rangle \end{pmatrix} \end{split}\]

This can easily be solved numerically using:

(14)#\[\begin{split} \begin{align} \boldsymbol{X}\boldsymbol{A} &= \boldsymbol{Y}\\ \therefore \boldsymbol{A} &= \boldsymbol{X}^{-1} \boldsymbol{Y}\\ \end{align} \end{split}\]

Worked Example - Cepheid Variables

You now have all you need to find the unknown coefficients for the full functional relation of the magnitude (\(M\)), period (\(P\)) and color (\(B-V\)) of the Cepheid variables:

\[ M = a_0 + a_1 \log P + a_2 (B - V) \]

using the same data file as before. (You should find the values \(a_0 = -2.15\) mag, \(a_1 = -3.12\) mag and \(a_2 = 1.49\)). You may wish to use NumPy matrices, which have the inverse property I (\(\boldsymbol{X}^{-1}\) is achieved using X.I, for the appropriately defined X matrix), and can be multiplied directly

Try this yourself before reading the solution that follows.

Solution:

As in the 16.1.1 Linear Least Squares Minimization worked example, we shall read in the data using numpy.loadtxt():

import numpy as np
import matplotlib.pyplot as plt

logP, M, color = np.loadtxt('./data/cepheid_data.csv', delimiter = ',', skiprows = 1, unpack = True)

Here we have:

\[\begin{align*} y &= M\\ x_1 &= logP\\ x_2 &= (B - V)\\ \end{align*}\]

We will now proceed to construct the matrices using the data. Starting with simpler \(\boldsymbol{Y}\) matrix:

Hide code cell content
#Defining Y as a column matrix
Y = np.matrix([
    [np.mean(M)],
    [np.mean(logP * M)],
    [np.mean(color * M)]
])

Note that \(\boldsymbol{X}\) is symmetric about the diagonal, i.e. \(\boldsymbol{X}_{k~l} = \boldsymbol{X}_{l~k}\), we will make use of this to reduce the number of calculations we need to perform.

Hide code cell content
X = np.matrix(np.ones( (3, 3) )) #Using an array generating function

# X[0, 0] is just 1, so we leave it

#The first row
X[0, 1] = np.mean(logP)
X[0, 2] = np.mean(color)

#The first column (which is the transpose of the first row)
X[1, 0] = X[0, 1]
X[2, 0] = X[0, 2]

#The diagonal
X[1, 1] = np.mean(logP * logP)
X[2, 2] = np.mean(color * color)

#The off-diagonal elements
X[1, 2] = np.mean(logP * color)
X[2, 1] = X[1, 2]

Now we calculate the \(\boldsymbol{A}\) matrix:

Hide code cell content
A = X.I * Y

#Printing the constants

for i in range(3):
    print(f'a{i+1} =', '{:.3}'.format(A[i,0]) )
a1 = -2.15
a2 = -3.12
a3 = 1.49

Arbitrarily Many Independent Variables#

Consider a linear functional relation between measurable variables \(x_1\), \(x_2\), \(x_3\), \(\dots\), \(x_m\) and \(y\):

\[\begin{split} \begin{align*} y &= a_0 + a_1 x_1 + a_2 x_2 + \dots + a_m x_m\\ &= a_0 + \sum_{j = 1}^m a_j x_j\\ \end{align*} \end{split}\]

where \(a_0\), \(a_1\), \(\dots\) and \(a_m\) are unknown constants.

Suppose we have a data set of measured \((x_{1,~i}, x_{2,~i}, \dots, x_{mi}, y_i)\) values for \(i = 1, 2, 3, \dots, N\). As before, we assume that the dispersion in our data from the functional relation is due to error in the \(y_i\) data points only. Therefore we can write the relation between our data points as:

\[ y_i + \epsilon_i = a_0 + \sum_{j = 1}^m a_j x_{j, ~i} \]

The sum of errors squared can thus be written as:

\[ S = \sum_{i=1}^N \bigg(a_0 + \bigg(\sum_{j=1}^m a_j x_{j,~i} \bigg) - y_i\bigg)^2 \]

We want to find the values of \(a_0\), \(a_1\), \(\dots\) and \(a_m\) which gives us the minimum value of \(S\). Minimizing \(S\) with respect to \(a_0\) gives us:

\[ \frac{\partial S}{\partial a_0} = 2 \sum_{i=1}^N \bigg(a_0 + \bigg(\sum_{j=1}^m a_j x_{j,~i} \bigg) - y_i\bigg) = 0 \]

Distributing the sum over \(i\) amongst the terms:

\[ \therefore N a_0 + \bigg(\sum_{j=1}^m a_j \sum_{i=1}^N x_{j,~i} \bigg) - \sum_{i=1}^N y_i = 0 \]

Dividing by \(N\):

\[ \therefore a_0 + \bigg(\sum_{j=1}^m a_j \frac{1}{N}\sum_{i=1}^N x_{j,~i} \bigg) - \frac{1}{N}\sum_{i=1}^N y_i = 0 \]

Using our stats notation:

\[ \therefore a_0 + \sum_{j=1}^m a_j \langle{x_{j}}\rangle = \langle{y}\rangle \]

Now, let’s minimize \(S\) with respect to one of the \(a_k\) for \(k = 1, 2, \dots, m\), following a similar line of algebraic manipulation as above:

\[\begin{split} \begin{align*} \frac{\partial S}{\partial a_k} = \sum_{i=1}^N 2 x_{k,~i} \bigg(a_0 + \bigg(\sum_{j=1}^m a_j x_{j,~i} \bigg) - y_i\bigg) &= 0\\ \therefore a_0 \sum_{i=1}^N x_{k, ~i} + \sum_{j=1}^m a_j \bigg(\sum_{i=1}^N x_{k,~i} x_{j,~i} \bigg) - \sum_{i=1}^N x_{k,~i} y_i &= 0\\ \therefore a_0 \langle{x_{k}}\rangle + \sum_{j=1}^m a_j \langle{x_k x_j}\rangle &= \langle{x_k y}\rangle\\ \end{align*} \end{split}\]

Writing the results for \(a_0\) and \(a_k\) (\(k = 1,\dots,m\)) into a system of equations, expanding the sum over \(j\):

\[\begin{split} \begin{matrix} a_0 &+& a_1 \langle{x_1}\rangle &+& a_2 \langle{x_2}\rangle &+& \dots &+& a_m \langle{x_m}\rangle &=& \langle{y}\rangle\\ a_0 \langle{x_{1}}\rangle &+& a_1 \langle{x_1~^2}\rangle &+& a_2 \langle{x_1 x_2}\rangle &+& \cdots &+& a_m \langle{x_1 x_m}\rangle &=& \langle{x_1 y}\rangle\\ a_0 \langle{x_{2}}\rangle &+& a_1 \langle{x_2 x_1}\rangle &+& a_2 \langle{x_2~^2}\rangle &+& \cdots &+& a_m \langle{x_2 x_m}\rangle &=& \langle{x_2 y}\rangle\\ a_0 \langle{x_{3}}\rangle &+& a_1 \langle{x_3 x_1}\rangle &+& a_2 \langle{x_3 x_2}\rangle &+& \cdots &+& a_m \langle{x_3 x_m}\rangle &=& \langle{x_3 y}\rangle\\ \vdots ~~~~ &+& ~~~~\vdots &+& ~~~~\vdots &+& \ddots &+& ~~~~\vdots &=& ~~~\vdots\\ a_0 \langle{x_{m}}\rangle &+& a_1 \langle{x_m x_1}\rangle &+& a_2 \langle{x_m x_2}\rangle &+& \cdots &+& a_m \langle{x_m~^2}\rangle &=& \langle{x_m y}\rangle\\ \end{matrix} \end{split}\]

To solve these equations numerically, we can reformulate these equations into a matrix equation:

\[\begin{split} \begin{pmatrix} 1 & \langle{x_1}\rangle & \langle{x_2}\rangle & \cdots & \langle{x_m}\rangle\\ \langle{x_1}\rangle & \langle{x_1^{~~2}}\rangle & \langle{x_1 x_2}\rangle & \cdots & \langle{x_1 x_m}\rangle\\ \langle{x_2}\rangle & \langle{x_2 x_1}\rangle & \langle{x_2^{~~2}}\rangle & \cdots & \langle{x_2 x_m}\rangle\\ \vdots & \vdots & \vdots & \ddots & \vdots\\ \langle{x_m}\rangle & \langle{x_m x_1}\rangle & \langle{x_m x_2}\rangle & \cdots & \langle{x_m^{~~2}}\rangle\\ \end{pmatrix} \begin{pmatrix} a_0\\ a_1\\ a_2\\ \vdots\\ a_m\\ \end{pmatrix} = \begin{pmatrix} \langle{y}\rangle\\ \langle{x_1 y}\rangle\\ \langle{x_2 y}\rangle\\ \vdots\\ \langle{x_m y}\rangle\\ \end{pmatrix} \end{split}\]

Notice that the left most matrix is symmetric about the diagonal, this can come in handy when computing the matrix elements. As before, this equation can be solved for the \(a_i\) by inverting the left most matrix, i.e.

\[\begin{split} \begin{align*} \boldsymbol{X} \boldsymbol{A} &= \boldsymbol{Y}\\ \therefore \boldsymbol{A} &= \boldsymbol{X}^{-1} \boldsymbol{Y}\\ \end{align*} \end{split}\]

Python Implementation#

Let’s work on a Python implementation of this solution. You may want to try it yourself before reading further. In order to verify our implementation we will use the Cepheid data we’ve used so far, though we shall design it to support any number of \(x_j\) variables.

We will start by designing a function that takes the data in as two arguments:

\[ \texttt{y\_data} = [y_{1}, y_{2}, y_{3}, \dots, y_{N}] \]
\[\begin{split} \begin{matrix} \texttt{x\_data} = [ & [ & x_{1,~1}, & x_{1,~2}, & \cdots, & x_{1,~N} &], \\ & [ & x_{2,~1}, & x_{2,~2}, & \cdots, & x_{2,~N} &],\\ & [ & x_{3,~1}, & x_{3,~2}, & \cdots, & x_{3~N} &],\\ & [ & \vdots~~, & \vdots~~, & \ddots, & \vdots &],\\ & [ & x_{m,~1}, & x_{m,~2}, & \cdots, & x_{m,~N} &]]\\ \end{matrix} \end{split}\]

as NumPy arrays.

Calculating Expectation Values#

Note that for each of the sums along the data sets (\(\sum_{i = 1}^N\)), we will be summing along the rows. For example, for the quantity:

\[ \langle x_1 \rangle = \frac{1}{N} \sum_{i = 1}^N x_{1,~i} \]

we can use:

np.mean(x_data[0, :])

and for

\[ \langle x_1 x_2 \rangle = \frac{1}{N} \sum_{i = 1}^N x_{1,~ i} x_{2,~ i} \]

we can use

np.mean(x_data[0, :] * x_data[1, :])

where we have made use of NumPy array’s vectorized operations.

Constructing the \(\boldsymbol{X}\) Matrix#

Now, let’s break down the structure of the matrix:

\[\begin{split} \boldsymbol{X} = \begin{pmatrix} 1 & \langle{x_1}\rangle & \langle{x_2}\rangle & \langle{x_3}\rangle & \cdots & \langle{x_m}\rangle\\ \langle{x_1}\rangle & \langle{x_1^{~~2}}\rangle & \langle{x_1 x_2}\rangle & \langle{x_1 x_3}\rangle & \cdots & \langle{x_1 x_m}\rangle\\ \langle{x_2}\rangle & \langle{x_2 x_1}\rangle & \langle{x_2^{~~2}}\rangle & \langle{x_2 x_3}\rangle & \cdots & \langle{x_2 x_m}\rangle\\ \langle{x_3}\rangle & \langle{x_3 x_1}\rangle & \langle{x_3 x_2}\rangle & \langle{x_3^{~2}}\rangle & \cdots & \langle{x_3 x_m}\rangle\\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots\\ \langle{x_m}\rangle & \langle{x_m x_1}\rangle & \langle{x_m x_2}\rangle & \langle{x_m x_3}\rangle & \cdots & \langle{x_m^{~~2}}\rangle\\ \end{pmatrix} \end{split}\]

the dimensions of this matrix is \((m+1)\times(m+1)\), we can get the value for \(m\) from the dimensions of the x_data array:

m = x_data.shape[0]

from this we can create a matrix of ones, which we will populate later:

X = np.array(np.ones((m+1, m+1))

Now, as we have noted before, \(\boldsymbol{X}\) is a symmetric matrix. That is for for row \(k\) and column \(l\), \(\boldsymbol{X}_{k l} = \boldsymbol{X}_{l k}\). We only need to construct one of the triangles of the matrix, the other is obtained for free.

Let’s work with the upper triangle of the matrix. Here there are 3 regions with distinguishable structures

  1. The first row

  2. The diagonal

  3. The remaining triangle

The first element of the matrix is just one. The remainder of the first row is simply the expectation value of each of the \(x_j\):

\[ \boldsymbol{X}_{0 ~0} = 1 \]

and

\[ \boldsymbol{X}_{0~ l} = \langle{x_l}\rangle ~~~~ \text{where}~ l = 1, 2, \dots, m \]

Note that here we are indexing \(\boldsymbol{X}\) from 0 to better translate it to code (keep in mind that the x_data array also starts with a 0 index, so the \(x_l\) data is in row \(l - 1\)):

for l in range(m):
    X[0, l + 1] = np.mean(x_data[l, :])
    
    # Setting the values for the first column
    # remember that X[k, l] = X[l, k]
    X[l + 1, 0] = X[0, l + 1]

Now, consider the triangle off of the diagonal. That is the region:

\[\begin{split} \begin{pmatrix} - & - & - & - & \cdots & - \\ - & - & \langle{x_1 x_2}\rangle & \langle{x_1 x_3}\rangle & \cdots & \langle{x_1 x_m}\rangle\\ - & - & - & \langle{x_2 x_3}\rangle & \cdots & \langle{x_2 x_m}\rangle\\ - & - & - & - & \cdots & \langle{x_3 x_m}\rangle\\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\ - & - & - & - & \cdots & -\\ \end{pmatrix} \end{split}\]

This region exhibits the pattern:

\[ \boldsymbol{X}_{k l} = \langle{x_k x_l}\rangle ~~~~ \text{where}~ 1 \leq k \leq m ~~\text{and}~~ l > k \]

The diagonal has a fairly simple pattern, starting from (row, column) \((1,1)\):

\[ \boldsymbol{X}_{k k} = \langle{x_k~^2}\rangle ~~~~ \text{where}~ 1 \leq k \leq m \]

Note, however, that this is a special case of the rules for constructing region 3. We can therefore combine regions 2 and 3 with the rule:

\[ \boldsymbol{X}_{k l} = \langle{x_k x_l}\rangle ~~~~\text{where}~ 1 \leq k \leq m ~~\text{and}~~ l \geq k \]

In the code this becomes:

# Inner matrix

for k in range(m):
    for l in range(k, m):
        #Calculating value for the upper triangle
        X[k + 1, l + 1] = np.mean( x_data[k, :] * x_data[l, :] )
        
        #Setting the same value for the lower triangle
        X[l + 1, k + 1] = X[k + 1, l + 1]

That covers the \(\boldsymbol{X}\) matrix.

Constructing the \(\boldsymbol{Y}\) Matrix#

Now let’s construct the matrix:

\[\begin{split} \boldsymbol{Y} = \begin{pmatrix} \langle{y}\rangle\\ \langle{x_1 y}\rangle\\ \langle{x_2 y}\rangle\\ \vdots\\ \langle{x_m y}\rangle\\ \end{pmatrix} \end{split}\]

This is fairly straight forward, with

\[ \boldsymbol{Y}_{0~ 0} = \langle{y}\rangle \]

and

\[ \boldsymbol{Y}_{k~ 0} = \langle{x_k y}\rangle ~~~~\text{where}~ k = 1, \dots, m \]
#Creating the Y column matrix:
Y = np.matrix( np.zeros( (m + 1, 1) ) )

#First entry
Y[0, 0] = np.mean(y_data)

#The remainder of the entries
for k in range(m):
    Y[k + 1, 0] = np.mean( x_data[k, :] * y_data )

Finding Matrix \(\boldsymbol{A}\) (Or Solving For the \(a_j\))#

Lastly, to solve for our \(a_j\) values, we consider the matrix:

\[\begin{split} \boldsymbol{A}= \begin{pmatrix} a_0 \\ a_1\\ a_2\\ \vdots\\ a_m\\ \end{pmatrix} \end{split}\]

This fits into the matrix equation

\[ \boldsymbol{X}\boldsymbol{A} = \boldsymbol{Y} \]

where we’ve already constructed \(\boldsymbol{X}\) and \(\boldsymbol{Y}\). All that’s left is to solve the equation be inverting \(\boldsymbol{X}\):

(15)#\[ \boldsymbol{A} = \boldsymbol{X}^{-1} \boldsymbol{Y} \]

To achieve this numerically, we simply take the inverse of X, X.I:

#Finding A:

A = X.I*Y

This A matrix is a column matrix. As a bonus, if we wanted to turn it into a one-dimensional array, we can use the flatten() method:

np.array(A).flatten()

Putting It All Together#

Putting this all together into a function:

import numpy as np


def least_squares(y_data, x_data):
    '''Least squares minimization to find the constants of a linear fit'''

    m = x_data.shape[0]

    X = np.matrix(np.ones((m+1, m+1)))

    #First row
    for l in range(m):
        X[0, l + 1] = np.mean(x_data[l, :])

        # Setting the values for the first column
        # remember that X[k, l] = X[l, k]
        X[l + 1, 0] = X[0, l + 1]


    # Upper triangle
    for k in range(m):
        for l in range(k, m):
            X[k + 1, l + 1] = np.mean( x_data[k, :] * x_data[l, :] )

            #Setting the value for the lower triangle
            X[l + 1, k + 1] = X[k + 1, l + 1]


    #Creating the Y column matrix:
    Y = np.matrix( np.zeros( (m + 1, 1) ) )

    #First entry
    Y[0, 0] = np.mean(y_data)

    #The remainder of the entries
    for k in range(m):
        Y[k + 1, 0] = np.mean( x_data[k, :] * y_data )
    
    #Finding A:

    A = X.I*Y
    
    return np.array(A).flatten()

Note that the function above could do with some optimisation. For one, some of the loops can be combined (each loop adds to the runtime of the code). Alternatively, all of the loops can be replaced with clever use of NumPy functions and arrays. All of these are left as an exercise to the reader.

Worked Example - Applying the Solution to the Cepheid Variables Data

Now, lets apply this function to the Cepheid Variables data to find the unknown coefficients for the functional relation of the magnitude (\(M\)), period (\(P\)) and color (\(B-V\)):

\[ M = a_0 + a_1 \log P + a_2 (B - V) \]

We’ll unpack the data as in the 16.1.1 Linear Least Squares Minimization example, and then pack it into the structure that is required by the function.

#Reading the data
logP, M, color = np.loadtxt('./data/cepheid_data.csv', delimiter = ',', skiprows = 1, unpack = True)

a_arr = least_squares(M, np.array([logP, color]))

for i, a in enumerate(a_arr):
    print(f'a_{i} =', '{:.3}'.format(a))
a_0 = -2.15
a_1 = -3.12
a_2 = 1.49

Which matches the results from the worked example above.

Determining Uncertainties#

Now, lets consider how we can determine the uncertainty of the \(y\) variable and \(a_j\) constants when we have arbitrarily many \(x_j\) variables. As before, the uncertainty for \(y\) can be determined as in 16.1.1 Linear Least Squares Minimization (assuming that each \(y_i\) has uniform uncertainty and is normally distributed):

\[ u(y)^2 = \sigma^2 = \sum_{i=0}^{N-1} \frac{\left(a_0 + a_1 x_{1~i} + \cdots + a_{M-1~i} - y_i\right)^2}{N - M} \]

Now, let’s find the uncertainties of the \(a_j\) constants. Using Equation (15) as the function defining the \(a_j\) constants given \(x_i\) and \(y_i\) variables, and the combined uncertainties equation for quantities with uncorrelated uncertainties (28), the uncertainty for the constants can be defined using the matrix equation:

\[ \boldsymbol{u(A)}^{\circ 2} = \sum_{i=0}^{N-1} \left( \frac{\partial}{\partial y_i} \boldsymbol{X}^{-1} \boldsymbol{Y} \right)^2 u(y_i)^{\circ 2} \]

where we note that the square operation: \(\circ 2\) is applied element-wise, and not using matrix multiplication.

Using the product rule the derivative can be distributed into the matrix mutlitplication:

\[\begin{align*} \frac{\partial}{\partial y_i} \left( \boldsymbol{X}^{-1} \boldsymbol{Y} \right) &= \left( \frac{\partial \boldsymbol{X}^{-1}}{\partial y_i} \right) \boldsymbol{Y} + \boldsymbol{X}^{-1} \left( \frac{\partial \boldsymbol{Y}}{\partial y_i} \right)\\ \\ &= \boldsymbol{X}^{-1} \frac{\partial \boldsymbol{Y}}{\partial y_i} \end{align*}\]

noting that the \(\boldsymbol{X}^{-1}\) matrix has no dependence on any of the \(y_i\). Now to resolve \({\partial}\boldsymbol{Y}/{\partial y_i}\):

\[\begin{split} \frac{\partial \boldsymbol{Y}}{\partial y_i} = \frac{\partial}{\partial y_i} \begin{pmatrix} \langle{y}\rangle\\ \langle{x_1 y}\rangle\\ \langle{x_2 y}\rangle\\ \vdots\\ \langle{x_m y}\rangle\\ \end{pmatrix} = \frac{1}{N} \begin{pmatrix} 1\\ x_{1,~i}\\ x_{2,~i}\\ \vdots\\ x_{m,~i}\\ \end{pmatrix} \end{split}\]

for which we will define:

\[\begin{split} \boldsymbol{\tilde{x}}_{i} \equiv \begin{pmatrix} 1\\ x_{1,~i}\\ x_{2,~i}\\ \vdots \\ x_{m,~i} \end{pmatrix} \end{split}\]

Putting this back into our equation for \(\boldsymbol{u(a_j)}\) and remembering that the \(u(y_i) = \sigma\) are uniform:

\[ \boldsymbol{u(A)}^{\circ 2} = \frac{\sigma^2}{N^2} \sum_{i=0}^{N-1} \left( \boldsymbol{X}^{-1} \boldsymbol{\tilde{x}}_{i}\right)^{\circ 2} \]

Python Implementation#

Now, lets create a Python implementation to calculate the uncertainties of \(u(y)\) and \(u(a_j)\) as defined above. We will build-up on our previous solutions that have already constructed the \(\boldsymbol(X)\) matrix and determined \(\boldsymbol{A}\).

Lets start by calculating \(u(y)\). As the inner sum of \(a_j x_j\) terms is of an arbitrary length, we will need to make use of either a loop or np.sum(). Considering that this sum is nested in a sum of data points, it becomes tricky to avoid a loop (though this can be done if you know what you are doing). The easiest way to calculate \(u(y)\) is to use a loop for the sum over data points, and the inner sum to calculate the model value of y for a given data point can be done using np.sum:

#Initializing the variable
u_y = 0

for i in range(y_data.size):
    u_y += (A[0] + np.sum(A[1:] * x_data[:, i]) - y_data[i])**2

u_y = np.sqrt(u_y / (y_data.size - m - 1))

Note that we need A to be a 1D array at this point.

Now, to determine the \(u(a_j)\). The sum over data points is easiest to achieve using a loop through the data points (as before, you can use np.sum() if you construct a \(M \times N\) matrix for the \(\boldsymbol{x_i}\)). Inside this loop, we can construct the \(\boldsymbol{x_i}\) column matrix for each data point and calculate it’s matrix product with \(\boldsymbol{X}^{-1}\).

u_A = np.matrix(np.zeros((m+1, 1)))

for i in range(y_data.size):
    #Column matrix of (1, x1, x2, x3, ..., xm)
    x_mat = np.ones((m+1, 1))
    x_mat[1:, 0] = x_data[:, i]

    #Element-wise power
    u_A += np.power(X.I * x_mat, 2)

#Sqrt is calculated element-wise
u_A = u_y * np.sqrt(u_A / y_data.size)

Now, lets put it together. We will combine all the loops in the origin least_squares() function. We also combine the loops for calculating \(u(y)\) and \(\boldsymbol{u(A)}\).

def least_squares_uncertainty(x_data, y_data):
    '''Least squares minimization to find the constants of a linear fit with uncertainties'''

    #Number of independent variables (note that m+1 constants are needed)
    m = x_data.shape[0]

    X = np.matrix(np.ones((m+1, m+1)))

    #Creating the Y column matrix:
    Y = np.matrix( np.zeros( (m + 1, 1) ) )

    #First entry
    Y[0, 0] = np.mean(y_data)

    for row in range(m):
        #First row
        X[0, row + 1] = np.mean(x_data[row, :])

        # Setting the values for the first column
        # remember that X[k, l] = X[l, k]
        X[row + 1, 0] = X[0, row + 1]

        #Triangles
        for col in range(row, m):
            X[row + 1, col + 1] = np.mean( x_data[row, :] * x_data[col, :] )

            #Setting the value for the lower triangle
            X[col + 1, row + 1] = X[row + 1, col + 1]

        #Y matrix values
        Y[row + 1, 0] = np.mean( x_data[row, :] * y_data )
    
    #Finding A:
    A = X.I*Y
    A = np.array(A).flatten()

    #Calculating the uncertainty of y
    u_y = 0

    #Calculating the uncertainty of A
    u_A = np.matrix(np.zeros((m+1, 1)))

    for i in range(y_data.size):
        u_y += (A[0] + np.sum(A[1:] * x_data[:, i]) - y_data[i])**2
        
        #Column matrix of (1, x1, x2, x3, ..., xm)
        x_mat = np.ones((m+1, 1))
        x_mat[1:, 0] = x_data[:, i]

        #Element-wise power
        u_A += np.power(X.I * x_mat, 2)

    u_y = np.sqrt(u_y / (y_data.size - m - 1))

    u_A = u_y * np.sqrt(u_A) / y_data.size

    return A, np.array(u_A).flatten(), u_y

As before, this solution can be further optimised by removing all of the loops, which is made possible by some smart application of NumPy functions. This is left as an exercise to the reader.

Worked Example - Cepheid Variables Model with Uncertainty

Let’s use the code above to determine the constants and their uncertainties for the Cepheid Variables data.

import numpy as np

logP, M, color = np.loadtxt('./data/cepheid_data.csv', delimiter = ',', skiprows = 1, unpack = True)

Before making use of all the data, let’s compare our results for the general solution to the solution for 1 independent variable from 16.1.1 Linear Least Squares Minimization:

\[ M = a_0 + a_1 \log P \]

Note that the least_squares_uncertainty() function expects its x_data argument to be a 2D array with axis 0 for the different variables and axis 1 for the data points. Even with only 1 variable, we must still ensure that the x_data argument has this shape (1 row and \(N\) columns):

\[ \texttt{x\_data} = [~[\log P_0, ~\log P_1, ~\log P_2, ~\cdots, ~\log P_{N-1}]~] \]
x_data = np.array([logP])

y_data = M

A, u_A, u_M = least_squares_uncertainty(x_data, y_data)

print(f'u(M) = {u_M}')

for i, (a, u_a) in enumerate(zip(A, u_A)):
    print(f'a{i} = {a} ({u_a})')
u(M) = 0.283678527744349
a0 = -1.6190332647937085 (0.15139784299976922)
a1 = -2.5473231297084764 (0.12757667951220308)

As we can see, the values we’ve calculated for the \(a_j\) constants, their uncertainties and the uncertainty of \(M\) are all the same as our solution from 16.1.1 Linear Least Squares Minimization.

Now, let’s apply this to the full model:

\[ M = a_0 + a_1 \log P + a_2 (B - V) \]

for which we will construct the array:

\[\begin{split} \begin{matrix} \texttt{x\_data} = [ & [ & \log P_0, & \log P_1, &\cdots, & \log P_{N-1} &], \\ & [ & (B-V)_0, &(B-V)_1, & \cdots, & (B-V)_{N-1} &]]\\ \end{matrix} \end{split}\]
x_data = np.array([logP, color])

y_data = M

A, u_A, u_y = least_squares_uncertainty(x_data, y_data)

print('u(y)', u_y)

for i, (a, u_a) in enumerate(zip(A, u_A)):
    print(f'a{i} = {a} ({u_a})')
u(y) 0.2537054158692781
a0 = -2.1451588503718906 (0.22347671372965403)
a1 = -3.117332841989028 (0.2238733339614743)
a2 = 1.4856664300002658 (0.5020333709282061)

Multiple Linear \(\chi^2\) Minimization#

In a similar way to the multiple linear least-squares miminization above, you can implement a general solution for multiple linear \(\chi^2\) minimization. This is left as an exercise to the reader.