Multiple Linear Least Squares Minimization#
In Numerical Methods/Curve Fitting/Linear Regression/Linear Least Squares Minimization, we considered the linear functional relation between two measurable variables, \(x\) and \(y\):
where \(a_0\) and \(a_1\) are unknown parameters to be determined.
On this page we will look at the more generic case, where we solve the problem for an arbitrary number of independent variables and parameters.
Two Independent Variables#
Let’s start by solving this problem for three measurable variables: \(y\), \(x_1\) and \(x_2\), in the linear functional relation:
where \(a_0\), \(a_1\) and \(a_2\) are unknown coefficients.
Consider a data set of measured \((x_{1,~i}, x_{2,~i}, y_i)\) pairs for \(i = 1, 2, 3, \dots , N\). If we attribute the dispersion of this data from the functional relation to error in the \(y_i\) terms, \(\epsilon_i\), then we can relate the data points with:
The sum of errors squared is given by:
We want to minimize \(S\) with respect to each of the constants, \(a_0\), \(a_1\) and \(a_2\):
,
and
Re-arranging the above equations and using our statistical notation yields:
,
and
This time algebraic manipulation is a lot more work, instead we shall use a matrix equation (which will serve us better in the more generic case to come). The matrix equation representation is:
This can easily be solved numerically using:
Worked Example - Cepheid Variables
You now have all you need to find the unknown coefficients for the full functional relation of the magnitude (\(M\)), period (\(P\)) and color (\(B-V\)) of the Cepheid variables:
using the same data file as before. (You should find the values \(a_0 = -2.15\) mag, \(a_1 = -3.12\) mag and \(a_2 = 1.49\)). You may wish to use NumPy matrices, which have the inverse property I
(\(\boldsymbol{X}^{-1}\) is achieved using X.I
, for the appropriately defined X
matrix), and can be multiplied directly
Try this yourself before reading the solution that follows.
Solution:
As in the Numerical Methods/Linear Regression Algorithms/Linear Least Squares Minimization worked example, we shall read in the data using numpy.loadtxt()
:
import numpy as np
import matplotlib.pyplot as plt
logP, M, color = np.loadtxt('./data/cepheid_data.csv', delimiter = ',', skiprows = 1, unpack = True)
Here we have:
We will now proceed to construct the matrices using the data. Starting with simpler \(\boldsymbol{Y}\) matrix:
#Defining Y as a column matrix
Y = np.matrix([
[np.mean(M)],
[np.mean(logP * M)],
[np.mean(color * M)]
])
Note that \(\boldsymbol{X}\) is symmetric about the diagonal, i.e. \(\boldsymbol{X}_{k~l} = \boldsymbol{X}_{l~k}\), we will make use of this to reduce the number of calculations we need to perform.
X = np.matrix(np.ones( (3, 3) )) #Using an array generating function
# X[0, 0] is just 1, so we leave it
#The first row
X[0, 1] = np.mean(logP)
X[0, 2] = np.mean(color)
#The first column (which is the transpose of the first row)
X[1, 0] = X[0, 1]
X[2, 0] = X[0, 2]
#The diagonal
X[1, 1] = np.mean(logP * logP)
X[2, 2] = np.mean(color * color)
#The off-diagonal elements
X[1, 2] = np.mean(logP * color)
X[2, 1] = X[1, 2]
Now we calculate the \(\boldsymbol{A}\) matrix:
A = X.I * Y
#Printing the constants
for i in range(3):
print(f'a{i+1} =', '{:.3}'.format(A[i,0]) )
a1 = -2.15
a2 = -3.12
a3 = 1.49
Arbitrarily Many Independent Variables#
Consider a linear functional relation between measurable variables \(x_1\), \(x_2\), \(x_3\), \(\dots\), \(x_m\) and \(y\):
where \(a_0\), \(a_1\), \(\dots\) and \(a_m\) are unknown constants.
Suppose we have a data set of measured \((x_{1,~i}, x_{2,~i}, \dots, x_{mi}, y_i)\) values for \(i = 1, 2, 3, \dots, N\). As before, we assume that the dispersion in our data from the functional relation is due to error in the \(y_i\) data points only. Therefore we can write the relation between our data points as:
The sum of errors squared can thus be written as:
We want to find the values of \(a_0\), \(a_1\), \(\dots\) and \(a_m\) which gives us the minimum value of \(S\). Minimizing \(S\) with respect to \(a_0\) gives us:
Distributing the sum over \(i\) amongst the terms:
Dividing by \(N\):
Using our stats notation:
Now, let’s minimize \(S\) with respect to one of the \(a_k\) for \(k = 1, 2, \dots, m\), following a similar line of algebraic manipulation as above:
Writing the results for \(a_0\) and \(a_k\) (\(k = 1,\dots,m\)) into a system of equations, expanding the sum over \(j\):
To solve these equations numerically, we can reformulate these equations into a matrix equation:
Notice that the left most matrix is symmetric about the diagonal, this can come in handy when computing the matrix elements. As before, this equation can be solved for the \(a_i\) by inverting the left most matrix, i.e.
Python Implementation#
Let’s work on a Python implementation of this solution. You may want to try it yourself before reading further. In order to verify our implementation we will use the Cepheid data we’ve used so far, though we shall design it to support any number of \(x_j\) variables.
We will start by designing a function that takes the data in as two arguments:
as NumPy arrays.
Calculating Expectation Values#
Note that for each of the sums along the data sets (\(\sum_{i = 1}^N\)), we will be summing along the rows. For example, for the quantity:
we can use:
np.mean(x_data[0, :])
and for
we can use
np.mean(x_data[0, :] * x_data[1, :])
where we have made use of NumPy array’s vectorized operations.
Constructing the \(\boldsymbol{X}\) Matrix#
Now, let’s break down the structure of the matrix:
the dimensions of this matrix is \((m+1)\times(m+1)\), we can get the value for \(m\) from the dimensions of the x_data
array:
m = x_data.shape[0]
from this we can create a matrix of ones, which we will populate later:
X = np.array(np.ones((m+1, m+1))
Now, as we have noted before, \(\boldsymbol{X}\) is a symmetric matrix. That is for for row \(k\) and column \(l\), \(\boldsymbol{X}_{k l} = \boldsymbol{X}_{l k}\). We only need to construct one of the triangles of the matrix, the other is obtained for free.
Let’s work with the upper triangle of the matrix. Here there are 3 regions with distinguishable structures
The first row
The diagonal
The remaining triangle
The first element of the matrix is just one. The remainder of the first row is simply the expectation value of each of the \(x_j\):
and
Note that here we are indexing \(\boldsymbol{X}\) from 0 to better translate it to code (keep in mind that the x_data
array also starts with a 0 index, so the \(x_l\) data is in row \(l - 1\)):
for l in range(m):
X[0, l + 1] = np.mean(x_data[l, :])
# Setting the values for the first column
# remember that X[k, l] = X[l, k]
X[l + 1, 0] = X[0, l + 1]
Now, consider the triangle off of the diagonal. That is the region:
This region exhibits the pattern:
The diagonal has a fairly simple pattern, starting from (row, column) \((1,1)\):
Note, however, that this is a special case of the rules for constructing region 3. We can therefore combine regions 2 and 3 with the rule:
In the code this becomes:
# Inner matrix
for k in range(m):
for l in range(k, m):
X[k + 1, l + 1] = np.mean( x_data[k, :] * x_data[l, :] )
#Setting the value for the lower triangle
X[l + 1, k + 1] = X[k + 1, l + 1]
That covers the \(\boldsymbol{X}\) matrix.
Constructing the \(\boldsymbol{Y}\) Matrix#
Now let’s construct the matrix:
This is fairly straight forward, with
and
#Creating the Y column matrix:
Y = np.matrix( np.zeros( (m + 1, 1) ) )
#First entry
Y[0, 0] = np.mean(y_data)
#The remainder of the entries
for k in range(m):
Y[k + 1, 0] = np.mean( x_data[k, :] * y_data )
Finding Matrix \(\boldsymbol{A}\) (Or Solving For the \(a_j\))#
Lastly, to solve for our \(a_j\) values, we consider the matrix:
This fits into the matrix equation
where we’ve already constructed \(\boldsymbol{X}\) and \(\boldsymbol{Y}\). All that’s left is to solve the equation be inverting \(\boldsymbol{X}\):
To achieve this numerically, we simply take the inverse of X
, X.I
:
#Finding A:
A = X.I*Y
This A
matrix is a column matrix. As a bonus, if we wanted to turn it into a one-dimensional array, we can use the flatten()
method:
np.array(A).flatten()
Putting It All Together#
Putting this all together into a function:
import numpy as np
import matplotlib.pyplot as plt
def least_squares(y_data, x_data):
m = x_data.shape[0]
X = np.matrix(np.ones((m+1, m+1)))
#First row
for l in range(m):
X[0, l + 1] = np.mean(x_data[l, :])
# Setting the values for the first column
# remember that X[k, l] = X[l, k]
X[l + 1, 0] = X[0, l + 1]
# Upper triangle
for k in range(m):
for l in range(k, m):
X[k + 1, l + 1] = np.mean( x_data[k, :] * x_data[l, :] )
#Setting the value for the lower triangle
X[l + 1, k + 1] = X[k + 1, l + 1]
#Creating the Y column matrix:
Y = np.matrix( np.zeros( (m + 1, 1) ) )
#First entry
Y[0, 0] = np.mean(y_data)
#The remainder of the entries
for k in range(m):
Y[k + 1, 0] = np.mean( x_data[k, :] * y_data )
#Finding A:
A = X.I*Y
return np.array(A).flatten()
Worked Example - Applying the Solution to the Cepheid Variables Data
Now, lets apply this function to the Cepheid Variables data to find the unknown coefficients for the functional relation of the magnitude (\(M\)), period (\(P\)) and color (\(B-V\)):
We’ll unpack the data as in the Numerical Methods/Linear Regression Algorithms/Linear Least Squares Minimization example, and then pack it into the structure that is required by the function.
#Reading the data
logP, M, color = np.loadtxt('./data/cepheid_data.csv', delimiter = ',', skiprows = 1, unpack = True)
a_arr = least_squares(M, np.array([logP, color]))
for i, a in enumerate(a_arr):
print(f'a_{i} =', '{:.3}'.format(a))
a_0 = -2.15
a_1 = -3.12
a_2 = 1.49
Which matches the results from the worked example above.
It is left as an exercise to the reader to impliment a solution for multiple linear \(\chi^2\) minimization.