Multivariate Linear Least Squares Minimization

In Numerical Methods/Linear Regression Algorithms/Linear Least Squares Minimization, we considered the linear functional relation between two measurable variables, x and y:

y=a0+a1x

where a0 and a1 are unknown conditions to be determined.

On this page we will look at the more generic case, where we solve the problem for an arbitrary number of variables and constants.

Three Variables

Let’s start by solving this problem for three measurable variables: y, x1 and x2, in the linear functional relation:

y=a0+a1x1+a2x2

where a0, a1 and a2 are unknown coefficients.

Consider a data set of measured (x1, i,x2, i,yi) pairs for i=1,2,3,,N. If we attribute the dispersion of this data from the functional relation to error in the yi terms, ϵi, then we can relate the data points with:

yi+ϵi=a0+a1x1, i+a2x2, iϵi=a0+a1x1, i+a2x2, iyi

The sum of errors squared is given by:

S=Ni=1ϵ2i=Ni=1(a0+a1x1, i+a2x2, iyi)2

We want to minimize S with respect to each of the constants, a0, a1 and a2:

Sa0=2ni=0(a0+a1x1, i+a2x2, iyi)=0

,

Sa1=2ni=0(a0+a1x1, i+a2x2, iyi)x1, i=0

and

Sa2=2ni=0(a0+a1x1, i+a2x2, iyi)x2, i=0

Re-arranging the above equations and using our statistical notation yields:

a0+a1x1+a2x2=y

,

a0x1+a1x21+a2x1x2=x1y

and

a0x2+a1x1x2+a2x  22=x2y

This time algebraic manipulation is a lot more work, instead we shall use a matrix equation (which will serve us better in the more generic case to come). The matrix equation representation is:

(1x1x2x1x21x1x2x2x1x2x22)(a0a1a2)=(yx1yx2y)

This can easily be solved numerically using:

XA=YA=X1Y
Worked Example - Cepheid Variables

You now have all you need to find the unknown coefficients for the full functional relation of the magnitude (M), period (P) and color (BV) of the Cepheid variables:

M=a0+a1logP+a2(BV)

using the same data file as before. (You should find the values a0=2.15 mag, a1=3.12 mag and a2=1.49). You may wish to use NumPy matrices, which have the inverse property I (X1 is achieved using X.I, for the appropriately defined X matrix), and can be multiplied directly

Try this yourself before reading the solution that follows.

Solution:

As in the Numerical Methods/Linear Regression Algorithms/Linear Least Squares Minimization worked example, we shall read in the data using numpy.loadtxt():

import numpy as np
import matplotlib.pyplot as plt

logP, M, color = np.loadtxt('./data/cepheid_data.csv', delimiter = ',', skiprows = 1, unpack = True)
Copy to clipboard

Here we have:

y=Mx1=logPx2=(BV)

We will now proceed to construct the matrices using the data. Starting with simpler Y matrix:

#Defining Y as a column matrix
Y = np.matrix([
    [np.mean(M)],
    [np.mean(logP * M)],
    [np.mean(color * M)]
])
Copy to clipboard

Note that X is symmetric about the diagonal, i.e. Xk l=Xl k, we will make use of this to reduce the number of calculations we need to perform.

X = np.matrix(np.ones( (3, 3) )) #Using an array generating function

# X[0, 0] is just 1, so we leave it

#The first row
X[0, 1] = np.mean(logP)
X[0, 2] = np.mean(color)

#The first column (which is the transpose of the first row)
X[1, 0] = X[0, 1]
X[2, 0] = X[0, 2]

#The diagonal
X[1, 1] = np.mean(logP * logP)
X[2, 2] = np.mean(color * color)

#The off-diagonal elements
X[1, 2] = np.mean(logP * color)
X[2, 1] = X[1, 2]
Copy to clipboard

Now we calculate the A matrix:

A = X.I * Y

#Printing the constants

for i in range(3):
    print(f'a{i+1} =', '{:.3}'.format(A[i,0]) )
Copy to clipboard
a1 = -2.15
a2 = -3.12
a3 = 1.49
Copy to clipboard

Arbitrarily Many Variables

Consider a linear functional relation between measurable variables x1, x2, x3, , xm and y:

y=a0+a1x1+a2x2++amxm=a0+mj=1ajxj

where a0, a1, and am are unknown constants.

Suppose we have a data set of measured (x1, i,x2, i,,xmi,yi) values for i=1,2,3,,N. As before, we assume that the dispersion in our data from the functional relation is due to error in the yi data points only. Therefore we can write the relation between our data points as:

yi+ϵi=a0+mj=1ajxj, i

The sum of errors squared can thus be written as:

S=Ni=1(a0+(mj=1ajxj, i)yi)2

We want to find the values of a0, a1, and am which gives us the minimum value of S. Minimizing S with respect to a0 gives us:

Sa0=2Ni=1(a0+(mj=1ajxj, i)yi)=0

Distributing the sum over i amongst the terms:

Na0+(mj=1ajNi=1xj, i)Ni=1yi=0

Dividing by N:

a0+(mj=1aj1NNi=1xj, i)1NNi=1yi=0

Using our stats notation:

a0+mj=1ajxj=y

Now, let’s minimize S with respect to one of the ak for k=1,2,,m, following a similar line of algebraic manipulation as above:

Sak=Ni=12xk, i(a0+(mj=1ajxj, i)yi)=0a0Ni=1xk, i+mj=1aj(Ni=1xk, ixj, i)Ni=1xk, iyi=0a0xk+mj=1ajxkxj=xky

Writing the results for a0 and ak (k=1,,m) into a system of equations, expanding the sum over j:

a0+a1x1+a2x2++amxm=ya0x1+a1x1 2+a2x1x2++amx1xm=x1ya0x2+a1x2x1+a2x2 2++amx2xm=x2ya0x3+a1x3x1+a2x3x2++amx3xm=x3y    +    +    ++    =   a0xm+a1xmx1+a2xmx2++amxm 2=xmy

To solve these equations numerically, we can reformulate these equations into a matrix equation:

(1x1x2xmx1x  21x1x2x1xmx2x2x1x  22x2xmxmxmx1xmx2x  2m)(a0a1a2am)=(yx1yx2yxmy)

Notice that the left most matrix is symmetric about the diagonal, this can come in handy when computing the matrix elements. As before, this equation can be solved for the ai by inverting the left most matrix, i.e.

XA=YA=X1Y

Python Implementation

Let’s work on a Python implementation of this solution. You may want to try it yourself before reading further. In order to verify our implementation we will use the Cepheid data we’ve used so far, though we shall design it to support any number of xj variables.

We will start by designing a function that takes the data in as two arguments:

y_data=[y1,y2,y3,,yN]
x_data=[[x1, 1,x1, 2,,x1, N],[x2, 1,x2, 2,,x2, N],[x3, 1,x3, 2,,x3 N],[  ,  ,,],[xm, 1,xm, 2,,xm, N]]

as NumPy arrays.

Calculating Expectation Values

Note that for each of the sums along the data sets (Ni=1), we will be summing along the rows. For example, for the quantity:

x1=1NNi=1x1, i

we can use:

np.mean(x_data[0, :])
Copy to clipboard

and for

x1x2=1NNi=1x1, ix2, i

we can use

np.mean(x_data[0, :] * x_data[1, :])
Copy to clipboard

where we have made use of NumPy array’s vectorized operations.

Constructing the X Matrix

Now, let’s break down the structure of the matrix:

X=(1x1x2x3xmx1x  21x1x2x1x3x1xmx2x2x1x  22x2x3x2xmx3x3x1x3x2x 23x3xmxmxmx1xmx2xmx3x  2m)

the dimensions of this matrix is (m+1)×(m+1), we can get the value for m from the dimensions of the x_data array:

m = x_data.shape[0]
Copy to clipboard

from this we can create a matrix of ones, which we will populate later:

X = np.array(np.ones((m+1, m+1))
Copy to clipboard

Now, as we have noted before, X is a symmetric matrix. That is for for row k and column l, Xkl=Xlk. We only need to construct one of the triangles of the matrix, the other is obtained for free.

Let’s work with the upper triangle of the matrix. Here there are 3 regions with distinguishable structures

  1. The first row

  2. The diagonal

  3. The remaining triangle

The first element of the matrix is just one. The remainder of the first row is simply the expectation value of each of the xj:

X0 0=1

and

X0 l=xl    where l=1,2,,m

Note that here we are indexing X from 0 to better translate it to code (keep in mind that the x_data array also starts with a 0 index, so the xl data is in row l1):

for l in range(m):
    X[0, l + 1] = np.mean(x_data[l, :])
    
    # Setting the values for the first column
    # remember that X[k, l] = X[l, k]
    X[l + 1, 0] = X[0, l + 1]
Copy to clipboard

Now, consider the triangle off of the diagonal. That is the region:

(x1x2x1x3x1xmx2x3x2xmx3xm)

This region exhibits the pattern:

Xkl=xkxl    where 1km  and  l>k

The diagonal has a fairly simple pattern, starting from (row, column) (1,1):

Xkk=xk 2    where 1km

Note, however, that this is a special case of the rules for constructing region 3. We can therefore combine regions 2 and 3 with the rule:

Xkl=xkxl    where 1km  and  lk

In the code this becomes:

# Inner matrix

for k in range(m):
    for l in range(k, m):
        X[k + 1, l + 1] = np.mean( x_data[k, :] * x_data[l, :] )
        
        #Setting the value for the lower triangle
        X[l + 1, k + 1] = X[k + 1, l + 1]
Copy to clipboard

That covers the X matrix.

Constructing the Y Matrix

Now let’s construct the matrix:

Y=(yx1yx2yxmy)

This is fairly straight forward, with

Y0 0=y

and

Yk 0=xky    where k=1,,m
#Creating the Y column matrix:
Y = np.matrix( np.zeros( (m + 1, 1) ) )

#First entry
Y[0, 0] = np.mean(y_data)

#The remainder of the entries
for k in range(m):
    Y[k + 1, 0] = np.mean( x_data[k, :] * y_data )
Copy to clipboard

Finding Matrix A (Or Solving For the aj)

Lastly, to solve for our aj values, we consider the matrix:

A=(a0a1a2am)

This fits into the matrix equation

XA=Y

where we’ve already constructed X and Y. All that’s left is to solve the equation be inverting X:

A=X1Y

To achieve this numerically, we simply take the inverse of X, X.I:

#Finding A:

A = X.I*Y
Copy to clipboard

This A matrix is a column matrix. As a bonus, if we wanted to turn it into a one-dimensional array, we can use the flatten() method:

np.array(A).flatten()
Copy to clipboard

Putting It All Together

Putting this all together into a function:

import numpy as np
import matplotlib.pyplot as plt


def least_squares(y_data, x_data):
    m = x_data.shape[0]

    X = np.matrix(np.ones((m+1, m+1)))

    #First row
    for l in range(m):
        X[0, l + 1] = np.mean(x_data[l, :])

        # Setting the values for the first column
        # remember that X[k, l] = X[l, k]
        X[l + 1, 0] = X[0, l + 1]


    # Upper triangle
    for k in range(m):
        for l in range(k, m):
            X[k + 1, l + 1] = np.mean( x_data[k, :] * x_data[l, :] )

            #Setting the value for the lower triangle
            X[l + 1, k + 1] = X[k + 1, l + 1]


    #Creating the Y column matrix:
    Y = np.matrix( np.zeros( (m + 1, 1) ) )

    #First entry
    Y[0, 0] = np.mean(y_data)

    #The remainder of the entries
    for k in range(m):
        Y[k + 1, 0] = np.mean( x_data[k, :] * y_data )
    
    #Finding A:

    A = X.I*Y
    
    return np.array(A).flatten()
Copy to clipboard
Worked Example - Applying the Solution to the Cepheid Variables Data

Now, lets apply this function to the Cepheid Variables data to find the unknown coefficients for the functional relation of the magnitude (M), period (P) and color (BV):

M=a0+a1logP+a2(BV)

We’ll unpack the data as in the Numerical Methods/Linear Regression Algorithms/Linear Least Squares Minimization example, and then pack it into the structure that is required by the function.

#Reading the data
logP, M, color = np.loadtxt('./data/cepheid_data.csv', delimiter = ',', skiprows = 1, unpack = True)

a_arr = least_squares(M, np.array([logP, color]))

for i, a in enumerate(a_arr):
    print(f'a_{i} =', '{:.3}'.format(a))
Copy to clipboard
a_0 = -2.15
a_1 = -3.12
a_2 = 1.49
Copy to clipboard

Which matches the results from the worked example above.