Linear Regression
Contents
Linear Regression#
In linear regression the relationship between a dependent variable and one or more independent variables is modelled by a linear relation, such as:
where \(y\) is the dependent variable, the \(x_j\) are the independent variables, and the \(a_j\) are scalar parameters. Note that it is the linearity of the \(a_j\) parameters that is important, any non-linear combination of \(x_j\) that do not cause a non-linear combination of \(a_j\) can be dealt with by redefining the \(x_j\) variables. The \(a_j\) parameters are often unkown and need to be determined from paired measurements of the \(y\) and \(x_j\) variables.
Statistical Notation#
Going forward we will use some statistical notation to clean up our equations.
For a data set of values for a variable \(x\), \(x_i\) (\(i = 1, 2, 3, \dots, N\)):
Expected value of \(f(x)\) (which can be considered the average of \(f(x)\)):
\[ \langle f(x) \rangle = \frac{1}{N} \sum_{i=1}^{N} f(x_i) \]Expected value of \(x\) is the mean of the data set:
\[ \langle x \rangle = \frac{1}{N} \sum_{i=1}^{N} x_i \]Standard deviation (which is an indication of the spread of the data):
\[ \sigma = \sqrt{\frac{\sum_{i=1}^{N} (x_i - \langle x \rangle)^2}{N}} \]