Least Squares

Least Squares
Go back to  'Data'

In this section, we’re going to explore least squares, understand what it means, learn the general formula, steps to plot it on a graph, know what are its limitations, and see what tricks we can use with least squares.

But before we get into those details, let's peek into Ms. Dolma's class. 

Ms. Dolma said in the class "Hey, students who spend more time on their assignments are getting better grades".

Teacher and students in the classroom

A student wants to estimate his grade for spending 2.3 hours on an assignment.

Through the magic of the least squares method, it is possible to determine the predictive model that will help him estimate the grades far more accurately.

This method is much simpler because it requires nothing more than some data and maybe a calculator.

We are all set to learn everything about least squares.

So, let's start!

Lesson Plan

What Is Meant by the Least Squares Method?

The least squares method is a statistical method used to find the line of best fit of the form \(y=mx+b\) to the given data.

For example, we have 4 data points, and using this method we arrive at the following graph.

Grapg of the line y=mx+c

The curve of the equation of the least squares is called the regression line.

Our main objective in this method is to reduce the sum of the squares of errors as much as possible.

This is the reason this method is called the "Least Squares Method".


What Is the General Formula For the Least Squares Method?

The least squares method is used to find a linear line of the form \(y=mx+b\).

Here, \(y\) and \(x\) are variables, \(m\) is the slope and \(b\) is the \(y\)-intercept.

Follow the steps mentioned below.

Step 1

Draw a table with 4 columns where the first two columns are for \(X\) and \(Y\) points.

Step 2

In the next two columns, find \(XY\) and \(X^{2}\).

Step 3

Find \(\sum X\), \(\sum Y\), \(\sum (XY)\), and \(\sum (X^{2})\).

Step 4

Find the value of slope \(m\) using the formula,

\(m=\dfrac{n\sum (XY)-\sum Y \sum X}{n\sum (X^{2})-(\sum X)^{2}}\)

Here, \(n\) is the number of data points.

Step 5

Calculate the value of \(b\) using the formula,
\(b=\dfrac{\sum Y-m\sum X}{n}\)

Step 6

Substitute the values of \(m\) and \(b\) in the equation \(y=mx+b\).

Example

Let's say we have data as shown below.

\(X\) 1 2 3 4 5
\(Y\) 2 5 3 8 7


We will follow the steps to find the linear line.
 

\(X\) \(Y\) \(XY\) \(X^{2}\)
1 2 2 1
2 5 10 4
3 3 9 9
4 8 32 16
5 7 35 25
\(\sum X\)=\(15\) \(\sum Y\)=\(25\) \(\sum XY\)=\(88\) \(\sum X^{2}\)=\(55\)


Find the value of \(m\).

\[\begin{align}m&=\dfrac{n\sum (XY)-\sum Y \sum X}{n\sum (X^{2})-(\sum X)^{2}}\\m&=\dfrac{5(88)-(15\times 25)}{5(55)-(15)^{2}}\\m&=\dfrac{13}{10}\end{align}\]

Find the value of \(b\).

\[\begin{align}b&=\dfrac{\sum Y-m\sum X}{n}\\b&=\dfrac{25-(1.3 \times 15)}{5}\\b&=\dfrac{11}{10}\end{align}\] 

So, the required equation of least squares is \(y=\dfrac{13}{10}x+\dfrac{11}{10}\).


Graphical Representation of Least Squares Method

Look at the graph shown below.

Graph of the regression line

The straight line shows the potential relationship between the independent variable and the dependent variable.

Our ultimate goal of this method is to reduce this difference between the observed response and the response predicted by the regression line.

Less residual means that the model fits better.

Example

Experiment with the calculator of least squares given below to find the regression line.

Give the data points and find the line that best fits your data.

We hope this calculator of least squares served you well to find the regression line.


What Are the Limitations of the Least Squares Method?

Even though the least squares method is considered the best method to find the line of best fit, it has few limitations.

Examples

  • This method exhibits only the relationship between the two variables. All other causes and effects are not taken into consideration.
  • This method is unreliable when data is not evenly distributed.
  • This method is very sensitive to outliers. In fact, this can skew the results of the least squares analysis.
 
important notes to remember
Important Notes
  1. The least squares method is used to predict the behavior of the dependent variable with respect to the independent variable.

  2. The sum of the squares of errors is called variance.

  3. The main aim of the least squares method is to minimize the sum of the squared errors.

Solved Examples

Example 1

 

 

Consider the set of points: \((1, 1)\), \((-2,-1)\), and \((3, 2)\).

Plot these points and the least-squares regression line in the same graph.

Solution

There are three points, so the value of \(n\) is 3

\(X\) \(Y\) \(XY\) \(X^{2}\)
1 1 1 1
-2 -1 2 4
3 2 6 9
\(\sum X\)=\(2\) \(\sum Y\)=\(2\) \(\sum XY\)=\(9\) \(\sum X^{2}\)=\(14\)


Now, find the value of \(m\).

\[\begin{align}m&=\dfrac{n\sum (XY)-\sum Y \sum X}{n\sum (X^{2})-(\sum X)^{2}}\\m&=\dfrac{3(9)-(2\times 2)}{3(14)-(2)^{2}}\\m&=\dfrac{27-4}{42-4}\\m&=\dfrac{23}{38}\end{align}\]

Now, find the value of \(b\).

\[\begin{align}b&=\dfrac{\sum Y-m\sum X}{n}\\b&=\dfrac{2-\left(\dfrac{23}{38}\right)\times 2}{3}\\b&=\dfrac{2-\dfrac{23}{19}}{3}\\b&=\dfrac{15}{3 \times 19}\\b&=\dfrac{5}{19}\end{align}\]

So, the required equation of least squares is \(y=\dfrac{23}{38}x+\dfrac{5}{19}\).

The required graph is shown below.

Graph of the regression line

The equation of regression line is \(y=\dfrac{23}{38}x+\dfrac{5}{19}\).
Example 2

 

 

Consider the set of points: \((-1, 0)\), \((0, 2)\), \((1, 4)\), and \((k, 5)\).

The values of slope and \(y\)-intercept in the equation of least squares are 1.7 and 1.9 respectively.

Can you determine the value of \(k\)? 

Solution

Here, there are four data points.

So, the value of \(n\) is 4

The slope of the least squares line, \(m=1.7\)

The value of \(y\)-intercept of the least-squares line, \(b=1.9\)

\(X\) \(Y\)
-1 0
0 2
1 4
k 5
\(\sum X=k\) \(\sum Y=11\)


Now, to evaluate the value of unknown \(k\), substitute \(m=1.7\), \(b=1.9\), \(\sum X=k\), and \(\sum Y=11\) in the formula \(b=\dfrac{\sum Y-m\sum X}{n}\).

\[\begin{align}b&=\dfrac{\sum Y-m\sum X}{n}\\1.9&=\dfrac{11-1.7k}{4}\\1.9\times 4&=11-1.7k\\1.7k&=11-7.6\\k&=\dfrac{3.4}{1.7}\\k&=2\end{align}\]

So, the value of \(k\) is 2
Example 3

 

 

The following data shows the sales (in million dollars) of a company.

\(X\) 2015 2016 2017 2018 2019
\(Y\) 12 19 29 37 45


Can you estimate the sales in the year 2020 using the regression line?

Solution

Here, there are 5 data points.

So, the value of \(n\) is 5

We will make the use of substitution \(t=x-2015\) to make the given data manageable.

Here, \(t\) represents the number of years after 2015

\(X\) \(Y\) \(XY\) \(X^{2}\)
0 12 0
1 19 19 1
2 29 58 4
3 37 111 9
4 45 180 16
\(\sum X\)=\(10\) \(\sum Y\)=\(142\) \(\sum XY\)=\(368\) \(\sum X^{2}\)=\(30\)


Now, find the value of \(m\).

\[\begin{align}m&=\dfrac{n\sum (XY)-\sum Y \sum X}{n\sum (X^{2})-(\sum X)^{2}}\\m&=\dfrac{5(368)-(142\times 10)}{5(30)-(10)^{2}}\\m&=\dfrac{1840-1420}{150-100}\\m&=\dfrac{42}{5}\\m&=8.4\end{align}\]

Now, find the value of \(b\).

\[\begin{align}b&=\dfrac{\sum Y-m\sum X}{n}\\b&=\dfrac{142-8.4\cdot10}{5}\\b&=\dfrac{142-84}{5}\\b&=11.6\end{align}\]

So, the equation of least squares is \(y(t)=8.4t+11.6\).

Now, for the year 2020, the value of \(t\) is \(2020-2015=5\)

The estimation of the sales in the year 2020 is given by substituting 5 for \(t\) in the equation \(y(t)=8.4t+11.6\).

\[\begin{align}y(t)&=8.4t+11.6\\y(5)&=8.4(5)+11.6\\&=42+11.6\\&=53.6\end{align}\]

The predicted number of sales in the year 2020 is \(\mathrm{\$53.6\;\text{ million}}\).
 
Thinking out of the box
Think Tank
1. Can you construct the equation of least squares using the information of \(X\) and \(Y\) values given below?
  \(n=7\), \(\sum X=113\), \(\sum Y=182\)
  \(\sum XY=3186\), \(\sum X^{2}=1983\)

Interactive Questions

Here are a few activities for you to practice.

Select/type your answer and click the "Check Answer" button to see the result.

 

 
 
 
 
 

Let's Summarize

We hope you enjoyed learning about the Least Squares with the examples and practice questions. Now, you will be able to easily solve problems on the formula for the least squares, calculator of least squares, and examples on least squares.

About Cuemath

At Cuemath, our team of math experts is dedicated to making learning fun for our favorite readers, the students!

Through an interactive and engaging learning-teaching-learning approach, the teachers explore all angles of a topic.

Be it worksheets, online classes, doubt sessions, or any other form of relation, it’s the logical thinking and smart learning approach that we, at Cuemath, believe in.


Frequently Asked Questions (FAQs)

1. What is ordinary least squares used for?

The ordinary least squares method is used to find the predictive model that best fits our data points.

2.  Is least squares the same as linear regression?

No, linear regression and least squares are not the same.

Linear regression is the analysis of statistical data to predict the value of the quantitative variable.

Least squares is one of the methods used in the linear regression to find the predictive model.

3. How do outliers affect the least squares regression line?

The presence of unusual data points can skew the results of the linear regression.

This makes the validity of the model very critical to obtain sound answers to the questions motivating the formation of the predictive model.

  
More Important Topics
Numbers
Algebra
Geometry
Measurement
Money
Data
Trigonometry
Calculus