12th Com Maths Part 2 Chapter 3 (Digest) Maharashtra state board

Chapter 3 Linear regression

Open with Full Screen in HD Quality

Project on Linear regression

Placeholder Image

Linear regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables by fitting a linear equation to observed data. The most basic form, simple linear regression, involves one dependent variable and one independent variable. Multiple linear regression involves one dependent variable and multiple independent variables.

Simple Linear Regression

In simple linear regression, the relationship between the dependent variable yy and the independent variable xx is modeled with the equation:

y=β0+β1x+ϵy = \beta_0 + \beta_1 x + \epsilon

where:

  • yy is the dependent variable.
  • xx is the independent variable.
  • β0\beta_0 is the y-intercept (the value of yy when x=0x = 0).
  • β1\beta_1 is the slope of the line (the change in yy for a one-unit change in xx).
  • ϵ\epsilon is the error term (the difference between the observed and predicted values of yy).

Multiple Linear Regression

In multiple linear regression, the relationship involves more than one independent variable:

y=β0+β1x1+β2x2++βpxp+ϵy = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_p x_p + \epsilon

where:

  • yy is the dependent variable.
  • x1,x2,,xpx_1, x_2, \ldots, x_p are the independent variables.
  • β0\beta_0 is the y-intercept.
  • β1,β2,,βp\beta_1, \beta_2, \ldots, \beta_p are the coefficients of the independent variables.
  • ϵ\epsilon is the error term.

Objective

The objective of linear regression is to find the best-fitting line (or hyperplane in the case of multiple regression) that minimizes the sum of the squared differences between the observed values and the values predicted by the linear model. This is known as the least squares criterion.

Finding the Coefficients

The coefficients β0\beta_0 and β1\beta_1 (in simple linear regression) or β0,β1,β2,,βp\beta_0, \beta_1, \beta_2, \ldots, \beta_p (in multiple linear regression) are determined using the method of least squares. The formulas for these coefficients minimize the sum of squared residuals (the differences between observed and predicted values).

In matrix notation for multiple linear regression, the equation is written as:

y=Xβ+ϵ\mathbf{y} = \mathbf{X} \boldsymbol{\beta} + \boldsymbol{\epsilon}

where:

  • y\mathbf{y} is an n×1n \times 1 vector of the dependent variable.
  • X\mathbf{X} is an n×(p+1)n \times (p+1) matrix of the independent variables (including a column of ones for the intercept).
  • β\boldsymbol{\beta} is a (p+1)×1(p+1) \times 1 vector of the coefficients.
  • ϵ\boldsymbol{\epsilon} is an n×1n \times 1 vector of the error terms.

The least squares solution is given by:

β=(XTX)1XTy\boldsymbol{\beta} = (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{y}

Assumptions of Linear Regression

  1. Linearity: The relationship between the dependent and independent variables is linear.
  2. Independence: Observations are independent of each other.
  3. Homoscedasticity: The variance of the residuals is constant across all levels of the independent variable(s).
  4. Normality: The residuals (errors) are normally distributed.

Evaluating the Model

To evaluate the performance of a linear regression model, several metrics are commonly used:

  • R-squared: Indicates the proportion of the variance in the dependent variable that is predictable from the independent variable(s).
  • Adjusted R-squared: Adjusts the R-squared value based on the number of predictors in the model.
  • Mean Squared Error (MSE): The average of the squares of the residuals.
  • Root Mean Squared Error (RMSE): The square root of the MSE, providing a measure of the average magnitude of the errors.

Conclusion

Linear regression is a fundamental technique in statistics and machine learning for modeling and analyzing relationships between variables. Its simplicity and interpretability make it a widely used method for predictive modeling and data analysis.