Simple Linear Regression

Suppose you are given a task to solve a mystery. But instead of a magnifying glass and trench coat, you are armed with data and a curious mind. Your mission is to uncover the hidden connection between two variables, to reveal the story lurking beneath the surface. Welcome to the world of simple linear regression, where data is your clue, and the regression equation is your secret decoder. Welcome to my new blog on one of the most important and widely used Supervised technique i.e., Simple Linear Regression. Supervised learning techniques, a subset of machine learning, are widely used in various real-world applications across different domains. These techniques involve training models on labelled data to make predictions or classifications. Such a technique is Simple Linear Regression, it is a fundamental statistical method, that is used to understand the relationship between two continuous variables. It establishes a linear relationship between a dependent and an independent variable. The goal is to find the best-fitting straight line, also known as the regression line or the least-squares line, that describes the relationship between these two variables. We will dive deep into the concepts in this blog. What Simple Linear Regression means? Simple Linear Regression is a supervised learning technique that is used to model the relationship between two continuous variables. It assumes that there is a linear relationship between a dependent variable Y and an independent variable X. Finding a linear equation that best fits the data is the goal. The equation for the Simple linear Regression is: Y=aX+b Where: The primary objective of simple linear regression is to estimate the values of a and b that minimize the sum of the squared differences between the observed values of Y and the values predicted by the linear equation. This is often referred to as the method of least squares. Now let us check some interesting facts of Simple Linear Regression from Google Trends: It has been observed that nearly on an average 60 times per month this word has been searched all over the world. Let us check the countries where the concept of simple linear regression is followed As we can see 100 times per month sectors in Ethiopia have searched for Simple linear Regression. From these facts we can see how these supervised learning concepts are searched all over the world. Concepts of Simple Linear Regression The steps followed in Simple Linear Regression are: OLS method in Simple Linear Regression: OLS stands for Ordinary Least Squares, and it is a method used in simple linear regression to estimate the parameters of a linear relationship between two variables. It finds the line that best fits the data by minimizing the sum of the squared residuals. Now let us solve one problem using Simple Linear Regression Supervised technique: We are taking waist circumference and adipose tissue dataset for our analysis. Here we need to predict Adipose Tissue of body based on Waist Circumference. Let us see our dataset Let us do the coding part step by step. Step 1. Importing necessary libraries and importing the dataset. This is the Dataframe we have after reading csv file. Step 2. Checking dimensions of data and getting data description. As we can see the data dimension is (109,2) that means 109 rows and 2 columns are present and parameters like mean, standard deviation, minimum, maximum etc are given for the dataset. After doing preprocessing, we are going to do model building. Step 3. Importing required library for Simple linear Regression and fitting our model. Package statsmodel is imported and ols is one of the statistical models used to get coefficients of linear regression equation that creates a relationship between dependent and independent variables. In our case Waist is the independent variable and AT is the dependent variable and ols method used to minimize the sum of square error between the observed and predicted values. Now let us check the output and get the summary of fitted model. The equation we are getting from our model is AT=3.4589*Waist-215.9815.R-Squared and Adj. R-squared determines goodness of fit.67% variability is there in the data and the model is fitting well. Step 4: Now check the predicted values and find out the error The error of 32.76 is coming. Step 5. Now let us try with some transformation. We can see R-squared value is 0.675 which has increased from earlier model. RMSE value is also bit lesser than 1st, model. We can see still many observed points are not lying on the predicted line. Step 6:Let us do polynomial transformation This model is good than the previous one, it is because here R-squared value is more. The equation of this model is AT=-7.8241+0.2289*Waist-(0.0010*Waist*Waist) Step 7: Let us check the predicted values and calculate the RMSE. As we can see most of the observed values are lying on the line, error is 32.24 which is less than other models we have seen. Thus, the model giving the least error is having polynomial transformation. We can see the errors for both train and test data are closer to each other and less. Therefore, the model is right fit model. Some Applications of Simple Linear Regression: The following are some of the areas where Simple Linear Regression is used: A lot of areas are still there where linear relationship between variables persisting. Conclusion From this blog we learnt what Simple Linear Regression model is, why it is widely used in Data Science, it’sapplication in real world and a code example with Waist circumference Adipose Tissue dataset. In conclusion, simple linear regression offers unique insights into the linear relationship between two variables, making it a valuable tool for understanding and predicting outcomes in various fields. Simple linear regression will continue to be used in the future, alongside more advanced statistical techniques, for various purposes in data analysis and decision-making, its simplicity, interpretability, and historical significance ensure that it will continue to be a valuable and relevant statistical technique in the future. Simple linear regression can