One of the most popular Machine-Leaning course is Andrew Ng's machine learning course in Coursera offered by Stanford University. I tried a few other machine learning courses before but I thought he is the best to break the concepts into pieces make them very understandable.
https://cit.instructure.com/eportfolios/9951/Home/_REGARDERAFTER_CHAPITRE_2_Streaming_vF_2020FILM__After_2_Film_Streaming_VF_et_VOSTFR
But I think, there is just only one problem. That is, all the assignments and instructions are in Matlab. I am a Python user and did not want to learn Matlab. So, I just learned the concepts from the lectures and developed all the algorithms in Python.
https://cit.instructure.com/eportfolios/9954/Home/_REGARDERAFTER_CHAPITRE_2_Streaming_vF_2020FILM__After_2_Film_Streaming_VF_et_VOSTFR
I explained all the algorithms in my own way(as simply as I could) and demonstrated the development of almost all the algorithms in the different articles before. I thought I should summarise them all on one page so that if anyone wants to follow, it is easier for them. Sometimes a little help goes a long way.
https://cit.instructure.com/eportfolios/9955/Home/_REGARDERAFTER_CHAPITRE_2_Streaming_vF2020FILM__After_2_Film_Streaming_VF_en_VOSTFR
If you want to take Andrew Ng's Machine Learning course, you can audit the complete course for free as many times as you want.
https://cit.instructure.com/eportfolios/9957/Home/STREAMINgAFTER_Streaming_vf_2020_FilmComplet_AFTER_2__Streaming_VF_en_HD_en_VOSTFR_AFTER_WE_COLLIDED
Let's dive in!
Linear Regression
The most basic machine learning algorithm. This algorithm is based on the very basic straight line formula we all learned in school:
Y = AX + B
Remember? If not, no problem. This is a very simple formula. Here is the complete article that explains how this simple formula can be used to make predictions.
Linear Regression Algorithm from Scratch in Python: Step by Step
The most basic machine learning algorithm has to be the linear regression algorithm with a single variable.
https://online.branded-edu.com/eportfolios/13135/Home/STREAMINgMANK_Streaming_vf_2020_FilmComplet_MANK__Streaming_VF_en_HD_en_VOSTFR_MONK
Nowadays, there are so many advanced machine learning algorithms, libraries, and techniques available that linear regression may seem to be not important. But It is always a good idea to learn the basics. That way you will grasp the concepts very clearly.
https://online.branded-edu.com/eportfolios/13140/Home/STREAMINgFAVOLACCE_Streaming_vf_2020_FilmComplet_FAVOLACCE__Streaming_VF_en_HD_en_VOSTFR_FAVOLACCE
In this article, I will explain the linear regression algorithm step by step.
Ideas and Formulas
Linear regression uses the very basic idea of prediction. Here is the formula:
Y = C + BX.
https://online.branded-edu.com/eportfolios/13142/Home/STREAMINgTHE_FROM_Streaming_vf_2020_FilmComplet_THE_FROM__Streaming_VF_en_HD_en_VOSTFR_THE_FROM
We all learned this formula in school. Just to remind you, this is the equation of a straight line. Here, Y is the dependent variable, B is the slope and C is the intercept. Typically, for linear regression, it is written as:
Image for post.
https://online.branded-edu.com/eportfolios/13144/Home/STREAMINgJE_VEUX_JUSTE_EN_FINIR_Streaming_vf_2020_FilmComplet_JE_VEUX_JUSTE_EN_FINIR__Streaming_VF_en_HD_en_VOSTFR
Here, 'h' is the hypothesis or the predicted dependent variable, X is the input feature, and theta0 and theta1 are the coefficients. Theta values are initialized randomly to start with. Then using gradient descent, we will update the theta value to minimize the cost function. Here is the explanation of cost function and gradient descent.
https://online.branded-edu.com/eportfolios/13146/Home/STREAMINgTHE_LIGHTHOUSE_Streaming_vf_2020_FilmComplet_THE_LIGHTHOUSE__Streaming_VF_en_HD_en_VOSTFR
Cost Function and Gradient Descent
The cost function determines how far the prediction is from the original dependent variable. Here is the formula for that
Image for post
The idea of any machine learning algorithm is to minimize the cost function so that the hypothesis is close to the original dependent variable. We need to optimize the theta value to do that. If we take the partial derivative of the cost function based on theta0 and theta1 respectively, we will get the gradient descent. To update the theta values we need to deduct the gradient descent from the corresponding theta values:
Image for post
After the partial derivative, the formulas above will turn out to be:
Image for post
Here, m is the number of training data and alpha is the learning rate. I am talking about one variable linear regression. That's why I have only two theta values. If there are many variables, there will be theta values for each variable.
Working Example
The dataset I am going to use is from Andrew Ng's machine learning course in Coursera. Here is the process of implementing a linear regression step by step in Python.
Import the packages and the dataset.
import numpy as np
import pandas as pd
df = pd.read_csv('ex1data1.txt', header = None)
df.head()
Image for post
In this dataset, column zero is the input feature and column 1 is the output variable or dependent variable. We will use column 0 to predict column 1 using the straight-line formula above.
- Plot column 1 against column 0.
Image for post
The relation between the input variable and the output variable is linear. Linear regression works best when the relationship is linear. - Initialize the theta values. I am initializing the theta values as zeros. But any other values should also work as well.
theta = [0,0] - Define the hypothesis and the cost function as per the formulas discussed before.
def hypothesis(theta, X):
return theta[0] + theta[1]Xdef cost_calc(theta, X, y):
return (1/2m) * np.sum((hypothesis(theta, X) - y)**2) - Calculate the number of training data as the length of the DataFrame. And then define the function for gradient descent. In this function, we will update the theta values until the cost function is it's minimum. It may take any number of iteration. In each iteration, it will update the theta values and with each updated theta values we will calculate the cost to keep track of the cost.
m = len(df)
def gradient_descent(theta, X, y, epoch, alpha):
cost = []
i = 0
while i < epoch:
hx = hypothesis(theta, X)
theta[0] -= alpha*(sum(hx-y)/m)
theta[1] -= (alpha * np.sum((hx - y) * X))/m
cost.append(cost_calc(theta, X, y))
i += 1
return theta, cost - Finally, define the predict function. It will get the updated theta from gradient descent function and predict the hypothesis or the predicted output variable.
def predict(theta, X, y, epoch, alpha):
theta, cost = gradient_descent(theta, X, y, epoch, alpha)
return hypothesis(theta, X), cost, theta - Using the predict function, find the hypothesis, cost, and updated theta values. I choose the learning rate as 0.01 and I will run this algorithm for 2000 epochs or iterations.
y_predict, cost, theta = predict(theta, df[0], df[1], 2000, 0.01)
The final theta values are -3.79 and 1.18. - Plot the original y and the hypothesis or the predicted y in the same graph.
%matplotlib inline
import matplotlib.pyplot as plt
plt.figure()
plt.scatter(df[0], df[1], label = 'Original y')
plt.scatter(df[0], y_predict, label = 'predicted y')
plt.legend(loc = "upper left")
plt.xlabel("input feature")
plt.ylabel("Original and Predicted Output")
plt.show()
Image for post
The hypothesis plot is a straight line as expected from the formula and the line is passing through in an optimum position. - Remember, we kept track of the cost function in each iteration. Let's plot the cost function.
plt.figure()
plt.scatter(range(0, len(cost)), cost)
plt.show()
Image for post
As I mentioned before, our purpose was to optimize the theta values to minimize the cost. As you can see from this graph, the cost went down drastically in the beginning and then it became stable. That means the theta values are optimized correctly as we expected.
I hope this was helpful. Here is the link to the dataset used in this article: