This is the fourth video tutorial on support vector machines (SVMs) with scikit-learn on the cancer dataset. In the last video, we used a support vector classifier (SVC) with an RBF kernel and with all default parameters and we trained it on our dataset.
We noticed how it overfits the training data (by getting 100% performance) and how it poorly performs on the test subset. There are several reasons that could lead to the decreased performance of the algorithm. Some of them include the scale of the data and also the adjusting of the hyper-parameters.
In this video we're gonna use matplotlib to visualize our data and to understand what it means for it to be unscaled - the difference in orders of magnitude between the values of each feature and the difference in magnitude in-between features.
Then, in the next tutorial we're gonna try to remediate this issue by scaling the data. Please watch the video below for the full scoop.
As a reminder:
In this series I'm going to explore the cancer dataset that comes pre-loaded with scikit-learn. The purpose is to train the classifiers on this dataset, which consists of labeled data: ~569 tumor samples, each labeled malignant or benign, and then use them on new, unlabeled data.
Previous videos in this series:
- Machine Learning on a Cancer Dataset - Part 20
- Machine Learning on a Cancer Dataset - Part 21
- Machine Learning on a Cancer Dataset - Part 22
- Machine Learning on a Cancer Dataset - Part 23
- Machine Learning on a Cancer Dataset - Part 24
- Machine Learning on a Cancer Dataset - Part 25
- Machine Learning on a Cancer Dataset - Part 26
- Machine Learning on a Cancer Dataset - Part 27
To stay in touch with me, follow @cristi
Cristi Vlad, Self-Experimenter and Author
Very Cool Post!
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
thanks!
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
I'm discovering your series just right now. Have you every done any videos about logistic regression?
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
yes, there are 3 in the current series. for the first one, look here.
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
Thank you!
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
Extremely interesting @cristi! I looked into a summer research project modeling brain tumor growth. But I wasn't allowed to due to a lack of ethics training and obtainable data. I'm just wondering what your position on developing a machine-learning model to help predict tumor growth is?
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
I think it would be difficult, but approachable...
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
I agree, it is definitely approachable, I like to think very achievable. But the problem with coming up with a prediction for tumor growth for some patient 'A' is that your algorithm might be structured correctly and work great but the data you've/I/we used is biased and therefore 'A' wouldn't receive an inaccurate prediction as to how much their tumor might grow. Consequentially 'A' could be given treatment that is too heavy and unnecessary or too low and ineffective.
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
nice project, I think that's very good
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
This post received a 4.3% upvote from @randowhale thanks to @cristi! For more information, click here!
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
Great post. Check out my profile if you have time :)
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit