Today we begin learning about another machine learning algorithm in scikit-learn: the Decision Tree classifier.
I introduce and explain a few concepts of Decisions Trees. Then, we create a classifier for the cancer dataset, train it and evaluate its performance. As you will see, by default the tree is overfitting, meaning that the accuracy on the training subset of the data is 100%.
We want to avoid overfitting and to do that we'll have to modify the default parameters and retrain the algorithm. See the video walkthrough below.
As a reminder:
In this series I'm going to explore the cancer dataset that comes pre-loaded with scikit-learn. The purpose is to train the classifiers on this dataset, which consists of labeled data: ~569 tumor samples, each labeled malignant or benign, and then use them on new, unlabeled data.
Previous videos in this series:
- Machine Learning on a Cancer Dataset - Part 1
- Machine Learning on a Cancer Dataset - Part 2
- Machine Learning on a Cancer Dataset - Part 3
- Machine Learning on a Cancer Dataset - Part 4
- Machine Learning on a Cancer Dataset - Part 5
- Machine Learning on a Cancer Dataset - Part 6
- Machine Learning on a Cancer Dataset - Part 7
- Machine Learning on a Cancer Dataset - Part 8
- Machine Learning on a Cancer Dataset - Part 9
- Machine Learning on a Cancer Dataset - Part 10
To stay in touch with me, follow @cristi
#machine-learning #science #python
Cristi Vlad, Self-Experimenter and Author
Great! Thank you :-)
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
Keep up the good work!
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit