Machine Learning on a Cancer Dataset - Part 11

in machine-learning •  8 years ago 

Today we begin learning about another machine learning algorithm in scikit-learn: the Decision Tree classifier.

I introduce and explain a few concepts of Decisions Trees. Then, we create a classifier for the cancer dataset, train it and evaluate its performance. As you will see, by default the tree is overfitting, meaning that the accuracy on the training subset of the data is 100%.

We want to avoid overfitting and to do that we'll have to modify the default parameters and retrain the algorithm. See the video walkthrough below.


As a reminder:

In this series I'm going to explore the cancer dataset that comes pre-loaded with scikit-learn. The purpose is to train the classifiers on this dataset, which consists of labeled data: ~569 tumor samples, each labeled malignant or benign, and then use them on new, unlabeled data.


Previous videos in this series:

  1. Machine Learning on a Cancer Dataset - Part 1
  2. Machine Learning on a Cancer Dataset - Part 2
  3. Machine Learning on a Cancer Dataset - Part 3
  4. Machine Learning on a Cancer Dataset - Part 4
  5. Machine Learning on a Cancer Dataset - Part 5
  6. Machine Learning on a Cancer Dataset - Part 6
  7. Machine Learning on a Cancer Dataset - Part 7
  8. Machine Learning on a Cancer Dataset - Part 8
  9. Machine Learning on a Cancer Dataset - Part 9
  10. Machine Learning on a Cancer Dataset - Part 10


To stay in touch with me, follow @cristi

#machine-learning #science #python


Cristi Vlad, Self-Experimenter and Author

Authors get paid when people like you upvote their post.
If you enjoyed what you read here, create your account today and start earning FREE STEEM!
Sort Order:  

Great! Thank you :-)

Keep up the good work!