Machine Learning with Scikit-Learn - [Part 37]

in machine-learning •  7 years ago 

In this machine learning tutorial we're looking into another method of preprocessing, which is scaling.

In many ML projects, it often happens that features require some kind of preprocessing before being fed to the algorithm to learn the data. Some algorithms have specific requirements for preprocessing, so it's not like you can do it optionally, but it's that you're required to do it so that your algorithm be trained efficiently.

With scaling in scikit-learn, there are several scalers that could be implemented on the data, depending on the context. Some of these can be:

  • StandardScaler
  • MinMaxScaler
  • Normalizer

In our example we're applying the MinMaxScaler on our dataset. Its purpose is to bring the data within the 0 to 1 range. Please see the video below for the full tutorial.



To stay in touch with me, follow @cristi


Cristi Vlad Self-Experimenter and Author

Authors get paid when people like you upvote their post.
If you enjoyed what you read here, create your account today and start earning FREE STEEM!
Sort Order:  

Not quite sure how to phrase this question, but here goes. For feature data with important outliers, would it make more sense to log transform the entire dataset before applying this scaler, or is something else recommended?

You would use L2 Normalization which takes into account your outliers. This is covered in #38.

Awesome. Thanks!

Verry good.