In this machine learning tutorial we're looking into another method of preprocessing, which is scaling.
In many ML projects, it often happens that features require some kind of preprocessing before being fed to the algorithm to learn the data. Some algorithms have specific requirements for preprocessing, so it's not like you can do it optionally, but it's that you're required to do it so that your algorithm be trained efficiently.
With scaling in scikit-learn, there are several scalers that could be implemented on the data, depending on the context. Some of these can be:
- StandardScaler
- MinMaxScaler
- Normalizer
In our example we're applying the MinMaxScaler on our dataset. Its purpose is to bring the data within the 0 to 1 range. Please see the video below for the full tutorial.
To stay in touch with me, follow @cristi
Cristi Vlad Self-Experimenter and Author
Not quite sure how to phrase this question, but here goes. For feature data with important outliers, would it make more sense to log transform the entire dataset before applying this scaler, or is something else recommended?
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
You would use L2 Normalization which takes into account your outliers. This is covered in #38.
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
Awesome. Thanks!
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
Verry good.
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit