Data Science | Data Preprocessing with Orange Tool
In this blog, we will learn how to preprocess our data in Orange. We will perform data preprocessing tasks i.e., Discretization, Randomization, and Normalization on data with help of various Orange functions.
Discretization
Data discretization refers to a method of converting a huge number of data values into smaller ones so that the evaluation and management of data become easy. In other words, data discretization is a method of converting attributes values of continuous data into a finite set of intervals with minimum data loss. In this example, I have taken the built-in dataset provided by Orange namely brown-selected which classifies the flowers based on their characteristics. For performing discretization Discretize function is used.
import Orange
Continuation
It is the act or fact of continuing in or the prolongation of a state or activity or resumption after an interruption or something that continues, increases, or adds.
Normalization
Normalization is used to scale the data of an attribute so that it falls in a smaller range, such as -1.0 to 1.0 or 0.0 to 1.0. Normalization is generally required when we are dealing with attributes on a different scale, otherwise, it may lead to a dilution ineffectiveness of an important equally important attribute(on a lower scale) because of other attributes having values on a larger scale. We use the Normalize function to perform normalization.
Randomization
With randomization, given a data table, the preprocessor returns a new table in which the data is shuffled. Randomize function is used from the Orange library to perform randomization.
Thank you :)