Data Science | Data Preprocessing with Orange Tool

AMI SAVALIYA
2 min readOct 26, 2021

In this blog, we will learn how to preprocess our data in Orange. We will perform data preprocessing tasks i.e., Discretization, Randomization, and Normalization on data with help of various Orange functions.

Discretization

Data discretization refers to a method of converting a huge number of data values into smaller ones so that the evaluation and management of data become easy. In other words, data discretization is a method of converting attributes values of continuous data into a finite set of intervals with minimum data loss. In this example, I have taken the built-in dataset provided by Orange namely brown-selected which classifies the flowers based on their characteristics. For performing discretization Discretize function is used.
import Orange

Continuation

It is the act or fact of continuing in or the prolongation of a state or activity or resumption after an interruption or something that continues, increases, or adds.

Normalization

Normalization is used to scale the data of an attribute so that it falls in a smaller range, such as -1.0 to 1.0 or 0.0 to 1.0. Normalization is generally required when we are dealing with attributes on a different scale, otherwise, it may lead to a dilution ineffectiveness of an important equally important attribute(on a lower scale) because of other attributes having values on a larger scale. We use the Normalize function to perform normalization.

Randomization

With randomization, given a data table, the preprocessor returns a new table in which the data is shuffled. Randomize function is used from the Orange library to perform randomization.

Thank you :)

--

--