Python: Building Data Driven Classes with Dataclass

August 17, 2021

Dataclass is a built-in module since the introduction of Python 3.7, which helps developers to build data-driven classes that come with data related functionalities. As Python is the main programming language for many data professionals, Python classes are often written to handle data rather than behavior. The introduction of Dataclass then perhaps serves a great way to differentiate data-driven classes from behavior-driven classes; in this way, a data class becomes almost like a data container. Today, we want to introduce the concept of how to build a data class and some of the key features associated with the module.

Basic Build:

Below, please see a basic build of a dataclass building on top of an example from Python documentation.

Key Feature: Self-Generated Special Methods

With the help of the decorator @dataclass, there is no need to write many methods, including the __init__() method, the __repr__() method. In other words, there is no need to write statements like  “self.name = name”. And when you print the instance, it will show the data that the instance  contains.

Key Feature: Frozen Instance

If you would like to set the instance attributes as read only or not changeable, Dataclass offers you the capability to freeze the instance. This allows to further regulate an instance that contains data.

If you would like to set the instance attributes as read only or not changeable, Dataclass offers you the capability to freeze the instance. This allows to further regulate an instance that contains data.

Key Feature: Default Factory Functions

With the help of Dataclass, now we can also use a mutable data structure as a default argument, which previously would create bugs.