Many Python tutorials cover generators, but only few of them give out a practical example for utilizing generators. Today we want to talk about an effective and efficient code pattern using generators to process data items one by one. Below is common code pattern that uses a data structure such as list to pass data from one function to another. This forces the system to execute and process all the elements in the data_object within func_1 before they can be passed into func_2 and ultimately saved into a database.
#common code pattern with lists
data_object = ['1', '2', '3', '4', '5']
#common code pattern
def func_1(data_object):
data_list = []
for i in data_object:
processed_object = some_process(i)
data_list.append(processed_object)
return data_list
def func_2(data_list):
for i in data_list:
processed_item = further_process(i)
save_to_database(processed_item)
data_object_list = func_1(data_object) #fully executed
func_2(data_object_list) #fully executed
However, by using a generator to lazily pass in data elements, we would allow the system to process and save the items one by one. Therefore, the next process, perhaps a frontend rendering process doesn’t have to wait for all items to be processed to render the data from the database. In addition, you speed up the process by avoiding the append().
data_object = ['1', '2', '3', '4', '5']
#generator pattern
def func_1(data_object):
for i in data_object:
processed_object = some_process(i)
yield processed_object
def func_2(data_generator):
for i in data_generator:
processed_item = more_process(i)
save_to_database(processed_item)
data_generator = func_1(data_object) #lazy execution
func_2(data_generator) #everything is executed here one by one
Finally, you can also utilize a generator expression in this case to simplify the code:
data_object = ['1', '2', '3', '4', '5']
#generator expression pattern
def func_1(data_object):
return (some_process(i) for i in data_object)
def func_2(data_generator):
for i in data_generator:
processed_item = more_process(i)
save_to_database(processed_item)
data_generator = func_1(data_object) #lazy execution
func_2(data_generator) #everything is executed here one by one
Hope you like today’s Data Hack. We will see you next time.