Many Python tutorials cover generators, but only few of them give out a practical example for utilizing generators. Today we want to talk about an effective and efficient code pattern using generators to process data items one by one. Below is common code pattern that uses a data structure such as list to pass data from one function to another. This forces the system to execute and process all the elements in the data_object within func_1 before they can be passed into func_2 and ultimately saved into a database.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | #common code pattern with lists data_object = [ '1' , '2' , '3' , '4' , '5' ] #common code pattern def func_1(data_object): data_list = [] for i in data_object: processed_object = some_process(i) data_list.append(processed_object) return data_list def func_2(data_list): for i in data_list: processed_item = further_process(i) save_to_database(processed_item) data_object_list = func_1(data_object) #fully executed func_2(data_object_list) #fully executed |
However, by using a generator to lazily pass in data elements, we would allow the system to process and save the items one by one. Therefore, the next process, perhaps a frontend rendering process doesn’t have to wait for all items to be processed to render the data from the database. In addition, you speed up the process by avoiding the append().
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | data_object = [ '1' , '2' , '3' , '4' , '5' ] #generator pattern def func_1(data_object): for i in data_object: processed_object = some_process(i) yield processed_object def func_2(data_generator): for i in data_generator: processed_item = more_process(i) save_to_database(processed_item) data_generator = func_1(data_object) #lazy execution func_2(data_generator) #everything is executed here one by one |
Finally, you can also utilize a generator expression in this case to simplify the code:
1 2 3 4 5 6 7 8 9 10 11 12 13 | data_object = [ '1' , '2' , '3' , '4' , '5' ] #generator expression pattern def func_1(data_object): return (some_process(i) for i in data_object) def func_2(data_generator): for i in data_generator: processed_item = more_process(i) save_to_database(processed_item) data_generator = func_1(data_object) #lazy execution func_2(data_generator) #everything is executed here one by one |
Hope you like today’s Data Hack. We will see you next time.