I want to create a code that reads several pandas data frames asynchronously, for example from a CSV file (or from a database)
I wrote the following code, assuming that it should import the two data frames faster, however it seems to do it slower:
import timeit import pandas as pd import asyncio train_to_save = pd.DataFrame(data={'feature1': [1, 2, 3],'period': [1, 1, 1]}) test_to_save = pd.DataFrame(data={'feature1': [1, 4, 12],'period': [2, 2, 2]}) train_to_save.to_csv('train.csv') test_to_save.to_csv('test.csv') async def run_async_train(): return pd.read_csv('train.csv') async def run_async_test(): return pd.read_csv('test.csv') async def run_train_test_asinc(): df = await asyncio.gather(run_async_train(), run_async_test()) return df start_async = timeit.default_timer() async_train,async_test=asyncio.run(run_train_test_asinc()) finish_async = timeit.default_timer() time_to_run_async=finish_async-start_async start = timeit.default_timer() train=pd.read_csv('train.csv') test = pd.read_csv('test.csv') finish = timeit.default_timer() time_to_run_without_async = finish - start print(time_to_run_async<time_to_run_without_async) Why does it read the two data frames faster in the non-async version?
Just to make it clear, I'm really going to read the data from Bigquery so im really interested in speeding both requests (train & test) using the code above.
Thanks in advance!