convert_a_dictionary_of_dataframes_to_a_big_dataframe
Table of Contents
Convert a dictionary of dataframes to a big dataframe
Task
Given a dictionary of strings to dataframes, create an expanded dataframe by putting the keys into their own column. For example, given
{'COST': fiscal_quarter_end reported_date 0 202102 20210304 1 202105 20210527 2 202108 20210923 3 202111 20211209, 'CPRT': fiscal_quarter_end reported_date 0 202104 20210519 1 202107 20210908 2 202110 20211117 3 202201 20220216}
We want
ticker fiscal_quarter_end reported_date 0 COST 202102 20210304 1 COST 202105 20210527 2 COST 202108 20210923 3 COST 202111 20211209 4 CPRT 202104 20210519 5 CPRT 202107 20210908 6 CPRT 202110 20211117 7 CPRT 202201 20220216
Solution
def dict_df_to_df(dict_df, key_col_name): df = ( pd.concat(dict_df, axis=0)\ .reset_index()\ .drop('level_1', axis=1)\ .rename({'level_0':key_col_name}, axis=1) ) return df
Using the example above
$ ipython In [1]: import pandas as pd costco_earnings = pd.DataFrame({ 'fiscal_quarter_end': [202102, 202105, 202108, 202111], 'reported_date': [20210304, 20210527, 20210923, 20211209] }) costco_earnings Out[1]: fiscal_quarter_end reported_date 0 202102 20210304 1 202105 20210527 2 202108 20210923 3 202111 20211209 In [2]: copart_earnings = pd.DataFrame({ 'fiscal_quarter_end': [202104, 202107, 202110, 202201], 'reported_date': [20210519, 20210908, 20211117, 20220216] }) copart_earnings Out[2]: fiscal_quarter_end reported_date 0 202104 20210519 1 202107 20210908 2 202110 20211117 3 202201 20220216 In [3]: dict_df = {'COST': costco_earnings, 'CPRT': copart_earnings} dict_df Out[3]: {'COST': fiscal_quarter_end reported_date 0 202102 20210304 1 202105 20210527 2 202108 20210923 3 202111 20211209, 'CPRT': fiscal_quarter_end reported_date 0 202104 20210519 1 202107 20210908 2 202110 20211117 3 202201 20220216}
In [4]: def dict_df_to_df(dict_df, key_col_name): df = ( pd.concat(dict_df, axis=0)\ .reset_index()\ .drop('level_1', axis=1)\ .rename({'level_0':key_col_name}, axis=1) ) return df
In [5]: expanded_df = dict_df_to_df(dict_df, 'ticker') expanded_df Out[5]: ticker fiscal_quarter_end reported_date 0 COST 202102 20210304 1 COST 202105 20210527 2 COST 202108 20210923 3 COST 202111 20211209 4 CPRT 202104 20210519 5 CPRT 202107 20210908 6 CPRT 202110 20211117 7 CPRT 202201 20220216
To see how it works
In [6]: pd.concat(dict_df, axis=0) Out[6]: fiscal_quarter_end reported_date COST 0 202102 20210304 1 202105 20210527 2 202108 20210923 3 202111 20211209 CPRT 0 202104 20210519 1 202107 20210908 2 202110 20211117 3 202201 20220216 In [7]: pd.concat(dict_df, axis=0)\ .reset_index() Out[7]: level_0 level_1 fiscal_quarter_end reported_date 0 COST 0 202102 20210304 1 COST 1 202105 20210527 2 COST 2 202108 20210923 3 COST 3 202111 20211209 4 CPRT 0 202104 20210519 5 CPRT 1 202107 20210908 6 CPRT 2 202110 20211117 7 CPRT 3 202201 20220216 In [8]: pd.concat(dict_df, axis=0)\ .reset_index()\ .drop('level_1', axis=1) Out[8]: level_0 fiscal_quarter_end reported_date 0 COST 202102 20210304 1 COST 202105 20210527 2 COST 202108 20210923 3 COST 202111 20211209 4 CPRT 202104 20210519 5 CPRT 202107 20210908 6 CPRT 202110 20211117 7 CPRT 202201 20220216 In [9]: pd.concat(dict_df, axis=0)\ .reset_index()\ .drop('level_1', axis=1)\ .rename({'level_0':'ticker'}, axis=1) Out[9]: ticker fiscal_quarter_end reported_date 0 COST 202102 20210304 1 COST 202105 20210527 2 COST 202108 20210923 3 COST 202111 20211209 4 CPRT 202104 20210519 5 CPRT 202107 20210908 6 CPRT 202110 20211117 7 CPRT 202201 20220216
Ref
- https://stackoverflow.com/questions/56638413/how-to-create-a-pandas-dataframe-from-dictionary-of-dataframes - where I first came across this solution. I just put the solution in a function, used a better example and showed some intermediate steps on how the solution works.
convert_a_dictionary_of_dataframes_to_a_big_dataframe.txt · Last modified: 2023/01/01 15:51 by raju