使用python理解和dictionary为pandas数据帧赋值



假设我正在尝试构建一个数据帧,以便像检查扇区的表一样打印出来:

SectorDescription   SectorCode
0   State Energy Data Systems   SEDS
1   Coal Data   COAL
2   Petroleum Data  PET
3   Natural Gas Data    NG
4   Electricity Data    ELEC
5   Petroleum Imports Data  PET_IMPORTS
6   Short-Term Energy Outlook Data  STEO
7   International Energy Data   INTL
8   Annual Energy Outlook Data  AEO

现在我有:

QuandlEIASector = {"State Energy Data Systems":"SEDS",
"Coal Data":"COAL",
"Petroleum Data":"PET",
"Natural Gas Data":"NG",
"Electricity Data":"ELEC",
"Petroleum Imports Data":"PET_IMPORTS",
"Short-Term Energy Outlook Data":"STEO",
"International Energy Data":"INTL",
"Annual Energy Outlook Data":"AEO"}

我所做的是:

QuandlEIASectorList = pd.DataFrame()
QuandlEIASectorList['SectorDescription'] = QuandlEIASector.keys()
QuandlEIASectorList['SectorCode'] = QuandlEIASector.values()
QuandlEIASectorList

但是,有没有比python理解的更快的一行程序将列值分配给pandas数据帧?

创建Series,然后转换为DataFrame:

QuandlEIASectorList = (pd.Series(QuandlEIASector)
.rename_axis('SectorDescription')
.reset_index(name='SectorCode'))

类似:

QuandlEIASectorList = (pd.Series(QuandlEIASector, name='SectorCode')
.rename_axis('SectorDescription')
.reset_index())

您的代码应该与DataFrame构造函数一起使用:

QuandlEIASectorList = pd.DataFrame({'SectorDescription':list(QuandlEIASector.keys()),
'SectorCode': list(QuandlEIASector.values())})

或者:

QuandlEIASectorList = pd.DataFrame(list(QuandlEIASector.items()), 
columns=['SectorDescription','SectorCode'])

10k密钥的性能

QuandlEIASector = dict(zip([f'{x} data' for x in np.arange(10000)], 
[f'{x} keys' for x in np.arange(10000)]))

In [73]: %%timeit
...: QuandlEIASectorList = pd.DataFrame()
...: QuandlEIASectorList['SectorDescription'] = QuandlEIASector.keys()
...: QuandlEIASectorList['SectorCode'] = QuandlEIASector.values()
...: 
5.94 ms ± 52.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [74]: %%timeit
...: (pd.Series(QuandlEIASector)
...:    .rename_axis('SectorDescription')
...:    .reset_index(name='SectorCode'))
...:                          
5.37 ms ± 261 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [75]: %%timeit
...: (pd.Series(QuandlEIASector, name='SectorCode')
...:    .rename_axis('SectorDescription')
...:    .reset_index())
...:                          
5.34 ms ± 211 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [76]: %%timeit
...: pd.DataFrame({'SectorDescription':list(QuandlEIASector.keys()),
...:               'SectorCode': list(QuandlEIASector.values())})
...:                                    
2.26 ms ± 20.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [77]: %%timeit
...: pd.DataFrame(list(QuandlEIASector.items()), 
...:              columns=['SectorDescription','SectorCode'])
...:                                    
3.15 ms ± 38.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

相关内容

最新更新