假设我正在尝试构建一个数据帧,以便像检查扇区的表一样打印出来:
SectorDescription SectorCode
0 State Energy Data Systems SEDS
1 Coal Data COAL
2 Petroleum Data PET
3 Natural Gas Data NG
4 Electricity Data ELEC
5 Petroleum Imports Data PET_IMPORTS
6 Short-Term Energy Outlook Data STEO
7 International Energy Data INTL
8 Annual Energy Outlook Data AEO
现在我有:
QuandlEIASector = {"State Energy Data Systems":"SEDS",
"Coal Data":"COAL",
"Petroleum Data":"PET",
"Natural Gas Data":"NG",
"Electricity Data":"ELEC",
"Petroleum Imports Data":"PET_IMPORTS",
"Short-Term Energy Outlook Data":"STEO",
"International Energy Data":"INTL",
"Annual Energy Outlook Data":"AEO"}
我所做的是:
QuandlEIASectorList = pd.DataFrame()
QuandlEIASectorList['SectorDescription'] = QuandlEIASector.keys()
QuandlEIASectorList['SectorCode'] = QuandlEIASector.values()
QuandlEIASectorList
但是,有没有比python理解的更快的一行程序将列值分配给pandas数据帧?
创建Series
,然后转换为DataFrame
:
QuandlEIASectorList = (pd.Series(QuandlEIASector)
.rename_axis('SectorDescription')
.reset_index(name='SectorCode'))
类似:
QuandlEIASectorList = (pd.Series(QuandlEIASector, name='SectorCode')
.rename_axis('SectorDescription')
.reset_index())
您的代码应该与DataFrame
构造函数一起使用:
QuandlEIASectorList = pd.DataFrame({'SectorDescription':list(QuandlEIASector.keys()),
'SectorCode': list(QuandlEIASector.values())})
或者:
QuandlEIASectorList = pd.DataFrame(list(QuandlEIASector.items()),
columns=['SectorDescription','SectorCode'])
10k密钥的性能:
QuandlEIASector = dict(zip([f'{x} data' for x in np.arange(10000)],
[f'{x} keys' for x in np.arange(10000)]))
In [73]: %%timeit
...: QuandlEIASectorList = pd.DataFrame()
...: QuandlEIASectorList['SectorDescription'] = QuandlEIASector.keys()
...: QuandlEIASectorList['SectorCode'] = QuandlEIASector.values()
...:
5.94 ms ± 52.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [74]: %%timeit
...: (pd.Series(QuandlEIASector)
...: .rename_axis('SectorDescription')
...: .reset_index(name='SectorCode'))
...:
5.37 ms ± 261 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [75]: %%timeit
...: (pd.Series(QuandlEIASector, name='SectorCode')
...: .rename_axis('SectorDescription')
...: .reset_index())
...:
5.34 ms ± 211 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [76]: %%timeit
...: pd.DataFrame({'SectorDescription':list(QuandlEIASector.keys()),
...: 'SectorCode': list(QuandlEIASector.values())})
...:
2.26 ms ± 20.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [77]: %%timeit
...: pd.DataFrame(list(QuandlEIASector.items()),
...: columns=['SectorDescription','SectorCode'])
...:
3.15 ms ± 38.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)