我有一个字典,其中每个值都是一个Series。为了便于讨论,我们假设所有索引都是相同的。我希望最终得到一个具有相同索引的数据框(或者实际上是任何表-不必是pandas),并且每个列都是这些序列中的一个,以字典键作为列头。我可以遍历字典,然后给数据框赋值,但我很好奇是否有一种不需要循环的更python的方法。
然后作为第二个问题,如果索引不完全相同(某些序列有一个或两个缺失,因此应该显示为零或零),那么解决方案会有什么不同?
一些示例数据,例如:
import pandas as pd
import random
test = dict.fromkeys(['a','b','c','d'])
for key in test:
randomlist = []
for i in range(0,5):
n = random.randint(1,30)
randomlist.append(n)
test[key] = pd.Series(randomlist)
test
Out[16]:
{'a': 0 10
1 16
2 9
3 23
4 30
dtype: int64,
'b': 0 7
1 9
2 1
3 16
4 29
dtype: int64,
'c': 0 30
1 21
2 25
3 1
4 22
dtype: int64,
'd': 0 28
1 29
2 7
3 25
4 25
dtype: int64}
,我想以这样的结尾:
a b c d
1 16 9 21 29
2 9 1 25 7
3 23 16 1 25
4 30 29 22 25
我肯定没有在这个例子中捕捉到什么,所以这里是我的实际字典的一小段:
'C:\Users\name\Google Drive\Simulations\030012-OffMed-VRF - ap\CTZ14S22AMeter.csv': Fans:Electricity [J](Hourly) 49.785923
InteriorEquipment:Electricity [J](Hourly) 72.889315
InteriorLights:Electricity [J](Hourly) 16.140645
Electricity:Facility [J](Hourly) 205.964746
Cooling:Electricity [J](Hourly) 57.205236
Pumps:Electricity [J](Hourly) 0.000000
Heating:Electricity [J](Hourly) 6.073830
WaterSystems:Electricity [J](Hourly) 3.869797
Receptacle:InteriorEquipment:Electricity [J](Hourly) 62.582398
Internal Transport:InteriorEquipment:Electricity [J](Hourly) 10.306918
ComplianceLtg:InteriorLights:Electricity [J](Hourly) 16.140645
dtype: float64,
'C:\Users\name\Google Drive\Simulations\030012-OffMed-VRF - ap\CTZ15S22AMeter.csv': Fans:Electricity [J](Hourly) 46.432982
InteriorEquipment:Electricity [J](Hourly) 71.004371
InteriorLights:Electricity [J](Hourly) 15.900494
Electricity:Facility [J](Hourly) 216.518008
Cooling:Electricity [J](Hourly) 78.686596
Pumps:Electricity [J](Hourly) 0.000000
Heating:Electricity [J](Hourly) 0.687672
WaterSystems:Electricity [J](Hourly) 3.805893
Receptacle:InteriorEquipment:Electricity [J](Hourly) 60.888104
Internal Transport:InteriorEquipment:Electricity [J](Hourly) 10.116267
ComplianceLtg:InteriorLights:Electricity [J](Hourly) 15.900494
dtype: float64,
'C:\Users\name\Google Drive\Simulations\030012-OffMed-VRF - ap\CTZ16S22AMeter.csv': Fans:Electricity [J](Hourly) 52.634381
InteriorEquipment:Electricity [J](Hourly) 66.367556
InteriorLights:Electricity [J](Hourly) 15.400713
Electricity:Facility [J](Hourly) 183.632062
Cooling:Electricity [J](Hourly) 28.867642
Pumps:Electricity [J](Hourly) 0.000000
Heating:Electricity [J](Hourly) 16.713066
WaterSystems:Electricity [J](Hourly) 3.648704
Receptacle:InteriorEquipment:Electricity [J](Hourly) 56.827636
Internal Transport:InteriorEquipment:Electricity [J](Hourly) 9.539920
ComplianceLtg:InteriorLights:Electricity [J](Hourly) 15.400713
dtype: float64}
如果我理解对了你的问题:
- 你可以传递你的字典作为数据:
data = {'a': pd.Series([1, 2, 3]), 'b': pd.Series([4, 5, 6]), 'c': pd.Series([7, 8, 9])} pd.DataFrame(data=data) a b c 0 1 4 7 1 2 5 8 2 3 6 9
- 没什么不同,你只需要传递索引:
data = {'a': pd.Series({1: 1, 2: 2, 3: 3, 5: 6}), 'b': pd.Series({1: 4, 2: 5, 4: 6}), 'c': pd.Series({2: 7, 3: 8, 4: 9, 5: 10})} pd.DataFrame(data=data) a b c 1 1.0 4.0 NaN 2 2.0 5.0 7.0 3 3.0 NaN 8.0 4 NaN 6.0 9.0 5 6.0 NaN 10.0