例如
d1有d1.KRK.ANDROID==600,而d2没有这样的指数。是否可以将d2-d1中缺少此类索引/值视为0?
现在操作返回NaN。
唯一的选择是手动去定义这样的索引和零值吗?所以我需要得到d2-d1,让KRK.ANDROID有-600,而不是NaN。
d2=pd.DataFrame({'branch':['EKB','KRK','NB','VN'],
'worktype':['PHP','PYTHON','PYTHON','ANDROID'],
'minutes':[20, 270, 20, 20]})
.set_index(['branch', 'worktype'])
d1=pd.DataFrame({'branch':['EKB','KRK','KRK','KRK', 'NB', 'VN'],
'worktype':['PHP','ANDROID','PYTHON','QA', 'PYTHON', 'ANDROID'],
'minutes':[20, 600, 680, 45, 120, 15]})
.set_index(['branch', 'worktype'])
In [293]: d2
Out[293]:
minutes
branch worktype
EKB PHP 20
KRK PYTHON 270
NB PYTHON 20
VN ANDROID 20
In [294]: d1
Out[294]:
minutes
branch worktype
EKB PHP 20
KRK ANDROID 600
PYTHON 680
QA 45
NB PYTHON 120
VN ANDROID 15
In [295]: d2 - d1
Out[295]:
minutes
branch worktype
EKB PHP 0.0
KRK ANDROID NaN
PYTHON -410.0
QA NaN
NB PYTHON -100.0
VN ANDROID 5.0
您可以尝试reindex
:-(
d2.reindex(d1.index).fillna(0)-d1
Out[342]:
minutes
branch worktype
EKB PHP 0.0
KRK ANDROID -600.0
PYTHON -410.0
QA -45.0
NB PYTHON -100.0
VN ANDROID 5.0
对于您的附加要求
if len(d2.index.labels[1])<len(d1.index.labels[1]):
print(d2.reindex(d1.index).fillna(0) - d1)
else :
print(d2 - d1.reindex(d2.index).fillna(0))
更新2
AAA=set(d1.index.tolist()+d2.index.tolist())
d1.reindex(AAA).fillna(0)-d2.reindex(AAA).fillna(0)
我发现另一种方法是在需要查找日期/时间索引的差异时使用df.diff((。也就是说,具体的价值每天都在变化。我只需要在不使用多索引的情况下将原始的逐组数据帧转换为纯2D数据帧
In [912]: gp = gp.unstack(level=1).fillna(0)
In [913]: with pd.option_context('display.max_rows', None, 'display.max_columns', 3):
...: print(gp)
...:
department MOBILE ... WEB
period ...
2016-02-03 0.0 ... 30.0
2016-12-24 0.0 ... 400.0
2016-12-25 0.0 ... 80.0
2016-12-26 0.0 ... 20.0
2016-12-27 0.0 ... 180.0
2016-12-28 600.0 ... 15.0
2017-01-01 0.0 ... 190.0
2017-01-03 0.0 ... 80.0
2017-01-04 20.0 ... 0.0
2017-02-01 120.0 ... 0.0
2017-02-02 45.0 ... 0.0
In [914]: with pd.option_context('display.max_rows', None, 'display.max_columns', 3):
...: print(gp.diff())
...:
...:
department MOBILE ... WEB
period ...
2016-02-03 NaN ... NaN
2016-12-24 0.0 ... 370.0
2016-12-25 0.0 ... -320.0
2016-12-26 0.0 ... -60.0
2016-12-27 0.0 ... 160.0
2016-12-28 600.0 ... -165.0
2017-01-01 -600.0 ... 175.0
2017-01-03 0.0 ... -110.0
2017-01-04 20.0 ... -80.0
2017-02-01 100.0 ... 0.0
2017-02-02 -75.0 ... 0.0
[11 rows x 4 columns]