我有两个Pandas数组,如:
A =
Date Value
0 2022-03-01 50
1 2022-03-01 50
2 2022-03-03 50
3 2022-03-04 50
4 2022-03-04 50
5 2022-03-04 50
6 2022-03-04 50
7 2022-03-04 50
8 2022-03-15 50
9 2022-03-19 50
10 2022-03-23 50
B =
Value Date
Date
2022-03-01 0.0 2022-03-01
2022-03-02 0.0 2022-03-02
2022-03-03 0.0 2022-03-03
2022-03-04 0.0 2022-03-04
2022-03-05 0.0 2022-03-05
2022-03-06 0.0 2022-03-06
2022-03-07 0.0 2022-03-07
2022-03-08 0.0 2022-03-08
另外,A和B列表的长度不同。
我需要形成一个列表'C=',其中日期将按顺序排列(如列表'B='),并且每个日期的'值'等于列表' a ='中此日期的'值'的总和…列表'A='中未包含在列表'B='中日期的所有'值'都不应出现在列表'C='中。
我是python的新手我已经尝试过很多选择了……但一切都不工作:
for i in range(len(A)):
for n in range(len(B)):
if A['Date'][i] == B['Date'][i]:
C['Date'] == B['Date'][i]
C['Value'] == A['Value'][i]
else:
C['Date'] == B['Date'][i]
C['Value'] == B['Value'][i]
结果数组应该如下所示:
B =
Value Date
1 100.0 2022-03-01
2 0.0 2022-03-02
3 50.0 2022-03-03
4 250.0 2022-03-04
5 0.0 2022-03-05
6 0.0 2022-03-06
7 0.0 2022-03-07
8 0.0 2022-03-08
/////////真正的数组是:
Date: Value:
0 2022-05-06 0.000000e+00
1 2022-05-04 -4.042888e+06
2 2022-04-15 -2.742782e+06
3 2022-04-15 -9.152107e+05
4 2022-04-01 -3.608648e+07
5 2022-04-01 -1.072268e+07
6 2022-04-01 -1.885982e+07
7 2022-04-01 -1.087363e+07
8 2022-03-04 0.000000e+00
9 2022-03-04 0.000000e+00
10 2022-03-04 0.000000e+00
11 2022-03-04 0.000000e+00
12 2022-03-04 0.000000e+00
13 2022-03-04 0.000000e+00
14 2022-03-04 0.000000e+00
15 2022-03-04 0.000000e+00
16 2022-02-03 -4.369548e+06
17 2022-02-01 0.000000e+00
18 2022-02-01 0.000000e+00
19 2022-02-01 0.000000e+00
20 2022-02-01 -3.441539e+05
21 2022-02-01 0.000000e+00
22 2022-02-01 0.000000e+00
23 2022-02-01 0.000000e+00
24 2022-02-01 0.000000e+00
25 2021-11-12 -1.350750e+06
26 2021-11-09 0.000000e+00
27 2021-11-09 0.000000e+00
28 2021-11-01 -3.716895e+06
29 2021-10-19 -2.464551e+07
30 2021-10-15 -5.662351e+06
31 2021-10-15 -1.243596e+06
32 2021-10-01 -4.992055e+07
33 2021-10-01 -2.695814e+07
34 2021-10-01 -2.689851e+07
35 2021-10-01 -2.689851e+07
36 2021-09-26 0.000000e+00
37 2021-09-26 0.000000e+00
38 2021-09-26 0.000000e+00
39 2021-09-26 0.000000e+00
40 2021-09-26 0.000000e+00
41 2021-09-26 0.000000e+00
42 2021-08-24 -7.515941e+08
43 2021-08-20 0.000000e+00
44 2021-08-05 -4.674707e+06
45 2021-08-02 -2.480048e+06
46 2021-05-03 -1.607934e+07
47 2021-05-03 -2.257631e+06
48 2021-04-15 -5.224200e+06
Value: Date:
Date
2021-11-01 0.0 2021-11-01
2021-11-02 0.0 2021-11-02
2021-11-03 0.0 2021-11-03
2021-11-04 0.0 2021-11-04
2021-11-05 0.0 2021-11-05
... ... ...
2022-05-10 0.0 2022-05-10
2022-05-11 0.0 2022-05-11
2022-05-12 0.0 2022-05-12
2022-05-13 0.0 2022-05-13
2022-05-16 0.0 2022-05-16
[136 rows x 2 columns]
应用Nick描述的方法后,'C='中的所有'Values'都等于0.0
Date: Value:
Date
2021-11-01 2021-11-01 0.0
2021-11-02 2021-11-02 0.0
2021-11-03 2021-11-03 0.0
2021-11-04 2021-11-04 0.0
2021-11-05 2021-11-05 0.0
... ... ...
2022-05-10 2022-05-10 0.0
2022-05-11 2022-05-11 0.0
2022-05-12 2022-05-12 0.0
2022-05-13 2022-05-13 0.0
2022-05-16 2022-05-16 0.0
[136 rows x 2 columns]
您应该能够将join
B
到A
按Date
分组并求和:
C = B.join(A.groupby('Date').sum(), how='inner', lsuffix='_l').drop('Value_l', axis=1)
输出(样本数据):
Date Value
Date
2022-03-01 2022-03-01 100
2022-03-03 2022-03-03 50
2022-03-04 2022-03-04 250
如果您想保留B
的日期而不包含总和,则使用左连接(默认)代替fillna
,并使用0:
C = B.join(A.groupby('Date').sum(), lsuffix='_l').drop('Value_l', axis=1).fillna(0.0)
输出:
Date Value
Date
2022-03-01 2022-03-01 100.0
2022-03-02 2022-03-02 0.0
2022-03-03 2022-03-03 50.0
2022-03-04 2022-03-04 250.0
2022-03-05 2022-03-05 0.0
2022-03-06 2022-03-06 0.0
2022-03-07 2022-03-07 0.0
2022-03-08 2022-03-08 0.0
最后,如果你想删除Date
作为索引,使用reset_index
:
C = C.reset_index(drop=True)
输出:
Date Value
0 2022-03-01 100.0
1 2022-03-02 0.0
2 2022-03-03 50.0
3 2022-03-04 250.0
4 2022-03-05 0.0
5 2022-03-06 0.0
6 2022-03-07 0.0
7 2022-03-08 0.0