如何通过按日期对值进行分组,然后提取要另存为新数据帧的过滤组来对 pandas 数据帧进行循环



我一直在试图找出如何使用基于"日期(UTC("列的 for 循环有效地对数据帧中的行进行分组。对于每个相似的日期,我需要提取具有该日期的所有行并将其保存到新数据帧。下面是我的数据示例。

data = pd.read_csv('MasterData.csv')
print(data)
Date (UTC)    Magnitude    Vector Station
0   2020-01-05    26.474679 -0.730455   A
1   2020-01-05    30.746291  0.020503   B
2   2020-01-05    37.829401  0.252316   C
3   2020-01-05  1904.611372  0.977388   D
4   2020-01-19    38.441813 -0.044736   B
5   2020-01-19    31.067455  0.419826   C
6   2020-01-19    15.972198 -0.592661   A
7   2020-01-19  1261.038155  0.977394   D
8   2020-02-02    36.628855  0.518991   B
9   2020-02-02    29.724500 -0.818267   A
10  2020-02-02   673.138440  0.977481   D
11  2020-02-02    13.119385  0.327124   C
12  2020-02-16     7.864772  0.311480   C
13  2020-02-16   312.438088  0.977436   D
14  2020-02-16     7.422883 -0.735439   A
15  2020-02-16    30.693853  0.079296   B
16  2020-02-16     4.722992 -0.434787   E
17  2020-03-01    27.501789  0.393106   C
18  2020-03-01   479.118903  0.977505   D
19  2020-03-01    10.022614 -0.887412   A
20  2020-03-01     4.171703 -0.338138   E
21  2020-03-01    26.057769  0.307710   B
22  2020-03-15   580.257861  0.977406   D
23  2020-03-15    17.528845 -0.927358   A
24  2020-03-15     2.963810  0.222188   E
25  2020-03-15    17.535629 -0.052694   B
26  2020-03-15    38.560337  0.283103   C
27  2020-03-29   943.100243  0.977399   D
28  2020-03-29    36.560799 -0.105109   B
29  2020-03-29    17.055374 -0.677127   A
30  2020-03-29     4.614302 -0.315060   E
31  2020-04-12  1095.538437  0.977369   D
32  2020-04-12     6.650060  0.660921   E
33  2020-04-12    55.495947 -0.316548   B
34  2020-04-12    16.269334 -0.614226   A
print(data.dtypes)
Date (UTC)     object
Magnitude      float64
Vector         float64
Station         object
dtype: object

预期的输出是,对于从过滤数据保存的每个新数据帧,我需要使用 matplotlib quiver 绘制给定日期内所有电台的幅度和矢量。

可以创建一个空字典,然后将日期分组并执行循环过程。

df_dict = {}
for name, group in df.groupby('Date_UTC'):
df_dict[name] = group
df_dict.values()
dict_values([    Date_UTC    Magnitude Vector Station
0 2020-01-05    26.474679     -0.730455A
1 2020-01-05    30.746291      0.020503B
2 2020-01-05    37.829401      0.252316C
3 2020-01-05  1904.611372      0.977388D,     Date_UTC    Magnitude Vector Station
4 2020-01-19    38.441813     -0.044736B
5 2020-01-19    31.067455      0.419826C
6 2020-01-19    15.972198     -0.592661A
7 2020-01-19  1261.038155      0.977394D,      Date_UTC   Magnitude Vector Station
...])

您也可以简单地分组

for name, group in df.groupby('Date_UTC'):
print('split_date:', name)
print(group)
split_date: 2020-01-05 00:00:00
Date_UTC    Magnitude Vector Station
0 2020-01-05    26.474679     -0.730455A
1 2020-01-05    30.746291      0.020503B
2 2020-01-05    37.829401      0.252316C
3 2020-01-05  1904.611372      0.977388D
split_date: 2020-01-19 00:00:00
Date_UTC    Magnitude Vector Station
4 2020-01-19    38.441813     -0.044736B
5 2020-01-19    31.067455      0.419826C
6 2020-01-19    15.972198     -0.592661A
7 2020-01-19  1261.038155      0.977394D
....

创建数据帧

df_list = ['df_0','df_1','df_2','df_3','df_4','df_5','df_6','df_7']
i = 0
for name, group in df.groupby('Date_UTC'):
df_list[i] = group
i += 1
df_0
Date_UTC    Magnitude   Vector Station
0   2020-01-05  26.474679   -0.730455A
1   2020-01-05  30.746291   0.020503B
2   2020-01-05  37.829401   0.252316C
3   2020-01-05  1904.611372 0.977388D

最新更新