我一直在试图找出如何使用基于"日期(UTC("列的 for 循环有效地对数据帧中的行进行分组。对于每个相似的日期,我需要提取具有该日期的所有行并将其保存到新数据帧。下面是我的数据示例。
data = pd.read_csv('MasterData.csv')
print(data)
Date (UTC) Magnitude Vector Station
0 2020-01-05 26.474679 -0.730455 A
1 2020-01-05 30.746291 0.020503 B
2 2020-01-05 37.829401 0.252316 C
3 2020-01-05 1904.611372 0.977388 D
4 2020-01-19 38.441813 -0.044736 B
5 2020-01-19 31.067455 0.419826 C
6 2020-01-19 15.972198 -0.592661 A
7 2020-01-19 1261.038155 0.977394 D
8 2020-02-02 36.628855 0.518991 B
9 2020-02-02 29.724500 -0.818267 A
10 2020-02-02 673.138440 0.977481 D
11 2020-02-02 13.119385 0.327124 C
12 2020-02-16 7.864772 0.311480 C
13 2020-02-16 312.438088 0.977436 D
14 2020-02-16 7.422883 -0.735439 A
15 2020-02-16 30.693853 0.079296 B
16 2020-02-16 4.722992 -0.434787 E
17 2020-03-01 27.501789 0.393106 C
18 2020-03-01 479.118903 0.977505 D
19 2020-03-01 10.022614 -0.887412 A
20 2020-03-01 4.171703 -0.338138 E
21 2020-03-01 26.057769 0.307710 B
22 2020-03-15 580.257861 0.977406 D
23 2020-03-15 17.528845 -0.927358 A
24 2020-03-15 2.963810 0.222188 E
25 2020-03-15 17.535629 -0.052694 B
26 2020-03-15 38.560337 0.283103 C
27 2020-03-29 943.100243 0.977399 D
28 2020-03-29 36.560799 -0.105109 B
29 2020-03-29 17.055374 -0.677127 A
30 2020-03-29 4.614302 -0.315060 E
31 2020-04-12 1095.538437 0.977369 D
32 2020-04-12 6.650060 0.660921 E
33 2020-04-12 55.495947 -0.316548 B
34 2020-04-12 16.269334 -0.614226 A
print(data.dtypes)
Date (UTC) object
Magnitude float64
Vector float64
Station object
dtype: object
预期的输出是,对于从过滤数据保存的每个新数据帧,我需要使用 matplotlib quiver 绘制给定日期内所有电台的幅度和矢量。
可以创建一个空字典,然后将日期分组并执行循环过程。
df_dict = {}
for name, group in df.groupby('Date_UTC'):
df_dict[name] = group
df_dict.values()
dict_values([ Date_UTC Magnitude Vector Station
0 2020-01-05 26.474679 -0.730455A
1 2020-01-05 30.746291 0.020503B
2 2020-01-05 37.829401 0.252316C
3 2020-01-05 1904.611372 0.977388D, Date_UTC Magnitude Vector Station
4 2020-01-19 38.441813 -0.044736B
5 2020-01-19 31.067455 0.419826C
6 2020-01-19 15.972198 -0.592661A
7 2020-01-19 1261.038155 0.977394D, Date_UTC Magnitude Vector Station
...])
您也可以简单地分组
for name, group in df.groupby('Date_UTC'):
print('split_date:', name)
print(group)
split_date: 2020-01-05 00:00:00
Date_UTC Magnitude Vector Station
0 2020-01-05 26.474679 -0.730455A
1 2020-01-05 30.746291 0.020503B
2 2020-01-05 37.829401 0.252316C
3 2020-01-05 1904.611372 0.977388D
split_date: 2020-01-19 00:00:00
Date_UTC Magnitude Vector Station
4 2020-01-19 38.441813 -0.044736B
5 2020-01-19 31.067455 0.419826C
6 2020-01-19 15.972198 -0.592661A
7 2020-01-19 1261.038155 0.977394D
....
创建数据帧
df_list = ['df_0','df_1','df_2','df_3','df_4','df_5','df_6','df_7']
i = 0
for name, group in df.groupby('Date_UTC'):
df_list[i] = group
i += 1
df_0
Date_UTC Magnitude Vector Station
0 2020-01-05 26.474679 -0.730455A
1 2020-01-05 30.746291 0.020503B
2 2020-01-05 37.829401 0.252316C
3 2020-01-05 1904.611372 0.977388D