STUDY Teacher UPDATE_DATE COMPARE_DATE INTERVAL_DAYS
0 entertainment C 2022-03-02 2022-01-01 0.61
1 entertainment B, C 2022-03-02 2022-04-10 0.39
2 math A 2022-02-25 2022-01-01 0.56
3 math A, C 2022-02-25 2022-04-10 0.44
4 science D 2022-01-03 2022-01-01 0.02
5 science A 2022-01-03 2022-02-20 0.48
6 science A, B, E 2022-02-20 2022-04-10 0.49
7 technology E 2021-09-01 2022-09-01 0.00
。当有2个以上的教师时,我想通过基于","拆分来创建一行。并将INTERVAL_DAYS字段的值除以要添加的教师的数量。我认为它与您帮助的compare_date函数具有类似的结构,并尝试应用它,但没有得到所需的输出,因此我不得不提出一个额外的问题。
STUDY Teacher UPDATE_DATE COMPARE_DATE INTERVAL_DAYS
0 entertainment C 2022-03-02 2022-01-01 0.61
1 entertainment B 2022-03-02 2022-04-10 0.2
2 entertainment C 2022-03-02 2022-04-10 0.2
3 math A 2022-02-25 2022-01-01 0.56
4 math A 2022-02-25 2022-04-10 0.22
5 math C 2022-02-25 2022-04-10 0.22
6 science D 2022-01-03 2022-01-01 0.02
7 science A 2022-01-03 2022-02-20 0.48
8 science A 2022-02-20 2022-04-10 0.16
9 science B 2022-02-20 2022-04-10 0.16
10 science E 2022-02-20 2022-04-10 0.1611技术E 2021-09-01 2022-09-01 1.00
最后要得到的是每个Teacher的间隔天数之和。
Teacher WORKING_COUNTS
0 A 2
1 B 0.36
2 C 1.03
3 D 0.02
4 E 1.16
首先用,
拆分值,然后用DataFrame.explode
,用GroupBy.transform
除以行数,最后聚合sum
:
df = df.assign(Teacher = df['Teacher'].str.split(', ')).explode('Teacher')
df['INTERVAL_DAYS'] /= df.groupby(level=0)['INTERVAL_DAYS'].transform('size')
print (df)
STUDY Teacher UPDATE_DATE COMPARE_DATE INTERVAL_DAYS
0 entertainment C 2022-03-02 2022-01-01 0.610000
1 entertainment B 2022-03-02 2022-04-10 0.195000
1 entertainment C 2022-03-02 2022-04-10 0.195000
2 math A 2022-02-25 2022-01-01 0.560000
3 math A 2022-02-25 2022-04-10 0.220000
3 math C 2022-02-25 2022-04-10 0.220000
4 science D 2022-01-03 2022-01-01 0.020000
5 science A 2022-01-03 2022-02-20 0.480000
6 science A 2022-02-20 2022-04-10 0.163333
6 science B 2022-02-20 2022-04-10 0.163333
6 science E 2022-02-20 2022-04-10 0.163333
7 technology E 2021-09-01 2022-09-01 0.000000
out = df.groupby('Teacher')['INTERVAL_DAYS'].sum().reset_index(name='WORKING_COUNTS')
print (out)
Teacher WORKING_COUNTS
0 A 1.423333
1 B 0.358333
2 C 1.025000
3 D 0.020000
4 E 0.163333