如何拆分Python数据框架中字段的值,将其添加到新行,并使用现有值将其添加到新字段中


STUDY              Teacher       UPDATE_DATE  COMPARE_DATE  INTERVAL_DAYS
0   entertainment           C        2022-03-02    2022-01-01      0.61
1   entertainment        B, C        2022-03-02    2022-04-10      0.39
2   math                    A        2022-02-25    2022-01-01      0.56
3   math                 A, C        2022-02-25    2022-04-10      0.44
4   science                 D        2022-01-03    2022-01-01      0.02
5   science                 A        2022-01-03    2022-02-20      0.48
6   science           A, B, E        2022-02-20    2022-04-10      0.49
7   technology              E        2021-09-01    2022-09-01      0.00

。当有2个以上的教师时,我想通过基于","拆分来创建一行。并将INTERVAL_DAYS字段的值除以要添加的教师的数量。我认为它与您帮助的compare_date函数具有类似的结构,并尝试应用它,但没有得到所需的输出,因此我不得不提出一个额外的问题。

STUDY              Teacher       UPDATE_DATE  COMPARE_DATE  INTERVAL_DAYS
0   entertainment           C        2022-03-02    2022-01-01      0.61
1   entertainment           B        2022-03-02    2022-04-10      0.2
2   entertainment           C        2022-03-02    2022-04-10      0.2
3   math                    A        2022-02-25    2022-01-01      0.56
4   math                    A        2022-02-25    2022-04-10      0.22
5   math                    C        2022-02-25    2022-04-10      0.22
6   science                 D        2022-01-03    2022-01-01      0.02
7   science                 A        2022-01-03    2022-02-20      0.48
8   science                 A        2022-02-20    2022-04-10      0.16
9   science                 B        2022-02-20    2022-04-10      0.16

10 science E 2022-02-20 2022-04-10 0.1611技术E 2021-09-01 2022-09-01 1.00

最后要得到的是每个Teacher的间隔天数之和。

Teacher     WORKING_COUNTS
0        A             2
1        B          0.36
2        C          1.03
3        D          0.02
4        E          1.16

首先用,拆分值,然后用DataFrame.explode,用GroupBy.transform除以行数,最后聚合sum:

df = df.assign(Teacher = df['Teacher'].str.split(', ')).explode('Teacher')
df['INTERVAL_DAYS'] /= df.groupby(level=0)['INTERVAL_DAYS'].transform('size')
print (df)
STUDY Teacher UPDATE_DATE COMPARE_DATE  INTERVAL_DAYS
0  entertainment       C  2022-03-02   2022-01-01       0.610000
1  entertainment       B  2022-03-02   2022-04-10       0.195000
1  entertainment       C  2022-03-02   2022-04-10       0.195000
2           math       A  2022-02-25   2022-01-01       0.560000
3           math       A  2022-02-25   2022-04-10       0.220000
3           math       C  2022-02-25   2022-04-10       0.220000
4        science       D  2022-01-03   2022-01-01       0.020000
5        science       A  2022-01-03   2022-02-20       0.480000
6        science       A  2022-02-20   2022-04-10       0.163333
6        science       B  2022-02-20   2022-04-10       0.163333
6        science       E  2022-02-20   2022-04-10       0.163333
7     technology       E  2021-09-01   2022-09-01       0.000000
out = df.groupby('Teacher')['INTERVAL_DAYS'].sum().reset_index(name='WORKING_COUNTS')
print (out)
Teacher  WORKING_COUNTS
0       A        1.423333
1       B        0.358333
2       C        1.025000
3       D        0.020000
4       E        0.163333

最新更新