我当前的熊猫数据帧:
Subject Test1 Test2 Test3
Python 45 Temp 25
50 Temp Temp
Temp 30 Temp
Temp 35 Temp
Temp 38 Temp
OS Temp Temp 45
33 24 32
43 34 44
Temp 43 31
Java 33 32 34
43 33 36
预期输出:
Subject Test1 Test2 Test3
Python 45 30 25
50 35
38
OS 33 24 45
43 34 32
43 44
31
Java 33 32 34
43 33 36
我需要删除"Temp"匹配的单元格。在这种情况下,整排往下掉是行不通的,所以我被困在这里了。请帮忙。
如果Subject
中没有值为空字符串,首先将其替换为缺失的值,然后在第一步中向前填充:
df['Subject'] = df['Subject'].replace('',np.nan).ffill()
然后使用自定义函数删除GroupBy.apply
:中每组的Temp
值
def f(x):
return x.apply(lambda x: pd.Series(x[x.ne('Temp')].tolist(),name=x.name))
df = (df.set_index('Subject')
.groupby('Subject', sort=False)
.apply(f)
.reset_index(level=1, drop=True)
.reset_index())
print (df)
Subject Test1 Test2 Test3
0 Python 45 30 25
1 Python 50 35 NaN
2 Python NaN 38 NaN
3 OS 33 24 45
4 OS 43 34 32
5 OS NaN 43 44
6 OS NaN NaN 31
7 Java 33 32 34
8 Java 43 33 36
或者在处理完Subject
之后,首先由DataFrame.melt
取消透视,删除Temp
行,并由GroupBy.cumcount
为具有DataFrame.pivot
:的辅助列使用计数器进行透视
df['Subject'] = df['Subject'].replace('',np.nan).ffill()
df = df.melt('Subject')
df = df[df['value'].ne('Temp')].copy()
df['g'] = df.groupby(['Subject','variable']).cumcount()
df = (df.pivot(['Subject','g'],'variable','value')
.reset_index(level=1, drop=True)
.reset_index()
.rename_axis(None, axis=1))
print (df)
Subject Test1 Test2 Test3
0 Java 33 32 34
1 Java 43 33 36
2 OS 33 24 45
3 OS 43 34 32
4 OS NaN 43 44
5 OS NaN NaN 31
6 Python 45 30 25
7 Python 50 35 NaN
8 Python NaN 38 NaN