在pandas的列之间移动特定的字符串



我有一个熊猫属性数据框架。

**Address**      | **Added on**.       | 
15 Smith Close   |  Added on 17/11/22  |
1 Apple Drive    |  Reduced on 19/11/22|
27 Pride place   |  Added on 18/1//22  |

我想把所有的实例'reduce on…' add on'列中的' add on'列与数据框中名为'Reduced on'的另一列对应。我该怎么做呢?

许多谢谢。

您可以使用pd.DataFrame.where:

df['Reduced on'] = df['Added on'].where(df['Added on'].str.contains('Reduced on'))
df['Added on'] = df['Added on'].where(~ df['Added on'].str.contains('Reduced on'))
df
Address           Added on           Reduced on
0  15 Smith Close  Added on 17/11/22                  NaN
1   1 Apple Drive                NaN  Reduced on 19/11/22
2  27 Pride place  Added on 18/1//22                  NaN

或者使用pd.Series.str.extract&pd.DataFrame.concat:

pd.concat([df['Address'], df['Added on'].str.extract('(?P<Added_on>Add.*)|(?P<Reduced_on>Reduced.*)')], axis=1)
Address           Added_on           Reduced_on
0  15 Smith Close  Added on 17/11/22                  NaN
1   1 Apple Drive                NaN  Reduced on 19/11/22
2  27 Pride place  Added on 18/1//22                  NaN

建议代码:

import pandas as pd
import numpy as np
# Build Dataframe to work on
df = pd.DataFrame({"**Address** ": ['15 Smith Close' , '1 Apple Drive', '27 Pride place'], 
"**Added on**": ['Added on 17/11/22', 'Reduced on 19/11/22', 'Added on 18/1//22']})
# Define the mask m
m = df['**Added on**'].str.contains('Reduced')               
# 1- Move 'Reduced' rows to **New Col**                       
df['**Reduced on**'] = df['**Added on**'].where(m, np.nan)
# 2- Erase 'Reduced' rows from  **Added on**
df['**Added on**'] = df['**Added on**'].where(~m, np.nan) 
print(df)

结果:

**Address**        **Added on**          **Reduced on**
0  15 Smith Close  Added on 17/11/22                  NaN
1   1 Apple Drive                NaN  Reduced on 19/11/22
2  27 Pride place  Added on 18/1//22                  NaN

这应该也可以:

(df[['Address']].join(df[['Added on']]
.set_index(df['Added on']
.str.rsplit(n=1)
.str[0]
.rename(None),append=True)['Added on']
.unstack()))

df['Added on'].str.rsplit(' ',n=1).str[0].str.get_dummies().mul(df['Added on'],axis=0)

输出:

Address           Added on           Reduced on
0  15 Smith Close  Added on 17/11/22                  NaN
1   1 Apple Drive                NaN  Reduced on 19/11/22
2  27 Pride place  Added on 18/1//22                  NaN

最新更新