我有一个熊猫属性数据框架。
**Address** | **Added on**. |
15 Smith Close | Added on 17/11/22 |
1 Apple Drive | Reduced on 19/11/22|
27 Pride place | Added on 18/1//22 |
我想把所有的实例'reduce on…' add on'列中的' add on'列与数据框中名为'Reduced on'的另一列对应。我该怎么做呢?
许多谢谢。
您可以使用pd.DataFrame.where
:
df['Reduced on'] = df['Added on'].where(df['Added on'].str.contains('Reduced on'))
df['Added on'] = df['Added on'].where(~ df['Added on'].str.contains('Reduced on'))
df
Address Added on Reduced on
0 15 Smith Close Added on 17/11/22 NaN
1 1 Apple Drive NaN Reduced on 19/11/22
2 27 Pride place Added on 18/1//22 NaN
或者使用pd.Series.str.extract
&pd.DataFrame.concat
:
pd.concat([df['Address'], df['Added on'].str.extract('(?P<Added_on>Add.*)|(?P<Reduced_on>Reduced.*)')], axis=1)
Address Added_on Reduced_on
0 15 Smith Close Added on 17/11/22 NaN
1 1 Apple Drive NaN Reduced on 19/11/22
2 27 Pride place Added on 18/1//22 NaN
建议代码:
import pandas as pd
import numpy as np
# Build Dataframe to work on
df = pd.DataFrame({"**Address** ": ['15 Smith Close' , '1 Apple Drive', '27 Pride place'],
"**Added on**": ['Added on 17/11/22', 'Reduced on 19/11/22', 'Added on 18/1//22']})
# Define the mask m
m = df['**Added on**'].str.contains('Reduced')
# 1- Move 'Reduced' rows to **New Col**
df['**Reduced on**'] = df['**Added on**'].where(m, np.nan)
# 2- Erase 'Reduced' rows from **Added on**
df['**Added on**'] = df['**Added on**'].where(~m, np.nan)
print(df)
结果:
**Address** **Added on** **Reduced on**
0 15 Smith Close Added on 17/11/22 NaN
1 1 Apple Drive NaN Reduced on 19/11/22
2 27 Pride place Added on 18/1//22 NaN
这应该也可以:
(df[['Address']].join(df[['Added on']]
.set_index(df['Added on']
.str.rsplit(n=1)
.str[0]
.rename(None),append=True)['Added on']
.unstack()))
或
df['Added on'].str.rsplit(' ',n=1).str[0].str.get_dummies().mul(df['Added on'],axis=0)
输出:
Address Added on Reduced on
0 15 Smith Close Added on 17/11/22 NaN
1 1 Apple Drive NaN Reduced on 19/11/22
2 27 Pride place Added on 18/1//22 NaN