如何创建2个新列，一个查找先前的匹配，一个显示下一个匹配

我有一个看起来类似于下面的数据集，并希望创建两个新列。其中一列将返回前一年，如果为0则返回NAN。第二列将返回下一个年份，如果没有，则在年份后面加4。

数据表:

0 0

setup

df = pd.DataFrame(
{
"Name":["Foo","Foo","Foo","Bar","Bar"],
"Year":[2012, 2017, 2022, 2015, 2024],
"Count":[0,1,2,0,1]
}
)

<<p>解决方案/strong>
def make_data(df_sub): years = pd.Series(df_sub["Year"].sort_values().unique()) df_sub["Prior"] = df_sub["Year"].map(dict(zip(years, years.shift()))).mask(df_sub["Count"] == 0) df_sub["Next"] = df_sub["Year"].map(dict(zip(years, years.shift(-1)))).fillna(df_sub["Year"]+4) return df_sub df.groupby("Name").apply(make_data)
得到

Name Year Count Prior Next 0 Foo 2012 0 NaN 2017.0 1 Foo 2017 1 2012.0 2022.0 2 Foo 2022 2 2017.0 2026.0 3 Bar 2015 0 NaN 2024.0 4 Bar 2024 1 2015.0 2028.0
解决方案根据Name值拆分数据帧。对于每一个子数据框，它对年份进行排序，并为下一年和前一年创建地图。对于Prior列，它将屏蔽Count列为0的任何值。对于'Next '列，它将用年份+ 4填充空值。

感谢@Riley提供的setup df代码。

我们可以用numpy.where。

df["Prior"] = np.where(df.groupby("Name")["Count"].shift(1).isnull(),np.nan,  df["Year"].shift(1))
df["Next"] = np.where(df.groupby("Name")["Count"].shift(-1).isnull(), df["Year"] + 4, df["Year"].shift(-1))
> df
Name    Year    Count   Prior   Next
0   Foo     2012    0       NaN     2017.0
1   Foo     2017    1       2012.0  2022.0
2   Foo     2022    2       2017.0  2026.0
3   Bar     2015    0       NaN     2024.0
4   Bar     2024    1       2015.0  2028.0

因为np.nan是float类型。整列也将是float类型

相关内容

最新更新

热门标签：