以下是板球比分的csv文件示例:
>>> df
venue ball run extra wide noball
0 a 0.1 0 1 NaN NaN
1 a 0.2 4 0 NaN NaN
2 a 0.3 1 5 5.0 NaN
3 a 0.4 1 0 NaN NaN
4 a 0.5 1 1 NaN 1.0
5 a 0.6 2 1 NaN NaN
6 a 0.7 6 2 1.0 1.0
7 a 0.8 0 0 NaN NaN
8 a 0.9 1 1 NaN NaN
9 a 1.1 2 2 NaN NaN
10 a 1.2 1 0 NaN NaN
11 a 1.3 6 1 NaN NaN
12 a 1.4 0 2 NaN 2.0
13 a 1.5 1 0 NaN NaN
14 a 1.6 2 0 NaN NaN
15 a 1.7 0 1 NaN NaN
16 a 0.1 0 5 NaN NaN
17 a 0.2 4 0 NaN NaN
18 a 0.3 1 1 NaN NaN
19 a 0.4 3 0 NaN NaN
20 a 0.5 0 0 NaN NaN
21 a 0.6 0 2 2.0 NaN
22 a 0.7 6 1 NaN NaN
23 a 1.1 4 0 NaN NaN
从这个数据帧中,我想更新ball值,生成2个新列并删除4个整列。条件为
- 当";宽";或";noball";为空,crun=crun+跑动+额外直到球=0.1(递归(
- 当";宽";或";noball";不为null,并发球值将不会增加,并将在crun计算后删除。crun=crun+run+extra。并且它将继续直到ball=0.1(递归地(。例如,让我分解:从行索引0到8:|0.1〃;宽";或";noball";为空并且crun=1|0.2〃;宽";或";noball";为空并且crun=1+4=5|0.3〃;宽";或";noball";不为空(移除(|0.4〃;宽";或";noball";为零(变为0.3(并且crun=5+1+5+1=12|0.5〃;宽";或";noball";不为空(去除(|0.6〃;宽";或";noball";为零(变为0.4(并且crun=12+1+1+2+1=17|0.7〃;宽";或";noball";不为空(去除(|0.8〃;宽";或";noball";为零(变为0.5(并且crun=17+6+2=25|0.9";宽";或";noball";为空(变为0.6(,crun=25+1+1=27|
- 最后"合计";将创建列,该列返回crun的最大值,直到ball=0.1为止(递归(。然后";运行""额外"宽"noball";列
我想要的输出:
venue ball crun total
0 a 0.1 1 45
1 a 0.2 5 45
2 a 0.3 12 45
3 a 0.4 17 45
4 a 0.5 25 45
5 a 0.6 27 45
6 a 1.1 31 45
7 a 1.2 32 45
8 a 1.3 39 45
9 a 1.4 42 45
10 a 1.5 44 45
11 a 1.6 45 45
12 a 0.1 5 27
13 a 0.2 9 27
14 a 0.3 11 27
15 a 0.4 14 27
16 a 0.5 14 27
17 a 0.6 23 27
18 a 1.1 27 27
我觉得太复杂了,请帮忙。我尝试过的代码:
df = pd.read_csv("data.csv")
gr = df.groupby(df.ball.eq(0.1).cumsum())
df["crun"] = gr.runs.cumsum()
df["total"] = gr.current_run.transform("max")
df = df.drop(['run', 'extra', 'wide', 'noball'], axis=1)
Alrighty。这很有趣。
(为了清楚起见,我试图添加评论。(
注意:";球,"run"额外的,"宽,";以及";noball";都是数字字段。
注释
注意:这一切都假设您的初始DataFrame位于一个名为df
的变量下。
# Create target groupings by ball value.
df["target_groups"] = df.loc[df["ball"] == 0.1].groupby(level=-1).ngroup()
df["target_groups"].fillna(method="ffill", inplace=True)
# --- Create subgroups --- #
df["target_subgroups"] = df["ball"].astype(int)
# Add field fro sum of run and extra
df["run_extra"] = df[["run", "extra"]].sum(axis=1)
# Apply groupby() and cumsum() as follows to get the cumulative sum
# of each ball group for run and extra.
df["crun"] = df.groupby(["target_groups"])["run_extra"].cumsum()
# Create dataframe for max crun value of each group
group_max_df = df.groupby(["target_groups"])["crun"].max().to_frame().reset_index()
# Merge both of the DataFrames with the given suffixes. The first one
# Just prevents crun from having a suffix added, which is an additional
# step to remove.
# You could probably use .join() in a similar manner.
df = pd.merge(df, group_max_df,
on=["target_groups"],
suffixes=("", "_total"),
sort=False
)
# Rename your new total field.
df.rename(columns={"crun_total": "total"}, inplace = True)
# Apply your wide and noball condition here.
df = df[(df["wide"].isna()) & (df["noball"].isna())].copy()
# -- Reset `ball` column -- #
# Add temp column with static value
df["tmp_ball"] = 0.1
# Generate cumulative sum by subgroup.
# Set `ball` to modulo 0.6
df.loc[:, "ball"] = df.groupby(["target_subgroups"])["tmp_ball"].cumsum() % 0.6
# Find rows where ball == 0.0 and set those to 0.6
df.loc[df["ball"] == 0.0, "ball"] = 0.6
# Add ball and target_subgroups columns to get final ball value.
df["ball"] = df["ball"] + df["target_subgroups"]
# Reset your main index, if desired
df.reset_index(drop=True, inplace=True)
# Select only desired field for output.
df = df.loc[:, ["venue","ball","crun","total"]].copy()
df
:输出
venue ball crun total
0 a 0.1 1 45
1 a 0.2 5 45
2 a 0.4 12 45
3 a 0.6 17 45
4 a 0.8 25 45
5 a 0.9 27 45
6 a 1.1 31 45
7 a 1.2 32 45
8 a 1.3 39 45
9 a 1.5 42 45
10 a 1.6 44 45
11 a 1.7 45 45
12 a 0.1 5 27
13 a 0.2 9 27
14 a 0.3 11 27
15 a 0.4 14 27
16 a 0.5 14 27
17 a 0.7 23 27
18 a 1.1 27 27
编辑: