Pandas根据列值删除和更新行和列



以下是板球比分的csv文件示例:

>>> df
venue  ball  run extra wide  noball
0   a   0.1     0   1   NaN     NaN
1   a   0.2     4   0   NaN     NaN
2   a   0.3     1   5   5.0     NaN
3   a   0.4     1   0   NaN     NaN
4   a   0.5     1   1   NaN     1.0
5   a   0.6     2   1   NaN     NaN
6   a   0.7     6   2   1.0     1.0
7   a   0.8     0   0   NaN     NaN
8   a   0.9     1   1   NaN     NaN
9   a   1.1     2   2   NaN     NaN
10  a   1.2     1   0   NaN     NaN
11  a   1.3     6   1   NaN     NaN
12  a   1.4     0   2   NaN     2.0
13  a   1.5     1   0   NaN     NaN
14  a   1.6     2   0   NaN     NaN
15  a   1.7     0   1   NaN     NaN
16  a   0.1     0   5   NaN     NaN
17  a   0.2     4   0   NaN     NaN
18  a   0.3     1   1   NaN     NaN
19  a   0.4     3   0   NaN     NaN
20  a   0.5     0   0   NaN     NaN
21  a   0.6     0   2   2.0     NaN
22  a   0.7     6   1   NaN     NaN
23  a   1.1     4   0   NaN     NaN

从这个数据帧中,我想更新ball值,生成2个新列并删除4个整列。条件为

  1. 当";宽";或";noball";为空,crun=crun+跑动+额外直到球=0.1(递归(
  2. 当";宽";或";noball";不为null,并发球值将不会增加,并将在crun计算后删除。crun=crun+run+extra。并且它将继续直到ball=0.1(递归地(。例如,让我分解:从行索引0到8:|0.1〃;宽";或";noball";为空并且crun=1|0.2〃;宽";或";noball";为空并且crun=1+4=5|0.3〃;宽";或";noball";不为空(移除(|0.4〃;宽";或";noball";为零(变为0.3(并且crun=5+1+5+1=12|0.5〃;宽";或";noball";不为空(去除(|0.6〃;宽";或";noball";为零(变为0.4(并且crun=12+1+1+2+1=17|0.7〃;宽";或";noball";不为空(去除(|0.8〃;宽";或";noball";为零(变为0.5(并且crun=17+6+2=25|0.9";宽";或";noball";为空(变为0.6(,crun=25+1+1=27|
  3. 最后"合计";将创建列,该列返回crun的最大值,直到ball=0.1为止(递归(。然后";运行""额外"宽"noball";列

我想要的输出:

venue ball  crun total
0   a   0.1     1   45
1   a   0.2     5   45
2   a   0.3     12  45
3   a   0.4     17  45
4   a   0.5     25  45
5   a   0.6     27  45
6   a   1.1     31  45
7   a   1.2     32  45
8   a   1.3     39  45
9   a   1.4     42  45
10  a   1.5     44  45
11  a   1.6     45  45
12  a   0.1     5   27
13  a   0.2     9   27
14  a   0.3     11  27
15  a   0.4     14  27
16  a   0.5     14  27
17  a   0.6     23  27
18  a   1.1     27  27

我觉得太复杂了,请帮忙。我尝试过的代码:

df = pd.read_csv("data.csv")
gr = df.groupby(df.ball.eq(0.1).cumsum())
df["crun"] = gr.runs.cumsum()
df["total"] = gr.current_run.transform("max")
df = df.drop(['run', 'extra', 'wide', 'noball'], axis=1)

Alrighty。这很有趣。

(为了清楚起见,我试图添加评论。(

注意:";球,"run"额外的,"宽,";以及";noball";都是数字字段。

注释 注意:这一切都假设您的初始DataFrame位于一个名为df的变量下。

# Create target groupings by ball value.
df["target_groups"] = df.loc[df["ball"] == 0.1].groupby(level=-1).ngroup()
df["target_groups"].fillna(method="ffill", inplace=True)

# --- Create subgroups --- #
df["target_subgroups"] = df["ball"].astype(int)

# Add field fro sum of run and extra
df["run_extra"] = df[["run", "extra"]].sum(axis=1)

# Apply groupby() and cumsum() as follows to get the cumulative sum
# of each ball group for run and extra.
df["crun"] = df.groupby(["target_groups"])["run_extra"].cumsum()

# Create dataframe for max crun value of each group
group_max_df = df.groupby(["target_groups"])["crun"].max().to_frame().reset_index()

# Merge both of the DataFrames with the given suffixes.  The first one
# Just prevents crun from having a suffix added, which is an additional
# step to remove.
# You could probably use .join() in a similar manner.
df = pd.merge(df, group_max_df,
on=["target_groups"],
suffixes=("", "_total"),
sort=False
)
# Rename your new total field.
df.rename(columns={"crun_total": "total"}, inplace = True)

# Apply your wide and noball condition here.
df = df[(df["wide"].isna()) & (df["noball"].isna())].copy()

# -- Reset `ball` column -- #
# Add temp column with static value
df["tmp_ball"] = 0.1
# Generate cumulative sum by subgroup.
# Set `ball` to modulo 0.6
df.loc[:, "ball"] = df.groupby(["target_subgroups"])["tmp_ball"].cumsum() % 0.6
# Find rows where ball == 0.0 and set those to 0.6
df.loc[df["ball"] == 0.0, "ball"] = 0.6
# Add ball and target_subgroups columns to get final ball value.
df["ball"]  = df["ball"] + df["target_subgroups"]

# Reset your main index, if desired
df.reset_index(drop=True, inplace=True)

# Select only desired field for output.
df = df.loc[:, ["venue","ball","crun","total"]].copy()

df:输出

venue  ball  crun  total
0      a   0.1     1     45
1      a   0.2     5     45
2      a   0.4    12     45
3      a   0.6    17     45
4      a   0.8    25     45
5      a   0.9    27     45
6      a   1.1    31     45
7      a   1.2    32     45
8      a   1.3    39     45
9      a   1.5    42     45
10     a   1.6    44     45
11     a   1.7    45     45
12     a   0.1     5     27
13     a   0.2     9     27
14     a   0.3    11     27
15     a   0.4    14     27
16     a   0.5    14     27
17     a   0.7    23     27
18     a   1.1    27     27

编辑:

相关内容

  • 没有找到相关文章

最新更新