如何在数据帧中创建列,列中的标题列表受条件影响,应用上限,然后排除不符合条件的标题



我正在努力解决这个问题。非常感谢您的帮助。

注意:文本中的bold是指我需要创建的列。

我有一个数据集,我在其中计算与nan不同的行的值,它在列[count]中表示。在[incl_count]列中,我希望有一个列表来标识对计数有贡献的列的标题。接下来,我希望有一个限制[lim]列,其中不能有超过3个计数。上限最多为3。这意味着最后到达计数的列不能被考虑,因此被排除在外,即保存在列[excl]中的排除

[index]     [A]   [B]   [C]    [D]    [E]    [F]  [count] [incl_count]    [lim]  [excl]
...
...
...
2020-01-01  nan    nan   nan   nan    nan    nan     0      []             0       []
2020-01-02 -0.01   nan   nan   nan    nan    nan     1      [A]            1       []
2020-01-03  0.02   nan   nan   nan    nan    nan     1      [A]            1       []
2020-01-04 -0.01   0.01  nan   nan    nan    nan     2      [A,B]          2       []
2020-01-05 -0.02  -0.04  0.02  nan    nan    nan     3      [A,B,C]        3       []
2020-01-06  nan    0.02  0.03  0.02   0.01   nan     4      [B,C,D,E]      3       [E]
2020-01-07  nan   -0.02  0.01  -0.01  0.03   0.01    5      [B,C,D,E,F]    3       [E,F]
2020-01-08  nan    nan  -0.02  0.05   -0.05  0.02    4      [C,D,E,F]      2       [E,F]
2020-01-09  nan    nan   nan   0.02   0.02   0.05    3      [D,E,F]        1       [E,F]
2020-01-10  nan    nan   nan    nan   nan    0.01    1      [F]            0       [F]
...
...
...

这应该有效:

import pandas as pd
import numpy as np
non_value_columns = ["index", "incl_count", "excl", "lim", "count"]
max_lim = 3
entries = []
df = pd.read_excel('your.xlsx')
for entry in df:
if entry not in non_value_columns:
print(entry)
entries.append(entry)
indexes = df['index'].tolist()
i = 0
cur_excludes = []
for index in indexes:
c = 0
incl = []
excl = []
for entry in entries:
if not np.isnan(df[entry].tolist()[i]):
incl.append(entry)
c += 1
if max_lim < c or entry in cur_excludes:
c -= 1
excl.append(entry)
cur_excludes.append(entry)
df.loc[i, 'lim'] = str(c)
df.loc[i, 'incl_count'] = str(incl)
df.loc[i, 'excl'] = str(excl)
i += 1
df.to_excel('output.xlsx')

编辑:更改了代码,这样它就可以在所有不同的列中循环。制作了一个数组,可以在其中声明非值列,如果添加了不想检查是否基于名称的列,请确保对其进行扩展,因此只需添加列的名称即可。还制作了一个变量,您可以在其中说明您的限制。如果出了什么问题,希望这个作品能告诉我!

最新更新