我正在努力解决这个问题。非常感谢您的帮助。
注意:文本中的bold是指我需要创建的列。
我有一个数据集,我在其中计算与nan不同的行的值,它在列[count]中表示。在[incl_count]列中,我希望有一个列表来标识对计数有贡献的列的标题。接下来,我希望有一个限制[lim]列,其中不能有超过3个计数。上限最多为3。这意味着最后到达计数的列不能被考虑,因此被排除在外,即保存在列[excl]中的排除
[index] [A] [B] [C] [D] [E] [F] [count] [incl_count] [lim] [excl]
...
...
...
2020-01-01 nan nan nan nan nan nan 0 [] 0 []
2020-01-02 -0.01 nan nan nan nan nan 1 [A] 1 []
2020-01-03 0.02 nan nan nan nan nan 1 [A] 1 []
2020-01-04 -0.01 0.01 nan nan nan nan 2 [A,B] 2 []
2020-01-05 -0.02 -0.04 0.02 nan nan nan 3 [A,B,C] 3 []
2020-01-06 nan 0.02 0.03 0.02 0.01 nan 4 [B,C,D,E] 3 [E]
2020-01-07 nan -0.02 0.01 -0.01 0.03 0.01 5 [B,C,D,E,F] 3 [E,F]
2020-01-08 nan nan -0.02 0.05 -0.05 0.02 4 [C,D,E,F] 2 [E,F]
2020-01-09 nan nan nan 0.02 0.02 0.05 3 [D,E,F] 1 [E,F]
2020-01-10 nan nan nan nan nan 0.01 1 [F] 0 [F]
...
...
...
这应该有效:
import pandas as pd
import numpy as np
non_value_columns = ["index", "incl_count", "excl", "lim", "count"]
max_lim = 3
entries = []
df = pd.read_excel('your.xlsx')
for entry in df:
if entry not in non_value_columns:
print(entry)
entries.append(entry)
indexes = df['index'].tolist()
i = 0
cur_excludes = []
for index in indexes:
c = 0
incl = []
excl = []
for entry in entries:
if not np.isnan(df[entry].tolist()[i]):
incl.append(entry)
c += 1
if max_lim < c or entry in cur_excludes:
c -= 1
excl.append(entry)
cur_excludes.append(entry)
df.loc[i, 'lim'] = str(c)
df.loc[i, 'incl_count'] = str(incl)
df.loc[i, 'excl'] = str(excl)
i += 1
df.to_excel('output.xlsx')
编辑:更改了代码,这样它就可以在所有不同的列中循环。制作了一个数组,可以在其中声明非值列,如果添加了不想检查是否基于名称的列,请确保对其进行扩展,因此只需添加列的名称即可。还制作了一个变量,您可以在其中说明您的限制。如果出了什么问题,希望这个作品能告诉我!