我想做的是遍历每一行。如果分类是"HR联系人";如果它的数量小于500,那就保留它。否则只保留500。我的代码是:
cntByUserNm['keep #'] = np.nan
cntByUserNm['rest #'] = np.nan
for index, row in cntByUserNm.iterrows():
print(row['Owner Name'], row['source'])
if row['source'] == 'HR':
if row['total number'] <= 500:
row['keep #'] = row['total number']
row['rest #'] = 0
else:
row['keep #'] = 500
row['rest #'] = row['total number'] - 500
但这似乎不起作用,所有的keep #
和rest #
仍然是nan
。如何解决这个问题?
for i in range(0, len(cntByUserNm)):
print(cntByUserNm.iloc[i]['Owner Name'], cntByUserNm.iloc[i]['blizday source'])
if cntByUserNm.iloc[i]['blizday source'] == mainCat:
if cntByUserNm.iloc[i][befCnt] <= destiNum:
cntByUserNm.iloc[i]['keep #'] = cntByUserNm.iloc[i][befCnt]
cntByUserNm.iloc[i]['rest #'] = 0
else:
cntByUserNm.iloc[i]['keep #'] = destiNum
cntByUserNm.iloc[i]['rest #'] = cntByUserNm.iloc[i][befCnt] - destiNum```
您正在更新副本数据框的行,而不是数据框本身。假设您的行索引是连续的(从0到len(dataframe)),您可以使用.loc
直接在数据框架上进行修改。
for index, row in cntByUserNm.iterrows():
print(row['Owner Name'], row['source'])
if row['source'] == 'HR':
if row['total number'] <= 500:
cntByUserNm.loc[index, 'keep #'] = row['total number']
cntByUserNm.loc[index, 'rest #'] = 0
else:
cntByUserNm.loc[index, 'keep #'] = 500
cntByUserNm.loc[index, 'rest #'] = row['total number'] - 500
如果索引不连续,可以得到keep #
和rest #
的列整数位置,并使用.iloc
keep_idx = df.columns.get_loc('keep #')
rest_idx = df.columns.get_loc('rest #')
for index, row in cntByUserNm.iterrows():
print(row['Owner Name'], row['source'])
if row['source'] == 'HR':
if row['total number'] <= 500:
cntByUserNm.iloc[index, keep_idx] = row['total number']
cntByUserNm.iloc[index, rest_idx] = 0
else:
cntByUserNm.iloc[index, keep_idx] = 500
cntByUserNm.iloc[index, rest_idx] = row['total number'] - 500
在熊猫中处理矢量更快。所以我建议:
cntByUserNm['keep #'] = np.nan
cntByUserNm['rest #'] = np.nan
mask = (cntByUserNm.loc[:, 'source'] == 'HR') & (cntByUserNm.loc[:, 'total number'] <= 500)
cntByUserNm.loc[mask, 'keep #'] = cntByUserNm.loc[mask, 'total number']
cntByUserNm.loc[mask, 'rest #'] = 0
cntByUserNm.loc[~mask, 'keep #'] = 500
cntByUserNm.loc[~mask, 'rest #'] = cntByUserNm.loc[~mask, 'total number'] - 500
答案:
keep_idx = df.columns.get_loc('keep #')
rest_idx = df.columns.get_loc('rest #')
for index, row in cntByUserNm.iterrows():
print(row['Owner Name'], row['source'])
if row['source'] == 'HR':
if row['total number'] <= 500:
cntByUserNm.iloc[index, keep_idx] = row['total number']
cntByUserNm.iloc[index, rest_idx] = 0
else:
cntByUserNm.iloc[index, keep_idx] = 500
cntByUserNm.iloc[index, rest_idx] = row['total number'] - 500