Python 循环遍历行,然后计算不炒



我想做的是遍历每一行。如果分类是"HR联系人";如果它的数量小于500,那就保留它。否则只保留500。我的代码是:

cntByUserNm['keep #'] = np.nan
cntByUserNm['rest #'] = np.nan
for index, row in cntByUserNm.iterrows():
print(row['Owner Name'], row['source'])
if row['source'] == 'HR':
if row['total number'] <= 500:
row['keep #'] = row['total number']
row['rest #'] = 0
else:
row['keep #'] = 500
row['rest #'] = row['total number'] - 500

但这似乎不起作用,所有的keep #rest #仍然是nan。如何解决这个问题?

for i in range(0, len(cntByUserNm)):
print(cntByUserNm.iloc[i]['Owner Name'], cntByUserNm.iloc[i]['blizday source'])
if cntByUserNm.iloc[i]['blizday source'] == mainCat:
if cntByUserNm.iloc[i][befCnt] <= destiNum:
cntByUserNm.iloc[i]['keep #'] = cntByUserNm.iloc[i][befCnt]
cntByUserNm.iloc[i]['rest #'] = 0
else:
cntByUserNm.iloc[i]['keep #'] = destiNum
cntByUserNm.iloc[i]['rest #'] = cntByUserNm.iloc[i][befCnt] - destiNum``` 

您正在更新副本数据框的行,而不是数据框本身。假设您的行索引是连续的(从0到len(dataframe)),您可以使用.loc直接在数据框架上进行修改。

for index, row in cntByUserNm.iterrows():
print(row['Owner Name'], row['source'])
if row['source'] == 'HR':
if row['total number'] <= 500:
cntByUserNm.loc[index, 'keep #'] = row['total number']
cntByUserNm.loc[index, 'rest #'] = 0
else:
cntByUserNm.loc[index, 'keep #'] = 500
cntByUserNm.loc[index, 'rest #'] = row['total number'] - 500

如果索引不连续,可以得到keep #rest #的列整数位置,并使用.iloc

keep_idx = df.columns.get_loc('keep #')
rest_idx = df.columns.get_loc('rest #')
for index, row in cntByUserNm.iterrows():
print(row['Owner Name'], row['source'])
if row['source'] == 'HR':
if row['total number'] <= 500:
cntByUserNm.iloc[index, keep_idx] = row['total number']
cntByUserNm.iloc[index, rest_idx] = 0
else:
cntByUserNm.iloc[index, keep_idx] = 500
cntByUserNm.iloc[index, rest_idx] = row['total number'] - 500

在熊猫中处理矢量更快。所以我建议:

cntByUserNm['keep #'] = np.nan
cntByUserNm['rest #'] = np.nan
mask = (cntByUserNm.loc[:, 'source'] == 'HR') & (cntByUserNm.loc[:, 'total number'] <= 500)
cntByUserNm.loc[mask, 'keep #'] = cntByUserNm.loc[mask, 'total number']
cntByUserNm.loc[mask, 'rest #'] = 0
cntByUserNm.loc[~mask, 'keep #'] = 500
cntByUserNm.loc[~mask, 'rest #'] = cntByUserNm.loc[~mask, 'total number'] - 500

答案:

keep_idx = df.columns.get_loc('keep #')
rest_idx = df.columns.get_loc('rest #')
for index, row in cntByUserNm.iterrows():
print(row['Owner Name'], row['source'])
if row['source'] == 'HR':
if row['total number'] <= 500:
cntByUserNm.iloc[index, keep_idx] = row['total number']
cntByUserNm.iloc[index, rest_idx] = 0
else:
cntByUserNm.iloc[index, keep_idx] = 500
cntByUserNm.iloc[index, rest_idx] = row['total number'] - 500

最新更新