如何根据另一列查找一行的值

我有一个这样的数据集：

user-id     time    location   msg  path
  1           1         1       1    0
  2           1         1       2    1000
  3           1         2       3    1
  4           1         2       0    0
  5           1         3       0    0
  1           2         2       2    0
  2           2         1       1    1
  3           2         1       1    1
  4           2         0       0    0
  5           2         0       0    0
  1           3         1       3    0
  2           3         3       1    0

我想根据最大消息数查找路径，其中两条记录的时间和位置相同。

time_locs = pd.unique(df['time_loc'])
for time_loc in time_locs:
   dc_group = df[df['time_loc'] == time_loc]
   if(len(dc_group) > 1):
        max_num_msg = max(dc_group['msgs'])

所以我将时间和地点压缩为time_loc，并找到了最大数量的消息。现在如何找到该行的路径？

例如，在这种情况下，我的第一个 dc 组是这两行：

user-id     time    location   msg  path
  1           1         1       1    0
  2           1         1       2    1000

我想找到 1000。

我尝试了这段代码，但它不起作用。

user_group = df.loc[max(dc_group['msgs']), 'path']

因为它正在搜索所有DF。并且.loc不适用于dc_group，这意味着此代码面临错误：

user_group = dc_group.loc[max(dc_group['msgs']), 'path']

您肯定希望在此处使用非循环方法。可以使用 .argmax 获取最大值的索引，而不是值本身。像这样：

In [15]: df
Out[15]:
    user-id  time  location  msg  path
0         1     1         1    1     0
1         2     1         1    1     0
2         3     1         2    0     0
3         4     1         2    0     0
4         5     1         3    0     0
5         1     2         2    2     0
6         2     2         1    1     0
7         3     2         1    1     0
8         4     2         0    0     0
9         5     2         0    0     0
10        1     3         1    3     0
11        2     3         3    1     0
In [16]: df.loc[df.time == df.location, 'msg'].argmax()
Out[16]: 5
In [17]: max_idx = df.loc[df.time == df.location, 'msg'].argmax()
In [18]: df.loc[max_idx]
Out[18]:
user-id     1
time        2
location    2
msg         2
path        0
Name: 5, dtype: int64
In [19]: df.loc[max_idx, 'path']
Out[19]: 0

如果你想要所有的行，那么只需使用布尔索引：

In [25]: df.loc[df.time == df.location]
Out[25]:
    user-id  time  location  msg  path
0         1     1         1    1     0
1         2     1         1    1     0
5         1     2         2    2     0
11        2     3         3    1     0

或者.query如果您愿意：

In [26]: df.query('time == location')
Out[26]:
    user-id  time  location  msg  path
0         1     1         1    1     0
1         2     1         1    1     0
5         1     2         2    2     0
11        2     3         3    1     0

相关内容

最新更新

热门标签：