为pandas中的类别分配随机值



我有一个df

Name        Week
Google      1
Google      1
Amazon      1
Tesla       1
Tesla       1
Google      2
Google      2
Tesla       2
Tesla       2
Uber        3
Uber        3

我正在尝试创建一个新的列value,这将是xy之间的随机整数,用于NameWeek的组合,如下所示:

Name        Week        Value
Google      1           100
Google      1           100
Amazon      1           150
Tesla       1           170
Tesla       1           170
Google      2           250
Google      2           250
Tesla       2           157
Tesla       2           157
Uber        3           500
Uber        3           500

对于NameWeek的组合赋值相同。

我试着:

def random_group_int(df_):

week = df_.week_no
supplier = df_.sm_supp_name
combinations = list(itertools.combinations(df.Week.unique(), df.Name.unique()))
rand_values_dict_by_combination = {combination: np.random.randint(100,200) for combination in combinations}
# return value by the combination on the line
# don't know how to do that

我觉得这不是最好的方法。我也试过:

df_rand = df.groupby(['Name','Week']).count()
df_rand['Value'] = df_rand['Week'].apply(lambda x : np.random.randint(100,200))
df_rand.reset_index(inplace = True)
df.merge(df_rand[['Value', 'Name', 'Week']], left_on = ['Name', 'Week'], right_on = ['Name', 'Week'], how = 'left')

这确实有效,但我不确定这是否是我应该使用的方法。

您可以使用GroupBy.transform并在转换中生成一个随机值:

import random
x, y = 100, 200
df['Value'] = (df.groupby(['Name', 'Week'])['Name'] # the column doesn't matter
.transform(lambda _: random.randint(x, y))
)

示例输出:

Name  Week  Value
0   Google     1    153
1   Google     1    153
2   Amazon     1    196
3    Tesla     1    198
4    Tesla     1    198
5   Google     2    122
6   Google     2    122
7    Tesla     2    180
8    Tesla     2    180
9     Uber     3    106
10    Uber     3    106

这应该可以满足您的需求

s = df.drop_duplicates()
s['random_int'] = np.random.randint(0,100,size=(len(s), 1))
df_merge = pd.merge(df, s, how = 'left')
df_merge

最新更新