我有一个df
Name Week
Google 1
Google 1
Amazon 1
Tesla 1
Tesla 1
Google 2
Google 2
Tesla 2
Tesla 2
Uber 3
Uber 3
我正在尝试创建一个新的列value
,这将是x
和y
之间的随机整数,用于Name
和Week
的组合,如下所示:
Name Week Value
Google 1 100
Google 1 100
Amazon 1 150
Tesla 1 170
Tesla 1 170
Google 2 250
Google 2 250
Tesla 2 157
Tesla 2 157
Uber 3 500
Uber 3 500
对于Name
和Week
的组合赋值相同。
我试着:
def random_group_int(df_):
week = df_.week_no
supplier = df_.sm_supp_name
combinations = list(itertools.combinations(df.Week.unique(), df.Name.unique()))
rand_values_dict_by_combination = {combination: np.random.randint(100,200) for combination in combinations}
# return value by the combination on the line
# don't know how to do that
我觉得这不是最好的方法。我也试过:
df_rand = df.groupby(['Name','Week']).count()
df_rand['Value'] = df_rand['Week'].apply(lambda x : np.random.randint(100,200))
df_rand.reset_index(inplace = True)
df.merge(df_rand[['Value', 'Name', 'Week']], left_on = ['Name', 'Week'], right_on = ['Name', 'Week'], how = 'left')
这确实有效,但我不确定这是否是我应该使用的方法。
您可以使用GroupBy.transform
并在转换中生成一个随机值:
import random
x, y = 100, 200
df['Value'] = (df.groupby(['Name', 'Week'])['Name'] # the column doesn't matter
.transform(lambda _: random.randint(x, y))
)
示例输出:
Name Week Value
0 Google 1 153
1 Google 1 153
2 Amazon 1 196
3 Tesla 1 198
4 Tesla 1 198
5 Google 2 122
6 Google 2 122
7 Tesla 2 180
8 Tesla 2 180
9 Uber 3 106
10 Uber 3 106
这应该可以满足您的需求
s = df.drop_duplicates()
s['random_int'] = np.random.randint(0,100,size=(len(s), 1))
df_merge = pd.merge(df, s, how = 'left')
df_merge