import pandas as pd
dict1 = {'id_game': [112, 113, 114], 'game_name' : ['x','z','y'],'id_category':[1,2,3], 'id_players':[[588,589,590],[589],[588,589]]}
dict2 = {'id_player': [588, 589, 590],'player_name' : ['fff','aaa','ccc'] ,'indication':['mmm x ggg sdg y', 'uuu x fdb y kfnkjq z', 'fffre x']}
game_df = pd.DataFrame(dict1)
player_df = pd.DataFrame(dict2)
这是我的数据示例,我正在寻找一种解决方案,根据game_df['id_players']
和player_df['id_player']
或game_df['game_name']
和drug_df['indication']
之间的关系,在第二个数据帧game_df
中获得包含categories_id的列
在以下脚本中,我使用了game_name
和indication
值:
new_list = []
for i in range(len(game_df)):
for j in range(len(player_df)):
if game_df['game_name'][i] in player_df['indication'][j]:
new_list.append(game_df['id_category'][i])
print(new_list)
player_df['categories_id'] = new_list
错误:
--> 747 raise ValueError(
748 "Length of values "
749 f"({len(data)}) "
ValueError: Length of values (6) does not match length of index (3)
您的代码可以通过在print(new_list)
之后添加break
来修复相同的压痕。
...
if game_df['game_name'][i] in player_df['indication'][j]:
new_list.append(game_df['id_category'][i])
print(new_list)
break
也就是说,对数据帧进行迭代是非常不鼓励的,因为它很慢,而且很快就会变得笨拙。解决此类问题的规范方法是merge
id_player(s)
上的数据帧,即将id_players
中的id分解为单独的行,
>>> game_df = game_df.explode("id_players").rename(columns={"id_players": "id_player"})
>>> game_df
id_game game_name id_category id_player
0 112 x 1 588
0 112 x 1 589
0 112 x 1 590
1 113 z 2 589
2 114 y 3 588
2 114 y 3 589
所以你可以用game_df
、来.merge
>>> df = game_df.merge(player_df, on="id_player")
>>> df
id_game game_name id_category id_player player_name indication
0 112 x 1 588 fff mmm x ggg sdg y
1 114 y 3 588 fff mmm x ggg sdg y
2 112 x 1 589 aaa uuu x fdb y kfnkjq z
3 113 z 2 589 aaa uuu x fdb y kfnkjq z
4 114 y 3 589 aaa uuu x fdb y kfnkjq z
5 112 x 1 590 ccc fffre x
这将使分析变得相当简单,比如检查game_name
是否在indication
中是否成为
df.apply(lambda row: row.game_name in row.indication, axis=1)
尽管它对所有这些都返回True,所以我不确定这是否真的是你想要的。