在Python中使用numpy.where或numpy.select搜索并分配多个字符串



我正在尝试进行条件字符串赋值-如果单元格包含位置,请将地理名称分配到它旁边的单元格中。我尝试了np.wherenp.select,它们倾向于单值赋值,而不是多值赋值。有什么建议我可以通过Numpy完成,或者有更简单的方法吗?

Europe = ['London', 'Paris', 'Berlin']
North_America = ['New York', 'Toroto', 'Boston']
Asia = ['Hong Kong', 'Tokyo', 'Singapore']
data = {'location':["London, Paris", "Hong Kong", "London, New York", "Singapore, Toroto", "Boston"]}
df = pd.DataFrame(data)
location
0      London, Paris
1          Hong Kong
2   London, New York
3  Singapore, Toroto
4             Boston
# np.where approach
df['geo'] = np.where(( ( (df['location'].isin(Europe) ) ) | ( (df['location'].isin(North_America) ) ) ), 'Europe', 'North America')
# np.select approach
conditions = [
df['location'].isin(Europe),
df['location'].isin(North_America)
]
choices = ['Europe', 'North America']
df['geo'] = np.select(conditions, choices, default=0)

预期输出:

location                    geo
0      London, Paris         Europe, Europe
1          Hong Kong                   Asia
2   London, New York  Europe, North America
3  Singapore, Toroto    Asia, North America
4             Boston          North America

创建每个国家的映射->区域,然后使用explodemap应用映射,最后使用groupbyapply重建列表:

geo = {'Europe': Europe, 'North_America': North_America, 'Asia': Asia}
mapping = {country: area for area, countries in geo.items() for country in countries}
df['geo'] = df['location'].str.split(', ').explode().map(mapping) 
.groupby(level=0).apply(', '.join)

输出:

>>> df
location                    geo
0      London, Paris         Europe, Europe
1          Hong Kong                   Asia
2   London, New York  Europe, North_America
3  Singapore, Toroto    Asia, North_America
4             Boston          North_America

通过将NumPy库与pythonfor循环一起使用,我们可以获得结果。首先,我们将国家城市列表合并在一起,然后创建另一个名为大陆的列表,其长度与创建的城市列表相同:

import numpy as np
import pandas as pd
continents = ["Europe"] * len(Europe) + ["North_America"] * len(North_America) + ["Asia"] * len(Asia)
countries = Europe + North_America + Asia
locations = data['location']

然后,对于每个城市,即使是组合中的每个城市,我们也会在创建的国家列表中找到其索引。然后,我们为每个组合中的逗号数量创建一个列表,用于创建所需的逗号输出:

corsp = []
comma_nums = []
for i in locations:
for j, k in enumerate(i.split(', ')):
corsp.append(np.where(np.array(countries) == k)[0][0])
comma_nums.append(j)

大陆列表将根据创建的索引列表进行重新排序和修改。然后,它的参数以列表格式组合为组合样式,其中在位置,然后列表转换为输出所需的字符串:

reordered_continents = [continents[i] for i in corsp]
mod_continents = []
iter = 0
f = 1
for i in comma_nums:
mod_continents.append(reordered_continents[iter:i + f])
iter = i + f
f = iter + 1
for i, j in enumerate(mod_continents):
if len(j) > 1:
for k in j:
mod_continents[i] = ', '.join(j)
else:
mod_continents[i] = ''.join(j)
df['geo'] = mod_continents

最新更新