检查主列表中的数据,并将其与数据框架内的列进行比较



现有数据框架:

Id      status            countries
01      pass        ['xyx','Indonesia','brazil']
02      fail        ['PQ','XT','sri lanka']
03      pass        ['spain', 'india','xtx']

期望数据帧:

Id      status            countries                      filtered_countries_name
01      pass        ['xyx','Indonesia','brazil']           'Indonesia','brazil'
02      fail        ['PQ','XT','sri lanka']                    'sri lanka'
03      pass        ['spain', 'india','xtx']                'spain', 'india'

我确实有特定国家(我想检查的国家)的主列表,从那里我比较国家列中的现有列表。

my approach:

countries_list = ['china', 'india', 'united states', 'indonesia', 'brazil', 'pakistan', 'nigeria', 'bangladesh', 'russia', 'japan', 'mexico', 'philippines', 'vietnam', 'ethiopia', 'egypt', 'germany', 'iran', 'turkey', 'democratic republic of the congo', 'thailand', 'france', 'united kingdom', 'italy', 'burma', 'south africa', 'south korea', 'colombia', 'spain', 'ukraine', 'tanzania', 'kenya', 'argentina', 'algeria', 'poland', 'sudan', 'uganda','Indonesia','brazil','spain','sri lanka']
import re
countries_re = '|'.join(str(v) for v in countries_list )
df['filtered_countries_name'] = df['countries'].str.extractall(countries_re)

,但无法获取相同的错误

TypeError:插入列索引与框架索引不兼容

任何导致. . ? ?

如果您有列表,请使用带有set的列表推导式作为效率参考:

S = set(countries_list)
df['filtered_countries_name'] = [[c for c in l if c.lower() in S]
for l in df['countries']]

输出:

Id status                 countries filtered_countries_name
0   1   pass  [xyx, Indonesia, brazil]     [Indonesia, brazil]
1   2   fail       [PQ, XT, sri lanka]             [sri lanka]
2   3   pass       [spain, india, xtx]          [spain, india]

使用设置的交叉路口:

df = pd.DataFrame({'Id': {0: 1, 1: 2, 2: 3},
'status': {0: 'pass', 1: 'fail', 2: 'pass'},
'countries': {0: ['xyx', 'Indonesia', 'brazil'],
1: ['PQ', 'XT', 'sri lanka'],
2: ['spain', 'india', 'xtx']}})
countries_list = ['china', 'india', 'united states', 'indonesia', 'brazil', 'pakistan', 'nigeria', 'bangladesh', 'russia', 'japan', 'mexico', 'philippines', 'vietnam', 'ethiopia', 'egypt', 'germany', 'iran', 'turkey', 'democratic republic of the congo', 'thailand', 'france', 'united kingdom', 'italy', 'burma', 'south africa', 'south korea', 'colombia', 'spain', 'ukraine', 'tanzania', 'kenya', 'argentina', 'algeria', 'poland', 'sudan', 'uganda','Indonesia','brazil','spain','sri lanka']
df["filtered_names"] = df["countries"].apply(lambda x: list(set(x) & set(countries_list)))
df
# Id  status    countries                   filtered_names
# 0 1   pass    [xyx, Indonesia, brazil]    [Indonesia, brazil]
# 1 2   fail    [PQ, XT, sri lanka]         [sri lanka]
# 2 3   pass    [spain, india, xtx]         [india, spain]

相关内容

  • 没有找到相关文章

最新更新