我有一个文本
original = '3200 NORTHLINE AVE STE 360 GREENSBORO 27408-7611 US NC'
我需要找到US并把它放在末尾:像这样
lookup= 'US'
result = original.replace(lookup,"") + " " + str(lookup)
输出:3200 NORTHLINE AVE STE 360GREENSBORO27408-7611 NC US
但是,如果我将查找作为如下列表:
lookup = ['US','CA','INDIA','CHINA']
和多个输入作为如下列表:
input = ['3200 NORTHLINE AVE STE 360 GREENSBORO 27408-7611 US NC','200 LINE AVE STE 360 GBORO 77611 CA NC','60 Indiranagar INDIA Bangalore']
我需要为每个输入列表找到国家,并将其放在每个列表的字符串末尾。
我试了很多方法,但都不行。非常感谢您的帮助。
感谢您可以从将字符串分解为多个标记开始:
>>> address = '3200 NORTHLINE AVE STE 360 GREENSBORO 27408-7611 US NC'
>>> tokens = address.split()
>>> tokens
['3200', 'NORTHLINE', 'AVE', 'STE', '360', 'GREENSBORO', '27408-7611', 'US', 'NC']
如果您想在末尾放置国家代码,您可以使用key
排序值对令牌列表进行排序。例子:
>>> sorted_tokens = sorted(tokens, key=lambda token: token == "US")
>>> sorted_tokens
['3200', 'NORTHLINE', 'AVE', 'STE', '360', 'GREENSBORO', '27408-7611', 'NC', 'US']
你可以使用join
>>> sorted_address = ' '.join(sorted_tokens)
>>> sorted_address
3200 NORTHLINE AVE STE 360 GREENSBORO 27408-7611 NC US
要使此工作适用于各种国家代码,您只需要稍微修改key
lookup = ['US','CA','INDIA','CHINA']
sorted_tokens = sorted(tokens, key=lambda token: token in lookup)
把这些放在一起:
lookup = ['US','CA','INDIA','CHINA']
adresses = [
'3200 NORTHLINE AVE STE 360 GREENSBORO 27408-7611 US NC',
'200 LINE AVE STE 360 GBORO 77611 CA NC',
'60 Indiranagar INDIA Bangalore',
]
def sort_address(address):
tokens = address.split()
sorted_tokens = sorted(tokens, key=lambda token: token in lookup)
return ' '.join(sorted_tokens)
for address in adresses:
print(sort_address(address))
3200 NORTHLINE AVE STE 360 GREENSBORO 27408-7611 NC US
200 LINE AVE STE 360 GBORO 77611 NC CA
60 Indiranagar Bangalore INDIA
我们可以使用如下的正则表达式替换方法:
lookup = ['US', 'CA', 'INDIA', 'CHINA']
regex = r's+(' + r'|'.join(lookup) + r')(.*)'
df["address"] = df["address"].str.replace(regex, r'2 1', regex=True)
假设您使用的是常规的Python列表,而不是数据帧,我们可以尝试类似的方法:
lookup = ['US', 'CA', 'INDIA', 'CHINA']
regex = r's+(' + r'|'.join(lookup) + r')(.*)'
inp = ['3200 NORTHLINE AVE STE 360 GREENSBORO 27408-7611 US NC','200 LINE AVE STE 360 GBORO 77611 CA NC','60 Indiranagar INDIA Bangalore']
output = [re.sub(regex, r'2 1', x) for x in inp]
print(output)
这个打印:
['3200 NORTHLINE AVE STE 360 GREENSBORO 27408-7611 NC US',
'200 LINE AVE STE 360 GBORO 77611 NC CA',
'60 Indiranagar Bangalore INDIA']