我需要用正确的名称替换一些国家的名称。下面是我的数据框架
names country
0 1 Austria
1 2 Autrisa
2 3 Egnald
3 4 Sweden
4 5 Swweden
5 6 India
我需要用正确的名称替换上面的国家。下面是我需要的输出
names country
0 1 Austria
1 2 Austria
2 3 England
3 4 Sweden
4 5 Sweden
5 6 India
correct_names = {'Austria','England','Sweden'}
def get_most_similar(word, wordlist):
top_similarity = 0.0
most_similar_word = word
for candidate in wordlist:
similarity = SequenceMatcher(None, word, candidate).ratio()
if similarity > top_similarity:
top_similarity = similarity
most_similar_word = candidate
# print(most_similar_word)
return most_similar_word
data['country'].apply(lambda x: get_most_similar(x,correct_names))
我得到的输出如下:-
0 Austria
1 Austria
2 England
3 Sweden
4 Sweden
5 England -- this should be India but it got converted to England
需要帮助来修复这个
你分配了
correct_names = {'Austria', 'England', 'Sweden'}
但是这并不适合当前的用例,因为印度可以是一个正确的名字,但它确实出现在set
中。
你想分配
correct_names = {'Austria', 'England', 'India', 'Sweden'}