TypeError:使用模糊匹配的Pandas上应为类似对象的字符串或字节



背景

我有一个df

import pandas as pd
import nltk
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
df= pd.DataFrame({'ID': [1,2,3], 
'Text':['This num dogs and cats is (111)888-8780 and other',
'dont block cow 23 here',
'cat two num: dog  and cows here']    
})

我还有一个列表

word_list = ['dog', 'cat', 'cow']

以及在df的Text列上与word_list进行模糊匹配的函数

def fuzzy(row, word_list):

tweet = row[0]
fuzzy_match = []
for word in word_list:

token_words = nltk.word_tokenize(tweet)

for token in range(0, len(token_words) - 1):

fuzzy_fx = process.extract(word_list[word], token_words[token], limit=100, scorer = fuzz.ratio)
fuzzy_match.append(fuzzy_fx[0])
return pd.Series([fuzzy_match], index = ['Fuzzy_Match'])

然后我加入

df_fuzz = df.join(df.apply(lambda x: fuzzy(x, word_list), axis = 1))

但我收到一个错误

TypeError: expected string or bytes-like object

所需输出我所需的输出是1(具有fuzzy函数输出的新列Fuzzy_Match

ID  Text                                                 Fuzzy_Match
0   1   This num dogs and cats is (111)888-8780 and other   output of fuzzy 1
1   2   dont block cow 23 here                              output of fuzzy 2
2   3   cat two num: dog and cows here                      output of fuzzy 3

问题我需要做什么才能获得所需的输出?

这应该有效:

In [32]: def fuzzy(row, word_list):
...:     tweet = row[1]
...:     fuzzy_match = []
...:     token_words = nltk.word_tokenize(tweet) 
...:     for word in word_list:
...: 
...:         fuzzy_fx = process.extract(word, token_words, limit=100, scorer = fuzz.ratio)
...:         fuzzy_match.append(fuzzy_fx[0])
...: 
...:     return pd.Series([fuzzy_match], index = ['Fuzzy_Match'])
df_fuzz = df.join(df.apply(lambda x: fuzzy(x, word_list), axis = 1))

process.extract()需要一个列表作为第二个参数。你可以在这里阅读更多关于它的信息。python fuzzywuzzy';s process.textract((:它是如何工作的?

最新更新