https://pastecord.com/yfabaficup.sql
我在预处理数据时遇到这种类型的错误。尝试搜索整个谷歌和堆栈溢出找不到答案,请帮忙。我用熊猫和gensim来完成这项任务,非常感谢你!
看看这是否有助于举个例子:
import pandas as pd
data={"Text1":["The DNA complexation and condensation properties","Three proteins namely protective antigen PA edition","Lecithin retinol acyltransferase LRAT catalyze"],"ID":range(3)}
df=pd.DataFrame(data)
df["Split_Text"]=df["Text1"].apply(lambda x: simple_preprocess(str(x))) # You can apply this to multiple column if needed.