我正试图为列名中的每个单词获取一个同义词列表。但是,当我运行wordnet.synsets((时,它将只处理带有一个单词的列名。我如何在多个单词上运行它,并像下面我想要的输出一样输出它?还有没有办法只显示前4个结果以获得更好的可读性?
代码
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet
import pandas as pd
df = ['Unnamed 0',
'business id',
'name',
'postal code',
]
syns = {w : [] for w in df}
for k, v in syns.items():
for synset in wordnet.synsets(k):
for lemma in synset.lemmas():
if lemma.name() not in syns:
v.append(lemma.name())
pd.DataFrame([syns], columns = syns.keys())
电流输出:
Unnamed 0 business id name postal code
[] [] [gens, figure, public_figure, epithet, call, i... []
期望输出:
Unnamed 0 business id name postal code
Unnamed[definitions], business[definitions], [gens, figure, public_figure] postal[definitions],
0[definitions] id[definitions] code[definitions]
更简单、可用
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet
import nltk
import pandas as pd
df = ['Unnamed 0',
'business id',
'name',
'postal code',
]
df = pd.DataFrame(
{tuple([k, t]):pd.Series(np.unique([l.name()
for s in wordnet.synsets(t)
for l in s.lemmas() if "_" not in l.name()])).to_dict()
for k in df
for t in nltk.word_tokenize(k)
}).fillna("")
df.columns.set_names(["sentance","word"],inplace = True)
df.loc[:4] # just first 5 matches...
只需将列表/dict理解更改为meet Panda格式{"colA":[1,2], "colB":[3,4]}
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet
import nltk
import pandas as pd
df = ['Unnamed 0',
'business id',
'name',
'postal code',
]
mr = max([len(k.split(" ")) for k in df])
pd.DataFrame(
# column for each requesed space delimited request
# use f-string to format as requested....
{k:[f"{v}:{np.unique([l.name() for s in wordnet.synsets(v) for l in s.lemmas() ]).tolist()}"
# need to pad request with fewer tokend to meet pandas required format
for v in f"{k}{(mr-len(k.split(' ')))*' '}".split(" ")]
for k in df}).replace({":[]":""})
输出
Unnamed 0 business id name postal code
0 Unnamed:['nameless', 'unidentified', 'unknown'... business:['business', 'business_concern', 'bus... name:['advert', 'appoint', 'bring_up', 'call',... postal:['postal']
1 0:['0', 'cipher', 'cypher', 'nought', 'zero'] id:['Gem_State', 'I.D.', 'ID', 'Idaho', 'id'] code:['cipher', 'code', 'codification', 'compu...