我有一个DataFrame,它由用户名组成,比如-df
import pandas as pd
data = [['Harry Potter'],['Ron weasley'],['Hermione Granger'],['Rubeus Hagrid'],['Dobby'],['Draco Malfoy']]
df = pd.DataFrame(data, columns = ['names'])
df
names
0 Harry Potter
1 Ron weasley
2 Hermione Granger
3 Rubeus Hagrid
4 Dobby
5 Draco Malfoy
我想要什么-1(拆分单词2(字母顺序排列3(用它们制作所有可能的单词,并应按字母顺序安排(考虑单词的第一个字母(
所以它应该是类似的东西
names alphabets words
0 Harry Potter aehoprrrty Ate, Hat, Heart, Party, Pot, Prey, Toy
1 Ron weasley aeelnorswy Lean, New, Rose, Worse, Won
如果可能的话,请帮助我在更少的行中获得所需的结果
谢谢!!!
一个可能的解决方案,但确实不容易解析所有可能的英语单词,并具有良好的性能-一个想法是使用set
s,但可以分配多个字母:
import nltk
english_vocab = {w.lower():
set(w.lower()) for w in nltk.corpus.words.words() if len(w) > 2}
f1 = lambda x: ''.join(sorted(y.lower() for y in x if y != ' '))
df['alphabets'] = df['names'].apply(f1)
f2 = lambda x: sorted([k for k, v in english_vocab.items() if v <= x])
df['new'] = df['alphabets'].apply(set).apply(f2)
print (df)
names alphabets
0 Harry Potter aehoprrrtty
1 Ron weasley aeelnorswy
2 Hermione Granger aeeegghimnnorrr
3 Rubeus Hagrid abdeghirrsuu
4 Dobby bbdoy
5 Draco Malfoy aacdflmoory
new
0 [aer, aerate, aerator, aero, aeropathy, aerope...
1 [aal, aaron, aeaean, aenean, aeon, aer, aero, ...
2 [aam, aani, aaron, aeaean, aegean, aegerian, a...
3 [aaru, aba, ababdeh, ababua, abaiser, abaissed...
4 [bob, bobby, bobo, bod, bodo, body, boo, boob,...
5 [aal, aam, acalycal, acamar, acara, acarol, ac...