拆分用户名,按字母顺序排列,并用它们在panda中创建所有可能的单词



我有一个DataFrame,它由用户名组成,比如-df

import pandas as pd
data = [['Harry Potter'],['Ron weasley'],['Hermione Granger'],['Rubeus Hagrid'],['Dobby'],['Draco Malfoy']]
df = pd.DataFrame(data, columns = ['names'])
df
names
0   Harry Potter
1   Ron weasley
2   Hermione Granger
3   Rubeus Hagrid
4   Dobby
5   Draco Malfoy

我想要什么-1(拆分单词2(字母顺序排列3(用它们制作所有可能的单词,并应按字母顺序安排(考虑单词的第一个字母(

所以它应该是类似的东西

names           alphabets  words
0   Harry Potter    aehoprrrty  Ate, Hat, Heart, Party, Pot, Prey, Toy
1   Ron weasley     aeelnorswy  Lean, New, Rose, Worse, Won

如果可能的话,请帮助我在更少的行中获得所需的结果

谢谢!!!

一个可能的解决方案,但确实不容易解析所有可能的英语单词,并具有良好的性能-一个想法是使用sets,但可以分配多个字母:

import nltk
english_vocab = {w.lower(): 
set(w.lower()) for w in nltk.corpus.words.words() if len(w) > 2}

f1 = lambda x: ''.join(sorted(y.lower() for y in x if y != ' '))
df['alphabets'] = df['names'].apply(f1)

f2 = lambda x: sorted([k for k, v in english_vocab.items() if v <= x])
df['new'] = df['alphabets'].apply(set).apply(f2)
print (df)
names        alphabets  
0      Harry Potter      aehoprrrtty   
1       Ron weasley       aeelnorswy   
2  Hermione Granger  aeeegghimnnorrr   
3     Rubeus Hagrid     abdeghirrsuu   
4             Dobby            bbdoy   
5      Draco Malfoy      aacdflmoory   
new  
0  [aer, aerate, aerator, aero, aeropathy, aerope...  
1  [aal, aaron, aeaean, aenean, aeon, aer, aero, ...  
2  [aam, aani, aaron, aeaean, aegean, aegerian, a...  
3  [aaru, aba, ababdeh, ababua, abaiser, abaissed...  
4  [bob, bobby, bobo, bod, bodo, body, boo, boob,...  
5  [aal, aam, acalycal, acamar, acara, acarol, ac...  

最新更新