为什么在for循环中将单词从复数转换为单数需要这么长时间(Python 3)



这是我从CSV文件中读取文本并将其中一列中的所有单词从复数形式转换为单数形式的代码:

import pandas as pd
from textblob import TextBlob as tb
data = pd.read_csv(r'pathtodata.csv')
for i in range(len(data)):
blob = tb(data['word'][i])
singular = blob.words.singularize()  # This makes singular a list
data['word'][i] = ''.join(singular)  # Converting the list back to a string

但这个代码已经运行了几分钟了(如果我不停止它,可能还会持续运行几个小时?(!为什么?当我单独检查几个单词时,转换会立即发生——根本不需要任何时间。文件中只有1060行(要转换的单词(。

编辑:它在大约10-12分钟内完成了运行。

以下是一些示例数据:

输入:

word
development
investment
funds
slow
company
commit
pay
claim
finances
customers
claimed
insurance
comment
rapid
bureaucratic
affairs
reports
policyholders
detailed

输出:

word
development
investment
fund
slow
company
commit
pay
claim
finance
customer
claimed
insurance
comment
rapid
bureaucratic
affair
report
policyholder
detailed

这个怎么样?

In [1]: import pandas as pd
In [2]: from textblob import Word
In [3]: s = pd.read_csv('text', squeeze=True, memory_map=True)
In [4]: type(s)
Out[4]: pandas.core.series.Series
In [5]: s = s.apply(lambda w: Word(w).singularize())
In [6]: s
Out[6]:
0      development
1       investment
2             fund
3             slow
4          company
5           commit
6              pay
7            claim
8          finance
9         customer
10         claimed
11       insurance
12         comment
13           rapid
14    bureaucratic
15          affair
16          report
17    policyholder
18        detailed
Name: word, dtype: object

我在这里使用squeezeread_csv返回Series而不是DataFrame,因为word文件只有一列。此外,如果单词文件较大,则可以使用memory_map

你能用你的数据测试性能吗?

最新更新