
我制作了一个压缩代码,并在10 KB的文本文件上进行了测试,这花了不少于3分钟的时间。然而,我用一个1MB的文件进行了测试,这是我老师分配的评估,它需要半个多小时。与我的同学相比,我的是不规则的长。它可能是我的电脑或我的代码,但我不知道。有人知道缩短代码速度的任何技巧或快捷方式吗?我的压缩代码如下,如果有任何更快的方法来进行循环等,请给我一个答案(:


import re #used to enable functions(loops, etc.) to find patterns in text file
import os #used for anything referring to directories(files)
from collections import Counter #used to keep track on how many times values are added
size1 = os.path.getsize('file.txt') #find the size(in bytes) of your file,    INCLUDING SPACES
print('The size of your file is ', size1,)
words = re.findall('w+', open('file.txt').read()) 
wordcounts = Counter(words) #turns all words into array, even capitals 
common100 = [x for x, it in Counter(words).most_common(100)] #identifies the 200 most common words
keyword = []
kcount = []
z = dict(wordcounts)
for key, value in z.items():
    keyword.append(key) #adds each keyword to the array called keywords
characters =['$','#','@','!','%','^','&','*','(',')','~','-','/','{','[', ']', '+','=','}','|', '?','cb',
         'TSS','VTT','WVV','XWW','YXX','ZYY','BCB','CDC','DFD','FGF','GHG','JKJ','KMK','MNM','NPN','PQP',] #characters which I can use
symbols_words = []
char = 0
for i in common100:
    symbols_words.append(characters[char]) #makes the array literally contain 0 values
        char = char + 1
print("Compression has now started")
f = 0
g = 0
no = 0
while no < 100:
    for i in common100:
        for w in words:
            if i == w and len(i)>1: #if the values in common200 are ACTUALLY in words
                place = words.index(i)#find exactly where the most common words are in the text
                symbols = symbols_words[common100.index(i)] #assigns one character with one common word
                words[place] = symbols # replaces the word with the symbol
                g = g + 1
    no = no + 1

string = words
stringMade = ' '.join(map(str, string))#makes the list into a string so you can put it into a text file
file = open("compression.txt", "w")
file.write(stringMade)#imports everything in the variable 'words' into the new file
size2 = os.path.getsize('compression.txt')
no1 = int(size1)
no2 = int(size2)
print('Compression has finished.')
print('Your original file size has been compressed by', 100 - ((100/no1) * no2 ) ,'percent.'
  'The size of your file now is ', size2)


word_substitutes = dict(zip(common100, characters))



# Iterate over all the words
# Use enumerate because we're going to modify the word in-place in the words list
for word_idx, word in enumerate(words):
    # If the current word is in the `word_substitutes` dict, then we know its in the
    # 'common' words, and can be replaced by the symbol
    if word in word_substitutes:
        # Replaces the word in-place
        replacement_symbol = word_substitutes[word]
        words[word_idx] = replacement_symbol



for i in common100:
    for w in words:
        if i == w and len(i)>1:


common_words = set(common100)
for w in words:
    if w in common_words:


  • 衡量你的程序的每个"部分"需要多少时间。您可以使用探查器(例如,标准库中的这个探查器),或者简单地在代码中添加一些times.append(time.time.now)并计算差异。然后你就知道代码的哪一部分是慢的
  • 看看你是否可以改进慢部分的算法。格尼乔拉的回答显示了加快速度的一种可能性。while no<=100看起来可疑,也许可以改进。这一步需要了解您使用的算法。请注意为您的用例选择最佳的数据结构
  • 如果你不能使用更好的算法(因为你总是使用最好的方法来计算),你需要加快计算速度。numpy提供了数字方面的好处,使用cython,您基本上可以将python代码编译为C,而numba则使用LLVM进行编译
