Python中字符串的基数排序



与Python的排序相比,我的基数排序函数输出排序但错误的列表:

My radix sort: ['aa', 'a', 'ab', 'abs', 'asd', 'avc', 'axy', 'abid']
Python's sort: ['a', 'aa', 'ab', 'abid', 'abs', 'asd', 'avc', 'axy']

*我的基数排序不做填充
*它的机制是最低有效位(LSB(
*我需要利用每个单词的长度

以下是我的代码。

def count_sort_letters(array, size, col, base):
output   = [0] * size
count    = [0] * base
min_base = ord('a')
for item in array:
correct_index = min(len(item) - 1, col)
letter = ord(item[-(correct_index + 1)]) - min_base
count[letter] += 1
for i in range(base - 1):
count[i + 1] += count[i]
for i in range(size - 1, -1, -1):
item = array[i]
correct_index = min(len(item) - 1, col)
letter = ord(item[-(correct_index + 1)]) - min_base
output[count[letter] - 1] = item
count[letter] -= 1
return output

def radix_sort_letters(array):
size = len(array)
max_col = len(max(array, key = len))
for col in range(max_col):
array = count_sort_letters(array, size, col, 26)
return array

有人能找到解决这个问题的方法吗?

正如我在评论中提到的:

在您的代码中的行:

correct_index = min(len(item) - 1, col)
letter = ord(item[-(correct_index + 1)]) - min_base

一旦col大于单词长度,就总是使用单词的第一个字母。这使较短的单词根据其第一个字母排序一次col大于单词长度。例如[‘aa’,‘a’]保留不变,因为在for col循环中,我们比较了两个单词中的"a",这使结果保持不变。

代码校正

注意:试图尽量减少对原始代码的更改

def count_sort_letters(array, size, col, base, max_len):
""" Helper routine for performing a count sort based upon column col """
output   = [0] * size
count    = [0] * (base + 1) # One addition cell to account for dummy letter
min_base = ord('a') - 1 # subtract one too allow for dummy character
for item in array: # generate Counts
# get column letter if within string, else use dummy position of 0
letter = ord(item[col]) - min_base if col < len(item) else 0
count[letter] += 1
for i in range(len(count)-1):   # Accumulate counts
count[i + 1] += count[i]
for item in reversed(array):
# Get index of current letter of item at index col in count array
letter = ord(item[col]) - min_base if col < len(item) else 0
output[count[letter] - 1] = item
count[letter] -= 1
return output
def radix_sort_letters(array, max_col = None):
""" Main sorting routine """
if not max_col:
max_col = len(max(array, key = len)) # edit to max length
for col in range(max_col-1, -1, -1): # max_len-1, max_len-2, ...0
array = count_sort_letters(array, len(array), col, 26, max_col)
return array
lst = ['aa', 'a', 'ab', 'abs', 'asd', 'avc', 'axy', 'abid']
print(radix_sort_letters(lst))

测试

lst = ['aa', 'a', 'ab', 'abs', 'asd', 'avc', 'axy', 'abid']
print(radix_sort_letters(lst))
# Compare to Python sort
print(radix_sort_letters(lst)==sorted(lst))

输出

['a', 'aa', 'ab', 'abid', 'abs', 'asd', 'avc', 'axy']
True

解释

计数排序是一种稳定的排序,意思是:

让我们来看看这个函数是如何工作的。

让我们排序:[ac','xb','ab']

我们按相反的顺序浏览每个列表中的每个字符。

迭代0:

Key is last character in list (i.e. index -1):       
keys are ['c','b', 'b'] (last characters of 'ac', 'xb', and 'ab'
Peforming a counting sort on these keys we get ['b', 'b', 'c']
This causes the corresponding words for these keys to be placed in    
the order:    ['xb', 'ab', 'ac']
Entries 'xb' and 'ab' have equal keys (value 'b') so they maintain their 
order of 'xb' followed by 'ab' of the original list 
(since counting sort is a stable sort)

迭代1:

Key is next to last character (i.e. index -2):
Keys are ['x', 'a', 'a'] (corresponding to list ['xb', 'ab', 'ac'])
Counting Sort produces the order ['a', 'a', 'a']
which causes the corresponding words to be placed in the order
['ab', 'ac', 'xb'] and we are done.

原始软件错误--您的代码最初是从左到右通过字符串而不是从右到左。我们需要从右到左,因为我们想根据第一个字符对最后一个排序,根据第二个字符对倒数第二个排序,等等

不同长度的字符串-上面的例子是使用相同长度的字符串。

前面的例子在假定字符串长度相等的情况下进行了简化。现在让我们尝试长度不等的字符串,例如:

[‘ac’,‘a’,‘ab’]

这立即带来了一个问题,因为单词的长度不相等,我们不能每次都选择一个字母。

我们可以通过在每个单词中填充一个伪字符(如"*"(来修复:

[‘ac’、‘a*’、‘ab’]

迭代0:键是每个单词的最后一个字符,因此:[‘c’,‘*’,‘b’]

The understanding is that the dummy character is less than all other
characters, so the sort order will be:
['*', 'b', 'c'] causing the related words to be sorted in the order
['a*', 'ab', 'ac']

迭代1:键位于每个单词的最后一个字符旁边,因此:[‘a’,‘a’和‘a’]

Since the keys are all equal counting sort won't change the order so we keep
['a*', 'ab', 'ac']
Removing the dummy character from each string (if any) we end up with:
['a', 'ab', 'ac']

get_index背后的想法是模仿填充字符串的行为实际填充(即填充是额外的工作(。因此,根据指数它评估索引是指向字符串的填充部分还是未填充部分并将适当的索引返回到计数阵列中进行计数。

相关内容

  • 没有找到相关文章

最新更新