我正在用python编写选举软件,使用MySQL作为"学习编码"项目。我使用了5种不同的选举类型,能够使用模块化类进行扩展。我最近被Schulze和IRV卡住了,因为它们都要求我比较候选人在同一张选票上的得分,但为了做到这一点,我需要创建一个生成器对象,以避免内存问题,从而为每一场比赛生成投票ID列表。我使用MySQLdb.connect(cursorclass=cursors.SSCursor)完成了这项工作
(prime key)
+---------+-------+---------+-----------+---------+---------+
| cand_id | score | race_id | ballot_id | user_id | vote_id |
+---------+-------+---------+-----------+---------+---------+
| 1 | 4 | 1 | 1 | 1 | 1 |
| 2 | 2 | 1 | 1 | 1 | 2 |
| 3 | 3 | 1 | 1 | 1 | 3 |
| 4 | 1 | 1 | 1 | 1 | 4 |
+---------+-------+---------+-----------+---------+---------+
这是我的问题:尽管我的算法有效,IRV似乎足够快,但舒尔泽是另一回事。为了让Schulze工作,我必须为每个组合生成一个获胜配对结果的字典,用于每个球。这意味着,在对每个唯一的候选人成对组合进行迭代的for循环中,我也在对每个选票进行迭代和检查,以查看一个候选人在该唯一选票上的评分是否高于另一个候选人,并将其添加到适当的字典键中(以组合元组排列)。对舒尔茨来说,分数越高,这里的评分就越高。请注意,循环实际上并没有遍历反向组合,我只是适当地填充字典。在这种情况下,字典看起来是这样的:
results = { (1,2):1, (2,1):0, (1,3):1, (3,1):0, (1,4):1. (4,1):0, (2,3):0, (3,2):1, (2,4):1, (4,2):0, (3,4):1, (4,3):0 }
这并没有那么糟糕,直到我有数百、数千或数百万的用户和选票需要为每个组合对进行迭代。我相当肯定这种做事方式是愚蠢的。我的问题是:是否有一些聪明的方法来查询MySQL,使用嵌套选择或其他我不理解的东西,这使得MySQL承担了计算有多少人(即唯一的ballot_id比较)更喜欢cand_id x而不是cand_id y的重任?我觉得这可能是可能的,但我越是尝试,我就越开始认为MySQL实际上做不到这一点。不管怎样,如果有人能帮我找到一种既不慢又不愚蠢的方法,那真的会让我很开心。
附言:作为一个新手,我真的不知道我在这里要求什么,所以如果有人有一个更好的标题,我可以使用,对其他用户更有帮助,我当然会很感激。谢谢。
这是实际代码:
def genPairResults_Schulze(self):
results = {}
# Generates a tuple of each unique pairwise combination of candidates
#
for comb in combinations(self.candidates, 2):
# Init the dictionary table of comparison results for the above combination tuple and its reverse
#
revcomb = comb[::-1]
results[comb] = 0
results[revcomb] = 0
# Formats the candidates into a string as "(x,y)" for use in SELECT command
#
formattedcands = self.formatCands([x for x in comb])
# Queries MySQL to create generator object of every ballot_id in specified race with the above candidates
#
genballot_cmd = "SELECT DISTINCT ballot_id FROM votes WHERE (cand_id in %s and race_id = '%i')" % (formattedcands, self.race_id)
for ballot_id in self.genBallotID(genballot_cmd): # Iterates through the generator of ballot_ids
# Grabs the score of each candidate in the combination from this specific ballot
#
cmd1 = "SELECT score FROM votes WHERE cand_id = '%i' and ballot_id = '%i'" % (
comb[0], ballot_id)
cmd2 = "SELECT score FROM votes WHERE cand_id = '%i' and ballot_id = '%i'" % (
comb[1], ballot_id)
# Pulling inside a list of tuples to get the raw int value
#
score1 = self.queryDB(cmd1)[0][0]
score2 = self.queryDB(cmd2)[0][0]
# Compares scores and adds winner to dictionary hash count
#
if score1 > score2:
results[comb] += 1
elif score1 < score2:
results[revcomb] += 1
elif score1 == score2: # Ties
results[comb] += 1
results[revcomb] += 1
return results
def tally_Schulze(self):
# Note: high score here represents higher preference, i.e. with a score of 4 vs 3, 4 wins
#
pair_results = {}
path_results = {}
winner = {}
# This is the dictionary generated in the function above
#
pair_results = self.genPairResults_Schulze()
candidates = self.candidates
# I totally stole this algorithm from wikipedia. BECAUSE I DONT UNDERSTAND SCHULZE
#
for i in candidates:
for j in candidates:
if i != j:
if pair_results[(i,j)] > pair_results[(j,i)]:
path_results[(i,j)] = pair_results[(i,j)]
else:
path_results[(i,j)] = 0
for i in candidates:
for j in candidates:
if i != j:
for k in candidates:
if (i != k) and (j != k):
path_results[(j,k)] = max(path_results[(j,k)], min(path_results[(j,i)], path_results[(i,k)]))
for i in candidates:
winner[i] = True
for i in candidates:
for j in candidates:
if i != j:
if path_results[(j,i)] > path_results[(i,j)]:
winner[i] = False
return winner
如果genPairResults_Schulze
为每个选票(eek!)执行SQL查询,它可以/应该将所有工作推送到一个数据库查询:
select b1.cand_id c1, b2.cand_id c2, count(*) pref1
from ballot b1
join ballot b2
on b1.race_id = b2.race_id
and b1.user_id = b2.user_id
and b1.cand_id != b2.cand_id
and b1.score > b2.score
where b1.race_id = ?
group by b1.cand_id, b2.cand_id