MySQL/Python Shulze Elections:为共享不同ballot_id的单独字段快速生成比较计数哈希表

我正在用python编写选举软件，使用MySQL作为"学习编码"项目。我使用了5种不同的选举类型，能够使用模块化类进行扩展。我最近被Schulze和IRV卡住了，因为它们都要求我比较候选人在同一张选票上的得分，但为了做到这一点，我需要创建一个生成器对象，以避免内存问题，从而为每一场比赛生成投票ID列表。我使用MySQLdb.connect（cursorclass=cursors.SSCursor）完成了这项工作

                                                   (prime key)
+---------+-------+---------+-----------+---------+---------+
| cand_id | score | race_id | ballot_id | user_id | vote_id |
+---------+-------+---------+-----------+---------+---------+
|       1 |     4 |       1 |         1 |       1 |       1 |
|       2 |     2 |       1 |         1 |       1 |       2 |
|       3 |     3 |       1 |         1 |       1 |       3 |
|       4 |     1 |       1 |         1 |       1 |       4 |
+---------+-------+---------+-----------+---------+---------+

这是我的问题：尽管我的算法有效，IRV似乎足够快，但舒尔泽是另一回事。为了让Schulze工作，我必须为每个组合生成一个获胜配对结果的字典，用于每个球。这意味着，在对每个唯一的候选人成对组合进行迭代的for循环中，我也在对每个选票进行迭代和检查，以查看一个候选人在该唯一选票上的评分是否高于另一个候选人，并将其添加到适当的字典键中（以组合元组排列）。对舒尔茨来说，分数越高，这里的评分就越高。请注意，循环实际上并没有遍历反向组合，我只是适当地填充字典。在这种情况下，字典看起来是这样的：

results = { (1,2):1, (2,1):0, (1,3):1, (3,1):0, (1,4):1. (4,1):0, (2,3):0, (3,2):1, (2,4):1, (4,2):0, (3,4):1, (4,3):0 }

这并没有那么糟糕，直到我有数百、数千或数百万的用户和选票需要为每个组合对进行迭代。我相当肯定这种做事方式是愚蠢的。我的问题是：是否有一些聪明的方法来查询MySQL，使用嵌套选择或其他我不理解的东西，这使得MySQL承担了计算有多少人（即唯一的ballot_id比较）更喜欢cand_id x而不是cand_id y的重任？我觉得这可能是可能的，但我越是尝试，我就越开始认为MySQL实际上做不到这一点。不管怎样，如果有人能帮我找到一种既不慢又不愚蠢的方法，那真的会让我很开心。

附言：作为一个新手，我真的不知道我在这里要求什么，所以如果有人有一个更好的标题，我可以使用，对其他用户更有帮助，我当然会很感激。谢谢。

这是实际代码：

def genPairResults_Schulze(self):
    results = {}
    # Generates a tuple of each unique pairwise combination of candidates
    #
    for comb in combinations(self.candidates, 2):
        # Init the dictionary table of comparison results for the above combination tuple and its reverse
        #
        revcomb = comb[::-1]
        results[comb] = 0
        results[revcomb] = 0
        # Formats the candidates into a string as "(x,y)" for use in SELECT command
        #
        formattedcands = self.formatCands([x for x in comb])
        # Queries MySQL to create generator object of every ballot_id in specified race with the above candidates
        #
        genballot_cmd = "SELECT DISTINCT ballot_id FROM votes WHERE (cand_id in %s and race_id = '%i')" % (formattedcands, self.race_id)
        for ballot_id in self.genBallotID(genballot_cmd): # Iterates through the generator of ballot_ids
        # Grabs the score of each candidate in the combination from this specific ballot
        #
            cmd1 = "SELECT score FROM votes WHERE cand_id = '%i' and ballot_id = '%i'" % (
                comb[0], ballot_id)
            cmd2 = "SELECT score FROM votes WHERE cand_id = '%i' and ballot_id = '%i'" % (
                comb[1], ballot_id)
            # Pulling inside a list of tuples to get the raw int value
            #
            score1 = self.queryDB(cmd1)[0][0]
            score2 = self.queryDB(cmd2)[0][0]
            # Compares scores and adds winner to dictionary hash count
            #
            if score1 > score2:
                results[comb] += 1
            elif score1 < score2:
                results[revcomb] += 1
            elif score1 == score2: # Ties
                results[comb] += 1
                results[revcomb] += 1
    return results

def tally_Schulze(self):
    # Note: high score here represents higher preference, i.e. with a score of 4 vs 3, 4 wins
    #
    pair_results = {}
    path_results = {}
    winner = {}
    # This is the dictionary generated in the function above
    #
    pair_results = self.genPairResults_Schulze()
    candidates = self.candidates
    # I totally stole this algorithm from wikipedia.  BECAUSE I DONT UNDERSTAND SCHULZE
    #
    for i in candidates:
        for j in candidates:
            if i != j:
                if pair_results[(i,j)] > pair_results[(j,i)]:
                    path_results[(i,j)] = pair_results[(i,j)]
                else:
                    path_results[(i,j)] = 0
    for i in candidates:
        for j in candidates:
            if i != j:
                for k in candidates:
                    if (i != k) and (j != k):
                        path_results[(j,k)] = max(path_results[(j,k)], min(path_results[(j,i)], path_results[(i,k)]))
    for i in candidates:
        winner[i] = True
    for i in candidates:
        for j in candidates:
            if i != j:
                if path_results[(j,i)] > path_results[(i,j)]:
                    winner[i] = False
    return winner

如果genPairResults_Schulze为每个选票（eek！）执行SQL查询，它可以/应该将所有工作推送到一个数据库查询：

select b1.cand_id c1, b2.cand_id c2, count(*) pref1
from ballot b1
join ballot b2
  on b1.race_id = b2.race_id
 and b1.user_id = b2.user_id
 and b1.cand_id != b2.cand_id
 and b1.score > b2.score
where b1.race_id = ?
group by b1.cand_id, b2.cand_id

相关内容

最新更新

热门标签：