在 Python 中重新排列字符串结构

我正在清理多个pdf文件。我合并了两个词典来获得三个输出。键文件名、字索引和字数统计。

for key, value in countDict.items():
for word, count in value.items():
for token, index in vocabDict.items():
if word==token:
print(key,index,count)

三个输出打印为字符串

PP3188 2498 1
PP3188 1834 10
PP3188 2063 1
PP3278 447 1
PP3278 1458 1
PP3160 2433 5
PP3160 1889 2

有没有办法对此输出进行分组以使其看起来像这样：

PP3188, 2498 : 1, 1834 : 10, 2063 :1
PP3278, 447 : 1, 1458 : 1
PP3160, 2433 : 5, 1889 : 2

知道如何实现这种结构吗？或类似的输出？谢谢。

当然，你想要的结构可能是字典的默认。我给你看看。

{
'PP3188': {
2498: 1,
1834: 10,
2063: 1
},
'PP3278': {
447: 1,
1458:1
},
'PP3160': {
2433: 5,
1889: 2
}
}

下面是示例代码。

from collections import defaultdict
... some code ...
data = defaultdict(dict)
for key, value in countDict.items():
for word, count in value.items():
for token, index in vocabDict.items():
if word==token:
data[key][index] = count

我和@Epion的答案之间的区别在于，在他的答案中，你有带键的字典，因为PPxxxx和值是元组的列表，而我的是带有字典值的字典。

好吧，你可以有一个defaultdict(list)结构，它将key作为其键，值是元组列表(index, count)。

from collections import defaultdict
our_dict = defaultdict(list)

然后，您将执行追加，而不是打印：

for key, value in countDict.items():
for word, count in value.items():
for token, index in vocabDict.items():
if word==token:
our_dict[key].append((index, count))

使用这样的结构，您可以在之后打印所有内容：

for key, values_list in our_dict.items():
for (index, count) in values_list:
print(key, index, count)

只需对代码进行最少的修改，就可以通过以下方式完成：

for key, value in countDict.items():
entries = [key]
for word, count in value.items():
for token, index in vocabDict.items():
if word==token:
entries.append(str(index) + " : " + str(count))
print(key,index,count)
print(", ".join(entries))

相关内容

最新更新

热门标签：