我正在尝试使用pyrouge来计算自动摘要和金标准之间的相似性。当处理两个摘要时,Rouge都可以正常工作。但是,当它写下结果时,它抱怨说"元组索引超出范围"是否有人知道是什么原因引起了这个问题,以及我如何解决它?
2017-09-13 23:54:57,524 [MainThread ] [INFO ] Set ROUGE home directory to D:ComputerScienceResearchROUGE-1.5.5ROUGE-1.5.5.
2017-09-13 23:54:57,524 [MainThread ] [INFO ] Writing summaries.
2017-09-13 23:54:57,524 [MainThread ] [INFO ] Processing summaries. Saving system files to C:UserszhuanAppDataLocalTemptmppm193twpsystem and model files to C:UserszhuanAppDataLocalTemptmppm193twpmodel.
2017-09-13 23:54:57,524 [MainThread ] [INFO ] Processing files in D:ComputerScienceResearchsummaryGrendelautomated.
2017-09-13 23:54:57,524 [MainThread ] [INFO ] Processing automated.txt.
2017-09-13 23:54:57,539 [MainThread ] [INFO ] Saved processed files to C:UserszhuanAppDataLocalTemptmppm193twpsystem.
2017-09-13 23:54:57,539 [MainThread ] [INFO ] Processing files in D:ComputerScienceResearchsummaryGrendelmanual.
2017-09-13 23:54:57,539 [MainThread ] [INFO ] Processing BookRags.txt.
2017-09-13 23:54:57,539 [MainThread ] [INFO ] Processing GradeSaver.txt.
2017-09-13 23:54:57,539 [MainThread ] [INFO ] Processing GradeSummary.txt.
2017-09-13 23:54:57,557 [MainThread ] [INFO ] Processing Wikipedia.txt.
2017-09-13 23:54:57,562 [MainThread ] [INFO ] Saved processed files to C:UserszhuanAppDataLocalTemptmppm193twpmodel.
Traceback (most recent call last):
File "<ipython-input-8-bc227b272111>", line 1, in <module>
runfile('D:/ComputerScience/Research/automate_summary.py', wdir='D:/ComputerScience/Research')
File "C:UserszhuanAnaconda3libsite-packagesspyderutilssitesitecustomize.py", line 707, in runfile
execfile(filename, namespace)
File "C:UserszhuanAnaconda3libsite-packagesspyderutilssitesitecustomize.py", line 101, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "D:/ComputerScience/Research/automate_summary.py", line 53, in <module>
output = r.convert_and_evaluate()
File "C:UserszhuanAnaconda3libsite-packagespyrougeRouge155.py", line 361, in convert_and_evaluate
rouge_output = self.evaluate(system_id, rouge_args)
File "C:UserszhuanAnaconda3libsite-packagespyrougeRouge155.py", line 331, in evaluate
self.write_config(system_id=system_id)
File "C:UserszhuanAnaconda3libsite-packagespyrougeRouge155.py", line 315, in write_config
self._config_file, system_id)
File "C:UserszhuanAnaconda3libsite-packagespyrougeRouge155.py", line 264, in write_config_static
system_filename_pattern = re.compile(system_filename_pattern)
File "C:UserszhuanAnaconda3libre.py", line 233, in compile
return _compile(pattern, flags)
File "C:UserszhuanAnaconda3libre.py", line 301, in _compile
p = sre_compile.compile(pattern, flags)
File "C:UserszhuanAnaconda3libsre_compile.py", line 562, in compile
p = sre_parse.parse(p, flags)
File "C:UserszhuanAnaconda3libsre_parse.py", line 855, in parse
p = _parse_sub(source, pattern, flags & SRE_FLAG_VERBOSE, 0)
File "C:UserszhuanAnaconda3libsre_parse.py", line 416, in _parse_sub
not nested and not items))
File "C:UserszhuanAnaconda3libsre_parse.py", line 616, in _parse
source.tell() - here + len(this))
error: nothing to repeat
黄金标准是bookrags.txt,gradesaver.txt,gradesummary.txt,wikipedia.txt需要将需要进行比较的摘要是自动化的。txt
*.txt或[A-Z0-9A-Z] 是否应该工作?但是上一个给我"无重复错误",后者"元组索引超出范围"错误
r = Rouge155("D:ComputerScienceResearchROUGE-1.5.5ROUGE-1.5.5")
r.system_dir = 'D:ComputerScienceResearchsummaryGrendel\automated'
r.model_dir = 'D:ComputerScienceResearchsummaryGrendelmanual'
r.system_filename_pattern = '[a-z0-9A-Z]+.txt'
r.model_filename_pattern = '[a-z0-9A-Z]+.txt'
output = r.convert_and_evaluate()
print(output)
我正在手动设置两个目录。似乎Rouge软件包可以处理其中的TXT。
i与 pyrouge 软件包遇到了相同的问题。之所以发生此问题,是因为源代码试图匹配我们在返回空元组的失败时提供的一定模式的文件名。如果您想了解更多有关此的信息,则可以查看 rouge155.py 文件。更具体地说,请查看功能 __ get_model_filenames_for_id()例如。
我通过遵循官方页面中提到的确切的文件名说明来解决它:
r.system_filename_pattern ='some_name。( d )。txt'
r.model_filename_pattern ='some_name。[a-z]。#id#.txt'
所以,我的建议是:
- 为system_summaries(系统生成)和model_summaries(人类生成/金标准)创建两个单独的目录
- 提供导致这些目录的确切文件路径
- 如果要比较一个system_summary(例如Systemsumary.1.txt)与一组model_summaries(例如Modelsummary.a.1.txt,Modelsummary.b.1.txt,Modelsummary.c.1.txt),然后提供以下模式:
r.system_filename_pattern = 'SystemSummary.(d+).txt'
r.model_filename_pattern = 'ModelSummary.[A-Z].#ID#.txt'
您可以根据要评估的摘要数量扩展此信息。
希望这会有所帮助!祝你好运!
问题是,流氓库从未考虑到未找到正则表达式匹配的情况。Rogue源代码id = match.groups(0)[0]
中的行是有问题的。如果您在文档中查找此问题,则说组函数Return a tuple containing all the subgroups of the match, from 1 up to however many groups are in the pattern...
。因为没有发现的匹配项,所以返回一个空元组,并且代码正在尝试从空元组中抓住第一个项目,从而导致错误。