在某些条目充当后续条目的标题的列表上循环最Python的方式是什么



作为记录CLI工具命令的系统输出的结果。。。

import subprocess
list_operations_cmd = 'cloud_cli_tool --list'
all_operations_sysout = subprocess.check_output(list_operations_cmd, shell=True)

我得到一个字符串,看起来像这样:

Project: foo foo foo (Environment: PRODUCTION)
ipsum dolor sit
Excepteur sint occaecat
POST aliquip ex ea
Project: foo foo foo (Environment: DEVELOPMENT)
ipsum dolor sit
Excepteur sint occaecat
POST aliquip ex ea
Project: bar (Environment: PRODUCTION)
velit esse cillum
occaecat cupidatat

我想把它变成一个看起来像这样的数据结构,以便于我对命令接口的参数进行输入净化,我正在编写这个Python脚本,以便向系统用户公开。

[
{
project: "foo foo foo",
environment: "PRODUCTION",
operation: "ipsum dolor sit"
},
{
project: "foo foo foo",
environment: "PRODUCTION",
operation: "Excepteur sint occaecat"
},
{
project: "foo foo foo",
environment: "PRODUCTION",
operation: "POST aliquip ex ea"
},
{
project: "foo foo foo",
environment: "DEVELOPMENT",
operation: "ipsum dolor sit"
},
{
project: "foo foo foo",
environment: "DEVELOPMENT",
operation: "Excepteur sint occaecat"
},
{
project: "foo foo foo",
environment: "DEVELOPMENT",
operation: "POST aliquip ex ea"
},
{
project: "bar",
environment: "PRODUCTION)",
operation: "velit esse cillum"
},
{
project: "bar",
environment: "PRODUCTION)",
operation: "occaecat cupidatat"
}
]

正如你所看到的,项目/环境的变化可以通过一条没有缩进的换行来检测;记录的更改可以通过换行符后面跟两个空格来检测。项目/环境线充当";标题";对于下面的行;项目/环境线";被检测到。

我知道我可以把整件事CCD_;开始一次循环一行,缓存内容&如果他们。。。但感觉可能有一种比更像Python的方法

proj_env_pattern = r'^Project: (.*) (Environment: (.*))$'
all_operations_list = all_operations_sysout.splitlines()
for line in all_operations_list[:20]:
is_header = False
if not line.startswith('  '):
is_header = True
project = re.search(proj_env_pattern, line, re.IGNORECASE).group(1)
print project
# TO DO:  code...
# TO:  code...

Linux上的Python 2.7.5;无法安装任何新模块。


更新:

感谢@fsimonjetz的提示。如果你想让它成为一个答案,我可以接受&关

代码现在比我做的要整洁得多:

import re
import subprocess
def get_all_operations():
list_operations_cmd = 'cloud_cli_tool --list'
proj_env_pattern = r'^(.*) (Environment: (.*))$'
all_operations_list = []
all_projects = subprocess.check_output(list_operations_cmd, shell=True).split('Project: ')
for project in all_projects[1:]:
project_lines = project.splitlines()
project_search = re.search(proj_env_pattern, project_lines[0])
for line in project_lines[1:]:
all_operations_list.append({'project': project_search.group(1), 'environment': project_search.group(2), 'operation': line.lstrip()})
return all_operations_list
print get_all_operations()

感谢@fsimonjetz的提示。如果你想让它成为一个答案,我可以接受,但在评论中添加我根据你的建议实施的内容,这样人们就不会再对这个问题视而不见了。

解决方案是利用幸运的事实;标题行";从Project:开始,并用.split('Project: ')将整个sysout字符串分解为每个标头的子字符串。对于每个子字符串,进一步的解析是通常的正则表达式.splitlines(),&[1:]列表切片。

import re
import subprocess
def get_all_operations():
list_operations_cmd = 'cloud_cli_tool --list'
proj_env_pattern = r'^(.*) (Environment: (.*))$'
all_operations_list = []
all_projects = subprocess.check_output(list_operations_cmd, shell=True).split('Project: ')
for project in all_projects[1:]:
project_lines = project.splitlines()
project_search = re.search(proj_env_pattern, project_lines[0])
for line in project_lines[1:]:
all_operations_list.append({'project': project_search.group(1), 'environment': project_search.group(2), 'operation': line.lstrip()})
return all_operations_list
print get_all_operations()

最新更新