IFC是用于建筑项目的STEP文件的变体。IFC包含有关正在建造的建筑的信息。该文件基于文本,易于阅读。我正试图将这些信息解析成一个python字典。每条线路的通用格式将类似于以下
2334=ifcmmateriallayerersetusage(#2333,.aaxs2。,正。,-180。(
理想情况下,这应该在#2334,IFCMATERIALLAYERSTUSAGE,#2333,.AXIS2.,.Ppositive.,-180中解析。我找到了一个解决方案Regex在第一场比赛中包括两场比赛https://regex101.com/r/RHIu0r/10对于部分问题。然而,在某些情况下,数据包含数组,而不是以下示例中的值
2335=IFCRELASSOCIATESMATERIAL('2ON6$yXXD1GAAH8whbdZmc',#5,$,$,(#40,#221,#268,#281(,#2334(
此案例需要解析为#2335,IFCRELASSOCIATESMATERIAL,'2ON6$yXXD1GAAH8whbdZmc',#5,$,$,[#40,#221,#268,#281],#2334其中[#40,#221,#268,#281]是作为数组存储在单个变量中的数组可以位于中间,也可以位于最后一个变量。
你能帮助创建一个正则表达式以获得所需的结果吗我已经创建https://regex101.com/r/mqrGka/1带有测试的案例
下面是一个解决方案,它从您在测试用例中使用正则表达式所达到的点开始继续:
file = """
#1=IFCOWNERHISTORY(#89024,#44585,$,.NOCHANGE.,$,$,$,1190720890);
#2=IFCSPACE(';;);',#1,$);some text);
#2=IFCSPACE(';;);',#1,$);
#2885=IFCRELAGGREGATES('1gtpBVmrDD_xsEb7NuFKc8',#5,$,$,#2813,(#2840,#2846,#2852,#2858,#2879));
#2334=IFCMATERIALLAYERSETUSAGE(#2333,.AXIS2.,.POSITIVE.,-180.);
#2335=IFCRELASSOCIATESMATERIAL('2ON6$yXXD1GAAH8whbdZmc',#5,$,$,(#40,#221,#268,#281),#2334);
""".splitlines()
import re
d = dict()
for line in file:
m = re.match(r"^#(d+)s*=s*([a-zA-Z0-9]+)s*(((?:'[^']*'|[^;'])+));", line, re.I|re.M)
attr = m.group(3) # attribute list string
values = [m.group(2)] # first value is the entity type name
while attr:
start = 1
if attr[0] == "'": start += attr.find("'", 1) # don't split at comma within string
if attr[0] == "(": start += attr.find(")", 1) # don't split item within parentheses
end = attr.find(",", start) # search for a comma / end of item
if end < 0: end = len(attr)
value = attr[1:end-1].split(",") if attr[0] == "(" else attr[:end]
if value[0] == "'": value = value[1:-1] # remove quotes
values.append(value)
attr = attr[end+1:] # remove current attribute item
d[m.group(1)] = values # store into dictionary