Regex Python / group quantifiers



我想匹配一个看起来像目录的变量列表,例如:

Same/Same2/Foot/Ankle/Joint/Actuator/Sensor/Temperature/Value=4.123
Same/Same2/Battery/Name=SomeString
Same/Same2/Home/Land/Some/More/Stuff=0.34

"子目录"的长度是一个有上限的变量(上面是9)。我想把每个子目录分组,除了上面我命名为"Same"的第一个子目录。

我能想到的最好的是:

^(?:([^/]+)/){4,8}([^/]+)=(.*)

它已经查找4-8个子目录,但只分组最后一个。为什么?是否有更好的解决方案使用组量词?

编辑:解决。将使用split()代替。

import re
regx = re.compile('(?:(?<=A)|(?<=/)).+?(?=/|Z)')

for ss in ('Same/Same2/Foot/Ankle/Joint/Actuator/Sensor/Temperature/Value=4.123',
           'Same/Same2/Battery/Name=SomeString',
           'Same/Same2/Home/Land/Some/More/Stuff=0.34'):
    print ss
    print regx.findall(ss)
    print

编辑1

现在您已经给出了更多关于您想要获得的信息(_"Same/Same2/Battery/Name=SomeString成为SAME2_BATTERY_NAME=SomeString"_)可以提出更好的解决方案:使用正则表达式或split(), + replace()

import re
from os import sep
sep2 = r'\' if sep=='\' else '/'
pat = '^(?:.+?%s)(.+$)' % sep2
print 'pat==%sn' % pat
ragx = re.compile(pat)
for ss in ('SameSame2FootAnkleJointActuatorSensorTemperatureValue=4.123',
           'SameSame2BatteryName=SomeString',
           'SameSame2HomeLandSomeMoreStuff=0.34'):
    print ss
    print ragx.match(ss).group(1).replace(sep,'_')
    print ss.split(sep,1)[1].replace(sep,'_')
    print
结果

pat==^(?:.+?\)(.+$)
SameSame2FootAnkleJointActuatorSensorTemperatureValue=4.123
Same2_Foot_Ankle_Joint_Actuator_Sensor_Temperature_Value=4.123
Same2_Foot_Ankle_Joint_Actuator_Sensor_Temperature_Value=4.123
SameSame2BatteryName=SomeString
Same2_Battery_Name=SomeString
Same2_Battery_Name=SomeString
SameSame2HomeLandSomeMoreStuff=0.34
Same2_Home_Land_Some_More_Stuff=0.34
Same2_Home_Land_Some_More_Stuff=0.34

编辑2

重新阅读你的评论,我意识到我没有考虑到你想要在'='符号之前而不是在它之后的字符串部分上。

因此,这段新代码公开了3个方法来满足这个要求。你可以选择你喜欢哪一个:

import re
from os import sep
sep2 = r'\' if sep=='\' else '/'

pot = '^(?:.+?%s)(.+?)=([^=]*$)' % sep2
print 'pot==%sn' % pot
rogx = re.compile(pot)
pet = '^(?:.+?%s)(.+?(?==[^=]*$))' % sep2
print 'pet==%sn' % pet
regx = re.compile(pet)

for ss in ('SameSame2FootAnkleJointSensorValue=4.123',
           'SameSame2BatteryName=SomeString',
           'SameSame2OceanAtlanticNorth=',
           'SameSame2MathsAddition\2+2=4Simple=ohoh'):
    print ss + 'n' + len(ss)*'-'
    print 'rogx groups  '.rjust(32),rogx.match(ss).groups()
    a,b = ss.split(sep,1)[1].rsplit('=',1)
    print 'split split  '.rjust(32),(a,b)
    print 'split split join upper replace   %s=%s' % (a.replace(sep,'_').upper(),b)
    print 'regx split group  '.rjust(32),regx.match(ss.split(sep,1)[1]).group()
    print 'regx split sub  '.rjust(32),
          regx.sub(lambda x: x.group(1).replace(sep,'_').upper(), ss)
    print

result,在Windows平台上

pot==^(?:.+?\)(.+?)=([^=]*$)
pet==^(?:.+?\)(.+?(?==[^=]*$))
SameSame2FootAnkleJointSensorValue=4.123
----------------------------------------------
                   rogx groups   ('Same2\Foot\Ankle\Joint\Sensor\Value', '4.123')
                   split split   ('Same2\Foot\Ankle\Joint\Sensor\Value', '4.123')
split split join upper replace   SAME2_FOOT_ANKLE_JOINT_SENSOR_VALUE=4.123
              regx split group   Same2FootAnkleJointSensorValue
                regx split sub   SAME2_FOOT_ANKLE_JOINT_SENSOR_VALUE=4.123
SameSame2BatteryName=SomeString
----------------------------------
                   rogx groups   ('Same2\Battery\Name', 'SomeString')
                   split split   ('Same2\Battery\Name', 'SomeString')
split split join upper replace   SAME2_BATTERY_NAME=SomeString
              regx split group   Same2BatteryName
                regx split sub   SAME2_BATTERY_NAME=SomeString
SameSame2OceanAtlanticNorth=
--------------------------------
                   rogx groups   ('Same2\Ocean\Atlantic\North', '')
                   split split   ('Same2\Ocean\Atlantic\North', '')
split split join upper replace   SAME2_OCEAN_ATLANTIC_NORTH=
              regx split group   Same2OceanAtlanticNorth
                regx split sub   SAME2_OCEAN_ATLANTIC_NORTH=
SameSame2MathsAddition2+2=4Simple=ohoh
-------------------------------------------
                   rogx groups   ('Same2\Maths\Addition\2+2=4\Simple', 'ohoh')
                   split split   ('Same2\Maths\Addition\2+2=4\Simple', 'ohoh')
split split join upper replace   SAME2_MATHS_ADDITION_2+2=4_SIMPLE=ohoh
              regx split group   Same2MathsAddition2+2=4Simple
                regx split sub   SAME2_MATHS_ADDITION_2+2=4_SIMPLE=ohoh

我可能误解了您想要做的事情,但是下面是不使用regex的方法:

for entry in list_of_vars:
    key, value = entry.split('=')
    key_components = key.split('/')
    if 4 <= len(key_components) <= 8:
        # here the actual work is done
        print "%s=%s" % ('_'.join(key_components[1:]).upper(), value)

使用split?

>>> p='Same/Same2/Foot/Ankle/Joint/Actuator/Sensor/Temperature/Value=4.123'
>>> p.split('/')
['Same', 'Same2', 'Foot', 'Ankle', 'Joint', 'Actuator', 'Sensor', 'Temperature', 'Value=4.123']

同样,如果你想要key/val对,你可以这样做…

>>> s = p.split('/')
>>> s[-1].split('=')
['Value', '4.123']

主题的几个变化。首先,我总是发现regexen非常神秘,以至于无法维护,所以我编写了pyparsing模块。在我看来,我看着你的代码,然后想,"哦,这是一个'/'分隔的字符串列表,一个'='符号,然后是某种右值。"这将非常直接地转换为pyparsing解析器定义代码。通过在解析器中到处添加一个名称("key"one_answers"value",类似于regex中的命名组),输出非常容易处理。

data="""
Same/Same2/Foot/Ankle/Joint/Actuator/Sensor/Temperature/Value=4.123
Same/Same2/Battery/Name=SomeString
Same/Same2/Home/Land/Some/More/Stuff=0.34""".splitlines()
from pyparsing import Word, alphas, alphanums, Word, nums, QuotedString, delimitedList
wd = Word(alphas, alphanums)
number = Word(nums+'+-', nums+'.').setParseAction(lambda t:float(t[0]))
rvalue = wd | number | QuotedString('"')
defn = delimitedList(wd, '/')('key') + '=' + rvalue('value')
for d in data:
    result = defn.parseString(d)
其次,我质疑您定义所有这些变量名的方法——基于数据动态创建变量名是一种公认的代码气味(不一定是坏的,但您可能真的想重新考虑这种方法)。我使用递归默认字典来创建一个可导航的结构,以便您可以轻松地执行诸如"查找作为"Same2"的子元素的所有条目(在本例中为"Foot","Battery"one_answers"Home")之类的操作-当试图筛选在locals()中找到的一些变量名称集合时,这种工作更加困难,在我看来,您最终将重新解析这些名称以重建键层次结构。
from collections import defaultdict
class recursivedefaultdict(defaultdict):
    def __init__(self, attrFactory=int):
        self.default_factory = lambda : type(self)(attrFactory)
        self._attrFactory = attrFactory
    def __getattr__(self, attr):
        newval = self._attrFactory()
        setattr(self, attr, newval)
        return newval
table = recursivedefaultdict()
# parse each entry, and accumulate into hierarchical dict
for d in data:
    # use pyparsing parser, gives us key (list of names) and value
    result = defn.parseString(d)
    t = table
    for k in result.key[:-1]:
        t = t[k]
    t[result.key[-1]] = result.value

# recursive method to iterate over hierarchical dict    
def showTable(t, indent=''):
    for k,v in t.items():
        print indent+k,
        if isinstance(v,dict):
            print
            showTable(v, indent+'  ')
        else:
            print v
showTable(table)

打印:

Same
  Same2
    Foot
      Ankle
        Joint
          Actuator
            Sensor
              Temperature
                Value 4.123
    Battery
      Name SomeString
    Home
      Land
        Some
          More
            Stuff 0.34

如果你真的设置定义这些变量名,那么在pyparsing中添加一些有用的解析操作将在解析时重新格式化已解析的数据,以便之后可以直接处理:

wd = Word(alphas, alphanums)
number = Word(nums+'+-', nums+'.').setParseAction(lambda t:float(t[0]))
rvaluewd = wd.copy().setParseAction(lambda t: '"%s"' % t[0])
rvalue = rvaluewd | number | QuotedString('"')
defn = delimitedList(wd, '/')('key') + '=' + rvalue('value')
def joinNamesWithAllCaps(tokens):
    tokens["key"] = '_'.join(map(str.upper, tokens.key))
defn.setParseAction(joinNamesWithAllCaps)
for d in data:
    result = defn.parseString(d)
    print result.key,'=', result.value

打印:

SAME_SAME2_FOOT_ANKLE_JOINT_ACTUATOR_SENSOR_TEMPERATURE_VALUE = 4.123
SAME_SAME2_BATTERY_NAME = "SomeString"
SAME_SAME2_HOME_LAND_SOME_MORE_STUFF = 0.34

(请注意,这还将SomeString值括在引号中,以便生成的赋值语句是有效的Python。)

相关内容

  • 没有找到相关文章

最新更新