用正则表达式变换列表



我有一个列表,它有这种形式的元素,字符串可能会改变,但格式保持不变:

["Radio0","Tether0","Serial0/0","Eth0/0","Eth0/1","Eth1/0","Eth1/1","vlanX","modem0","modem1","modem2","modem3","modem6"]

我想把它转换成下面的列表。您可以看到,它将删除字符串(如Eth)的相同出现的副本-仅在新列表中出现一次,并将数字转换为x和y以更通用:

["RadioX","TetherX","SerialX/Y","EthX/Y","vlanX","modemX"]

我正在摆弄不同的正则表达式,我的方法相当混乱,我对你们想到的任何优雅的解决方案都很感兴趣。

这里是一些可以改进的代码,而且set不保持顺序,所以也应该改进:

a = ["Radio0","Tether0","Serial0/0","Eth0/0","Eth0/1","Eth0/2","Eth1/0","vlanX","modem0","modem1","modem2","modem3","modem6"]
c =[]
for i in a:
     b = re.split("[0-9]", i)
     if "/" in i:
         c.append(b[0]+"X/Y")
     elif len(b) > 1:
         c.append(b[0]+"X")
     else:
         c.append(b)
print set(c)
set(['modemX', 'TetherX', 'RadioX', 'vlanX', 'SerialX/Y', 'EthX/Y'])

set上可能的改进:

unique=[]
[unique.append(item) for item in c if item not in unique]
print unique
['RadioX', 'TetherX', 'SerialX/Y', 'EthX/Y', 'vlanX', 'modemX']

下面的代码应该足够通用,允许字符串中最多有3个数字,但是您可以简单地更改repl变量以允许更多。

import re
elements = ["Radio0","Tether0","Serial0/0","Eth0/0","Eth0/1","Eth1/0","Eth1/1","vlanX","modem0","modem1","modem2","modem3","modem6"]
repl = "XYZ"
for i in range(len(repl)):
    elements = [re.sub("[0-9]",repl[i], element, 1) for element in elements]
result = set(elements)
import re

def particular_case(string):
    return re.sub("d+", "X", re.sub("d+/d+", "X/Y", w))

def generic_case(string, letters=['X', 'Y', 'Z']):
    len_letters = len(letters)
    list_matches = list(re.finditer(r'd+', string))
    result, last_index = "", 0
    if len(list_matches) == 0:
        return string
    for index, match in enumerate(list_matches):
        result += string[last_index:
                         match.start(0)] + letters[index % len_letters]
        last_index = match.end(0)
    return result
if __name__ == "__main__":
    words = ["Radio0", "Tether0", "Serial0/0", "Eth0/0", "Eth0/1", "Eth1/0",
             "Eth1/1", "vlanX", "modem0", "modem1", "modem2", "modem3", "modem6"]
    result = []
    result2 = []
    for w in words:
        new_value = particular_case(w)
        if new_value not in result:
            result.append(new_value)
        new_value = generic_case(w)
        if new_value not in result2:
            result2.append(new_value)
    print result
    print result2

我使用re.finditer查找并替换所有数字:

def repl(string):
    #use regex to find all numbers
    numbers= re.finditer(r'd+', string)
    #replace the numbers with letters. zip will stop when the sequence of
    #numbers OR letters runs out.
    for match, char in zip(numbers, 'XYZ'): #add more characters if necessary
        string= string[:match.start()] + char + string[match.end():]
    return string
s= set() #set to keep track of duplicates while maintaining order
result= []
for string in l:
    string= repl(string)
    if string in s: #ignore if duplicate
        continue
    #otherwise add to result list
    s.add(string)
    result.append(string)

这可以用X, YZ代替最多3个数字,可以很容易地修改以支持更多。

你可以选择:

import re
rx = r'd+'
incoming = ["Radio0","Tether0","Serial0/0","Eth0/0","Eth0/1","Eth1/0","Eth1/1","vlanX","modem0","modem1","modem2","modem3","modem6"]
outgoing = []
for item in incoming:
    t = re.sub(rx, 'X', item)
    if t not in outgoing:
        outgoing.append(t)
print(outgoing)
# ['RadioX', 'TetherX', 'SerialX/X', 'EthX/X', 'vlanX', 'modemX']

或者(只是在强大的Python列表推导的帮助下的另一个语法示例):

import re
rx = re.compile(r'd+')
incoming = ["Radio0","Tether0","Serial0/0","Eth0/0","Eth0/1","Eth1/0","Eth1/1","vlanX","modem0","modem1","modem2","modem3","modem6"]
def cleanitem(item):
    return rx.sub('X', item)
outgoing = []
[outgoing.append(item) 
    for item in (cleanitem(x) for x in incoming) 
    if item not in outgoing]
print(outgoing)


import re
import functools
lst = ["Radio0","Tether0","Serial0/0","Eth0/0","Eth0/1","Eth1/0","Eth1/1","vlanX","modem0","modem1","modem2","modem3","modem6"]
def process_str(s, letters='XY'):
    return functools.reduce(lambda txt, letter: re.sub(r'd+', letter, txt, 1), letters, s)
r = set(map(process_str, lst))
print(r)

最新更新