在python中的数据帧内的列表中进行Regex替换



你好,我有数据帧,比如:

COL1 COL2  COL3 
G1   1     [[(OK2_+__HELLO,OJ_+__BY),(LO_-__HOLLA,KUOJ_+__BY)]]
G1   2     [[(JU3_+__BO,UJ3_-__GET)]]

如何在COL3列表中使用re.sub(r'.*__', '')

并且在'__'之前不进行任何更改就获得一个新列:

COL1 COL2  COL3 COL4 
G1   1     [[(OK2_+__HELLO,OJ_+__BY),(LO_-__HOLLA,KUOJ_+__BY)]] [[(HELLO,OBY),(HOLLA,BY)]]
G1   2     [[(JU3_+__BO,UJ3_-__GET)]] [(BO,GET)]]

这是数据:

data= {'COL1': {0: 'G1', 1: 'G1'}, 'COL2': {0: 1, 1: 2}, 'COL3 ': {0: "[[(OK2_+__HELLO,OJ_+__BY),(LO_-__HOLLA,KUOJ_+__BY)]]", 1: "[[(JU3_+__BO,UJ3_-__GET)]]"}}
df = pd.DataFrame.from_dict(data)

更新的数据解决方案

data= {'COL1': {0: 'G1', 1: 'G1'}, 'COL2': {0: 1, 1: 2}, 'COL3 ': {0: "[[(OK2_+__HELLO,OJ_+__BY),(LO_-__HOLLA,KUOJ_+__BY)]]", 1: "[[(JU3_+__BO,UJ3_-__GET)]]"}}
df = pd.DataFrame.from_dict(data)
df['COL4'] = df['COL3 '].str.replace(r"([,(])[^(),]*__", r"1")
df['COL4']
# => 0    [[(HELLO,BY),(HOLLA,BY)]]
#    1                 [[(BO,GET)]]
#    Name: COL4, dtype: object

请参阅regex演示。

旧数据解决方案

您可以使用ast.literal_evalCOL3列中的字符串转换为列表列表,并在修改元组项时对其进行迭代:

import ast
import pandas as pd
data= {'COL1': {0: 'G1', 1: 'G1'}, 'COL2': {0: 1, 1: 2}, 'COL3 ': {0: "[[('OK2_+__HELLO','OJ_+__BY'),('LO_-__HOLLA','KUOJ_+__BY')]]", 1: "[[('JU3_+__BO','UJ3_-__GET')]]"}}
df = pd.DataFrame.from_dict(data)
def repl(m):
result = []
for l in ast.literal_eval(m):
ll = []
for x, y in l:
ll.append(tuple([re.sub(r'.*__', '', x), re.sub(r'.*__', '', y)]))
result.append(ll)
return str(result)
df['COL4'] = df['COL3 '].apply(repl)
df['COL4']
# => 0    [[('HELLO', 'BY'), ('HOLLA', 'BY')]]
#    1                       [[('BO', 'GET')]]

如果可以将结果保留为列表列表,则不需要使用str(result)

最新更新