Python Regex /正则表达式-如何在保持目标文本完整的同时绕过目标文本?



下面是目标文本的示例:

{"feature1"color","feature2":"size","name_color":"Gray","name_size":"7"10 "x10"2 ","ebay":"{"_id ":"6175 ee6eb7f86b42582b4667 ","rawColor ":"灰色","rawSize ":"7"10 "x10"2 ""}","overstock":"{"_id ":"6175eef7b7f86b42582b4678", rawColor";棕色/红色", rawSize";棕色/红色", rawSize";7 \";;颜色";;feature2 &;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;"棕色/红色""rawSize ":"7"10 "x10"2 ""}","overstock":"{"_id ":"6175 eef7b7f86b42582b4679 ","rawColor ":"灰色","rawSize ":"7"10 "x10"2 ""}"}"、"{"feature1"color","feature2":"size","name_color":"Gray","name_size":"7"10 "x10"2 ""ebay":"{"_id ":"6175 ee72b7f86b42582b466c ","rawColor ":"棕色/红色","rawSize ":"7"10 "x10"2 ""}","overstock":"{"_id ":"6175 eef7b7f86b42582b4678 ","rawColor ":"棕色/红色","rawSize ":"7"10 "x10"2 ""}"}"、"{"feature1"color","feature2"size"name_color":"Gray"name_size":"7"10 "x10"2 ""ebay":"{"_id ":"6175 ee6eb7f86b42582b4667 ","rawColor ":"灰色","rawSize ":"7"10 "x10"2 ""}","overstock":"{"_id ":"6175 eef7b7f86b42582b4679 ","rawColor ":"灰色""rawSize ":"7"10 "x10"2 ""}"}

不幸的是,我需要让这个被json.loads接受,它失败了,因为JSONDecodeError: Expecting value: line 1 column 1 (char 0)

到目前为止我试过的是:

import re 
import json
problem = "{'{"feature1": "color", "feature2": "size", "name_color": "Gray", "name_size": "7\'10\\"x10\'2\\"", "ebay": "{\\"_id\\": \\"6175ee6eb7f86b42582b4667\\", \\"rawColor\\": \\"Gray\\", \\"rawSize\\": \\"7\'10\\\\\\"x10\'2\\\\\\"\\"}", "overstock": "{\\"_id\\": \\"6175eef7b7f86b42582b4678\\", \\"rawColor\\": \\"Brown/Red\\", \\"rawSize\\": \\"7\'10\\\\\\"x10\'2\\\\\\"\\"}"}', '{"feature1": "color", "feature2": "size", "name_color": "Gray", "name_size": "7\'10\\"x10\'2\\"", "ebay": "{\\"_id\\": \\"6175ee72b7f86b42582b466c\\", \\"rawColor\\": \\"Brown/Red\\", \\"rawSize\\": \\"7\'10\\\\\\"x10\'2\\\\\\"\\"}", "overstock": "{\\"_id\\": \\"6175eef7b7f86b42582b4679\\", \\"rawColor\\": \\"Gray\\", \\"rawSize\\": \\"7\'10\\\\\\"x10\'2\\\\\\"\\"}"}', '{"feature1": "color", "feature2": "size", "name_color": "Gray", "name_size": "7\'10\\"x10\'2\\"", "ebay": "{\\"_id\\": \\"6175ee72b7f86b42582b466c\\", \\"rawColor\\": \\"Brown/Red\\", \\"rawSize\\": \\"7\'10\\\\\\"x10\'2\\\\\\"\\"}", "overstock": "{\\"_id\\": \\"6175eef7b7f86b42582b4678\\", \\"rawColor\\": \\"Brown/Red\\", \\"rawSize\\": \\"7\'10\\\\\\"x10\'2\\\\\\"\\"}"}', '{"feature1": "color", "feature2": "size", "name_color": "Gray", "name_size": "7\'10\\"x10\'2\\"", "ebay": "{\\"_id\\": \\"6175ee6eb7f86b42582b4667\\", \\"rawColor\\": \\"Gray\\", \\"rawSize\\": \\"7\'10\\\\\\"x10\'2\\\\\\"\\"}", "overstock": "{\\"_id\\": \\"6175eef7b7f86b42582b4679\\", \\"rawColor\\": \\"Gray\\", \\"rawSize\\": \\"7\'10\\\\\\"x10\'2\\\\\\"\\"}"}'}"
b = problem
b = re.sub(r's\\"', ' "', b)
b = re.sub(r'\\"_id\\', '"_id', b) # cleans up area around _id
b = re.sub(r'\\":', '":', b) # cleans up post property and colon
b = re.sub(r'\\",', '",', b) # cleans up post property and comma
b = re.sub(r'\\"}"}', '}}', b) # cleans up ending of string 
b = re.sub(r'\\\\\\"', '\\\"', b) # fixes inches backslashes
b = re.sub(r'\\"', '\"', b) # clears up escaping inches
b = re.sub(r'"",', '",', b) # clears up extra quotation marks
b = re.sub(r'"{"', '{"', b)
finally_b = b[1:-1:] # removes the extra { and } from the ends 
print('b...')
print(b)
print()
print('finally_b...')
print(finally_b)
json.loads( finally_b )

输出:

b...
{'{"feature1": "color", "feature2": "size", "name_color": "Gray", "name_size": "7'10"x10'2", "ebay": {"_id": "6175ee6eb7f86b42582b4667", "rawColor": "Gray", "rawSize": "7'10"x10'2""}", "overstock": {"_id": "6175eef7b7f86b42582b4678", "rawColor": "Brown/Red", "rawSize": "7'10"x10'2"}}', '{"feature1": "color", "feature2": "size", "name_color": "Gray", "name_size": "7'10"x10'2", "ebay": {"_id": "6175ee72b7f86b42582b466c", "rawColor": "Brown/Red", "rawSize": "7'10"x10'2""}", "overstock": {"_id": "6175eef7b7f86b42582b4679", "rawColor": "Gray", "rawSize": "7'10"x10'2"}}', '{"feature1": "color", "feature2": "size", "name_color": "Gray", "name_size": "7'10"x10'2", "ebay": {"_id": "6175ee72b7f86b42582b466c", "rawColor": "Brown/Red", "rawSize": "7'10"x10'2""}", "overstock": {"_id": "6175eef7b7f86b42582b4678", "rawColor": "Brown/Red", "rawSize": "7'10"x10'2"}}', '{"feature1": "color", "feature2": "size", "name_color": "Gray", "name_size": "7'10"x10'2", "ebay": {"_id": "6175ee6eb7f86b42582b4667", "rawColor": "Gray", "rawSize": "7'10"x10'2""}", "overstock": {"_id": "6175eef7b7f86b42582b4679", "rawColor": "Gray", "rawSize": "7'10"x10'2"}}'}
finally_b...
'{"feature1": "color", "feature2": "size", "name_color": "Gray", "name_size": "7'10"x10'2", "ebay": {"_id": "6175ee6eb7f86b42582b4667", "rawColor": "Gray", "rawSize": "7'10"x10'2""}", "overstock": {"_id": "6175eef7b7f86b42582b4678", "rawColor": "Brown/Red", "rawSize": "7'10"x10'2"}}', '{"feature1": "color", "feature2": "size", "name_color": "Gray", "name_size": "7'10"x10'2", "ebay": {"_id": "6175ee72b7f86b42582b466c", "rawColor": "Brown/Red", "rawSize": "7'10"x10'2""}", "overstock": {"_id": "6175eef7b7f86b42582b4679", "rawColor": "Gray", "rawSize": "7'10"x10'2"}}', '{"feature1": "color", "feature2": "size", "name_color": "Gray", "name_size": "7'10"x10'2", "ebay": {"_id": "6175ee72b7f86b42582b466c", "rawColor": "Brown/Red", "rawSize": "7'10"x10'2""}", "overstock": {"_id": "6175eef7b7f86b42582b4678", "rawColor": "Brown/Red", "rawSize": "7'10"x10'2"}}', '{"feature1": "color", "feature2": "size", "name_color": "Gray", "name_size": "7'10"x10'2", "ebay": {"_id": "6175ee6eb7f86b42582b4667", "rawColor": "Gray", "rawSize": "7'10"x10'2""}", "overstock": {"_id": "6175eef7b7f86b42582b4679", "rawColor": "Gray", "rawSize": "7'10"x10'2"}}'
---------------------------------------------------------------------------
JSONDecodeError: Expecting value: line 1 column 1 (char 0)

有没有更好的方法来处理像\\"rawSize\\"这样的东西,把它们变成"rawSize"?这就是我所说的绕过rawSize这个词的方式,只是清理这个词周围的东西。

我认为数据看起来损坏了。看看这部分:"name_size": "7'10"x10'2""";和'周围的7都没有反斜杠。这在解释时就会成为一个问题。

我个人建议清理字符串。您可以将字符串转换为原始字符串,可能通过test_string.encode('unicode_escape')编码然后确保每一个"'前面有反斜杠,然后json加载它?

最新更新