我需要将基本字符串转换为目标字符串。我现在有一个工作代码,但如果有","字符,它说tvg-name,代码是坏的,不能工作。我怎么能修正这个错误呢?
基本工作字符串:{tvg-id: , tvg-name: A beautiful Day - 2016, tvg-logo: https://image.tmdb.org/t/p/w600_and_h900_bestv2/hZgsmIYUAtdUOUFKROq6rNyWXVa.jpg, group-title: 2017-16-15 Germany Cinema}
基本问题字符串:{tvg-id: , tvg-name: Antonio, ihm schmeckt's nicht! (2016), tvg-logo: https://image.tmdb.org/t/p/w600_and_h900_bestv2/dyLfGb1mF2PUd0Rz5kqKiYtQl3r.jpg, group-title: 2017-16-15 Germany Cinema}
:{"tvg-id": "None", "tvg-name": "Antonio, ihm schmeckt's nicht! (2016)", "tvg-logo": "https://image.tmdb.org/t/p/w600_and_h900_bestv2/dyLfGb1mF2PUd0Rz5kqKiYtQl3r.jpg", "group-title": "2017-16-15 Germany Cinema"}
My Convert Function
def convert(example):
#split the string into a list
example= example.replace("{", "").replace("}", "").split(",")
#create a dictionary
final = {}
#loop through the list
for i in example:
#split the string into a list
i = i.split(":")
#if http or https is in the list merge with next item
if "http" in i[1] or "https" in i[1]:
i[1] = i[1] + ":" + i[2]
i.pop(2)
#remove first char whitespace
if i[0][0] == " ":
i[0]=i[0][1:]
#remove first char whitespace
if i[1][0] == " ":
i[1]=i[1][1:]
final[i[0]] = i[1]
#return the dictionary
return final
我们可以使用正则表达式代替正常的.split(',')
来帮助我们处理分割。
import re
def convert(example):
kv_pairs = re.split(', (?=w+-?w+:)', example[1:-1])
result = {}
for kv_pair in kv_pairs:
key, value = kv_pair.split(': ', 1)
result[key] = value
return result
在re.split(', (?=w+-?w+:)', example[1:-1])
中,我们只拆分那些后跟模式(?=w+-?w+:)
的逗号,例如tvg-logo:
。
在key, value = kv_pair.split(': ', 1)
中,我们指定了maxsplit=1
,这样我们就不需要担心值(如url)中的冒号。
希望有帮助。
如果没有一些启发式方法,你真的无法做到这一点。
这是一段有效的代码-
from typing import Dict, Optional
def convert(input: str) -> Dict[str, Optional[str]]:
input = input.strip()[1:-1] # Remove the curly braces {...}
result: Dict[str, Optional[str]] = {}
carryover = ''
for pair in input.split(','):
kv = (carryover + pair).strip().split(':', 1)
if len(kv) == 1:
carryover += pair + ','
continue
result[kv[0]] = kv[1] if kv[1] else None
carryover = ''
return result
如果在当前字符串之前没有':'
,则防止输出。
注意,如果你有像'{ab,cd:ef,gh}'
这样的字符串,这将会中断,因为它不知道如何处理'gh'。这其实有点模棱两可
正确处理所有情况下,唯一的选择就是改变输入源引用的字符串,如果可能的话。如果这是不可能的,或者如果这是一次性的事情,您可以尝试扩展启发式以涵盖所有情况。
Regex做好事:
import re
def convert(s):
s = s[1:-1] # Remove {}
# Split on commas followed by a space then group of characters that end in ':'
s = re.split(', (?=S+:)', s)
# Split each of these groups on the first ': '. Now it's basically a dict.
return dict(i.split(': ', 1) for i in s)
>>> x = '{tvg-id: , tvg-name: A beautiful Day - 2016, tvg-logo: https://image.tmdb.org/t/p/w600_and_h900_bestv2/hZgsmIYUAtdUOUFKROq6rNyWXVa.jpg, group-title: 2017-16-15 Germany Cinema}'
>>> print(convert(x))
# Output:
{'tvg-id': '', 'tvg-name': 'A beautiful Day - 2016', 'tvg-logo': 'https://image.tmdb.org/t/p/w600_and_h900_bestv2/hZgsmIYUAtdUOUFKROq6rNyWXVa.jpg', 'group-title': '2017-16-15 Germany Cinema'}
>>> x = "{tvg-id: , tvg-name: Antonio, ihm schmeckt's nicht! (2016), tvg-logo: https://image.tmdb.org/t/p/w600_and_h900_bestv2/dyLfGb1mF2PUd0Rz5kqKiYtQl3r.jpg, group-title: 2017-16-15 Germany Cinema}"
>>> print(convert(x))
# Output:
{'tvg-id': '', 'tvg-name': "Antonio, ihm schmeckt's nicht! (2016)", 'tvg-logo': 'https://image.tmdb.org/t/p/w600_and_h900_bestv2/dyLfGb1mF2PUd0Rz5kqKiYtQl3r.jpg', 'group-title': '2017-16-15 Germany Cinema'}
您可以检查字符串是否以{
开始并以}
结束,然后匹配键值对
匹配键和值的模式:
([^s:,{}]+):s*([^,{}]*)
([^s:,{}]+)
捕获组1,匹配除空白字符以外的1+字符:
,
{
}
:s*
匹配冒号后面跟着可选的空白字符([^,{}]*)
捕获组2,匹配,
{
}
以外的可选字符
查看正则表达式演示和Python演示
import re
strings = [
"{tvg-id: , tvg-name: A beautiful Day - 2016, tvg-logo: https://image.tmdb.org/t/p/w600_and_h900_bestv2/hZgsmIYUAtdUOUFKROq6rNyWXVa.jpg, group-title: 2017-16-15 Germany Cinema}",
"{tvg-id: , tvg-name: Antonio, ihm schmeckt's nicht! (2016), tvg-logo: https://image.tmdb.org/t/p/w600_and_h900_bestv2/dyLfGb1mF2PUd0Rz5kqKiYtQl3r.jpg, group-title: 2017-16-15 Germany Cinema}"
]
def convert(example):
pattern = r"([^s:,{}]+):s*([^,{}]*)"
dct = {}
if example.endswith and example.startswith:
for t in re.findall(pattern, example):
if t[1].strip():
dct[t[0]] = t[1]
else:
dct[t[0]] = None
return dct
for s in strings:
print(convert(s))
输出{'tvg-id': None, 'tvg-name': 'A beautiful Day - 2016', 'tvg-logo': 'https://image.tmdb.org/t/p/w600_and_h900_bestv2/hZgsmIYUAtdUOUFKROq6rNyWXVa.jpg', 'group-title': '2017-16-15 Germany Cinema'}
{'tvg-id': None, 'tvg-name': 'Antonio', 'tvg-logo': 'https://image.tmdb.org/t/p/w600_and_h900_bestv2/dyLfGb1mF2PUd0Rz5kqKiYtQl3r.jpg', 'group-title': '2017-16-15 Germany Cinema'}