我已经尝试了一段时间以成功地通过博彩公司网站解析并检索市场/赔率。
我来了,我可以获取硒网元元素的.TEXT属性,所以我有类似的东西:
编辑以展示更多示例
BPZ vs Griffin - League of Legends - Challenger Korea
Sat 2/25 1511 BPZ 1.645
10:30PM 1512 Griffin 2.250
Team Battle Comics vs RisingStar Gaming - League of Legends - Challenger Korea
Sat 2/25 1513 Team Battle Comics 5.800
11:59PM 1514 RisingStar Gaming 1.133
Going In vs Hala Ares - Dota 2 - Prodota Cup
Sat 2/25 1529 Going In 1.667
1:30PM 1530 Hala Ares 2.200
Unicorns of Love vs G2 Esports - League of Legends - Intel Extreme Masters
Sat 2/25 1545 Unicorns of Love 2.750
11:15AM 1546 G2 Esports 1.444
实际上几个小时的谷歌正则搜索和读取语法后,我无法做到的是在我需要的情况下提取该字符串的一部分。在上面的字符串中,如果我可以使用Regex将其过滤成看起来像这样的字典:
{'event':'BPZ vs Griffin - League of Legends',
'outcome1':'BPZ',
'outcome2':'Griffin',
'outcome1odds':1.645,
'outcome2odds':2.25,
'date':'Sat 2/25',
'time':'10:30PM'}
那我会非常高兴。我相当确定这是可能的,但是我遇到了太多的困难,将我的头围绕着正则努力以实现这一目标。非常感谢任何帮助和/或资源。
此模式应执行技巧:
(?P<event>(?P<outcome1>[^-]+?) vs (?P<outcome2>[^-]+) -.*?) -[^b]*?(?P<date>(?:Mon|Tue|Wed|Thu|Fri|Sat|Sun) d+/d+(?:/d+)?)[^.]*(?P<outcome1odds>d+.d+)s+(?P<time>d+:d+[AP]M)[^.]*(?P<outcome2odds>d+.d+)
这很长,但是作为交换,您可以使用.groupdict()
函数直接获得所需的结果:
print(re.match(pattern, text).groupdict())
分解:
(?P<event> # in a named capture group, match...
(?P<outcome1> # outcome1, which is...
[^-]+? # all text up to...
)
vs # a literal " vs "
(?P<outcome2> # outcome2 is...
[^-]+ # all text up to...
)
- # the next literal " -"
.*? # still inside the "event" group, match until...
)
- # a literal " -"
[^b]*? # skip forward to...
(?P<date> # the date, which is...
(?:Mon|Tue|Wed|Thu|Fri|Sat|Sun) # a weekday
d+/d+(?:/d+)? # followed by digits separated with /
)
[^.]* # skip worward to...
(?P<outcome1odds>
d+.d+ # a floating point number
)
s+
(?P<time> # match the time, which is...
d+:d+ # digits separated with :
[AP]M # followed by AM or PM
)
[^.]* # skip to...
(?P<outcome2odds>
d+.d+ # another floating point number
)
使用re.match()和match.groupdict()的解决方案(获取匹配的所有命名子组)方法:
s = '''
BPZ vs Griffin - League of Legends - Challenger Korea
Sat 2/25 1511 BPZ 1.645
10:30PM 1512 Griffin 2.250
'''
p = r'^(?P<event>[w ]+-[w ]+)s[ws-]+'
r'(?P<date>[A-Z]w+ d+/d{2})s+d+s(?P<outcome1>[w ]+)'
r'(?P<outcome1odds>d+.d+)s+(?P<time>d+:d+(AM|PM))s+d+s'
r'(?P<outcome2>[w ]+)(?P<outcome2odds>d+.d+)'
matches = re.match(p, s.strip(), re.M)
result = {k:v.strip() for k,v in matches.groupdict().items()}
print(result)
输出:
{'time': '10:30PM', 'event': 'BPZ vs Griffin - League of Legends', 'outcome2odds': '2.250', 'date': 'Sat 2/25', 'outcome2': 'Griffin', 'outcome1': 'BPZ', 'outcome1odds': '1.645'}
使用此长正则表达式,您可以在8组中找到数据:
(.*-.*)s-s.*n(w{3})s*(d+/d+)s*d+s*(w+)s*(d+.?d*)s*n(d+:d+ww)s*d+s*(w+)s*(d+.?d*)
Full match 0-104 `BPZ vs Griffin - League of Legends - Challenger Korea Sat 2/25 1511 BPZ 1.645
10:30PM 1512 Griffin 2.250`
Group 1. n/a `BPZ vs Griffin - League of Legends`
Group 2. n/a `Sat`
Group 3. n/a `2/25`
Group 4. n/a `BPZ`
Group 5. n/a `1.645`
Group 6. n/a `10:30PM`
Group 7. n/a `Griffin`
Group 8. n/a `2.250`
{'event':'$1',
'outcome1':'$4',
'outcome2':'$7',
'outcome1odds':$5,
'outcome2odds':$8,
'date':'$2 $3',
'time':'$6'}
https://regex101.com/r/2idfnb/2