我很难为日期范围匹配其他情况。最终目标是提取每个组以构建ISO 8601日期格式。
测试用例
May 8th – 14th, 2019
November 25th – December 2nd
November 5th, 2018 – January 13th, 2019
September 17th – 23rd
正则表达式
(w{3,9})s([1-9]|[12]d|3[01])(?:st|nd|rd|th),s(19|20)d{2}s–s(w{3,9})s([1-9]|[12]d|3[01])(?:st|nd|rd|th),s(19|20)d{2}
正则表达式
我希望能够捕获每个组,无论它是否存在。
例如,May 8th – 14th, 2019
Group 1 May
Group 2 8th
Group 3
Group 4
Group 5 14th
Group 6 2019
和November 5th, 2018 – January 13th, 2019
Group 1 November
Group 2 5th
Group 3 2018
Group 4 January
Group 5 13th
Group 6 2019
如果组不匹配,要捕获空字符串,一般思路是使用(<characters to match>|)
试试这个:
([A-z]{3,9})s((?:[1-9]|[12]d|3[01])(?:st|nd|rd|th))(?:, (?=19|20))?(d{4}|)s–s([A-z]{3,9}|)s?((?:[1-9]|[12]d|3[01])(?:st|nd|rd|th))(?:, (?=19|20))?(d{4}|)
https://regex101.com/r/4UY0WE/1/
尝试捕获月份(第一组(时,请确保使用[A-z]{3,9}
而不是w{3,9}
,否则您可能会匹配,例如,23rd
而不是月份字符串。
分离出来:
([A-z]{3,9}) # Month ("January")
s
((?:[1-9]|[12]d|3[01])(?:st|nd|rd|th)) # Day of month, including suffix ("23rd")
(?:, (?=19|20))? # Comma and space, if followed by year
(d{4}|) # Year
s–s #
([A-z]{3,9}|) # same as first line
s?
# same as third to fifth lines:
((?:[1-9]|[12]d|3[01])(?:st|nd|rd|th))
(?:, (?=19|20))?
(d{4}|)
这个通过合并一些分组来节省一些空间。
在这里尝试一下
完整正则表达式:
([A-z]{3,9}) ((?:[1-9]|[12]d|3[01])(?:st|nd|rd|th))(?:, ((?:19|20)d{2}))? [–-] ([A-z]{3,9}s)?((?:[1-9]|[12]d|3[01])(?:st|nd|rd|th))(?:, ((?:19|20)d{2}))?
按组分隔(为便于阅读,空格替换为s
(:
1. ([A-z]{3,9})
s
2. ((?:[1-9]|[12]d|3[01])(?:st|nd|rd|th))
3. (?:,s((?:19|20)d{2}))?
s[–-]s
4. ([A-z]{3,9}s)?
5. ((?:[1-9]|[12]d|3[01])(?:st|nd|rd|th))
6. (?:,s((?:19|20)d{2}))?
此方法不使用查找,因此对于任何正则表达式引擎通常是安全的。