Regex-文本中的每个出现

我正在尝试分析NOAA/NWS的天气警报，我必须承认我的正则表达式不太好，我正在为一个概念而挣扎。

基本上，警报强制一行上有很多字符，所以有很多换行符我想忽略，这样我就可以抓住"；概念"；在文本之外。

我想做的是把所有的*项目作为单独的项目提取出来，我可以稍后格式化为项目符号

看来子弹是要打碎它们的。所以我试过

/* (?:X*)(?: n){2}/

这似乎一直选择到"*"之前的行关联"；。我想得到的是每个分组都是单独选择的。

下面是我正在使用的示例。

我确信对于一个regex更好的人来说，这很简单，所以我希望有人能给我一些指导。谢谢

...FLASH FLOOD WATCH REMAINS IN EFFECT UNTIL 7 AM AKDT THIS 
MORNING... 
The Flash Flood Watch continues for: 

* Including the following area, Cape Decision to Salisbury Sound 
Coastal Area. 

* Until 7 AM AKDT this morning. 

* The low tracked inland over the northeast gulf coast early 
Saturday morning. This storm produced 48 hour rain amounts ranging 
from 2 to 4 inches with possible localized higher amounts. Locally 
heavy showers will spread onshore through Saturday morning. 
Antecedent soil moisture values are similar to last October which 
produced landslides in the Sitka area. 

* Associated with the moderate to heavy rainfall, high winds will 
increase the risk of isolated landslides in prone areas late 
Friday afternoon into Friday night. There will be sharp rises on 
small streams through Friday with potential flooding by Saturday 
morning.

您可以使用正则表达式

(?<=^* )(?:[^n]|n(?! *n))*

调用多行选项以规定^和$与每行的开头匹配，而不是(默认情况下)字符串的开头和结尾。

看看它在行动！。

考虑字符串

* Including the following area, Cape Decision to Salisbury Sound 
Coastal Area.

为了查看字符串中的所有空格和换行符，我将空格转换为下划线，将换行符转换为八叉，也称为标签或磅符号('#')：

*_Including_the_following_area,_Cape_Decision_to_Salisbury_Sound_#Coastal_Area._##

注意"Sound"后面的空格和换行符，以及末尾两个连续换行符前面的空格。显然，我们必须在正则表达式中说明这些。

我们可以在自由间距模式中编写正则表达式，使其具有自文档性。我将使用通用语法¹。

/
(?<=      # begin a negative lookbehind
^       # match the beginning of a line
*      # match '*'
       # match a space (escaped space character)
)         # end negative lookbehind
(?:       # begin non-capture group
[^n]   # match any character other than a newline
|       # or
n      # match a newline
(?!     # begin negative lookahead
 *   # match zero or more (*) spaces
n    # match a newline
)       # end negative lookahead
)         # end non-capture group
*         # execute non-capture group zero or more times
/xm       # invoke free-spacing and multiline modes

如果每个'* '都是匹配字符串的一部分，则用^*替换(?<=^* )。

^{1.注意，我已经转义了这两个空格字符。这是因为当在自由间距模式下解析正则表达式时，在对其求值之前，所有未受保护的空白(和注释)都会被剥离。逃离一个空间可以保护它，尽管当表达式以自由间距模式编写时，读者可能不清楚空间是否存在。根据语言的不同，我更喜欢以一种更明显的方式来保护空间。例如，在Ruby中，有几个选项，其中一个是将空格放在字符类([ ])中。偶尔，如果希望匹配空格，则应该针对空格字符进行匹配，而不是针对空白字符(s)。当只匹配空格时，换行符是空白字符这一事实可能会特别麻烦}

我可能已经想好了，但如果这是正确的方法，我仍然希望得到一些评论。

我所做的是寻找双引号作为终止符，但也寻找*可能是最后一个项目的情况，所以是"$&"；。我在正则表达式/gU中也做了不合理的操作。

* (?:X*)(?:(?: n){2}|.$)

这有道理吗？

相关内容

最新更新

热门标签：