Azure Kusto-如何使用解析从字符串中获取URL


let T = datatable(Id:int, Text:string)
[
1, "SomeTextSome TextSomeText: https://someurl.com/fileId/edit/12649844",
2, "SomeText SomeText&nbsp;<https://someurl.com/fileId/newedit/71244>SomeTextSomeTextSomeTextSomeText",
];
T | parse Text with * "someurl.com" myurl ">" * | project Id, myurl 
Output
=========
Id  myurl
1   
2   /fileId/newedit/12702480

需要一种解析文本字段并从中提取url的方法。文本字段的内容是html正文。使用解析工作,如果url Id后面有更多的字符,比如">quot;或者空白,但是如果Text字段以url id结尾,则不起作用。Url Id不是固定长度。如果不解析,是否有其他方法可以从someurl.com中提取所有内容,直到id,而不管url是在字符串的中间还是结尾?

如果您对URL格式有任何了解,可以尝试将其包含在正则表达式中,并使用extract() function

例如:

datatable(Id:int, Text:string)
[
1, "SomeTextSome TextSomeText:&nbsp;https://someurl.com/fileId/edit/12649844",
2, "SomeText SomeText&nbsp;<https://someurl.com/fileId/newedit/71244>SomeTextSomeTextSomeTextSomeText",
]
| extend Url = extract(@"someurl.com(/w+/w+/d+)", 1, Text)
Id文本Url
1某些文本某些文本:https://someurl.com/fileId/edit/12649844|/fileId/edit/12649844 |
2SomeText SomeTexthttps://someurl.com/fileId/newedit/71244SomeTextSomeTextSomeTextSomeText/fileId/newedit/7124
let T = datatable(Id:int, Text:string)
[
1, "SomeTextSome TextSomeText:&nbsp;https://someurl.com/fileId/edit/12649844",
2, "SomeText SomeText&nbsp;<https://someurl.com/fileId/newedit/71244>SomeTextSomeTextSomeTextSomeText",
];
T | extend URL=extract('&nbsp;.?(https://[a-zA-Z0-9/.]+)', 1, Text)

+1对Yoni L.的回答。以上内容将包括https。。。。不确定这是否是你想要的。

输出

URL
1   https://someurl.com/fileId/edit/12649844
2   https://someurl.com/fileId/newedit/71244

最新更新