目标是在JSON文件中使用jq将hrefFull
的部分分配给hrefSimple
和hrefSubsite
。可能有更好的方法来实现这一点,但我已经通过寻找一种解决方案来解决这个问题,该解决方案删除了键值中的字符串articles
之前的所有内容,但保留了字符串。因此,像下面的示例对象这样的多个对象包含在一个JSON文件中,格式为[
在开始,]
在结束。
所需结果:
hrefFull
不变。从hrefFull
中提取的字符串应用于hrefSimple
和hrefSubsite
。hrefSimple
是articles
之后的所有内容。如果articles
不在字符串中,则hrefSimple
为最后一个/
后面的字符串。参见示例对象7。hrefSubsite
是https://docs.mysite.com/
和/articles...
之间的字符串。
示例结果-对象1:
{
"hrefFull": "https://docs.mysite.com/product-a/articles/page-a.html",
"hrefSimple": "articles/page-a.html",
"hrefSubsite": "product-a"
}
示例结果-对象2:
{
"hrefFull": "https://docs.mysite.com/product-b/articles/guide-b/page-b.html",
"hrefSimple": "articles/guide-b/page-b.html",
"hrefSubsite": "product-b"
}
示例结果-对象3:
{
"hrefFull": "https://docs.mysite.com/product-c/articles/guide-c/section-c/page-c.html",
"hrefSimple": "articles/guide-c/section-c/page-c.html",
"hrefSubsite": "product-c"
}
示例结果-对象4:
{
"hrefFull": "https://docs.mysite.com/product-d/sub-product-d/articles/page-d.html",
"hrefSimple": "articles/page-d.html",
"hrefSubsite": "product-d/sub-product-d"
}
示例结果-对象5:
{
"hrefFull": "https://docs.mysite.com/product-e/sub-product-e/articles/guide-e/page-e.html",
"hrefSimple": "articles/guide-e/page-e.html",
"hrefSubsite": "product-e/sub-product-e"
}
示例结果-对象6:
{
"hrefFull": "https://docs.mysite.com/product-f/sub-product-f/articles/guide-f/section-f/page-f.html",
"hrefSimple": "articles/guide-f/section-f/page-f.html",
"hrefSubsite": "product-f/sub-product-f"
}
示例结果-对象7:
{
"hrefFull": "https://docs.mysite.com/product-g/index.html",
"hrefSimple": "index.html",
"hrefSubsite": "product-g"
}
尝试失败(在Bash脚本中):
siteUrl="docs.mysite.com"
jq '
(.hrefSimple = .hrefFull)
| .hrefSimple |= (gsub("https://($siteUrl)/.*?/"; ""))
| (.hrefSubsite = .hrefFull)
| .hrefSubsite |= (gsub("https://($siteUrl)/"; ""))
' file-1.json > file-2.json
脚本生成准确和不准确的结果。
准确的结果:
- 对象1 对象2
- 对象3
- 对象7
不准确的结果:
- 对象4:
hrefSimple
错误的是sub-product-d/articles/page-d.html
而不是articles/page-d.html
hrefSubsite
是错误的sub-product-d
而不是product-d/sub-product-d
- 对象5:
hrefSimple
是错误的sub-product-e/articles/guide-e/page-e.html
而不是articles/guide-e/page-e.html
hrefSubsite
错误地是sub-product-e
而不是product-e/sub-product-e
- 对象6:
hrefSimple
是错误的sub-product-f/articles/guide-f/section-f/page-f.html
而不是articles/guide-f/section-f/page-f.html
hrefSubsite
是错误的sub-product-f
而不是product-f/sub-product-f
其他不成功的尝试(如果有帮助,我可以提供确切的结果):
articles
以.hrefSimple |= (gsub("https://($siteUrl)/.*?/"; ""))
和.hrefSubsite |= (gsub("https://($siteUrl)/"; ""))
形式的各种迭代.hrefSimple |= split("articles")[0]
的各种迭代(也在.hrefSubsite
内)
对于上下文,如果重要的话,hrefFull
来自Azure App Insights导出的文档网站的页面视图。导出的数据将用于分析报告。我正在创建hrefSimple
来连接两个表,并希望对hrefSubsite
进行过滤。hrefFull
中的路径是在使用DocFx静态站点生成器生成网站并部署到Azure Blob时产生的。
我会使用capture
与一个正则表达式:
. + (.hrefFull | capture(
"^https://docs.mysite.com/(?<hrefSubsite>.*?)/(?<hrefSimple>articles.*|[^/]*)$"
))
{
"hrefFull": "https://docs.mysite.com/product-a/articles/page-a.html",
"hrefSubsite": "product-a",
"hrefSimple": "articles/page-a.html"
}
{
"hrefFull": "https://docs.mysite.com/product-b/articles/guide-b/page-b.html",
"hrefSubsite": "product-b",
"hrefSimple": "articles/guide-b/page-b.html"
}
{
"hrefFull": "https://docs.mysite.com/product-c/articles/guide-c/section-c/page-c.html",
"hrefSubsite": "product-c",
"hrefSimple": "articles/guide-c/section-c/page-c.html"
}
{
"hrefFull": "https://docs.mysite.com/product-d/sub-product-d/articles/page-d.html",
"hrefSubsite": "product-d/sub-product-d",
"hrefSimple": "articles/page-d.html"
}
{
"hrefFull": "https://docs.mysite.com/product-e/sub-product-e/articles/guide-e/page-e.html",
"hrefSubsite": "product-e/sub-product-e",
"hrefSimple": "articles/guide-e/page-e.html"
}
{
"hrefFull": "https://docs.mysite.com/product-f/sub-product-f/articles/guide-f/section-f/page-f.html",
"hrefSubsite": "product-f/sub-product-f",
"hrefSimple": "articles/guide-f/section-f/page-f.html"
}
{
"hrefFull": "https://docs.mysite.com/product-g/index.html",
"hrefSubsite": "product-g",
"hrefSimple": "index.html"
}
演示如果您的输入对象位于数组中,请将此过滤器包装到map(…)
.