制作一个包含在一个大字符串中的链接数组

我有一个大字符串（来自网页的html代码）。

现在的问题是如何解析到图像的链接。

我想做一个数组的所有链接的图像在那个网页。

我知道如何在java中做到这一点，但我不知道如何在shell中解析字符串和进行字符串操作。我知道有很多技巧，我想这很容易做到。

最后我想得到像这样的

    #!/bin/bash
read BIG_STRING <<< $(curl some_web_page_with_links_to_images.com)
#parse the big string and fill the LINKS variable
    # fill this with the links to image somewhow (.jpg and .png only)
    #after the parsing the LINKS should look like this
    LINKS=("www.asd.com/asd1.jpg" "www.asd.com/asd.jpg" "www.asd.com/asd2123.jpg")

    #I need the parsing and to fill the LINKS variable with the links from the web page
    # get length of an array
    tLen=${#LINKS[@]}

    for (( i=0; i<${tLen}; i++ ));
    do
      echo ${LINKS[$i]}
    done

谢谢你的回复，你为我省去了几天的沮丧

为什么不从正确的工具开始？解析HTML很困难，尤其是使用sed。如果你有Mojolicus项目中的mojo工具，你可以这样做：

mojo get http://example.com a attr href

然后只需检查每一行是否以jpg、png或其他什么结束即可。

很难提供比近似值更多的东西。假设所有感兴趣的链接都是href=""属性，每行最多有一个href属性（而且链接也只有一行，实际上我不确定URL中是否允许换行

假设您的源文件名为test.html。

以下内容应在这些假设下打印所有链接：

sed -n 's/.*<href="([^"]*)".*/1/p' test.html

要了解它是如何工作的，您应该知道什么是正则表达式，并且已经阅读了关于sed的教程（特别是ssubstitute命令是如何工作）

相关内容

最新更新

热门标签：