Shell脚本从html标记检索数据

我想通过shell脚本获取</em>4,519</a>标记中的值，有人能帮忙吗？

id='idusedMemory' alt='graph'/></em>4,519</a> Mb / 64,309 Mb&nbsp;&nbsp;&nbsp;</td><td>

使用支持/具有-P标志的grep。

grep -Po '(?<=</em>).*(?=</a>)' file

或

echo 'id='idusedMemory' alt='graph'/></em>4,519</a> Mb / 64,309 Mb&nbsp;&nbsp;&nbsp;</td><td>' | grep -Po '(?<=</em>).*(?=</a>)'

正如评论中所建议的，不要使用此类工具解析html/xml。使用分析此类文件的工具/实用程序。

只需将grep与-o开关一起使用即可显示该信息：

grep -o "</em>.*</a>" test.txt

.*代表任意字符的任意数量。

如果您的HTML字符串只包含一个子字符串，那么您可以使用regexp和sed:

echo "id='idusedMemory' alt='graph'/></em>4,519</a> Mb / 64,309 Mb&nbsp;&nbsp;&nbsp;</td><td>" | sed -rn 's@^.*</em>(.*)</a>.*$@1@p'

输出：

4,519

如果您有更复杂的东西，您可能需要检查bash中的XML解析。例如，在这里。

希望能有所帮助。

相关内容