我现在正在做我的applescript,我被困在这里了。让我们把这个片段作为html代码的一个例子
<body><div>Apple don't behave accordingly <a href = "http://apple.com>apple</a></div></body>
我现在需要的是返回没有html标签的单词。要么通过删除括号中的所有内容,要么可能有任何其他方法将html重新格式化为纯文本..
结果应该是:
Apple没有相应的行为
我想我应该添加一个额外的答案,因为我有问题。如果您不想丢失UTF-8字符,您需要:
set plain_text to do shell script "echo " & quoted form of ("<!DOCTYPE HTML PUBLIC><meta charset="UTF-8">" & html_string) & space & "| textutil -convert txt -stdin -stdout"
你基本上需要添加<meta charset="UTF-8">
元标签,以确保textutil看到这是一个utf-8文档
如何使用textutil?
on run -- example (don't forget to escape quotes)
removeMarkup from "<body><div>Apple don't behave accordingly <a href = "http://apple.com">apple</a></div></body>"
end run
to removeMarkup from someText -- strip HTML using textutil
set someText to quoted form of ("<!DOCTYPE HTML PUBLIC>" & someText) -- fake a HTML document header
return (do shell script "echo " & someText & " | /usr/bin/textutil -stdin -convert txt -stdout") -- strip HTML
end removeMarkup
on findStrings(these_strings, search_string)
set the foundList to {}
repeat with this_string in these_strings
considering case
if the search_string contains this_string then set the end of the foundList to this_string
end considering
end repeat
return the foundList
end findStrings
findStrings({"List","Of","Strings","To","find..."}, "...in String to search")