AppleScript 文本分隔符 "Can’t get text items"



我正试图让AppleScript从网站上找到一些数据,并将其复制为文本。

然而,有时我会出现错误:错误"无法获取\的第2到第1个文本项"correct@data.com\"."编号-1728,来自第2条至第1条的文本项,共"correct@data.com"

知道吗?

to getInputByClass(theClass, num) -- defines a function with two inputs, theClass and num
    tell application "Safari" --tells AS that we are going to use Safari
        set input to do JavaScript "
document.getElementsByClassName('" & theClass & "')[" & num & "].innerHTML;" in document 1 -- uses JavaScript to set the variable input to the information we want
    end tell
    return input --tells the function to return the value of the variable input
end getInputByClass

-- start here
getInputByClass("spacebefore", 0)
set theText to getInputByClass("spacebefore", 0)
-- clear text
set DATA1ID to extractText(theText, "">", "</td>")
to extractText(searchText, startText2, endText)
    set tid to AppleScript's text item delimiters
    set startText1 to "x"
    set searchText to ("x" & searchText)
    set AppleScript's text item delimiters to startText1
    set endItems to text item -1 of searchText
    set AppleScript's text item delimiters to endText
    set beginningToEnd to text item 1 of endItems
    set AppleScript's text item delimiters to startText2
    set finalText to (text items 2 thru -1 of beginningToEnd)
    set AppleScript's text item delimiters to tid
    return finalText
end extractText
-- DATA1ID Found
to getInputByClass2(theClass, num) -- defines a function with two inputs, theClass and num
    tell application "Safari" --tells AS that we are going to use Safari
        set input to do JavaScript "
document.getElementsByClassName('" & theClass & "')[" & num & "].innerHTML;" in document 1 -- uses JavaScript to set the variable input to the information we want
    end tell
    return input --tells the function to return the value of the variable input
end getInputByClass2

-- start here
getInputByClass2("inspectDSInInspector", 0)
set theText to getInputByClass2("inspectDSInInspector", 0)
-- clear text
set DATA2DSID to extractText2(theText, "</a>", "</td>")
to extractText2(searchText, startText2, endText)
    set tid to AppleScript's text item delimiters
    set startText1 to "x"
    set searchText to ("x" & searchText)
    set AppleScript's text item delimiters to startText1
    set endItems to text item -1 of searchText
    set AppleScript's text item delimiters to endText
    set beginningToEnd to text item 1 of endItems
    set AppleScript's text item delimiters to startText2
    set finalText to (text items 2 thru -1 of beginningToEnd)
    set AppleScript's text item delimiters to tid
    return finalText
end extractText2

set finalResult to "DATA2DSID: " & DATA2DSID & "
DATA1ID: " & DATA1ID
set the clipboard to finalResult

tell application "System Events" to keystroke "v" using command down

更新:

<td class="inspectDATAInInspector"><a href="/WebObjects/DATA.DATA/DT/DDDDTTTAAAADDTA/0.1.0"></a>48784745</td>

"href="/WebObjects/DATA.DATA/DT/DDDDTTAAAADDTA/0.1.0">48784745"不是修复数据,不会更改的是,我需要的是末尾的随机数,在本例中为48784745

我在这里制作的剧本正在这里工作,但我偶尔会提到这个信息。我想这可能是因为我必须将数据转换为纯文本,直到HTML或类似的东西。

常见的解决方案,它还检查源文本是否包含两个标签

set sourceText to "<td class="inspectDATAInInspector"><a href="/WebObjects/DATA.DATA/DT/DDDDTTTAAAADDTA/0.1.0"></a>48784745</td>"
set startTextAfterTag to "</a>"
set endTextBeforeTag to "</td>"
set startOffset to offset of startTextAfterTag in sourceText
set endOffset to offset of endTextBeforeTag in sourceText
if startOffset = 0 or endOffset = 0 or endOffset < startOffset then
    display dialog "The source text does not contain the specified tags."
    return
end if
set extractedText to extractTextBetweenTags(sourceText, startTextAfterTag, endTextBeforeTag)
on extractTextBetweenTags(theText, startTag, endTag)
    set saveTID to text item delimiters
    set text item delimiters to startTag
    set secondPart to text item 2 of theText
    set text item delimiters to endTag
    set firstPart to text item 1 of secondPart
    set text item delimiters to saveTID
    return firstPart
end extractTextBetweenTags

编辑:

建议#2:它捕获倒数第二个></td标签之间的所有内容

set sourceText to "<td class="inspectDATAInInspector"><a href="/WebObjects/DATA.DATA/DT/DDDDTTTAAAADDTA/0.1.0"></a>48784745</td>"
set startTextAfterTag to ">"
set endTextBeforeTag to "</td"
set extractedText to extractTextBetweenTags(sourceText, startTextAfterTag, endTextBeforeTag)
on extractTextBetweenTags(theText, startTag, endTag)
    set saveTID to text item delimiters
    set text item delimiters to startTag
    set secondPart to text item -2 of theText
    set text item delimiters to endTag
    set firstPart to text item 1 of secondPart
    set text item delimiters to saveTID
    return firstPart
end extractTextBetweenTags

建议#3:如果安装了SatImage.OSAX,则可以使用正则表达式

set sourceText to "<td class="inspectDATAInInspector"><a href="/WebObjects/DATA.DATA/DT/DDDDTTTAAAADDTA/0.1.0"></a>48784745</td>"
try
    set foundText to find text ">(\d+)</td>$" in sourceText using 1 with regexp
    set extractedText to foundText's matchResult
on error
    display dialog "The source text does not match the regex."
end try

最新更新