python 3.x-在文本文件中定位动态字符串

问题：

你好，我最近一直在努力编程。我已经设法从谷歌语音到文本收到了下面的输出，但我不知道如何从这个块中提取数据。

摘录1:

[VoiceMain]：成功初始化

｛"结果"：[]｝｛"result"：[｛"alternative"：["transcript"："hello"，"confidence"：0.46152416｝，｛"transcriptive"："how-lo"｝，｝

[VoiceMain]：成功初始化

｛"结果"：[]｝｛"result"：[｛"alternative"：["transcript"："hello"｝，｛"transcriptive"："how long"｝

目标：

我的目标是从每个块的第一个记录中提取字符串"hello"(不带引号(，并将其设置为一个变量。当我不知道这个短语会是什么时，问题就出现了。这个短语可能不是"你好"，而是任何长度的字符串。即使它是不同的字符串，我仍然希望将其设置为短语"hello"设置为的相同变量。

此外，我想提取"信心"一词后面的数字。在这种情况下，它是0.46152416。数据类型对于置信度变量来说并不重要。置信度变量似乎更难从块中提取，因为它可能存在也可能不存在。如果它不存在，则必须忽略它。但是，如果它存在，则必须将其检测并存储为变量。

另外请注意，此文本块存储在名为"CurlOutput.txt"的文件中。

非常感谢所有与解决此问题有关的帮助或建议。

您可以使用regex来实现这一点，但我假设您稍后会希望将其用作代码中的dict。因此，这里有一种python方法来将这个结果构建为字典。

import json
with open('CurlOutput.txt') as f:
    lines = f.read().splitlines()
    flag = '{"result":[]} '
    for line in lines: # Loop through each lin in file
        if flag in line: # check if this is a line with data on it
            results = json.loads(line.replace(flag, ''))['result'] # Load data as a dict
            # If you just want to change first index of alternative
            # results[0]['alternative'][0]['transcript'] = 'myNewString'
            # If you want to check all alternative for confidence and transcript
            for result in results[0]['alternative']: # Loop over each alternative
                transcript = result['transcript']
                confidence = None
                if 'confidence' in result:
                    confidence = result['confidence']
                # now do whatever you want with confidence and transcript.

相关内容

最新更新

热门标签：