如果可能的话,我如何使用谷歌云自然语言API对推文进行分类



我正在尝试使用Google Cloud Natural Language API对推文进行分类,以便筛选出与我的受众无关的推文(与天气相关(。我可以理解,人工智能解决方案对少量文本进行分类一定很棘手,但我想它至少会对这样的文本进行猜测:

预计西北部将出现零到-5度的风寒阿肯色州进入阿肯色州中北部,延伸至俄克拉荷马州北部上午6点至9点时段#arwx#okwx

我测试了几条推文,但只有极少数得到了分类,其余的都没有结果(或者"找不到分类。尝试更长的文本输入。"如果我通过GUI尝试的话(。

希望这种做法奏效是没有意义的吗?或者,是否可以降低分类的阈值?一个";有根据的猜测;来自NLP解决方案将比完全没有滤波器要好。是否有其他解决方案(在训练我自己的NLP模型之外(?

编辑:为了澄清:

最后,我使用谷歌云平台自然语言API来对推文进行分类。为了测试它,我正在使用GUI(链接到上面(。我可以看到,我测试的推文(在GUI中(中,有相当多的推文是从GCP-NLP中得到分类的,即类别是空的。

我想要的状态是GCPNLP提供推文文本的类别猜测,而不是提供空结果。我假设NLP模型删除了置信度小于X%的任何结果。如果知道是否可以配置该阈值,那将是一件有趣的事情。

我想推文的分类以前一定已经做过了,如果有其他方法可以解决这个问题?

编辑2:分类推文代码:

async function classifyTweet(tweetText) {
const language = require('@google-cloud/language');
const client = new language.LanguageServiceClient({projectId, keyFilename});
//const tweetText = "Some light snow dusted the ground this morning, adding to the intense snow fall of yesterday. Here at my Warwick station the numbers are in, New Snow 19.5cm and total depth 26.6cm. A very good snow event. Photos to be posted. #ONStorm #CANWarnON4464 #CoCoRaHSON525"
const document = {
content: tweetText,
type: 'PLAIN_TEXT',
};   
const [classification] = await client.classifyText({document});

console.log('Categories:');
classification.categories.forEach(category => {
console.log(`Name: ${category.name}, Confidence: ${category.confidence}`);
});

return classification.categories
}

我深入研究了云自然语言的当前状态,我对您的主要问题的回答是,在自然语言的目前状态下,对文本进行分类是不可能的。尽管如此,一个变通方法是,如果您的类别基于分析输入中的文本所获得的输出。

考虑到我们没有为此使用自定义模型,只是使用云自然语言提供的选项,关于这一问题的一种尝试方法如下:

首先,我已经根据我们的需要更新了官方样本中的代码,以进一步解释这一点:

from google.cloud import language_v1 
from google.cloud.language_v1 import enums 

def sample_cloud_natural_language_text(text_content):
""" 
Args:
text_content The text content to analyze. Must include at least 20 words.
"""
client = language_v1.LanguageServiceClient()
type_ = enums.Document.Type.PLAIN_TEXT
language = "en"
document = {"content": text_content, "type": type_, "language": language}

print("=====CLASSIFY TEXT=====")
response = client.classify_text(document)
for category in response.categories:
print(u"Category name: {}".format(category.name))
print(u"Confidence: {}".format(category.confidence))

print("=====ANALYZE TEXT=====")
response = client.analyze_entities(document)
for entity in response.entities:
print(f">>>>> ENTITY {entity.name}")  
print(u"Entity type: {}".format(enums.Entity.Type(entity.type).name))
print(u"Salience score: {}".format(entity.salience))
for metadata_name, metadata_value in entity.metadata.items():
print(u"{}: {}".format(metadata_name, metadata_value))
for mention in entity.mentions:
print(u"Mention text: {}".format(mention.text.content))
print(u"Mention type: {}".format(enums.EntityMention.Type(mention.type).name))

if __name__ == "__main__":
#text_content = "That actor on TV makes movies in Hollywood and also stars in a variety of popular new TV shows."
text_content="Wind chills of zero to -5 degrees are expected in Northwestern Arkansas into North-Central Arkansas extending into portions of northern Oklahoma during the 6-9am window"

sample_cloud_natural_language_text(text_content)

输出

=====CLASSIFY TEXT=====
=====ANALYZE TEXT=====
>>>>> ENTITY Wind chills
Entity type: OTHER
Salience score: 0.46825599670410156
Mention text: Wind chills
Mention type: COMMON
>>>>> ENTITY degrees
Entity type: OTHER
Salience score: 0.16041776537895203
Mention text: degrees
Mention type: COMMON
>>>>> ENTITY Northwestern Arkansas
Entity type: ORGANIZATION
Salience score: 0.07702474296092987
mid: /m/02vvkn4
wikipedia_url: https://en.wikipedia.org/wiki/Northwest_Arkansas
Mention text: Northwestern Arkansas
Mention type: PROPER
>>>>> ENTITY North
Entity type: LOCATION
Salience score: 0.07702474296092987
Mention text: North
Mention type: PROPER
>>>>> ENTITY Arkansas
Entity type: LOCATION
Salience score: 0.07088913768529892
mid: /m/0vbk
wikipedia_url: https://en.wikipedia.org/wiki/Arkansas
Mention text: Arkansas
Mention type: PROPER
>>>>> ENTITY window
Entity type: OTHER
Salience score: 0.06348973512649536
Mention text: window
Mention type: COMMON
>>>>> ENTITY Oklahoma
Entity type: LOCATION
Salience score: 0.04747137427330017
wikipedia_url: https://en.wikipedia.org/wiki/Oklahoma
mid: /m/05mph
Mention text: Oklahoma
Mention type: PROPER
>>>>> ENTITY portions
Entity type: OTHER
Salience score: 0.03542650490999222
Mention text: portions
Mention type: COMMON
>>>>> ENTITY 6
Entity type: NUMBER
Salience score: 0.0
value: 6
Mention text: 6
Mention type: TYPE_UNKNOWN
>>>>> ENTITY 9
Entity type: NUMBER
Salience score: 0.0
value: 9
Mention text: 9
Mention type: TYPE_UNKNOWN
>>>>> ENTITY -5
Entity type: NUMBER
Salience score: 0.0
value: -5
Mention text: -5
Mention type: TYPE_UNKNOWN
>>>>> ENTITY zero
Entity type: NUMBER
Salience score: 0.0
value: 0
Mention text: zero
Mention type: TYPE_UNKNOWN

正如您所看到的,classify text并没有多大帮助(结果是空的(。当我们开始analyze text时,我们可以得到一些值。我们可以使用它来构建或拥有类别。诀窍(也是艰巨的工作(将是建立适合每个类别(我们建立的类别(的关键词库,我们可以用来设置我们正在分析的数据。关于分类,我们可以查看谷歌提供的当前可用类别列表,了解类别应该是什么样子。

我认为lower the bar还没有在当前版本中实现任何功能,但它可以作为一个功能向谷歌请求。

最新更新