在R中创建键值存储的问题



我正在尝试创建一个键值存储,其中键是实体,值是新闻文章中实体的平均情绪得分。

我有一个数据框架,其中包含新闻文章和一个由分类器在这些新闻文章中识别的实体列表,称为组织1。organization1列表的第一行包含news_us数据帧第一行文章中标识的实体。我试图遍历组织列表,并创建一个键值存储,其中键是组织1列表中的实体,值是提到该实体的新闻描述的情感得分。我的代码不会改变情绪列表中的分数,我不知道为什么。我的第一个猜测是,我必须使用情绪列表上的$运算符来添加值,但这也没有改变任何事情。这是我迄今为止的代码:

library(syuzhet)
sentiment <- list()
organization1 <- list(NULL, "US", "Bath", "Animal Crossing", "World Health Organization", 
NULL, c("Microsoft", "Facebook"))
news_us <- structure(list(title = c("Stocks making the biggest moves after hours: Bed Bath & Beyond, JC Penney, United Airlines and more - CNBC", 
"Los Angeles mayor says 'very difficult to see' large gatherings like concerts and sporting events until 2021 - CNN", 
"Bed Bath & Beyond shares rise as earnings top estimates, retailer plans to maintain some key investments - CNBC", 
"6 weeks with Animal Crossing: New Horizons reveals many frustrations - VentureBeat", 
"Timeline: How Trump And WHO Reacted At Key Moments During The Coronavirus Crisis : Goats and Soda - NPR", 
"Michigan protesters turn out against Whitmer’s strict stay-at-home order - POLITICO"
), description = c("Check out the companies making headlines after the bell.", 
"Los Angeles Mayor Eric Garcetti said Wednesday large gatherings like sporting events or concerts may not resume in the city before 2021 as the US grapples with mitigating the novel coronavirus pandemic.", 
"Bed Bath & Beyond said that its results in 2020 "will be unfavorably impacted" by the crisis, and so it will not be offering a first-quarter nor full-year outlook.", 
"Six weeks with Animal Crossing: New Horizons has helped to illuminate some of the game's shortcomings that weren't obvious in our first review.", 
"How did the president respond to key moments during the pandemic? And how did representatives of the World Health Organization respond during the same period?", 
"Many demonstrators, some waving Trump campaign flags, ignored organizers‘ pleas to stay in their cars and flooded the streets of Lansing, the state capital."
), name = c("CNBC", "CNN", "CNBC", "Venturebeat.com", "Npr.org", 
"Politico")), na.action = structure(c(`35` = 35L, `95` = 95L, 
`137` = 137L, `154` = 154L, `213` = 213L, `214` = 214L, `232` = 232L, 
`276` = 276L, `321` = 321L), class = "omit"), row.names = c(NA, 
6L), class = "data.frame")
i = as.integer(0)
for(index in organizations1){
i <- i+1
if(is.character(index)) { #if entity is not null/NA
val <- get_sentiment(news_us$description[i], method = "afinn")
#print(val)
print(sentiment[[index[1]]])
sentiment[[index[1]]] <- sentiment[[index[1]]]+val
}
}

以下是运行上述代码块后的情绪列表:

$US
integer(0)
$Bath
integer(0)
$`Animal Crossing`
integer(0)
$`World Health Organization`
integer(0)
$`Apple TV`
integer(0)
$`Pittsburgh Steelers`
integer(0)

而我希望它看起来像:

$US
1.3
$Bath
0.3
$`Animal Crossing`
2.4
$`World Health Organization`
1.2
$`Apple TV`
-0.7
$`Pittsburgh Steelers`
0.3

对于在文章中标识的多个实体,值列可以具有多个值。

我不确定organization1news_us$description是如何关联的,但也许你打算这样使用它?

library(syuzhet)
setNames(lapply(news_us$description, get_sentiment), unlist(organization1))
#$US
#[1] 0
#$Bath
#[1] -0.4
#$`Animal Crossing`
#[1] -0.1
#$`World Health Organization`
#[1] 1.1
#$Microsoft
#[1] -0.6
#$Facebook
#[1] -1.9

最新更新