如何从《纽约时报》Api中提取信息



我想通过library(rtimes)从《纽约时报》Api中提取信息。api调用返回一个3的列表,对于一个R新手来说,它似乎以一种无法访问的方式包含了我需要的信息。

install.packages("rtimes")
require(rtimes)
# Here I use the Key provides by the New York Times
api <- "[redacted]" 
# I create an empty vector to append required information to it,
mylist <- c()
# The default article api call for "Crisis"
NY_terror<- as_search(q="Crisis",
begin_date = '20110101', 
end_date ='20110201',
fl=c("pub_date","headline","keywords","abstract","_id"),
facet_field=c("section_name"),
key = api)
#Here I extract the data. At least I believe that
mylist<- append(mylist, unlist(NY_terror$data))    

但我只以一个必需的列"pub_date"以及相应关键字的freq.count结束。我想问一下如何生成一个在flface_field中定义了列的数据帧。

因此,所需的输出应该看起来像:

id  section_name         pub_date      headline  keywords  abstract
...      Politics       2011-01-01      MAMBA      ...     ...   
posted
API Key

我认为这应该能让你开始,你可以继续以同样的方式添加更多的字段:

b <- list()
for(i in 1:length(NY_terror$data$docs)){
a <- as.data.frame(as.character(unlist(NY_terror$data$docs[[i]]$byline$person)))[5,1]
b <- rbind(b,as.character(a))
}
b <- unlist(b)
b # first author's last name (if given), can be expanded for multiple authors
c <- list()
for(i in 1:length(NY_terror$data$docs)){
a <- as.data.frame(as.character(unlist(NY_terror$data$docs[[i]]$pub_date)))[[1]]
c <- rbind(c,as.character(a))
}
c <- unlist(c)
c # dates
d <- list()
for(i in 1:length(NY_terror$data$docs)){
a <- as.character(unlist(NY_terror$data$docs[[i]]$keywords[[1]]$value))
d <- rbind(d,a)
}
d <- unlist(d)
d # keywords
res <- cbind(b,c,d)
res[,1] <- gsub("reported", "NA",res[,1])
res
b           c                      d                                        
[1,] "BOSMAN"    "2011-01-30T20:14:04Z" "Financial Crisis Inquiry Commission"    
[2,] "CHAN"      "2011-01-29T09:00:03Z" "Regulation and Deregulation of Industry"
[3,] NA          "2011-01-25T17:20:36Z" "Financial Crisis Inquiry Commission"    
[4,] "CRAIG"     "2011-01-27T14:17:32Z" "Financial Crisis Inquiry Commission"    
[5,] "MORGENSON" "2011-01-30T00:00:00Z" "Banking and Financial Institutions"     
[6,] "BOSMAN"    "2011-01-31T00:00:00Z" "FINANCIAL CRISIS INQUIRY COMMISSION"    
[7,] "CHAN"      "2011-01-25T00:00:00Z" "Subprime Mortgage Crisis"               
[8,] "NA"        "2011-01-28T09:30:54Z" "Securities and Commodities Violations"  
[9,] NA          "2011-01-25T02:15:29Z" "Justice Department"                     
[10,] "NOCERA"    "2011-01-29T00:00:00Z" "Banking and Financial Institutions"     

最新更新