我想通过library(rtimes)
从《纽约时报》Api中提取信息。api调用返回一个3的列表,对于一个R新手来说,它似乎以一种无法访问的方式包含了我需要的信息。
install.packages("rtimes")
require(rtimes)
# Here I use the Key provides by the New York Times
api <- "[redacted]"
# I create an empty vector to append required information to it,
mylist <- c()
# The default article api call for "Crisis"
NY_terror<- as_search(q="Crisis",
begin_date = '20110101',
end_date ='20110201',
fl=c("pub_date","headline","keywords","abstract","_id"),
facet_field=c("section_name"),
key = api)
#Here I extract the data. At least I believe that
mylist<- append(mylist, unlist(NY_terror$data))
但我只以一个必需的列"pub_date"以及相应关键字的freq.count结束。我想问一下如何生成一个在fl
和face_field
中定义了列的数据帧。
因此,所需的输出应该看起来像:
id section_name pub_date headline keywords abstract
... Politics 2011-01-01 MAMBA ... ...
posted
API Key
我认为这应该能让你开始,你可以继续以同样的方式添加更多的字段:
b <- list()
for(i in 1:length(NY_terror$data$docs)){
a <- as.data.frame(as.character(unlist(NY_terror$data$docs[[i]]$byline$person)))[5,1]
b <- rbind(b,as.character(a))
}
b <- unlist(b)
b # first author's last name (if given), can be expanded for multiple authors
c <- list()
for(i in 1:length(NY_terror$data$docs)){
a <- as.data.frame(as.character(unlist(NY_terror$data$docs[[i]]$pub_date)))[[1]]
c <- rbind(c,as.character(a))
}
c <- unlist(c)
c # dates
d <- list()
for(i in 1:length(NY_terror$data$docs)){
a <- as.character(unlist(NY_terror$data$docs[[i]]$keywords[[1]]$value))
d <- rbind(d,a)
}
d <- unlist(d)
d # keywords
res <- cbind(b,c,d)
res[,1] <- gsub("reported", "NA",res[,1])
res
b c d
[1,] "BOSMAN" "2011-01-30T20:14:04Z" "Financial Crisis Inquiry Commission"
[2,] "CHAN" "2011-01-29T09:00:03Z" "Regulation and Deregulation of Industry"
[3,] NA "2011-01-25T17:20:36Z" "Financial Crisis Inquiry Commission"
[4,] "CRAIG" "2011-01-27T14:17:32Z" "Financial Crisis Inquiry Commission"
[5,] "MORGENSON" "2011-01-30T00:00:00Z" "Banking and Financial Institutions"
[6,] "BOSMAN" "2011-01-31T00:00:00Z" "FINANCIAL CRISIS INQUIRY COMMISSION"
[7,] "CHAN" "2011-01-25T00:00:00Z" "Subprime Mortgage Crisis"
[8,] "NA" "2011-01-28T09:30:54Z" "Securities and Commodities Violations"
[9,] NA "2011-01-25T02:15:29Z" "Justice Department"
[10,] "NOCERA" "2011-01-29T00:00:00Z" "Banking and Financial Institutions"