从R中的socrata读取过滤后的数据

有人知道如何在导入的第一步中根据R中的socrata数据集中的date_of_evention自动过滤数据以加快读取时间吗？

这就是我目前拥有的

token <- "n15hFiXqJU6DBItiSjA4jWD2U"
PoliceIncidents <- read.socrata("https://www.dallasopendata.com/resource/qv6i-rri7.csv", app_token = token)

#过滤2019年的警察事件数据以呈现

PoliceIncidents2019to2020 <- PoliceIncidents %>% filter(servyr > 2018)

这是源数据https://www.dallasopendata.com/Public-Safety/Police-Incidents/qv6i-rri7/data

您可以在原始查询中使用过滤器，仅提取2019年以来的事件。这将加快读取过程，主要来自不需要传递那么多数据的服务器响应。您需要使用"；API字段名称"；以构造查询。

在这种情况下：

PoliceIncidents <- read.socrata("https://www.dallasopendata.com/resource/qv6i-rri7.csv?servyr > 2018")

对于大型csv，我喜欢tidyverse的包vroom。它比read_csv快得多。有了vroom，通常更容易吞下整个东西，然后过滤。

library(vroom)
library(tidyverse)
df_raw<-vroom('Police_Incidents.csv')
occurence_2019<-df_raw %>%
filter(`Year1 of Occurrence`>=2019)

这只花了大约10秒。

相关内容