我的问题很基本,但我是R的新手,几天来我一直在努力解决这个问题,但没有成功:(
我的工作
这是CoronaNet_clean数据帧。
country date_start date_end
South Africa 2020-03-22 NA
South Africa 2020-04-12 2020-06-02
Australia 2021-02-11 2020-04-12
Australia 2020-06-10 NA
United States 2020-01-01 NA
United States 2020-12-08 NA
这是tweetgovuser数据帧
country screen_name created_at text
South Africa HealthZA 2020-12-08 The number of health care workers....
...
我想要什么
我想在tweetgovuser中创建一个名为lockdown_summy的列。我希望这个指标/伪变量是基于三个条件创建的:
- 如果created_at(tweetgovuser(与date_start或date_end匹配(CoronaNet_clean(,则让lockdown_dummy
- 如果created_at在date_end和date_start的日期之间,则让lockdown_summy=1
- 如果以上条件均不成立,则设lockdown_summy=0
最终产品应该是这样的:
country screen_name created_at text lockdown_dummy
South Africa HealthZA 2020-12-08 The number.... 1
...
我尝试过的
我尝试了几个不同的代码块,但最近我写了一个非常粗糙、写得很差的代码来执行这个:
lockdown_dummy <- case_when(
created_at == date_start ~ 1,
created_at == date_end ~ 1,
"date_start" %<% created_at %<% "date_end" ~ 1
TRUE ~ 0
)
好问题。下次,尝试与一些导入代码共享更多数据,这样我们就更容易进行实验。您可以使用类似dput()
的函数。
在您的情况下,首先要做的是连接这两个表。这里我使用了left_join()
。
然后,您只需要调用ifelse()
来检查日期是否高于开始日期而小于结束日期。
这是代码:
library(tidyverse)
library(lubridate)
CoronaNet_cleaned = read.table(header=T, text="
country date_start date_end
'South Africa' 2020-03-22 NA
'South Africa' 2020-03-22 2027-03-22
'South Africa' 2020-04-12 2020-06-02
Australia 2021-02-11 2020-04-12
Australia 2020-06-10 NA
'United States' 2020-01-01 NA
'United States' 2020-12-08 NA")
tweetgovuser = read.table(header=T, text="
country screen_name created_at text
'South Africa' HealthZA 2020-12-08 'The number of health care workers'
")
CoronaNet_cleaned %>%
left_join(tweetgovuser, by="country") %>%
mutate(
date_start = ymd(date_start), #probably not needed with real data
date_end = ymd(date_end), #probably not needed with real data
created_at = ymd(created_at), #probably not needed with real data
dummy = ifelse(created_at>=date_start & created_at<=date_end, 1, 0),
)
#> # A tibble: 7 x 7
#> country date_start date_end screen_name created_at text dummy
#> <chr> <date> <date> <chr> <date> <chr> <dbl>
#> 1 South Afr~ 2020-03-22 NA HealthZA 2020-12-08 The number of h~ NA
#> 2 South Afr~ 2020-03-22 2027-03-22 HealthZA 2020-12-08 The number of h~ 1
#> 3 South Afr~ 2020-04-12 2020-06-02 HealthZA 2020-12-08 The number of h~ 0
#> 4 Australia 2021-02-11 2020-04-12 <NA> NA <NA> NA
#> 5 Australia 2020-06-10 NA <NA> NA <NA> NA
#> 6 United St~ 2020-01-01 NA <NA> NA <NA> NA
#> 7 United St~ 2020-12-08 NA <NA> NA <NA> NA
由reprex软件包(v1.0.0(于2021-05-20创建