R:当标识符和变量在同一列时的数据类型



我遇到了以下(风格化的(数据清理问题:

df <- data.frame(first_column  = c("country1", "variable1", "variable2","country2", "variable1", "variable2"),
second_column = c(NA, "15", "16", NA, "62", "63")
)
df     
#>   first_column second_column
#> 1     country1          <NA>
#> 2    variable1            15
#> 3    variable2            16
#> 4     country2          <NA>
#> 5    variable1            62
#> 6    variable2            63

由reprex包(v0.3.0(创建于2020-11-02

我正试图将其转换为";整洁的";(即长格式或宽格式(,但无法使用pivot_longer_specpivot_wider_spec。关于这些函数的文档似乎很少,我很难找到如何正确指定参数。

有人能告诉我如何使用这些函数或其他函数来处理这个问题吗?

非常感谢。

zoo包的替代解决方案:

library(zoo)
library(dplyr)
df <- data.frame(first_column  = c("country1", "variable1", "variable2","country2", "variable1", "variable2"),
second_column = c(NA, "15", "16", NA, "62", "63"))
df %>% 
dplyr::mutate(COUNTRY = ifelse(is.na(second_column), first_column, NA)) %>% 
dplyr::mutate(COUNTRY = zoo::na.locf(COUNTRY)) %>% 
dplyr::filter(!is.na(second_column)) %>% 
tidyr::pivot_wider(names_from = first_column, values_from = second_column)
# A tibble: 2 x 3
COUNTRY  variable1 variable2
<chr>    <chr>     <chr>    
1 country1 15        16       
2 country2 62        63    

这可以这样实现:

  1. ;棘手的";部分是将国家标识符放在一个单独的列中,我使用以第二列中的NA值为条件的ifelse(如@DPH的方法(、在国家列上的fillfilter来消除";郡行
  2. 之后我们可以简单地pivot_wider
library(tidyr)
library(dplyr)
df <- data.frame(first_column  = c("country1", "variable1", "variable2","country2", "variable1", "variable2"),
second_column = c(NA, "15", "16", NA, "62", "63")
)
df %>% 
mutate(country = ifelse(is.na(second_column), first_column, NA)) %>%
tidyr::fill(country) %>% 
filter(first_column != country) %>% 
tidyr::pivot_wider(names_from = "first_column", values_from = "second_column")
#> # A tibble: 2 x 3
#>   country  variable1 variable2
#>   <chr>    <chr>     <chr>    
#> 1 country1 15        16       
#> 2 country2 62        63

最新更新