我遇到了以下(风格化的(数据清理问题:
df <- data.frame(first_column = c("country1", "variable1", "variable2","country2", "variable1", "variable2"),
second_column = c(NA, "15", "16", NA, "62", "63")
)
df
#> first_column second_column
#> 1 country1 <NA>
#> 2 variable1 15
#> 3 variable2 16
#> 4 country2 <NA>
#> 5 variable1 62
#> 6 variable2 63
由reprex包(v0.3.0(创建于2020-11-02
我正试图将其转换为";整洁的";(即长格式或宽格式(,但无法使用pivot_longer_spec
和pivot_wider_spec
。关于这些函数的文档似乎很少,我很难找到如何正确指定参数。
有人能告诉我如何使用这些函数或其他函数来处理这个问题吗?
非常感谢。
zoo包的替代解决方案:
library(zoo)
library(dplyr)
df <- data.frame(first_column = c("country1", "variable1", "variable2","country2", "variable1", "variable2"),
second_column = c(NA, "15", "16", NA, "62", "63"))
df %>%
dplyr::mutate(COUNTRY = ifelse(is.na(second_column), first_column, NA)) %>%
dplyr::mutate(COUNTRY = zoo::na.locf(COUNTRY)) %>%
dplyr::filter(!is.na(second_column)) %>%
tidyr::pivot_wider(names_from = first_column, values_from = second_column)
# A tibble: 2 x 3
COUNTRY variable1 variable2
<chr> <chr> <chr>
1 country1 15 16
2 country2 62 63
这可以这样实现:
- ;棘手的";部分是将国家标识符放在一个单独的列中,我使用以第二列中的
NA
值为条件的ifelse(如@DPH的方法(、在国家列上的fill
和filter
来消除";郡行 - 之后我们可以简单地
pivot_wider
library(tidyr)
library(dplyr)
df <- data.frame(first_column = c("country1", "variable1", "variable2","country2", "variable1", "variable2"),
second_column = c(NA, "15", "16", NA, "62", "63")
)
df %>%
mutate(country = ifelse(is.na(second_column), first_column, NA)) %>%
tidyr::fill(country) %>%
filter(first_column != country) %>%
tidyr::pivot_wider(names_from = "first_column", values_from = "second_column")
#> # A tibble: 2 x 3
#> country variable1 variable2
#> <chr> <chr> <chr>
#> 1 country1 15 16
#> 2 country2 62 63