我有一个由字符、整数和因子组成的混合数据帧,我想将其转换为大写。这是一个常见的问题(例如这里(,但我无法在不转换数据类型的情况下将字符AND因子更改为大写。下面的工作示例:
# create a three column dataframe with characters, integers and factors:
df <- data.frame(v1=letters[1:5],v2=1:5,v3=as.factor(letters[10:14]),stringsAsFactors=FALSE)
v1 v2 v3
1 a 1 j
2 b 2 k
3 c 3 l
4 d 4 m
5 e 5 n
glimpse(df)
# v1 <chr> "a", "b", "c", "d", "e"
# v2 <int> 1, 2, 3, 4, 5
# v3 <fct> j, k, l, m, n
mutate_all和toupper变为大写,但将因子转换为字符:
df <- mutate_all(df, funs(toupper))
glimpse(df)
# v1 <chr> "A", "B", "C", "D", "E"
# v2 <chr> "1", "2", "3", "4", "5"
# v3 <chr> "J", "K", "L", "M", "N"
mute_if和str_to_upper适用于is.character,但不适用于因素:
df <- df %>% mutate_if(is.character, str_to_upper)
glimpse(df)
# v1 <chr> "A", "B", "C", "D", "E"
# v2 <int> 1, 2, 3, 4, 5
# v3 <fct> j, k, l, m, n
mutate_if和str_to_upper适用于is.factor,但将因子转换为字符:
df <- df %>% mutate_if(is.character, str_to_upper)
df <- df %>% mutate_if(is.factor, str_to_upper)
glimpse(df)
# v1 <chr> "A", "B", "C", "D", "E"
# v2 <int> 1, 2, 3, 4, 5
# v3 <chr> "J", "K", "L", "M", "N"
理想情况下,我希望找到一个保留数据类型并可应用于任何数据帧的一揽子解决方案。
要在静心的回应基础上解决Thomas Moore的后续问题,您可以将列名更改为大写,并添加以下内容:
df %>%
mutate(across(where(is.character), str_to_upper),
across(where(is.factor), ~ factor(str_to_upper(.x)))) %>%
rename_with(str_to_upper)
df %>%
mutate(across(where(is.character), str_to_upper),
across(where(is.factor), ~ factor(str_to_upper(.x))))
toupper
或str_to_upper
将类更改为字符。你有两个选择:
- 转换回
factor
:
df <- df %>% mutate_if(is.character, str_to_upper)
df <- df %>% mutate_if(is.factor, ~factor(str_to_upper(.)))
str(df)
#'data.frame': 5 obs. of 3 variables:
# $ v1: chr "a" "b" "c" "d" ...
# $ v2: int 1 2 3 4 5
# $ v3: Factor w/ 5 levels "J","K","L","M",..: 1 2 3 4 5
- 更改因子变量的
levels
。将上面的步骤1和2合并为1
df <- df %>% mutate_if(~is.character(.) || is.factor(.),
~if(is.factor(.)) {levels(.) <- toupper(levels(.));.} else toupper(.))
注意,_if
、_at
、_all
动词在dplyr
1.0.0中已被弃用,取而代之的是across
。