r语言 - 有条件地将因子水平拆分为两个不同的水平



我有一个数据框,例如:

df <- data.frame(
type = c("BND", "INV", "BND", "DEL", "TRA"),
chrom1 = c(1, 1, 1, 1, 1),
chrom2 = c(1, 1, 2, 1, 3)
)

我想将所有df[df$type=='BND',]实例重新分配给INVTRA,具体取决于chrom1chrom2中的值。

我正在尝试使用 forcats 包中的fct_recode,如下所示:

library(forcats)
df$type <- ifelse(df$type=="BND", 
ifelse(df$chrom1 == df$chrom2,
fct_recode(df$type, BND="INV"),
fct_recode(df$type, BND="TRA")),
df$type)

但是,这会将我的因素重新编码为数字:

type chrom1 chrom2
1    1      1      1
2    3      1      1
3    1      1      2
4    2      1      1
5    4      1      3

这是我的预期结果:

type chrom1 chrom2
1    INV      1      1 # BND -> INV as chrom1==chrom2
2    INV      1      1
3    TRA      1      2 # BND -> TRA as chrom1!=chrom2
4    DEL      1      1
5    TRA      1      3

如何以这种方式将因子分为两个级别?

你也可以用case_when()

library(tidyverse)
df %>% 
mutate(type = as.factor(case_when(
type == 'BND' & chrom1 == chrom2 ~ 'INV', 
type == 'BND' & chrom1 != chrom2 ~ 'TRA',
TRUE  ~ as.character(type))))

数据:

df <- data.frame(
type = c("BND", "INV", "BND", "DEL", "TRA"),
chrom1 = c(1, 1, 1, 1, 1),
chrom2 = c(1, 1, 2, 1, 3)
)

我的思维方式如下:(1(索引要更改的行,(2(执行ifelse语句。我希望这有帮助:

df <- data.frame(
type = c("BND", "INV", "BND", "DEL", "TRA"),
chrom1 = c(1, 1, 1, 1, 1),
chrom2 = c(1, 1, 2, 1, 3)
)
indexBND<-which(df$type=="BND")
df$type[indexBND]<-ifelse(df$chrom1[indexBND] == df$chrom2[indexBND], df$type[indexBND] <- "INV", "TRA")
df
#   type chrom1 chrom2
# 1  INV      1      1
# 2  INV      1      1
# 3  TRA      1      2
# 4  DEL      1      1
# 5  TRA      1      3

干杯!

为了完整起见,这里还有一个简洁data.table的解决方案:

library(data.table)
setDT(df)[type == "BND" & chrom1 == chrom2, type := "INV"][type == "BND", type := "TRA"][]
type chrom1 chrom2
1:  INV      1      1
2:  INV      1      1
3:  TRA      1      2
4:  DEL      1      1
5:  TRA      1      3

好处是type通过引用进行更新,例如,无需复制整个对象,并且仅适用于条件适用的那些行。

或者只是

df$type[df$type == "BND"] <- with(df, 
ifelse(df[type == "BND", ]$chrom1 == 
df[type == "BND", ]$chrom2,
"INV", "TRA"))
> df
type chrom1 chrom2
1  INV      1      1
2  INV      1      1
3  TRA      1      2
4  DEL      1      1
5  TRA      1      3

最新更新