R 中数据框的 2 个相邻行中的字符串比较



我有一个包含 627 个观测值和 16 个变量的数据框。我正在考虑一个名为"ZoneDivison"的列,其中包含以下因素:东北,东部和东南部。因此,我想比较相邻的行值并创建一个新列,如果两个相邻的行具有相同的区域,则为1,如果相邻行不同,则为0。

我参考了以下链接以找到出路:[这里]匹配 R 中的两列[此处] 比较多行的行值 (R(

library(dplyr)
a <- c(rep("Eastern",3),rep ("North Eastern", 6),rep("South Eastern", 3))
a=data.frame(a)
colnames(a)="ZoneDivision"
#comparing the zones
library(plyr)
ddply(n, .(ZoneDivision),summarize,ZoneMatching=Position(isTRUE,ZoneDivision))

Expected Result
   ZoneDivision ZoneMatching
 1      Eastern       NA
 2      Eastern       1
 3       Eastern      1               
 4 North Eastern      0
 5 North Eastern      1
 6 North Eastern      1
 7 North Eastern      1
 8 North Eastern      1
 9 North Eastern      1
 10 South Eastern     0
 11 South Eastern     1
 12 South Eastern     1
Actual Result
    ZoneDivision ZoneMatching
1       Eastern           NA
2 North Eastern           NA
3 South Eastern           NA

我应该怎么做?请帮忙!!

使用base R,我们可以做

as.numeric(c(NA, a$ZoneDivision[-1] == a$ZoneDivision[-nrow(a)]))
#[1] NA  1  1  0  1  1  1  1  1  0  1  1

data.table 方式:

a <- c(rep("Eastern",3),rep ("North Eastern", 6),rep("South Eastern", 3))
dt <- as.data.table(a)
dt[,'ZoneMatching' := as.numeric(.SD[,a] == shift(.SD[,a],1))]

在其中添加新的 ZoneMatch 列作为shift(( 函数生成的 a 列和滞后值之间的逻辑比较的数值。

您可以使用

lag来获取它:

library(dplyr)
a %>%
  mutate(ZoneMatching = as.numeric((ZoneDivision == lag(ZoneDivision, 1))))
    ZoneDivision ZoneMatching
1        Eastern           NA
2        Eastern            1
3        Eastern            1
4  North Eastern            0
5  North Eastern            1
6  North Eastern            1
7  North Eastern            1
8  North Eastern            1
9  North Eastern            1
10 South Eastern            0
11 South Eastern            1
12 South Eastern            1

我们可以使用base R

with(a, c(NA, +(head(ZoneDivision, -1) == tail(ZoneDivision, -1))))
#[1] NA  1  1  0  1  1  1  1  1  0  1  1

相关内容

  • 没有找到相关文章

最新更新