考虑以下数据框架:
set.seed(42)
ID <- c(1:6)
OB <- c(rep("A",4),rep("B",2))
lat_start <- rnorm(6,42,2)
lon_start <- rnorm(6,12,2)
lat_stopp <- rnorm(6,42,2)
lon_stopp <- rnorm(6,12,2)
df <- data.frame(ID,OB,lat_start,lon_start,lat_stopp,lon_stopp)
我想将df
重新格式化为长格式,每个ID
在开始和停止坐标方面都有一行。一个简单的gather()
解决方案,例如df_wrong <- gather(df,coords,val,lat_start:lon_stopp)
显然将不起作用,因为我需要LAT/LON列保持分组。我希望长数据框架看起来像这样:
ID OB SS lat lon
1 1 A start 44.74192 15.023040
2 1 A stop 39.22228 7.119066
3 2 A start 40.87060 11.810680
4 2 A stop 41.44242 14.640227
5 3 A start 42.72626 16.036850
6 3 A stop 41.73336 11.386723
7 4 A start 43.26573 11.874570
8 4 A stop 43.27190 8.437383
9 5 B start 42.80854 14.609740
10 5 B stop 41.43149 11.656165
11 6 B start 41.78775 16.573290
12 6 B stop 36.68709 14.429349
SS
列当然可以稍后添加。任何建议都将不胜感激!
一个tidyverse
可能性可能是:
df %>%
gather(var, val, -c(ID, OB)) %>%
separate(var, c("var1", "SS")) %>%
spread(var1, val)
ID OB SS lat lon
1 1 A start 44.74192 15.023044
2 1 A stopp 39.22228 7.119066
3 2 A start 40.87060 11.810682
4 2 A stopp 41.44242 14.640227
5 3 A start 42.72626 16.036847
6 3 A stopp 41.73336 11.386723
7 4 A start 43.26573 11.874572
8 4 A stopp 43.27190 8.437383
9 5 B start 42.80854 14.609739
10 5 B stopp 41.43149 11.656165
11 6 B start 41.78775 16.573291
12 6 B stopp 36.68709 14.429349
您可以unlist
并创建重新排列的矩阵。
d <- matrix(unlist(t(df[3:6])), ncol=2, byrow=TRUE)
setNames(data.frame(do.call(cbind, lapply(list(df$ID, as.character(df$OB)),
function(x) rep(x, each=2))),
c("start", "stop"),
d), c(names(df)[1:3], "lat", "lon"))
# ID OB lat_start lat lon
# 1 1 A start 44.74192 15.023044
# 2 1 A stop 39.22228 7.119066
# 3 2 A start 40.87060 11.810682
# 4 2 A stop 41.44242 14.640227
# 5 3 A start 42.72626 16.036847
# 6 3 A stop 41.73336 11.386723
# 7 4 A start 43.26573 11.874572
# 8 4 A stop 43.27190 8.437383
# 9 5 B start 42.80854 14.609739
# 10 5 B stop 41.43149 11.656165
# 11 6 B start 41.78775 16.573291
# 12 6 B stop 36.68709 14.429349