r-如何整理包含多个信息的列的数据集样本数据



请帮我整理数据。谢谢总观测值为394个,共有26列。数据从ms-excel导出。数据样本如下。在这个样本中,实际上应该只有三个观测值/行。在向量d1.d2..no和Farmer.Name中,应清除与v1的NA相对应的观测值,并将其添加到前一行值中。d1.d2..no对应于三个观测值(两个日期观测值一个唯一标识号(,Farmer.Name向量也是如此。样品为

d1..d2..no<-c("27/01/2020", "43832", "KE004421", "43832", "43832", 
"KE003443", "31/12/2019", "43832", "KE0001512")
Farmer.Name<-c("S Jacob Gender:male","farmer type :marginal","farmer category :general", 
"J Isac Gender :Female","farmer type: large","farmer category :general",
"P Kumar Gender :Male","farmer type:small","farmer category :general")
adress<-c("k11",NA,NA,"k12",NA,NA,"k13",NA,NA)
amount<-c(25,NA,NA,25,NA,NA,32,NA,NA)
mydata<-data.frame(v1=v1, d1..d2..no=d1..d2..no, Farmer.Name=Farmer.Name, 
adress=adress, amount=amount)

在向量d1.d2..no和Farmer.Name中,应清除与v1的NA相对应的观测值,并将其添加到前一行值中。d1.d2..no对应于三个观测值(两个日期观测值一个唯一标识号(农夫也是。命名向量。也就是说,我预期的结果就像来自这个代码

v1<-c(1,2,3)
d1<-c("27/01/2020","43832","31/12/2019")
d2<-c("43832","43832","43832")
no<-c("KE004421","KE003443","KE0001512")
Farmer.Name1<-c("S Jacob","J Isac","P Kumar")
Gender<-c("male","female","male")
farmer_type <-c("marginal","large","small")
farmer_category <-c("general", "general", "general")
adress<-c("k11","k12","k13")
amount<-c(25,25,32)
myfinaldata<-data.frame(v1=v1,d1=d1,d2=d2,no=no,
Farmer.Name1=Farmer.Name1,
farmer_type=farmer_type,
farmer_category=farmer_category,
adress=adress,amount=amount)

结果应该是

v1         d1    d2        no Farmer.Name1 farmer_type farmer_category adress amount
1  1 27/01/2020 43832  KE004421      S Jacob    marginal         general    k11     25
2  2      43832 43832  KE003443       J Isac       large         general    k12     25
3  3 31/12/2019 43832 KE0001512      P Kumar       small         general    k13     32  

我是编程和r的新手,通过在线资源学习。也是我在这个平台上的第一篇帖子。请原谅任何错误。

我把整洁的器皿弄得一团糟。。但仍停留在如何进行上。

不一致的数据可能是一个挑战。以下是tidyverse方法。

首先,添加了预期用于d1d2no的拟议列名。假设行按此顺序排列。

Farmer.Nameseparated分成两列,由:组成。

Name本身在字Gender之前被分离。

fill允许为同一个体(例如v1adressamountName(填充公共值。

pivot_wider首先通过d1d2no,然后通过包括Genderfarmer_typefarmer_category在内的其他列来进行数据扩展。

library(tidyverse)
df1 <- mydata %>%
mutate(d_var = rep(c("d1", "d2", "no"), times = 3)) %>%
separate(Farmer.Name, into = c("Var", "Val"), sep = ":") %>%
separate(Var, into = c("Name", "Var"), sep = "(?=Gender)", fill = "left") %>%
mutate_at(c("Name", "Var"), trimws) %>%
fill(v1, adress, amount, Name, .direction = "down") %>%
mutate(Var = gsub(" ", "_", Var)) 
df1 %>%
pivot_wider(id_cols = c(v1, Name, adress, amount), names_from = d_var, values_from = d1..d2..no) %>%
left_join(pivot_wider(df1, id_cols = c(v1, Name, adress, amount), names_from = Var, values_from = Val))

输出

# A tibble: 3 x 10
v1 Name    adress amount d1         d2    no        Gender farmer_type farmer_category
<dbl> <chr>   <chr>   <dbl> <chr>      <chr> <chr>     <chr>  <chr>       <chr>          
1     1 S Jacob k11        25 27/01/2020 43832 KE004421  male   "marginal"  general        
2     2 J Isac  k12        25 43832      43832 KE003443  Female " large"    general        
3     3 P Kumar k13        32 31/12/2019 43832 KE0001512 Male   "small"     general

数据集中的日期不是日期格式。请考虑在此之后设置它们的格式。

library(reshape)
df.new <- cbind(mydata[seq(1, nrow(mydata), 3), ], mydata[seq(2, nrow(mydata), 3), ][2:3], mydata[seq(3, nrow(mydata), 3), ][2:3])
colnames(df.new) <- c("v1", "d1", "Farmer.Name1", "adress", "amount", "d2", "farmer_type", "no", "farmer_category")
df.new <- df.new[c(1,2,6, 8,3, 7,9, 4,5)]

library(stringr)
df.new$Farmer.Name1 <- word(df.new$Farmer.Name1,1,sep = "\ Gender")
df.new$farmer_type <- word(df.new$farmer_type,2,sep = "\:")
df.new$farmer_category <- word(df.new$farmer_category,2,sep = "\:")

最终表格:

> df.new
v1         d1    d2        no Farmer.Name1 farmer_type farmer_category adress amount
1  1 27/01/2020 43832  KE004421      S Jacob    marginal         general    k11     25
4  2      43832 43832  KE003443       J Isac       large         general    k12     25
7  3 31/12/2019 43832 KE0001512      P Kumar       small         general    k13     32

附言:我没有重新命名行号。

最新更新