r语言 - 在识别数据集中的其他字段时可视化字段的唯一性 - r - Visualizing the uniqueness of fields in identifying other fields in a data set 小贝子编程网

我有一个数据可视化问题。我的数据如下所示：{int x， int y， string a， string b， ... }

我想可视化 {x，y} 唯一标识 {a，b} 的能力。即，如果 x，y 是已知的，那么通常有 1，有时只有 a，b 的几个可能的组合。我知道我的数据就是这种情况，但我想以可视化形式显示。假设记录数约为 5000 条，最好的方法是什么？

Here are a few lines of this data
2320,1190,T,a
3051,1680,i,a
3099,1495,N,v
3395,1475,C,v
3395,1475,C,c
3400,1480,C,a
3405,1615,A,a
3430,1630,1f,a
3440,1480,C1,d
3440,1640,C1,e
3450,1640,u,lk

也许这样的东西就是你要找的。从这里您可以分面非唯一条目。

require(ggplot2)
df <- read.table(file="clipboard", sep=",",             #Read in your data
               header=F, skip = 1, stringsAsFactors = F)
df$key <- with(df, paste0(V1, V2))                      #Make Key from {x,y}
Counts <- as.data.frame(xtabs(~key, data = df))         #Get counts for {x,y} pairs
df_merge <- merge(df, Counts, by = "key", all.x =T)        #Merge the Tables by Key
df_merge$Unique <- ifelse(df_merge$Freq == 1, "Yes", "No") #Unique Yes or No
qplot(data = df_merge, x = V1, y = V2, color = Unique, geom = "point") #Plot

r语言 - 在识别数据集中的其他字段时可视化字段的唯一性

相关内容

最新更新

热门标签：