我必须在r中执行关联规则,我找到了示例这里http://www.salemmarafi.com/code/market-basket-analysis-with-r/在此示例中,他们与data(Groceries)
一起工作但是他们给了原始的数据集杂货。csv
structure(list(chocolate = structure(c(9L, 13L, 1L, 8L, 16L,
2L, 14L, 11L, 7L, 15L, 17L, 5L, 10L, 4L, 3L, 6L, 2L, 18L, 12L
), .Label = c("bottled water", "canned beer", "chicken,citrus fruit,tropical fruit,root vegetables,whole milk,frozen fish,rollsbuns",
"chicken,pip fruit,other vegetables,whole milk,dessert,yogurt,whippedsour cream,rollsbuns,pasta,soda,waffles",
"citrus fruit,pip fruit,root vegetables,other vegetables,whole milk,cream cheese ,domestic eggs,brown bread,margarine,baking powder,waffles",
"frankfurter,citrus fruit,onions,other vegetables,whole milk,rollsbuns,sugar,soda",
"frankfurter,rollsbuns,bottled water,fruitvegetable juice,hygiene articles",
"frankfurter,sausage,butter,whippedsour cream,rollsbuns,margarine,spices",
"fruitvegetable juice", "hamburger meat,other vegetables,whole milk,curd,yogurt,rollsbuns,pastry,semi-finished bread,margarine,bottled water,fruitvegetable juice",
"meat,citrus fruit,berries,root vegetables,whole milk,soda",
"packaged fruitvegetables,whole milk,curd,yogurt,domestic eggs,brown bread,mustard,pickled vegetables,bottled water,misc. beverages",
"pickled vegetables,coffee", "root vegetables", "tropical fruit,margarine,rum",
"tropical fruit,pip fruit,onions,other vegetables,whole milk,domestic eggs,sugar,soups,tea,soda,hygiene articles,napkins",
"tropical fruit,root vegetables,herbs,whole milk,butter milk,whippedsour cream,flour,hygiene articles",
"turkey,pip fruit,salad dressing,pastry"), class = "factor")), .Names = "chocolate", class = "data.frame", row.names = c(NA,
-19L))
我加载此数据
g=read.csv("g.csv",sep=";")
所以我必须将其转换为诸如Arule需要的交易
#'@importClassesFrom arules transactions
trans = as(g, "transactions")
让我们的检查数据(杂货(
> str(Groceries)
Formal class 'transactions' [package "arules"] with 3 slots
..@ data :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
.. .. ..@ i : int [1:43367] 13 60 69 78 14 29 98 24 15 29 ...
.. .. ..@ p : int [1:9836] 0 4 7 8 12 16 21 22 27 28 ...
.. .. ..@ Dim : int [1:2] 169 9835
.. .. ..@ Dimnames:List of 2
.. .. .. ..$ : NULL
.. .. .. ..$ : NULL
.. .. ..@ factors : list()
..@ itemInfo :'data.frame': 169 obs. of 3 variables:
.. ..$ labels: chr [1:169] "frankfurter" "sausage" "liver loaf" "ham" ...
.. ..$ level2: Factor w/ 55 levels "baby food","bags",..: 44 44 44 44 44 44 44 42 42 41 ...
.. ..$ level1: Factor w/ 10 levels "canned food",..: 6 6 6 6 6 6 6 6 6 6 ...
..@ itemsetInfo:'data.frame': 0 obs. of 0 variables
>
和我从原始CSV
转换的数据> str(trans)
Formal class 'transactions' [package "arules"] with 3 slots
..@ data :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
.. .. ..@ i : int [1:9835] 1265 6162 6377 4043 3585 6475 4431 3535 4401 6490 ...
.. .. ..@ p : int [1:9836] 0 1 2 3 4 5 6 7 8 9 ...
.. .. ..@ Dim : int [1:2] 7011 9835
.. .. ..@ Dimnames:List of 2
.. .. .. ..$ : NULL
.. .. .. ..$ : NULL
.. .. ..@ factors : list()
..@ itemInfo :'data.frame': 7011 obs. of 3 variables:
.. ..$ labels : chr [1:7011] "tr=abrasive cleaner" "tr=abrasive cleaner,napkins" "tr=artif. sweetener" "tr=artif. sweetener,coffee" ...
.. ..$ variables: Factor w/ 1 level "tr": 1 1 1 1 1 1 1 1 1 1 ...
.. ..$ levels : Factor w/ 7011 levels "abrasive cleaner",..: 1 2 3 4 5 6 7 8 9 10 ...
..@ itemsetInfo:'data.frame': 9835 obs. of 1 variable:
.. ..$ transactionID: chr [1:9835] "1" "2" "3" "4" ...
>
我们在数据(杂货(中看到
transactions in sparse format with
9835 transactions (rows) and
169 items (columns)
在我的反式数据
中 9835 transactions (rows) and
7011 items (columns)
即。我从杂货中获得了7011列。CSV,同时在嵌入式示例(169列(
中为什么是这样?该文件如何正确转换。我必须理解,因为,我无法使用我的文件
我尝试找到类似的话题但这两个帖子对我没有帮助如何将交易数据准备到篮子里的篮子r(arules(将数据帧转换为交易,然后删除Na
这是因为下载时数据是逗号界定的,在g=read.csv("g.csv",sep=";")
中,您将数据分配在半柱上。如果您从g
的定义中删除sep = ";"
,则应获得所需的输出。
请参见以下内容,将SEP定义为;
:
> trans <- read.transactions("~/Downloads/groceries.csv", format = 'basket', sep = ';')
> str(trans)
Formal class 'transactions' [package "arules"] with 3 slots
..@ data :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
.. .. ..@ i : int [1:9835] 1265 6162 6377 4043 3585 6475 4431 3535 4401 6490 ...
.. .. ..@ p : int [1:9836] 0 1 2 3 4 5 6 7 8 9 ...
.. .. ..@ Dim : int [1:2] 7011 9835
.. .. ..@ Dimnames:List of 2
.. .. .. ..$ : NULL
.. .. .. ..$ : NULL
.. .. ..@ factors : list()
..@ itemInfo :'data.frame': 7011 obs. of 1 variable:
.. ..$ labels: chr [1:7011] "abrasive cleaner" "abrasive cleaner,napkins" "artif. sweetener" "artif. sweetener,coffee" ...
..@ itemsetInfo:'data.frame': 0 obs. of 0 variables
和此,将sep定义为 ,
:
> trans <- read.transactions("~/Downloads/groceries.csv", format = 'basket', sep = ',')
> str(trans)
Formal class 'transactions' [package "arules"] with 3 slots
..@ data :Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
.. .. ..@ i : int [1:43367] 29 88 118 132 33 157 167 166 38 91 ...
.. .. ..@ p : int [1:9836] 0 4 7 8 12 16 21 22 27 28 ...
.. .. ..@ Dim : int [1:2] 169 9835
.. .. ..@ Dimnames:List of 2
.. .. .. ..$ : NULL
.. .. .. ..$ : NULL
.. .. ..@ factors : list()
..@ itemInfo :'data.frame': 169 obs. of 1 variable:
.. ..$ labels: chr [1:169] "abrasive cleaner" "artif. sweetener" "baby cosmetics" "baby food" ...
..@ itemsetInfo:'data.frame': 0 obs. of 0 variables