r语言 - 如何将数据框转换为规则的事务对象



我正在尝试使用 R 中的库规则对数据集执行关联规则。数据集有一个事务列和 5 个项目列 - 我正在尝试将数据转换为列表,然后使用 arules,但由于有多个项目列,我不确定如何处理。

我的数据集如下所示:

Transaction     Item1        Item2         Item3    
12/09/2001     lipstick      Bronzer        Mascara
2/09/2001     Eyeshadow     lipstick
13/09/2002     Powder        Remover
14/09/2003     Nail varnish  Lip gloss      Eyeliner 

我通常用于一个事务列和一个项目列的代码如下。

library(arules)
Transactions <- split(data$item, data$transaction)
basketanalysis <- as(Transactions, "transactions")

任何帮助将不胜感激。

这是我尝试过的。我认为您需要操作数据并创建列表。首先,我创建了交易 ID 以防万一。然后,我将数据转换为长格式数据框。此时,所有产品都保留在一列中。我删除了所有具有 NA 的行。然后,我将产品转换为因子。对于每个组(交易 ID(,我创建了包含所有产品的列表。x有一个名为whatever的列。这是要用于创建事务对象的列表。

library(tidyverse)
library(arules)
mutate(mydata, transaction_id = 1:n()) %>% 
pivot_longer(cols = contains("Item"), names_to = "item", values_to = "product") %>% 
filter(complete.cases(product)) %>% 
mutate(product = factor(product)) %>% 
group_by(transaction_id) %>% 
summarize(whatever = list(product)) -> x
# Assign transaction ID as name to whatever
names(x$whatever) <- x$transaction_id
$`1`
[1] lipstick Bronzer  Mascara 
Levels: Bronzer Eyeliner Eyeshadow Lip gloss lipstick Mascara Nail varnish Powder Remover
$`2`
[1] Eyeshadow lipstick 
Levels: Bronzer Eyeliner Eyeshadow Lip gloss lipstick Mascara Nail varnish Powder Remover
$`3`
[1] Powder  Remover
Levels: Bronzer Eyeliner Eyeshadow Lip gloss lipstick Mascara Nail varnish Powder Remover
$`4`
[1] Nail varnish Lip gloss    Eyeliner    
Levels: Bronzer Eyeliner Eyeshadow Lip gloss lipstick Mascara Nail varnish Powder Remover

最后,我创建了一个事务类对象。

mybasket <- as(x$whatever, "transactions")
> summary(mybasket)
transactions as itemMatrix in sparse format with
4 rows (elements/itemsets/transactions) and
9 columns (items) and a density of 0.2777778 
most frequent items:
lipstick   Bronzer  Eyeliner Eyeshadow Lip gloss   (Other) 
2         1         1         1         1         4 
element (itemset/transaction) length distribution:
sizes
2 3 
2 2 
Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
2.0     2.0     2.5     2.5     3.0     3.0 
includes extended item information - examples:
labels
1   Bronzer
2  Eyeliner
3 Eyeshadow
includes extended transaction information - examples:
transactionID
1             1
2             2
3             3

数据

mydata <- structure(list(Transaction = c("12/09/2001", "2/09/2001", "13/09/2002", 
"14/09/2003"), Item1 = c("lipstick", "Eyeshadow", "Powder", "Nail varnish"
), Item2 = c("Bronzer", "lipstick", "Remover", "Lip gloss"), 
Item3 = c("Mascara", NA, NA, "Eyeliner")), row.names = c(NA, 
-4L), class = c("tbl_df", "tbl", "data.frame"))

相关内容

  • 没有找到相关文章

最新更新