根据长度筛选子集?



尝试使用过滤器提取长度为 k 的子集。不知道如何处理它?该列表有100 个元素

subsets :: [a] -> [[a]]
subsets [] = [[]]
subsets (x:xs) = [zs | ys <- subsets xs, zs <- [ys, (x:ys)]]

如果我使用过滤器,这就是我认为的:

filter (length(3)) subsets [1,2,3,4,5]

但我可能错了。如果有不同的方法而不是过滤器?我是哈斯克尔的新手,所以不太确定。

当我在过滤中遇到一些困惑时,我会更上一层楼并使用foldr

在这种情况下就像:
filterLength3 = foldr (x rs -> if (length x) == 3 then  x : rs else rs) [] 
filterLength3 (subsets [1,2,3,4,5])

输出

=> [[1,2,3],[1,2,4],[1,3,4],[2,3,4],[1,2,5],[1,3,5],[2,3,5],[1,4,5],[2,4,5],[3,4,5]]

filter应该是:

filter ((==3) . length) (subsets [1,2,3,4,5])
=> [[1,2,3],[1,2,4],[1,3,4],[2,3,4],[1,2,5],[1,3,5],[2,3,5],[1,4,5],[2,4,5],[3,4,5]]

编辑

经过思考,在气的帮助下,问了这个问题,我才能够解决它:

import Data.List
subsetsOfThree ws = [ [x,y,z] | (x:xs) <- tails ws, (y:ys) <- tails xs, z <- ys ]

一些例子:

subsetsOfThree [1..3]
=> [[1,2,3]]
subsetsOfThree [1..4]
=> [[1,2,3],[1,2,4],[1,3,4],[2,3,4]]
subsetsOfThree [1..5]
=> [[1,2,3],[1,2,4],[1,2,5],[1,3,4],[1,3,5],[1,4,5],[2,3,4],[2,3,5],[2,4,5],[3,4,5]]
subsetsOfThree [1..10]
=> [[1,2,3],[1,2,4],[1,2,5],[1,2,6],[1,2,7],[1,2,8],[1,2,9],[1,2,10],[1,3,4],[1,3,5],[1,3,6],[1,3,7],[1,3,8],[1,3,9],[1,3,10],[1,4,5],[1,4,6],[1,4,7],[1,4,8],[1,4,9],[1,4,10],[1,5,6],[1,5,7],[1,5,8],[1,5,9],[1,5,10],[1,6,7],[1,6,8],[1,6,9],[1,6,10],[1,7,8],[1,7,9],[1,7,10],[1,8,9],[1,8,10],[1,9,10],[2,3,4],[2,3,5],[2,3,6],[2,3,7],[2,3,8],[2,3,9],[2,3,10],[2,4,5],[2,4,6],[2,4,7],[2,4,8],[2,4,9],[2,4,10],[2,5,6],[2,5,7],[2,5,8],[2,5,9],[2,5,10],[2,6,7],[2,6,8],[2,6,9],[2,6,10],[2,7,8],[2,7,9],[2,7,10],[2,8,9],[2,8,10],[2,9,10],[3,4,5],[3,4,6],[3,4,7],[3,4,8],[3,4,9],[3,4,10],[3,5,6],[3,5,7],[3,5,8],[3,5,9],[3,5,10],[3,6,7],[3,6,8],[3,6,9],[3,6,10],[3,7,8],[3,7,9],[3,7,10],[3,8,9],[3,8,10],[3,9,10],[4,5,6],[4,5,7],[4,5,8],[4,5,9],[4,5,10],[4,6,7],[4,6,8],[4,6,9],[4,6,10],[4,7,8],[4,7,9],[4,7,10],[4,8,9],[4,8,10],[4,9,10],[5,6,7],[5,6,8],[5,6,9],[5,6,10],[5,7,8],[5,7,9],[5,7,10],[5,8,9],[5,8,10],[5,9,10],[6,7,8],[6,7,9],[6,7,10],[6,8,9],[6,8,10],[6,9,10],[7,8,9],[7,8,10],[7,9,10],[8,9,10]]

现在你可以让你的怪物变成一个小傀儡:

length $ subsetsOfThree [1..10]
=> 120
length $ subsetsOfThree [1..20]
=> 1140
length $ subsetsOfThree [1..50]
=> 19600
length $ subsetsOfThree [1..100]
=> 161700
length $ subsetsOfThree [1..500]
=> 20708500

100 个元素列表的子集数量约为 2100≃ 1.26*1030,这是一个非常庞大的数字。因此,filter方法似乎并不切实际。这个问题应该通过操作只包含 1 到 100 之间的几个数字的列表来解决。

因此,我们的目标是编写一个命名为kSubsets的函数,该函数返回基数 k 的所有子集的列表:

kSubsets :: Int -> [a] -> [[a]]

其中 k 是第一个参数。

基于递归列表处理的解决方案:

构建kSubsets功能的一种可能方法是使用辅助kIndexSubsets函数,该函数计算元素的从零开始的索引,而不是元素本身。kIndexSubsets函数可以用递归方式编写。

在这种情况下,kSubsets函数本质上是一个包装器,它将元素索引映射到实际的列表元素。这将给出以下代码:

import qualified  Data.Map    as  M
import qualified  Data.Maybe  as  Mb
import qualified  Data.List   as  L
kIndexSubsets :: Int -> Int -> [[Int]]
kIndexSubsets 0 _  = [[]]
kIndexSubsets k nn =
-- first element chosen must leave room for (k-1) elements after itself
let lastChoice = if (k > nn)
then error "k above nn in kIndexSubsets"
else (nn -k)
choices = [0 .. lastChoice]
-- for each possible first element, recursively compute
-- all the possible tails:
fn hd   = let tails1 = kIndexSubsets (k-1) (nn - (hd+1))
-- rebase subsequent indexes:
tails2 = map (map (x -> (x+hd+1))) tails1
in  -- add new leftmost element:
map  (ls -> hd:ls)  tails2
in
concatMap fn choices

-- return the list of all subsets of ls having k elements:
kSubsets :: Int -> [a] -> [[a]]
kSubsets 0 _  = [[]]
kSubsets k ls = 
let  nn = length ls
-- need a map for fast access to elements of ls:
ma = M.fromList $ zip [0..] ls
extractor ix = Mb.fromJust(M.lookup ix ma)
indexSubSets = kIndexSubsets k nn
in
map  (map extractor)  indexSubSets

我们现在可以测试我们的kSubsets函数。这涉及检查结果输出列表的长度是否符合经典的组合公式,即 n!/(k! * (n-k(!(,其中 n 是输入列表的长度。

*Main> let ls = "ABCDEFGH"
*Main> kSubsets 0 ls
[""]
*Main> kSubsets 1 ls
["A","B","C","D","E","F","G","H"]
*Main> kSubsets 2 ls
["AB","AC","AD","AE","AF","AG","AH","BC","BD","BE","BF","BG","BH","CD","CE","CF","CG","CH","DE","DF","DG","DH","EF","EG","EH","FG","FH","GH"]
*Main> kSubsets 3 ls
["ABC","ABD","ABE","ABF","ABG","ABH","ACD","ACE","ACF","ACG","ACH","ADE","ADF","ADG","ADH","AEF","AEG","AEH","AFG","AFH","AGH","BCD","BCE","BCF","BCG","BCH","BDE","BDF","BDG","BDH","BEF","BEG","BEH","BFG","BFH","BGH","CDE","CDF","CDG","CDH","CEF","CEG","CEH","CFG","CFH","CGH","DEF","DEG","DEH","DFG","DFH","DGH","EFG","EFH","EGH","FGH"]
*Main> 
*Main> kSubsets 7 ls
["ABCDEFG","ABCDEFH","ABCDEGH","ABCDFGH","ABCEFGH","ABDEFGH","ACDEFGH","BCDEFGH"]
*Main> 
*Main> kSubsets 8 ls
["ABCDEFGH"]
*Main> 
*Main> 
*Main> div ((100*99*98)::Integer)  ((2*3)::Integer)
161700
*Main> 
*Main> length $ kSubsets 3 [ 1 .. 100 ]
161700
*Main> 
*Main> div ((100*99*98*97*96)::Integer)  ((2*3*4*5)::Integer)
75287520
*Main> length $ kSubsets 5 [ 1 .. 100 ]
75287520
*Main>

在普通的 x86-64 Linux 机器上,kSubsets 3 [ 1 .. 100 ]的评估需要不到 50 毫秒的时间。

基于状态机的替代解决方案:

所选索引的(反向(列表被视为自动机的状态,我们逐步推进状态,直到这不再可能,此时子列表列表完成。

基本上,如果有空间前进最右边的索引,很好,否则我们递归以前进列表的其余部分,然后将最右边的索引尽可能向左移动。

该方法给出了kIndexSubsets的替代源代码,其中关键部分是ksAdvance步进函数:

import qualified  Data.Map    as  M
import qualified  Data.Maybe  as  Mb
import qualified  Data.List   as  L

-- works on the *reversed* list of chosen indexes:
ksAdvance :: Int -> Int -> Maybe [Int] -> Maybe [Int]
ksAdvance k nn Nothing        = Nothing
ksAdvance k nn (Just [])      = Nothing
ksAdvance k nn (Just (h:rls)) =
if (h == (nn-1))
then -- cannot advance rightmost index, so must recurse
let mbols2 = ksAdvance (k-1) (nn-1) (Just rls)
in
case mbols2 of
Nothing   -> Nothing
Just ols2 -> let  y = ((head ols2)+1)  in  Just (y:ols2)
else -- just advance rightmost index:
Just ((h+1):rls)

kIndexSubsets :: Int -> Int -> [[Int]]
kIndexSubsets 0 _  = [[]]
kIndexSubsets k nn =
let startList = reverse  $  [ 0 .. (k-1) ]
cutList = takeWhile  Mb.isJust
mbls    = cutList $ iterate  (ksAdvance k nn)  (Just startList)
in
map  (reverse . Mb.fromJust)  mbls

这个算法似乎比第一个算法更不耗费内存,速度更快。

使用此主程序进行快速性能测试,将 100 个元素中的 5 个元素的子集生成75287520子集:

kSubsets :: Int -> [a] -> [[a]]
kSubsets 0 _  = [[]]
kSubsets k ls = 
let  nn = length ls
-- need a map for fast access to elements of ls:
ma = M.fromList $ zip [0..] ls
eltFromIndex = ix -> Mb.fromJust (M.lookup ix ma)
indexSubSets = kIndexSubsets k nn
in
map  (map eltFromIndex)  indexSubSets

main = do
let nn  = 100
let  k  = 5
let ls  = [ 1 .. nn ]::[Int]
let str = "count of " ++ (show k) ++ " out of " ++ (show nn) ++
" elements subsets = " ++ (show $ length (kSubsets k ls))
putStrLn $ str

内存性能得到改善:

$ /usr/bin/time ./kSubsets03.x +RTS -s
count of 5 out of 100 elements subsets = 75287520
4,529,861,272 bytes allocated in the heap
623,240 bytes copied during GC
44,504 bytes maximum residency (2 sample(s))
29,224 bytes maximum slop
2 MB total memory in use (0 MB lost due to fragmentation)
...
Productivity  98.4% of total user, 98.5% of total elapsed
0.70user 0.00system 0:00.72elapsed 99%CPU (0avgtext+0avgdata 4724maxresident)k
0inputs+0outputs (0major+436minor)pagefaults 0swaps
$ 

还不如Fortran好,但越来越接近:-(

这是不使用过滤器的 length-n 子集的通用解决方案。

我们的初始列表是x:xs,请注意,我们可以将这些子集划分为包含x的子集和不包含x的子集。这向我们展示了一个很好的递归结构;第一个分区x附加到xs的每个长度(n-1(子集前面,第二个分区只是xs的长度n子集。

subsetsOfLength n (x:xs) = map (x:) (subsetsOfLength (n-1) xs) ++ subsetsOfLength n xs

我们所需要的只是基本情况。有一个 length-0 子集,并且没有子集大于原始子集:

subsets 0 _  = [[]]
subsets _ [] = []

将这些基础放在递归步骤上方,并在其上抛出适当的类型签名,我们就完成了。

λ> subsetsOfLength 3 [1..5]
[[1,2,3],[1,2,4],[1,2,5],[1,3,4],[1,3,5],[1,4,5],[2,3,4],[2,3,5],[2,4,5],[3,4,5]]
λ> length $ subsetsOfLength 5 [1..100]
252

好。

小心。(++)很慢;如果你在编译时知道你将使用的长度,Damián Rafael Lattenero的tails方法可能会更有性能。不过,对此并不完全确定。此外,根据值的不同,您可能最好交换(++)的操作数。我还没有做数学。

最新更新