Coq 中的严格正则表达式与格式错误的正则表达式



我们是少数学习Coq的人,我们试图为正则表达式的表示定义一个归纳谓词,它表示一组序列。 这似乎遇到了strictly positive限制,因为我们允许not作为操作员。not通常不包含在正则表达式中,但它包含在 Brzozowski 的正则表达式中, 这是我们正在查看的正则表达式。 当我们尝试使用固定点重新定义正则表达式时, 我们遇到了zero or more运算符的ill-formed限制。 我们可以通过将正则表达式定义为归纳谓词和定点的混合来克服这些问题, 但这感觉不对。

有没有其他方法可以将我们的正则表达式纯粹定义为归纳谓词?

我们如何使用固定点和归纳谓词的混合有什么问题,或者我们只是过于纯粹?

下面是示例代码,注释中包含说明和预期错误:

Require Import List.
Import ListNotations.
(* We are defining our input alphabet for regular expressions as only two possible symbols *)
Inductive alphabet := a1 | a0.
Inductive regex :=
(* emptyset matches absolutely no strings *)
| emptyset : regex
(* lambda matches only the empty string *)
| lambda : regex
(* symbol matches only strings of length 1 containing the exact alphabet symbol *)
| symbol : alphabet -> regex
(* concat is used to build of regular expressions that can match longer strings *)
| concat : regex -> regex -> regex
(* zero or more, as you are familiar with from regular expressions *)
| star : regex -> regex
(* `nor` is a boolean operator, here is the truth table
P | Q | P `nor` Q
-----------------
T | T | F
T | F | F
F | T | F
F | F | T
*)
| nor : regex -> regex -> regex
.
(* We chose to include `nor`, since it can represent any possible boolean expression,
which is one of the selling points of Brzozowski's derivatives for regular expressions.
*)
Definition complement (r: regex) : regex :=
nor r r.
Definition and (r s: regex) : regex :=
nor (nor r r) (nor s s).
Definition or (r s: regex) : regex :=
nor (nor r s) (nor r s).
Definition xor (r s: regex) : regex :=
or (and r (complement s)) (and (complement r) s).
(* I matches all strings *)
Definition I: regex :=
complement (emptyset).
(*  A regular expression denotes a set of sequences. *)
Definition seq := (list alphabet).
Definition seqs := seq -> Prop.
Definition in_set_of_sequences (ss: seqs) (s: seq): Prop := ss s. 
Notation "p in P" := (in_set_of_sequences P p) (at level 80).
(* Concatenation*. $(P.Q) = { s | s = p.q; p in P, q in Q }$. *)
Inductive concat_seqs (P Q: seqs): seqs :=
| mk_concat: forall (s: seq),
(exists p q, p ++ q = s ->
p in P /
q in Q
) ->
concat_seqs P Q s
.
(*
*Star*. $P^{*} = cup_{0}^{infty} P^n$ , where $P^2 = P.P$, etc. 
and $P^0 = lambda$, the set consisting of the sequence of zero length.
*)
Inductive star_seqs (R: seqs): seqs :=
| mk_star_zero : forall (s: seq),
s = [] -> star_seqs R s
| mk_star_more : forall (s: seq),
s in (concat_seqs R (star_seqs R)) ->
star_seqs R s
.
(*
*Boolean function*. We shall denote any Boolean function of $P$ and $Q$ by $f(P, Q)$. 
Of course, all the laws of Boolean algebra apply.
`nor` is used to emulate `f`, since nor can be used to emulate all boolean functions.
*)
Inductive nor_seqs (P Q: seqs): seqs :=
| mk_nor : forall s,
~(s in P) / ~(s in Q) ->
nor_seqs P Q s
.
(* Here we use a mix of Fixpoint and Inductive predicates to define the denotation of regular expressions.
This works, but it would be nicer to define it purely as an Inductive predicate.
*)
Fixpoint denote_regex (r: regex): seqs :=
match r with
| emptyset => fun _ => False
| lambda => fun xs => xs = []
| symbol y => fun xs => xs = [y]
| concat r1 r2 => concat_seqs (denote_regex r1) (denote_regex r2)
| star r1 => star_seqs (denote_regex r1)
| nor r1 r2 => nor_seqs (denote_regex r1) (denote_regex r2)
end.
(* Here we try to rewrite the denotation of a regex using a pure inductive predicate, but we get an error:
Non strictly positive occurrence of "ind_regex" in
"forall (s : seq) (P Q : regex), 
s in nor_seqs (ind_regex P) (ind_regex Q) -> ind_regex (nor P Q) s".
*)
Inductive ind_regex: regex -> seqs :=
| ind_emptyset (s: seq):
False ->
ind_regex emptyset s
| ind_lambda (s: seq):
s = [] ->
ind_regex lambda s
| ind_symbol (s: seq) (a: alphabet):
s = [a] ->
ind_regex (symbol a) s
| ind_concat (s: seq) (P Q: regex):
s in (concat_seqs (ind_regex P) (ind_regex Q)) ->
ind_regex (concat P Q) s
| ind_star (s: seq) (R: regex):
s in (star_seqs (ind_regex R)) ->
ind_regex (star R) s
| ind_nor (s: seq) (P Q: regex):
s in (nor_seqs (ind_regex P) (ind_regex Q)) ->
ind_regex (nor P Q) s
.

(*
Here we try to define the denotation of a regex purely as a fixpoint, but we get an error:
Recursive definition of fix_regex is ill-formed.
In environment
fix_regex : regex -> seqs
r : regex
s : regex
xs : seq
x : alphabet
xs' : list alphabet
ys : list alphabet
zs : list alphabet
Recursive call to fix_regex has principal argument equal to "star s" instead of "s".
Recursive definition is:
"fun r : regex =>
match r with
| emptyset => fun _ : seq => False
| lambda => fun xs : seq => xs = []
| symbol y => fun xs : seq => xs = [y]
| concat s t => fun xs : seq => exists ys zs : list alphabet, xs = ys ++ zs / fix_regex s ys / fix_regex t zs
| star s =>
fun xs : seq =>
match xs with
| [] => True
| x :: xs' => exists ys zs : list alphabet, xs' = ys ++ zs / fix_regex s (x :: ys) / fix_regex (star s) zs
end
| nor _ _ => fun _ : seq => True
end".
*)
Fixpoint fix_regex (r: regex): seqs :=
match r with
| emptyset => fun _ => False
| lambda => fun xs => xs = []
| symbol y => fun xs => xs = [y]
| concat s t => fun xs => exists ys zs, xs = ys ++ zs / fix_regex s ys / fix_regex t zs
| star s => fun xs =>
match xs with
| [] => True
| (x::xs') => exists ys zs, xs' = ys ++ zs / fix_regex s (x::ys) / fix_regex (star s) zs
end
| _ => fun _ => True
end.

我们如何混合使用固定点和归纳谓词有什么问题吗

在我看来,混合归纳和定点定义是合理的。 您的fix_regex取决于/运算符,即conj的表示法。conj确实在标准库中被定义为归纳类型。exists _, _也是如此,它是ex的表示法。 我认为定义和使用star_seqs与使用conj一样公平。

有没有其他方法可以将我们的正则表达式纯粹定义为归纳谓词?

在这里,我建议一些替代方案。

互感型

您可以定义多个相互依赖的归纳类型。

这是一个(不完整的(示例。

Inductive match_regex : regex -> seq -> Prop  :=
| match_lambda : match_regex lambda []
| match_symbol : forall a, match_regex (symbol a) [a]
| match_nor : forall r1 r2 s,
unmatch_regex r1 s -> unmatch_regex r2 s -> match_regex (nor r1 r2) s
with unmatch_regex : regex -> seq -> Prop :=
| unmatch_lambda : forall x xs, unmatch_regex lambda (x :: xs)
| unmatch_symbol : forall a b, a <> b -> unmatch_regex (symbol a) [b]
| unmatch_nor_l : forall r1 r2 s,
match_regex r1 s -> unmatch_regex (nor r1 r2) s
| unmatch_nor_r : forall r1 r2 s,
match_regex r2 s -> unmatch_regex (nor r1 r2) s
.

定义正则表达式、序列和布尔值之间的关系。

使用互归纳类型时,编写互补条件(例如上面示例中的match_lambdaunmatch_lambda(可能会很复杂。

这可以通过将命题定义为正则表达式、seq 和 bool 之间的关系来缓解。

Definition alpha_eq_dec : forall (x y : alphabet), {x = y} + {x <> y}.
decide equality.
Defined.
Definition seq_eq_dec : forall (xs ys : seq), {xs = ys} + {xs <> ys} := list_eq_dec alpha_eq_dec.
Definition seq_eqb (xs ys : seq) : bool :=
if seq_eq_dec xs ys then true else false.
Inductive bool_regex : regex -> seq -> bool -> Prop :=
| bool_lambda : forall xs, bool_regex lambda xs (seq_eqb xs [])
| bool_symbol : forall a xs, bool_regex (symbol a) xs (seq_eqb xs [a])
| bool_nor : forall r1 r2 s b1 b2,
bool_regex r1 s b1 -> bool_regex r2 s b2 -> bool_regex (nor r1 r2) s (negb (b1 || b2)).

谓词公理化

将谓词定义为函数可能很棘手,如果不是不可能的话。

定义谓词的要求,如下所示。

Definition matchp_axiom (matchp : regex -> seq -> Prop) : Prop :=
forall r s,
matchp r s <->
match r with
| emptyset => False
| lambda =>  s = []
| symbol a => s = [a]
(* and so on *)
end.

并参数化您的陈述。

Section Facts.
Variable matchp : regex -> seq -> Prop.
Axiom matchp_spec : matchp_axiom matchp.
Lemma star_repeat : forall a n, matchp (star (symbol a)) (repeat a n).
...
Qed.
End Facts.

您不能使用simpl来减少谓词,而是可以使用类似口味的rewrite matchp_spec

这可以通过证明matchp_axiom match_regexmatchp_axiom (fun r s => bool_regex r s true)与其他方法结合使用

实际上,可以使用Fixpoint来定义匹配

Fixpoint match_regex (re : regex) (s : list alphabet) : Prop :=
match re with
| emptyset       => False
| lambda         => s = []
| symbol x       => s = [x]
| concat re1 re2 =>
exists s1 s2, s = s1 ++ s2 / match_regex re1 s1 / match_regex re2 s2
| star re' =>
exists ss, s = List.concat ss / Forall (match_regex re') ss
| nor re1 re2 => ~ (match_regex re1 s / match_regex re2 s)
end.

最新更新