我想根据文件将人员分组。文件如下:
group1 = john dave jim collin;
group2 = abc def ghi jkl mno
pqr stu vxz;
group3 = marc;
所以我必须在等号和分号之间匹配人(换行符最终介于两者之间,参见group2)并归属于一个组。
我尝试了以下操作,但没有成功:
my $person2ascr = "sarah";
open (grp_file, "<$group_file");
# the line bellow will only match if the group list is in one line only
while(<grp_file>) {my $grp = $1 if (/(.*)s*=s*.*n*.*$person2ascr.*n*.*;/i)};
# the following line wont match any. Off course i close/open the file again
while(<grp_file>) {my $grp = $1 if /(w+)s*=s*(w+)*s*$person2ascr(s+w+)*s*;/i};
但当我阅读手册时,我得出的结论是,我做得对:-/有什么帮助吗?
怎么样:
$/=";";
my @grps = <DATA>;
s/n+//g for@grps;
my $person2ascr = "ghi";
for(@grps) {
say "group: $1" if /^([^=]+)=.*b$person2ascrb/;
}
__DATA__
group1 = john dave jim collin;
group2 = abc def ghi jkl mno
pqr stu vxz;
group3 = marc;
输出:
group: group2
当文件具有定义良好的记录结束标记时,有一种非常简单的方法可以一次从文件中读取记录。
#Enclosing braces to ensure local $/ stays very local
{
#Use 3-arg open (safer)
open my $fh, '<', $group_file or die "Can't open $group_file: $!";
#Set "newline" separator to the end-of-record token
local $/ = ";n";
while(my $record = <$fh>) {
#$record will contain "groupN = some name or other;n"
chomp $record;
#$record now contains "groupN = some name or other" without the trailing ";n"
my ($group, $data) = split / = /, $record, 2;
#$group contains "groupN"; $data contains "some name or other"
$grp = $group if $data =~ /$person2ascr/; #Add i modifier if you want case insensitive matching
}
#It's paranoid, but close _can_ fail
close $fh or warn "Closing $group_file failed: $!";
}
这个解决方案可能有些过头了。它解析组文件并构建完整的数据结构。不过,如果您重复查询组信息,这可能是合适的。如果您只需要针对组文件中的几个名称grep
,那么您可能不想要这个解决方案,因为它在这方面做得太过火了。
我为groups文件编写了一个通用解析器,它返回两个映射:从名称到组的映射和从组到名称的映射。
sub parse_name_groups
{
my $file = shift; # file name of group file
my %group_to_names; # Hash mapping groups to lists of names
my %name_to_groups; # Hash mapping names to a list of groups
my $group = "<UNKNOWN>"; # If we see a name outside of a group, assign it to <UNKNOWN>
my $last_line_in_group = 0; # Flag: If we see a semicolon, this is the last line in a group.
open my $fh, "<", $file
or die "Cannot open group file '$file'n";
foreach my $line (<$fh>)
{
chomp $line;
# Trim white space from front and back
$line =~ s/^s*//g;
$line =~ s/s*$//g;
# Does line begin with a group specifier (ie. "group = ")?
# If so, grab it and make it our current group.
if ($line =~ s/^s*(S+)s*=s*//)
{
$group = $1;
}
# Does line have a semicolon? Ignore it and everything
# after. Also, reset $group to <UNKNOWN> after this line.
if ($line =~ s/;.*$//)
{
$last_line_in_group = 1;
}
# Split the rest of the line into a list of names
# and make the name-to-group and group-to-name
# association.
foreach my $name (split /s+/, $line)
{
push @{ $group_to_names{ $group } }, $name;
push @{ $name_to_groups{ $name } }, $group;
}
if ($last_line_in_group)
{
$group = "<UNKNOWN>";
}
$last_line_in_group = 0;
}
close $fh;
return ( %group_to_names, %name_to_groups );
}
这里有一个示例程序,它将在组文件中查找一个名称,并告诉您该名称属于哪个组(如果有的话):
# Example program that looks up the group(s) associated with a name.
# Usage:
#
# ./lookup_name group_file name
if ($#ARGV != 1)
{
die "Usage: lookup_name group_file namen";
}
my ( $file, $name ) = @ARGV;
my ($group_to_names, $name_to_groups) = parse_name_groups( $file );
my $groups = $name_to_groups->{ $name };
if (!defined $groups)
{
print "$name does not belong to any groupsn";
} else
{
print join("n", @$groups), "n";
}
由于没有完全指定组文件格式,我在解析器中进行了一些判断调用。具体来说,如果它在看到group =
指定之前看到类似名称的东西,它将把这些名称分配给组<UNKNOWN>
。同样,如果它看到一个分号,它看到的任何名称(从后面的一行开始),但在group =
被分配给组<UNKNOWN>
之前。
该代码还将分号视为"行尾"指示。同一行分号之后的任何内容都将被忽略。
上面的代码中应该有足够的注释,这样您就可以根据应用程序的需要更改这些行为。