如何计算字符串C数组中相同长度的单词数



我正在打开并阅读一个字典文件,并计算文件中有多少单词。然后我将每个单词单独存储在一个字符串数组中。之后,我使用函数qsort()按长度和字母顺序对单词进行排序。现在,我正试图访问表格,计算有多少单词的长度相同,但我很难决定下一步该如何进行。到目前为止,我写的代码是:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAX_STR 100
/* Sorting the words by length and alphabtichal order,
being legnth priority number one */
int compare(const void *a, const void *b){
const char **str_a = (const char **)a;
const char **str_b = (const char **)b;
int len1 = strlen(*str_a);
int len2 = strlen(*str_b);
if (len1 < len2) return -1;
if (len1 > len2) return  1;
return strcmp(*str_a, *str_b);
}
int main (int argc, char *argv[]){
FILE *fp = NULL;
int i = 0, n_total_palavras = 0;
char str[MAX_STR];
int  count = 0;
char **Words;
fp = fopen("words.dict", "r");
if (fp == NULL){
exit (0);
}
while (fscanf(fp,"%s",str) == 1){
n_total_palavras++;
}
Words = (char **)malloc(n_total_palavras * sizeof (char *));
if (Words == NULL){
exit(0);
}
for (i = 0; i < n_total_palavras; i++){
Words[i] = NULL;

}
rewind (fp);
while (fscanf(fp,"%s",str) == 1){
Words[count] = (char*)malloc((strlen(str)+1) * sizeof(char));
strcpy(Words[count], str);
count++;
}
qsort(Words, n_total_palavras, sizeof(Words[0]), compare);
/* for(i = 0; i < n_total_palavras; i++){
printf("%sn", Words[i]);
}
*/

fclose(fp);
return 0;
}

我正试图获得这样的东西:

4 letters words: 2018
5 letters words: 170
6 letters words: 10
(...)

你知道我该怎么看这个吗?

以下是我的代码实现了我的建议。它读取文件一次,根据需要增加单词列表。每次需要更多的空间时,它分配的空间大约是以前的两倍。

我稍微简化了比较函数,但优化器可能已经接近我所写的内容。

该代码当前配置为不打印已排序的单词列表。

/* SO 7400-7509 */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAX_STR 100
/*
* Sorting the words first by length and then in alphabetical order
*/
static int compare(const void *a, const void *b)
{
const char *str_a = *(const char **)a;
const char *str_b = *(const char **)b;
int len1 = strlen(str_a);
int len2 = strlen(str_b);
if (len1 < len2)
return -1;
if (len1 > len2)
return 1;
return strcmp(str_a, str_b);
}
int main(int argc, char *argv[])
{
char str[MAX_STR];
char **words = 0;
size_t max_words = 0;
size_t num_words = 0;
const char *filename = "words.dict";
if (argc == 2)
filename = argv[1];
else if (argc > 2)
{
fprintf(stderr, "Usage: %s [filename]n", argv[0]);
exit(EXIT_FAILURE);
}
FILE *fp = fopen(filename, "r");
if (fp == NULL)
{
fprintf(stderr, "%s: failed to open file '%s' for readingn",
argv[0], filename);
exit(EXIT_FAILURE);
}
while (fscanf(fp, "%99s", str) == 1)
{
if (num_words >= max_words)
{
size_t new_size = (max_words + 2) * 2;
void *new_space = realloc(words, sizeof(words[0]) * new_size);
if (new_space == NULL)
{
fprintf(stderr, "%s: failed to allocate %zu pointersn",
argv[0], new_size);
exit(EXIT_FAILURE);
}
words = new_space;
max_words = new_size;
}
words[num_words++] = strdup(str);
}
fclose(fp);
qsort(words, num_words, sizeof(words[0]), compare);
/*
for (size_t i = 0; i < num_words; i++)
{
printf("%zu: %sn", i+1, words[i]);
}
*/
size_t count = 0;
size_t currlen = strlen(words[0]);
for (size_t i = 0; i < num_words; i++)
{
size_t length = strlen(words[i]);
if (length == currlen)
count++;
else
{
printf("%zu-letter words: %zun", currlen, count);
currlen = length;
count = 1;
}
}
printf("%zu-letter words: %zun", currlen, count);
return 0;
}

考虑数据文件words.dict:

alpha
beta
gamma
delta
epsilon
Hawaii
California
Colorado
Alaska
Alabama
Arizona

它产生输出:

4-letter words: 1
5-letter words: 3
6-letter words: 2
7-letter words: 3
8-letter words: 1
10-letter words: 1

给定Linux字典的一个变体(它被破坏了,所以单词都是单格的,所有标点符号都被删除了,重复的也被删除了(,输出是:

$ timecmd -m -- cw31 ~/src/spelling-bee/sb-wordlist
2022-10-09 13:56:34.935 [PID 94595] cw31 /Users/jonathanleffler/src/spelling-bee/sb-wordlist
1-letter words: 26
2-letter words: 566
3-letter words: 4343
4-letter words: 10359
5-letter words: 21884
6-letter words: 38179
7-letter words: 50447
8-letter words: 58182
9-letter words: 57289
10-letter words: 48591
11-letter words: 39357
12-letter words: 30260
13-letter words: 21642
14-letter words: 14585
15-letter words: 9078
16-letter words: 5325
17-letter words: 3046
18-letter words: 1505
19-letter words: 774
20-letter words: 363
21-letter words: 170
22-letter words: 74
23-letter words: 31
24-letter words: 12
25-letter words: 8
27-letter words: 3
28-letter words: 2
29-letter words: 2
31-letter words: 1
45-letter words: 1
2022-10-09 13:56:35.130 [PID 94595; status 0x0000]  -  0.195s
$ wc -l ~/src/spelling-bee/sb-wordlist
416105 /Users/jonathanleffler/src/spelling-bee/sb-wordlist
$

好奇者的45个字母的单词是";肺炎显微镜检查;。

对我来说,先按长度对单词列表进行排序,再按字母顺序进行排序,然后统计相同长度的单词是没有意义的。这是在试图让简单变得复杂吗?

int main() {
FILE *fp = fopen( "foobar.txt", "r" );
/* omitting check for failure */
char buf[ 128 ];
int wc = 0, lengths[ 50 ] = { 0 };
while( fgets( buf, sizeof buf, fp ) != NULL ) {
/* add test for length >= 50 if paranoid about source data */
lengths[ strlen( buf ) - 1 ]++;
wc++; // word counter
}
fclose( fp );
printf( "word count: %dn", wc );
for( int i = 1; i < sizeof lengths/sizeof lengths[0]; i++ )
if( lengths[ i ] )
printf( "words of %d letters: %dn", i, lengths[ i ] );
return 0;
}

只有极少数几行代码可以计算出长度相似的单词。如果需要一些其他功能(例如:按排序顺序报告4-6个字符的单词(,只需在这些行中添加一点即可实现该结果。

UNIX——我们所熟知和喜爱的成功操作系统——是建立在简单原则之上的。在早期,像cattrsort这样的实用程序做了一件事,而且做得很好。这些天,cat试图成为cat+more+less+tail+number+paginate+。。。";"手册页";对cat来说是一句话:;将命名为参数('-'==stdin(的文件复制/连接到stdout"现在cat的手册页将持续几页。

KISS,然后继续前进。添加厨房水槽不是一个好的编程实践。

最新更新