GMP将最后一位设置为零



我正在寻找最快的方法将声明为mpz_t的正数l的最后一位设置为零。我还没有发现这个函数可以。例如,应将6531489321483更改为6531489321480

更新

对于mpz_t类型,减法和模运算似乎是将最后一位清零的优越方法。正如@MarkDickinson和@MarcGlisse所指出的,渐近行为极大地倾向于使用mpz_tdiv_r_ui(或mpz_fdiv_r_ui(,而不是使用mpz_div_uimpz-mul_ui。我最初的基准是相对较小的数字(25位数(。我重新测试了一个175位的数字,sub_mod方法快了近40%。

Test value: 1234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456789
Result with div_mul: 1234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456780
Result with sub_mod: 1234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456780
time with division followed by multiplication: 6.145656
time with subtraction and modulo: 4.413998

使用350位数字,我们可以看到sub_mod的速度大约快85%:

Test value: 12345678987654321234567891234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456789
Result with div_mul: 12345678987654321234567891234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456780
Result with sub_mod: 12345678987654321234567891234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456789123456789876543212345678912345678987654321234567891234567898765432123456780
time with division followed by multiplication: 10.256122
time with subtraction and modulo: 5.522990

需要注意的是,无论我们使用mpz_tdiv_r_ui还是mpz_fdiv_r_ui,结果都几乎相同。

由于sub_mod方法在较小的数字下只稍微慢一点,因此似乎只在所有情况下使用该方法是合理的。

如果能在不同的编译器上进行测试,那就太好了。我目前正在使用clang 5.0.1

原件

我的机器上的基准测试表明,除法后乘法比通过模运算和减法求余数更快。

#include <stdio.h>
#include <time.h>
#include <gmp.h>
void div_mul(mpz_t x) {
mpz_tdiv_q_ui(x, x, 10u);
mpz_mul_ui(x, x, 10u);
}
// Maybe this could be simpler?
void sub_mod(mpz_t x, mpz_t y) {
// N.B. mpz_mod_ui is equivalent to mpz_fdiv_r_ui. Changed to 
// mpz_tdiv_r_ui for consistency with div_mul.
mpz_tdiv_r_ui(y, x, 10u);
mpz_sub(x, x, y);
}

Main:

int main() {
mpz_t testVal;
mpz_init(testVal);
mpz_set_str(testVal, "1234567898765432123456789", 10);
gmp_printf("Test value: %Zdn", testVal);

mpz_t x;
mpz_t y;

mpz_init(x);
mpz_init(y);

mpz_set(x, testVal);
div_mul(x);
gmp_printf("Result with div_mul: %Zdn", x);

mpz_set(x, testVal);
sub_mod(x, y);
gmp_printf("Result with sub_mod: %Zdn", x);

const int limit = 100000000;
const double checkPoint0 = (double) clock() / CLOCKS_PER_SEC;

for (int i = 0; i < limit; ++i) {
mpz_set(x, testVal);
div_mul(x);
}

const double checkPoint1 = (double) clock() / CLOCKS_PER_SEC;
const double time_div_mul = checkPoint1 - checkPoint0;
printf("time with division followed by multiplication: %fn", time_div_mul);
const double checkPoint2 = (double) clock() / CLOCKS_PER_SEC;

for (int i = 0; i < limit; ++i) {
mpz_set(x, testVal);
sub_mod(x, y);
}

const double checkPoint3 = (double) clock() / CLOCKS_PER_SEC;
const double time_sub_mod = checkPoint3 - checkPoint2;
printf("time with subtraction and modulo: %fn", time_sub_mod);
mpz_clear(testVal);
mpz_clear(x);
mpz_clear(y);
return 0;
}

输出:

Test value: 1234567898765432123456789
Result with div_mul: 1234567898765432123456780
Result with sub_mod: 1234567898765432123456780
time with division followed by multiplication: 2.941251
time with subtraction and modulo: 3.171949

我怀疑后一种方法稍微慢一点的原因之一是需要2个变量,因为CAPI中无法访问同一行上的复杂多操作。如果我们可以使用gmpxx,我们就可以编写x - x % 10

关于为什么第一种方法更快的另一个想法是,div_mul涉及两个带有无符号整数的运算,而sub_mod方法涉及一个带有无标记整数的运算和一个带有mpz_t的运算。

我试图在ideone.com上复制这个,但无法加载gmp.h。我选择使用类型long long int来实现类似的基准测试,只是为了好玩。您将注意到volatile的存在,并且限制是十亿,而不是如上所述的一亿。volatile需要防止for循环被优化。

将数字转换为字符串并更改最后一个字符不是最快的方法吗?

最新更新