我需要在C中表示非常小阶的浮点数(例如,0.6745×2-3000)。这种支持必须与平台无关(适用于CPU和GPU-CUDA)。不需要有效位的大长度。
我不能使用高精度库(GMP、MPFR等),因为它们不适用于GPU。另一方面,CUDA不支持long double
类型。有什么解决办法吗?有可能以某种方式实现自定义浮点类型吗?
您可以在日志空间中工作,也就是说,将每个数字表示为ex,其中x是您的标准浮点类型:
-
加法/减法(以及更普遍的求和)可以使用对数和exp技巧来执行,即
- ex+ey=ex(1+ey-x)=ex+log(1+exp(y-x))
-
乘法/除法变为加法/减法
- ex&次;ex=ex+y
-
将提升到一个功率是相当简单的
- (ex)^(ex)=exexp(y)
我写了一个简单的解决方案(使用这项工作):
#include <math.h>
#include <stdint.h>
#define DOUBLE_PRECISION 53
/* DOUBLE PRECISION FLOATING-POINT TYPE WITH EXTENDED EXPONENT */
typedef struct Real {
double sig; //significand
long exp; //binary exponent
} real;
/* UNION FOR DIVISION DOUBLE BY 2^POW */
union DubleIntUnion
{
double dvalue;
uint64_t ivalue;
};
/* PLACE SIGNIFICAND OF REAL NUMBER IN RANGE [1, 2) */
inline real adjust(real x){
real y;
y.exp = x.exp;
y.sig = x.sig;
if(y.sig == 0){
y.exp = 0;
} else if (fabs(y.sig) >= 2.0){
y.exp = y.exp + 1;
y.sig = y.sig / 2;
} else if(fabs(y.sig) < 1){
y.exp = y.exp - 1;
y.sig = y.sig * 2;
}
return y;
}
/* PLACE SIGNIFICAND OF REAL NUMBER IN RANGE [1, 2) FOR TINY NUMBER */
/* FOR EXAMPLE, AFTER SUBTRATION OR WHEN SET REAL FROM DOUBLE */
inline real adjusttiny(real x){
real y;
y.exp = x.exp;
y.sig = x.sig;
while(1){
x.exp = y.exp;
x.sig = y.sig;
y = adjust(x);
if(x.exp == y.exp && x.sig == y.sig)
break;
}
return y;
}
real set(double x){
real y;
real z;
y.sig = x;
y.exp = 0;
return adjusttiny(y);
};
real set(real x){
real y;
y.exp = x.exp;
y.sig = x.sig;
return y;
};
/* ARITHMETIC OPERATIONS */
//divide x by 2^pow. Assert that x.exp - pow > e_min
inline double div2pow(const double x, const int pow)
{
DubleIntUnion diu;
diu.dvalue = x;
diu.ivalue -= (uint64_t)pow << 52; // subtract pow from exponent
return diu.dvalue;
}
//summation
inline real sum(real x, real y){
real sum;
int dexp = abs(x.exp - y.exp);
if (x.exp > y.exp){
sum.exp = x.exp;
if(dexp <= DOUBLE_PRECISION){
sum.sig = div2pow(y.sig, dexp); // divide y by 2^(x.exp - y.exp)
sum.sig = sum.sig + x.sig;
} else sum.sig = x.sig;
} else if (y.exp > x.exp){
sum.exp = y.exp;
if(dexp <= DOUBLE_PRECISION){
sum.sig = div2pow(x.sig, dexp); // divide x by 2^(y.exp - x.exp)
sum.sig = sum.sig + y.sig;
} else
sum.sig = y.sig;
} else {
sum.exp = x.exp;
sum.sig = x.sig + y.sig;
}
return adjust(sum);
}
//subtraction
inline real sub(real x, real y){
real sub;
int dexp = abs(x.exp - y.exp);
if (x.exp > y.exp){
sub.exp = x.exp;
if(dexp <= DOUBLE_PRECISION){
sub.sig = div2pow(y.sig, dexp); // divide y by 2^(x.exp - y.exp)
sub.sig = x.sig - sub.sig;
} else sub.sig = x.sig;
} else if (y.exp > x.exp){
sub.exp = y.exp;
if(dexp <= DOUBLE_PRECISION){
sub.sig = div2pow(x.sig, dexp); // divide x by 2^(y.exp - x.exp)
sub.sig = sub.sig - y.sig;
} else sub.sig = -y.sig;
} else {
sub.exp = x.exp;
sub.sig = x.sig - y.sig;
}
return adjusttiny(sub);
}
//multiplication
inline real mul(real x, real y){
real product;
product.exp = x.exp + y.exp;
product.sig = x.sig * y.sig;
return adjust(product);
}
//division
inline real div(real x, real y){
real quotient;
quotient.exp = x.exp - y.exp;
quotient.sig = x.sig / y.sig;
return adjust(quotient);
}
乍一看工作正常。也许我错过了什么,或者可以加快实施?
如何在这样的数字上实现函数floor
和ceil
?
如果您需要非常大的指数,那么对称级别的索引算法可能适合您的需求。然而,精度很难预测,因此您可能需要更高的精度来补偿LI(水平指数)值。一种常用的提高精度的方法是双二重算法,该算法在CUDA上也得到了广泛的应用。
CUDA上也有许多多精度库,如CUMP
更多信息:
- https://devtalk.nvidia.com/default/topic/512234/gpump-multiple-precision-arithmetic-on-the-gpu-/
- http://individual.utoronto.ca/haojunliu/courses/ECE1724_Report.pdf
- http://gcl.cis.udel.edu/publications/meetings/090709_ARLvisit/090709_Library4GPU.pdf
- http://www.cra.org/Activities/craw_archive/dmp/awards/2009/Padron/proposal_omar_dreu_09.pdf