分类导航

程序问答发布时间：2022-06-02 发布网站：大佬教程 code.js-code.com

大佬教程收集整理的这篇文章主要介绍了将double转换为float时如何确定舍入方向？，大佬教程大佬觉得挺不错的，现在分享给大家，也给大家做个参考。

如何解决将double转换为float时如何确定舍入方向？？

开发过程中遇到将double转换为float时如何确定舍入方向？的问题如何解决？下面主要结合日常开发的经验，给出你关于将double转换为float时如何确定舍入方向？的解决方法建议，希望对你解决将double转换为float时如何确定舍入方向？有所启发或帮助；

我正在寻找一种算法来确定将任意 64 位双精度值转换为 32 位浮点数时的舍入方向。我进行此检查的具体用例是将 64 位双精度值转换为 32 位浮点数，并朝无穷大舍入。

起初，我应用以下标准来检查尾数截断部分中的位。如果尾数的截断部分不为零，则强制转换必须向下舍入！

const F64_MASK: u64 = (1 << 29) - 1;
fn is_rounded_up_when_casted(v: f64) -> bool {
    v.to_bits() & F64_MASK > 0
}

然而，这个标准不能识别所有情况——指数的最后三位也被截断。我尝试修改掩码以检查这些指数位：

const F64_MASK: u64 = (1u64 << 55) - (1 << 52) + (1 << 29) - 1;

很遗憾，此检查不起作用。例如，数字 1.401298464324817e−45 有一个指数，其中三个截断的位是 010，但仍然以 float/f32 精确表示。

编辑：我认为不能说非零尾数意味着正舍入。我想我需要一种不同的方法。我认为指数只是增加了数字的范围，因此可以通过一些单独的检查来处理。舍入方向可能只是尾数截断部分的前导位的函数？

解决方法

您发现的边缘情况与以下事实有关：低于正常的 f32 值实际上可以表示小于其典型最小值的指数。我编写了一个我认为涵盖所有边缘情况的函数：

const F64_MANTISSA_SIZE: u64 = 52;
const F64_MANTISSA_MASK: u64 = (1 << F64_MANTISSA_SIZE) - 1;
const F64_EXPONENT_SIZE: u64 = 64 - F64_MANTISSA_SIZE - 1;
const F64_EXPONENT_MASK: u64 = (1 << F64_EXPONENT_SIZE) - 1; // shift away the mantissa first
const F32_MANTISSA_SIZE: u64 = 23;
const F64_TRUNCATED_MANTISSA_SIZE: u64 = F64_MANTISSA_SIZE - F32_MANTISSA_SIZE;
const F64_TRUNCATED_MANTISSA_MASK: u64 = (1 << F64_TRUNCATED_MANTISSA_SIZE) - 1;

fn is_exactly_representable_as_f32(v: f64) -> bool {
    let bits = v.to_bits();
    let mantissa = bits & F64_MANTISSA_MASK;
    let exponent = (bits >> F64_MANTISSA_SIZE) & F64_EXPONENT_MASK;
    let _sign = bits >> (F64_MANTISSA_SIZE + F64_EXPONENT_SIZE) != 0;
    if exponent == 0 {
        // if mantissa == 0,the float is 0 or -0,which is representable
        // if mantissa != 0,it's a subnormal,which is never representable
        return mantissa == 0;
    }
    if exponent == F64_EXPONENT_MASK {
        // either infinity or nan,all of which are representable
        return true;
    }
    // remember to subtract the bias
    let computed_exponent = exponent as i64 - 1023;
    // -126 and 127 are the min and max value for a standard f32 exponent
    if (-126..=127).contains(&computed_exponent) {
        // at this point,it's only exactly representable if the truncated mantissa is all zero
        return mantissa & F64_TRUNCATED_MANTISSA_MASK == 0;
    }
    // exponents less than 2**(-126) may be representable by f32 subnormals
    if computed_exponent < -126 {
        // this is the number of leading zeroes that need to be in the f32 mantissa
        let diff = -127 - computed_exponent;
        // this is the number of bits in the mantissa that must be preserved (essentially mantissa with Trailing zeroes trimmed off)
        let mantissa_bits = F64_MANTISSA_SIZE - (mantissa.Trailing_zeros() as u64).min(F64_MANTISSA_SIZE) + 1;
        // the leading zeroes + essential mantissa bits must be able to fit in the smaller mantissa size
        return diff as u64 + mantissa_bits <= F32_MANTISSA_SIZE;
    }
    // the exponent is >127 so f32s can't go that high
    return false;
}

无需修改位：

#[derive(PartialEq,std::fmt::Debug)]
enum Direction { Equal,Up,Down }
fn get_rounding_direction(v: f64) -> Direction {
    match v.partial_cmp(&(v as f32 as f64)) {
        Some(Ordering::Greater) => Direction::Down,Some(Ordering::Less) => Direction::Up,_ => Direction::Equal
    }
}

还有一些测试来检查正确性。

#[cfg(test)]
#[test]
fn test_get_rounding_direction() {
    // check that the f64 one step below 2 casts to exactly 2
    assert_eq!(get_rounding_direction(1.9999999999999998),Direction::Up);

    // check edge cases
    assert_eq!(get_rounding_direction(f64::NAN),Direction::Equal);
    assert_eq!(get_rounding_direction(f64::NEG_INFINITY),Direction::Equal);
    assert_eq!(get_rounding_direction(f64::MIN),Direction::Down);
    assert_eq!(get_rounding_direction(-f64::MIN_POSITIVE),Direction::Up);
    assert_eq!(get_rounding_direction(-0.),Direction::Equal);
    assert_eq!(get_rounding_direction(0.),Direction::Equal);
    assert_eq!(get_rounding_direction(f64::MIN_POSITIVE),Direction::Down);
    assert_eq!(get_rounding_direction(f64::MAX),Direction::Up);
    assert_eq!(get_rounding_direction(f64::INFINITY),Direction::Equal);

    // for all other f32
    for u32_bits in 1..f32::INFINITY.to_bits() - 1 {
        let f64_value = f32::from_bits(u32_bits) as f64;
        let u64_bits = f64_value.to_bits();

        if u32_bits % 100_000_000 == 0 {
            println!("checkpoint every 600 million tests: {}",f64_value);
        }
        // check that the f64 equivalent to the current f32 casts to a value that is equivalent
        assert_eq!(get_rounding_direction(f64_value),Direction::Equal,"at {},{}",u32_bits,f64_value);
        // check that the f64 one step below the f64 equivalent to the current f32 casts to a value that is one step greater
        assert_eq!(get_rounding_direction(f64::from_bits(u64_bits - 1)),Direction::Up,f64_value);
        // check that the f64 one step above the f64 equivalent to the current f32 casts to a value that is one step less
        assert_eq!(get_rounding_direction(f64::from_bits(u64_bits + 1)),Direction::Down,f64_value);

        // same checks for negative numbers
        let u64_bits = (-f64_value).to_bits();
        assert_eq!(get_rounding_direction(f64_value),f64_value);
        assert_eq!(get_rounding_direction(f64::from_bits(u64_bits - 1)),f64_value);
        assert_eq!(get_rounding_direction(f64::from_bits(u64_bits + 1)),f64_value);
    }
}

特别是向无穷大舍入：

fn cast_toWARD_inf(vf64: f64) -> f32 {
    let vf32 = vf64 as f32;
    if vf64 > vf32 as f64 { f32::from_bits(vf32.to_bits() + 1) } else { vf32 }
}

可以主要从第 28 位（尾数截断部分的第一位）确定舍入，但处理边缘情况会带来显着的复杂性。

大佬总结

以上是大佬教程为你收集整理的将double转换为float时如何确定舍入方向？全部内容，希望文章能够帮你解决将double转换为float时如何确定舍入方向？所遇到的程序开发问题。

如果觉得大佬教程网站内容还不错，欢迎将大佬教程推荐给程序员好友。

本图文内容来源于网友网络收集整理提供，作为学习参考使用，版权属于原作者。
如您有任何意见或建议可联系处理。小编QQ：384754419，请注明来意。

标签：将double转换为float时如何确定舍入方向？

上一篇: System.IO.File.InternalDelete ... 下一篇:如何快速制作 isarray(n)？

猜你在找的程序问答相关文章

在烧瓶中重定向时发出POST请求 2022-06-02
从 CreateWindow() 返回的 HWND 的格式值是多少？ 2022-05-31
使用nodejs打印json对象内容 2022-05-31
useEffect 无限循环仅在测试时发生，否则不会发生 - 尽管使用 useReducer 2022-05-31
从雅虎财经检索 ESG 分数 2022-05-31
Gulp：获取“必须指定任务功能”错误，但我只有 1 个任务 2022-05-31
JavaScript 将平面数组转换为嵌套/分组和排序数组 2022-05-31
405 Method Not Allowed 当提交表单到 Flask 时，即使路由有 ['GET', 'PO... 2022-05-31
Mongodb 错误码和对应的 http 状态码 2022-05-31
连接到上游时 Nginx connect() 失败（111：连接被拒绝），客户端：192.168.128.1，服务... 2022-05-31