在特殊案例功能中展开循环

，所以我正在尝试优化一些代码。我的功能具有可变大小的循环。但是，为了提高效率，我想用1、2和3尺寸的循环案例制作完全展开的特殊情况。到目前为止，我的方法是将循环大小声明为const参数，然后定义包装器函数，该功能将其称为主函数，将其传递为const值的字面函数。我包括了一个代码剪辑，说明了我想到的东西。

inline void someFunction (const int a)
{
    for (int i=0; i<a; i++)
    {
        // do something with i.
    }
}
void specialCase()
{
    someFunction (3);
}
void generalCase(int a)
{
    someFunction (a);
}

所以我的问题是我期望我的编译器（GCC）在特殊案件的内部展开for循环是合理的。我的意思是显然我可以复制 - 将某些功能的内容粘贴到特殊案例中，然后用3替换为3，但我宁愿仅处理我的代码中某种函数的定义。

。

为了提高效率，我想用1、2和3尺寸的循环特殊案例制作案例，这些案例完全展开。

您是否测量了这实际上更快？我怀疑它将（或编译器不会自动展开循环）。

到目前为止，我的方法是将循环大小声明为const参数，然后定义包装器函数，将其称为主函数，将其传递为const值的文字。

const在这里没有任何意义。它不会影响编译器展开循环的能力。这只是意味着a不能在功能主体内突变，但仍然是一个运行时参数。

如果要确保展开，请强制它。C 17很容易。

template <typename F, std::size_t... Is>
void repeat_unrolled_impl(F&& f, std::index_sequence<Is...>)
{
    (f(std::integral_constant<std::size_t, Is>{}), ...);
}
template <std::size_t Iterations, typename F>
void repeat_unrolled(F&& f)
{
    repeat_unrolled_impl(std::forward<F>(f), 
                         std::make_index_sequence<Iterations>{});
}

Godbolt上的实时示例

如果您不喜欢模板并且不信任您的编译器，则总有此方法，该方法的启发是由过时的手动展开循环的方法，称为" Duff的设备"：

void do_something(int i);
void do_something_n_times(int n)
{
    int i = 0;
    switch(n)
    {
        default:
            while(n > 3) {
                do_something(i++);
                --n;
            }
        case 3: do_something(i++);
        case 2: do_something(i++);
        case 1: do_something(i++);
    }
}

，但我认为值得一提的是，如果您不相信编译器做一些简单的事情，例如为您展开循环，那么可能是时候考虑一个新的编译器了。

请注意，Duff的设备最初是针对编译器编译的编译器的微观化策略发明的，这些编译器不会自动应用循环 - 未汇总优化。

它是由汤姆·达夫（Tom Duff）于1983年发明的。

https://en.wikipedia.org/wiki/duff'S_Device

它与现代编译器的使用值得怀疑。

如果您愿意使用所有流行编译器的力量（非标准）功能，我宁愿去这样。

__attribute__((always_inline))
void bodyOfLoop(int i) {
  // put code here
}
void specialCase() {
    bodyOfLoop(0);
    bodyOfLoop(1);
    bodyOfLoop(2);
}
void generalCase(int a) {
    for (int i=0; i<a; i++) {
        bodyOfLoop(i);
    }
}

注意：这是GCC/clang解决方案。将__forceinline用于MSVC。

这个c 20揭开式螺旋桨怎么样：

#pragma once
#include <utility>
#include <concepts>
#include <iterator>
template<size_t N, typename Fn>
    requires (N >= 1) && requires( Fn fn, size_t i ) { { fn( i ) } -> std::same_as<void>; }
inline
void unroll( Fn fn )
{
    auto unroll_n = [&]<size_t ... Indices>( std::index_sequence<Indices ...> )
    {
        (fn( Indices ), ...);
    };
    unroll_n( std::make_index_sequence<N>() );
}
template<size_t N, typename Fn>
    requires (N >= 1) && requires( Fn fn ) { { fn() } -> std::same_as<void>; }
inline
void unroll( Fn fn )
{
    auto unroll_n = [&]<size_t ... Indices>( std::index_sequence<Indices ...> )
    {
        return ((Indices, fn()), ...);
    };
    unroll_n( std::make_index_sequence<N>() );
}
template<size_t N, typename Fn>
    requires (N >= 1) && requires( Fn fn, size_t i ) { { fn( i ) } -> std::convertible_to<bool>; }
inline
bool unroll( Fn fn )
{
    auto unroll_n = [&]<size_t ... Indices>( std::index_sequence<Indices ...> ) -> bool
    {
        return (fn( Indices ) && ...);
    };
    return unroll_n( std::make_index_sequence<N>() );
}
template<size_t N, typename Fn>
    requires (N >= 1) && requires( Fn fn ) { { fn() } -> std::convertible_to<bool>; }
inline
bool unroll( Fn fn )
{
    auto unroll_n = [&]<size_t ... Indices>( std::index_sequence<Indices ...> ) -> bool
    {
        return ((Indices, fn()) && ...);
    };
    return unroll_n( std::make_index_sequence<N>() );
}
template<std::size_t N, typename RandomIt, typename UnaryFunction>
    requires std::random_access_iterator<RandomIt>
    && requires( UnaryFunction fn, typename std::iterator_traits<RandomIt>::value_type elem ) { { fn( elem ) }; }
inline
RandomIt unroll_for_each( RandomIt begin, RandomIt end, UnaryFunction fn )
{
    RandomIt &it = begin;
    if constexpr( N > 1 )
        for( ; it + N <= end; it += N )
            unroll<N>( [&]( size_t i ) { fn( it[i] ); } );
    for( ; it < end; ++it )
        fn( *begin );
    return it;
}

但是请注意，在这里展开的因素至关重要。展开并不总是有益的，有时甚至超出了最佳CPU特异性展开因素而不会在不展开的情况下进行。

相关内容

最新更新

热门标签：