大佬教程收集整理的这篇文章主要介绍了LLVM笔记(20) - AntiDepBreaker,大佬教程大佬觉得挺不错的,现在分享给大家,也给大家做个参考。
自从年初手断了以后好久不写文字了, 说好的笔耕不辍也忘了(=.=), 今天正好有同学问AntiDepBreaker, 就花点时间看了下代码(顺便水一篇).
在数据流分析中一般将数据依赖分为三种:
对于反依赖与输出依赖, 由于它们分别要求先使用后定义/多次定义, 因此只有在经过Phi Elimination后程序流不再维护SSA状态才能出现. 而一般架构在postRA阶段会再做一次指令调度, 此时分析依赖图即会产生反依赖与输出依赖. 另外需要注意的是有些架构存在隐式定义/使用特殊寄存器的情况, 由于这类指令固定使用某些物理寄存器, 因此preRA阶段也可能产生针对特定物理寄存器的反依赖/数据依赖(这也是为什么建议大家不要给指令定义隐式急寄存器, 而是通过增加寄存器类型的方式限定寄存器使用的原因).
依赖的存在加强了节点间的约束, 使指令调度时可选的调度窗口变窄, 导致调度结果变差. 减少依赖可以扩大指令调度窗口, 缩短critical path的长度, 获得更好的性能. 需要注意的是真依赖与(对固定寄存器的)写依赖/反依赖都是不可消除的, 只有(对非固定寄存器的)写依赖/反依赖其读写的内容实质指向两个值, 因此可以通过变量/寄存器重命名来解决, AntiDepBreaker就是用postRA阶段调度前的消除依赖的优化. 另外还要注意的是这类优化主要针对没有硬件调度单元的架构做的, 对于支持乱序执行与重命名的架构作用相对而言较小.
自底向上扫描指令, 跟踪寄存器分配状态的同时检查指令是否存在反依赖. 若存在反依赖则从空闲的寄存器列表中选择合适(生命周期不冲突)的寄存器做替换. 在postRA做这类数据流分析的难度主要在于无法维持SSA状态, 在分析过程中需要注意以下几点:
AntiDepBreaker不是一个单独的PASS, 而是依附于postRA阶段的调度PASS, 其原因是postRA调度按region划分并构建依赖图, 因此每个region调度前都需要调用AntiDepBreaker. 可以通过选项-mllvm -break-anti-dependencies=none
关闭该优化(见lib/CodeGen/PostRAschedulerList.cpp:61).
在开启该优化时也可以选择两种模式: critical与aggressive, 其中前者只会针对关键路径上的反依赖做消除, 后者则会力求消除所有的反依赖, 可以通过上面的选项控制使用哪种模式.
我们以aggressive模式为例来看看如何实现AntiDepBreaker(源码见lib/CodeGen/AggressiveAntiDepBreaker.cpp).
先来看下数据结构的定义: AggressiveAntiDepBreaker定义了两个类(见lib/CodeGen/AggressiveAntiDepBreaker.h), AggressiveAntiDepState与AggressiveAntiDepBreaker.
AggressiveAntiDepState用于在遍历指令的过程中维护寄存器的状态, 其成员如下.
class LLVM_LIBRARY_VISIBILITY AggressiveAntiDepState {
const unsigned NumTargetRegs;
std::vector<unsigned> GroupNodes;
std::vector<unsigned> GroupNodeInDices;
std::multimap<unsigned, RegisterReference> RegRefs;
std::vector<unsigned> KillInDices;
std::vector<unsigned> DefInDices;
};
AggressiveAntiDepState::AggressiveAntiDepState(const unsigned TargetRegs,
MachineBasicBlock *BB)
: NumTargetRegs(TargetRegs), GroupNodes(TargetRegs, 0),
GroupNodeInDices(TargetRegs, 0), KillInDices(TargetRegs, 0),
DefInDices(TargetRegs, 0) {
const unsigned BBSize = BB->size();
for (unsigned i = 0; i < NumTargetRegs; ++i) {
GroupNodeInDices[i] = i;
KillInDices[i] = ~0u;
DefInDices[i] = BBSize;
}
}
TRI->getNumRegs()
), 为其它几个容器成员的初始元素个数.补充: 寄存器的几种状态(关于寄存器状态定义的细节见include/llvm/CodeGen/MachineOperand.h中定义).
再看下AggressiveAntiDepState对外暴露的几个接口.
unsigned AggressiveAntiDepState::GetGroup(unsigned Reg) {
unsigned Node = GroupNodeInDices[Reg];
while (GroupNodes[Node] != NodE)
Node = GroupNodes[Node];
return Node;
}
void AggressiveAntiDepState::GetGroupRegs(
unsigned Group,
std::vector<unsigned> &Regs,
std::multimap<unsigned, AggressiveAntiDepState::registerReference> *RegRefs)
{
for (unsigned Reg = 0; Reg != NumTargetRegs; ++Reg) {
if ((GetGroup(Reg) == Group) && (RegRefs->count(Reg) > 0))
Regs.push_BACk(Reg);
}
}
unsigned AggressiveAntiDepState::UnionGroups(unsigned Reg1, unsigned Reg2) {
unsigned Group1 = GetGroup(Reg1);
unsigned Group2 = GetGroup(Reg2);
unsigned Parent = (Group1 == 0) ? Group1 : Group2;
unsigned Other = (Parent == Group1) ? Group2 : Group1;
GroupNodes.at(Other) = Parent;
return Parent;
}
unsigned AggressiveAntiDepState::LeaveGroup(unsigned Reg) {
unsigned idx = GroupNodes.size();
GroupNodes.push_BACk(idX);
GroupNodeInDices[Reg] = idx;
return idx;
}
bool AggressiveAntiDepState::IsLive(unsigned Reg) {
return((KillInDices[Reg] != ~0u) && (DefInDices[Reg] == ~0u));
}
AggressiveAntiDepState主要提供了操作并查集的接口, 其中GetGroup()返回给定寄存器所在的分组, UnionGroups()将两个分组合并, LeaveGroup()将给定寄存器提出分组. 需要注意的是LeaveGroup()在将寄存器踢出分组时并不会修改原有的分组, 而是新建一个分组, 这是因为当前节点可能被别的节点所引用(即在一条链路的中间).
再来看下另一个类AggressiveAntiDepBreaker, 其主要成员是一个指向AggressiveAntiDepState的指针(StatE)以及一个BitVector(CriticalPathSet), 其基类AntiDepBreaker封装了对外的接口.
class LLVM_LIBRARY_VISIBILITY AggressiveAntiDepBreaker
: public AntiDepBreaker {
MachineFunction &MF;
MachineRegisterInfo &MRI;
const TargetinstrInfo *TII;
const TargetRegisterInfo *TRI;
const RegisterClassInfo ®classInfo;
BitVector CriticalPathSet;
AggressiveAntiDepState *State = nullptr;
public:
void StartBlock(MachineBasicBlock *BB) override;
unsigned BreakAntiDependencies(const std::vector<SUnit> &SUnits,
MachineBasicBlock::iterator Begin,
MachineBasicBlock::iterator End,
unsigned InsertPosIndex,
DbgValueVector &dbgValues) override;
void Observe(Machineinstr &MI, unsigned Count,
unsigned InsertPosIndeX) override;
void FinishBlock() override;
};
class LLVM_LIBRARY_VISIBILITY AntiDepBreaker {
public:
virtual void StartBlock(MachineBasicBlock *BB) = 0;
virtual unsigned BreakAntiDependencies(const std::vector<SUnit> &SUnits,
MachineBasicBlock::iterator Begin,
MachineBasicBlock::iterator End,
unsigned InsertPosIndex,
DbgValueVector &dbgValues) = 0;
virtual void Observe(Machineinstr &MI, unsigned Count,
unsigned InsertPosIndeX) = 0;
virtual void FinishBlock() = 0;
};
AntiDepBreaker包含四个接口, 分别作为调度器代码的四个流程(starBlock/schedule/Observe/finishBlock)的hook, 下面逐一分析.
在每个block被调度前首先需要执行StartBlock()做准备工作, 其主要内容为初始化寄存器信息的初始状态.
void AggressiveAntiDepBreaker::StartBlock(MachineBasicBlock *BB) {
State = new AggressiveAntiDepState(TRI->getNumRegs(), BB);
bool IsReturnBlock = BB->isReturnBlock();
std::vector<unsigned> &KillInDices = State->GetKillInDices();
std::vector<unsigned> &defInDices = State->GetDefInDices();
for (MachineBasicBlock::succ_iterator SI = BB->succ_begin(),
SE = BB->succ_end(); SI != SE; ++SI)
for (const auto &LI : (*SI)->liveins()) {
for (MCRegAliasIterator AI(LI.PhysReg, TRI, truE); AI.isValid(); ++AI) {
unsigned Reg = *AI;
State->UnionGroups(Reg, 0);
KillInDices[Reg] = BB->size();
DefInDices[Reg] = ~0u;
}
}
const MachineFrameInfo &MFI = MF.getFrameInfo();
BitVector PrisTine = MFI.getPrisTineRegs(MF);
for (const MCPhysReg *I = MF.getRegInfo().getCalleeSavedRegs(); *I;
++I) {
unsigned Reg = *I;
if (!IsReturnBlock && !PrisTine.test(Reg))
conTinue;
for (MCRegAliasIterator AI(Reg, TRI, truE); AI.isValid(); ++AI) {
unsigned AliasReg = *AI;
State->UnionGroups(AliasReg, 0);
KillInDices[AliasReg] = BB->size();
DefInDices[AliasReg] = ~0u;
}
}
}
函数首先构造并初始化一个AggressiveAntiDepState, 由于调度过程是自底向上(bottom-up)的, 因此初始化时需要考虑本Block的live out. 通常情况下本Block的live out即本Block的后继的live in的集合, 但是在这里还需加上live out的callee saved寄存器. 这是因为记录KillInDices与DefInDices的目的是判断寄存器是否可用, 而live out的callee saved寄存器虽然没有生命周期(没有定义与使用), 但同时也不是可用的寄存器. 这类寄存器包含两块: 对return block而言即所有callee saved寄存器, 对于非return block而言则是未在prolog里保存的callee save的寄存器.
BreakAntiDependencies()是处理反依赖消除的核心代码, 源代码较多这里仅列出几个重要步骤.
unsigned AggressiveAntiDepBreaker::BreakAntiDependencies(
const std::vector<SUnit> &SUnits,
MachineBasicBlock::iterator Begin,
MachineBasicBlock::iterator End,
unsigned InsertPosIndex,
DbgValueVector &dbgValues) {
......
for (MachineBasicBlock::iterator I = End, E = Begin;
I != E; --Count) {
Machineinstr &MI = *--I;
std::set<unsigned> PassthruRegs;
GetPassthruRegs(MI, PassthruRegs);
Prescaninstruction(MI, Count, PassthruRegs);
std::vector<const SDep *> Edges;
const SUnit *PathSU = MISUnitMap[&MI];
AntiDepEdges(PathSU, Edges);
for (unsigned i = 0, e = Edges.size(); i != e; ++i) {
const SDep *Edge = Edges[i];
SUnit *NextSU = Edge->getSUnit();
if ((Edge->getKind() != SDep::Anti) &&
(Edge->getKind() != SDep::Output)) conTinue;
......
std::map<unsigned, unsigned> RenameR_645_11845@ap;
if (FindSuitableFreeRegisters(GroupIndex, RenameOrder, RenameR_645_11845@ap)) {
for (std::map<unsigned, unsigned>::iterator
S = RenameR_645_11845@ap.begin(), E = RenameR_645_11845@ap.end(); S != E; ++S) {
State->UnionGroups(NewReg, 0);
RegRefs.erase(NewReg);
DefInDices[NewReg] = DefInDices[CurrReg];
KillInDices[NewReg] = KillInDices[CurrReg];
State->UnionGroups(CurrReg, 0);
RegRefs.erase(CurrReg);
DefInDices[CurrReg] = KillInDices[CurrReg];
KillInDices[CurrReg] = ~0u;
}
}
}
Scaninstruction(MI, Count);
}
}
bool AggressiveAntiDepBreaker::FindSuitableFreeRegisters(
unsigned AntiDepGroupIndex,
RenameOrderType& RenameOrder,
std::map<unsigned, unsigned> &RenameR_645_11845@ap) {
std::vector<unsigned> Regs;
State->GetGroupRegs(AntiDepGroupIndex, Regs, &RegRefs);
for (unsigned i = 0, e = Regs.size(); i != e; ++i) {
unsigned Reg = Regs[i];
if ((SuperReg == 0) || TRI->isSuperRegister(SuperReg, Reg))
SuperReg = Reg;
if (RegRefs.count(Reg) > 0) {
BitVector &BV = RenameRegisterMap[Reg];
BV = GetRenameRegisters(Reg);
}
}
const TargetRegisterClass *SuperRC =
TRI->getMinimalPhysregclass(SuperReg, MVT::Other);
RenameOrder.insert(RenameOrderType::value_type(SuperRC, Order.size()));
unsigned OrigR = RenameOrder[SuperRC];
unsigned EndR = ((OrigR == Order.size()) ? 0 : OrigR);
unsigned R = OrigR;
do {
for (unsigned i = 0, e = Regs.size(); i != e; ++i) {
if (State->IsLive(NewReg) || (KillInDices[Reg] > DefInDices[NewReg])) {
goto next_super_reg;
} else {
bool found = false;
for (MCRegAliasIterator AI(NewReg, TRI, falsE); AI.isValid(); ++AI) {
unsigned AliasReg = *AI;
if (State->IsLive(AliasReg) ||
(KillInDices[Reg] > DefInDices[AliasReg])) {
LLVM_DEBUG(dbgs()
<< "(alias " << printReg(AliasReg, TRI) << " livE)");
found = true;
break;
}
}
if (found)
goto next_super_reg;
}
......
}
}
}
处理逻辑是从给定的迭代器开始逆序遍历指令, 对于每条指令定义的寄存器分析其定义的寄存器并更新寄存器状态(AggressiveAntiDepStatE). 然后在判断指令是否存在依赖, 判断打破依赖可能性. 如果确认需要打破依赖则调用FindSuitableFreeRegisters()获取可替换的寄存器列表, 尝试替换寄存器. 最后(对于重命名的情况)调用Scaninstruction()更新寄存器状态. 其中:
比较简单不再赘述.
相比于preRA, (LLVM中)postRA的数据流分析优化较少, 一方面原因是postRA不再维护SSA状态另一方面考虑到复杂的寄存器别名分析. AntiDepBreaker给我们很好的展示了在postRA做变换时需要考虑的若干关键点. 另一方面从代码中也可以看到其分析较为保守(候选寄存器的限定条件), 而其分析处理的情况较多, 因此有时优化结果不理想也是可以预见的.
以上是大佬教程为你收集整理的LLVM笔记(20) - AntiDepBreaker全部内容,希望文章能够帮你解决LLVM笔记(20) - AntiDepBreaker所遇到的程序开发问题。
如果觉得大佬教程网站内容还不错,欢迎将大佬教程推荐给程序员好友。
本图文内容来源于网友网络收集整理提供,作为学习参考使用,版权属于原作者。
如您有任何意见或建议可联系处理。小编QQ:384754419,请注明来意。