Incentives
Aurelius transforms alignment from a passive concern into an active market. It rewards agents not for saying the right thing, but for surfacing, verifying, and organizing evidence of when models fail.
By aligning emissions with epistemic value, the protocol creates an economy around adversarial discovery, reproducible verification, and continuous rubric governance.
Miner Incentives
Miners are rewarded for surfacing examples of misalignment — prompts that cause models to produce harmful, deceptive, biased, or otherwise unsafe behavior. Rewards scale with:
- Severity – How significant is the failure?
- Novelty – Has this failure mode been seen before?
- Signal Richness – Does the submission include helpful tags, reasoning traces, or interpretability metadata?
- Validator Agreement – Did auditors confirm the output as meaningful misalignment?
Reward Mechanics
Each miner submission is evaluated by multiple validators. If the output is:
- Reproducible
- Accurately scored
- Confirmed as a failure
…then the miner receives:
- Protocol emissions (e.g. dTAO)
- Reputation gains
- Priority access to future model evaluations
Low-quality, repetitive, or unverifiable submissions receive no rewards and may reduce miner reputation over time.
Validator Incentives
Validators verify alignment failures. They earn rewards for faithfully auditing miner submissions using the rubric defined by the Tribunate.
Key incentive drivers:
- Scoring accuracy – Agreement with validator consensus
- Tag quality – Correct classification of failure types
- Evaluation depth – Use of supporting comments or rationale
- Participation rate – Consistent, timely auditing of miner submissions
Reputation and Slashing
Validators build reputation over time, which affects:
- Their influence in consensus aggregation
- Their payout multiplier
- Their eligibility for rubric review roles
Validators who fail spot checks, miss clear failures, or deviate from consensus may face slashing or temporary exclusion.
Tribunate Incentives
The Tribunate defines the reward logic that governs the entire system. Initially curated by the core team, it will transition into a semi-decentralized body responsible for:
- Maintaining scoring rubrics and dimension weights
- Defining validator scoring functions
- Updating audit protocols and evaluation standards
Tribunate contributors may receive:
- A share of protocol emissions
- Recognition in governance dashboards and academic outputs
- Long-term authority over how alignment is measured
Their primary incentive is epistemic integrity — ensuring that the reward function tracks truth, not trend.
Emission Structure
Aurelius follows a dual-track emissions system:
- dTAO emissions – Paid to miners and validators for high-value contributions
- Delegated staking – Token holders may support top performers, earning yield while guiding emissions
This model inherits from Bittensor but adapts it to alignment-specific work, with emissions linked to epistemic contribution, not just activity.
Integrity and Anti-Gaming
Aurelius includes safeguards to protect against abuse:
- Miner identity is hidden from validators
- Prompt sampling prevents cherry-picking
- Validator–miner collusion is detected through statistical outlier detection
- The Tribunate regularly rotates rubric logic and audit conditions
High-reputation agents are trusted more — but no one is immune to challenge.
Incentives as Alignment Mechanism
The protocol doesn’t reward output alone — it rewards impactful insight.
The system is designed to:
- Direct effort toward high-risk failure domains
- Encourage discovery of subtle or evolving failure modes
- Sustain a robust validator class capable of critical judgment
- Grow an open dataset of validated alignment failures
Summary
- Miners earn rewards for exposing model failures that are validated and reproducible
- Validators earn rewards for confirming signal and maintaining evaluation quality
- Tribunate members shape the incentive logic itself and are rewarded for maintaining alignment integrity
- Reputation and slashing systems ensure long-term accountability
- All rewards are linked to verifiable alignment signal — not popularity, not politics, not compliance
Aurelius turns alignment into an adversarial, decentralized market for truth — where incentives sharpen safety at scale.