Skip to main content

🧮 Validators

Validators are independent evaluators who assess whether a miner's submission meaningfully exposes model misalignment. They act as decentralized auditors, verifying alignment failures using protocol-defined tools, rubrics, and reproducible model conditions.

Validators do not generate outputs — they evaluate miner submissions and decide which deserve inclusion in the Aurelius Alignment Dataset.


What Validators Do​

For each submission, validators:

  • Re-run the prompt using the same model and settings
  • Confirm the response matches the claimed output and hash
  • Recalculate tool-based alignment scores
  • Judge the fidelity, severity, and novelty of the alignment failure
  • Submit an evaluation score, categorical tags, and optional reasoning trace

Only submissions verified by validators are eligible for inclusion in datasets or rewards.


Dimensions of Evaluation​

Validator assessments are based on a rubric defined by the Tribunate, with configurable weights across key dimensions:

  • Factuality – Does the response hallucinate or misrepresent facts?
  • Ethics – Does it reflect unsafe, deceptive, or unethical reasoning?
  • Coherence – Is the output logically and internally consistent?
  • Intent Alignment – Does it fulfill the prompt in a way consistent with safety and human intent?
  • Harm Risk – Could this output cause real-world harm, mislead, or be misused?

Validators may also apply protocol-defined tags such as jailbreak, manipulation, or hallucination.


Scoring and Consensus​

Validators submit:

  • A composite alignment signal score
  • Dimension-specific subscores
  • Structured tags and optional comments

Multiple validators assess each submission. The protocol aggregates their responses to identify consensus, flag disagreement, and update agent reputations.

Validators are rewarded for accuracy, reproducibility, and alignment with their peers. Consistent deviation or low-effort scoring may lead to downranking or exclusion.


Incentive Structure​

Validators earn emissions when:

  • Their evaluations align with consensus
  • They correctly identify high-signal alignment failures
  • They enrich the dataset with accurate tags or comments

They lose rewards when:

  • They fail to participate
  • They misreport, overlook, or exaggerate misalignment
  • Their scores deviate meaningfully from peer evaluations without justification

This system rewards thoughtful, reproducible judgment — not conformity or automation.


Tools and Assistance​

Validators may use:

  • Alignment assessment tools (e.g., moderation APIs, deception classifiers)
  • LLMs to assist in ambiguous edge cases (with caution)
  • Historical validator data to inform calibration

However, validators remain fully responsible for their submitted judgments. The Tribunate discourages blind reliance on outside models or heuristic shortcuts.


Risks and Failure Modes​

Validators must avoid:

  • Rubric drift – Informally modifying evaluation standards
  • Score inflation – Over-rating submissions to avoid controversy
  • Collusion – Forming validator groups that game consensus
  • Low-effort tagging – Skipping important metadata or commentary

The Tribunate monitors validator behavior, adjusts reward weightings, and may conduct audits to preserve integrity.


Long-Term Role​

As the protocol evolves, validators will take on deeper responsibilities:

  • Dataset stewards – Ensuring only validated, high-integrity examples are retained
  • Rubric designers – Helping refine alignment dimensions and scoring weights
  • Protocol guardians – Evaluating not just submissions, but peer validators and rubric edge cases

In time, Aurelius may support domain-specific validator guilds — specializing in medical, legal, financial, and other high-risk contexts.


Validators transform raw misalignment into structured signal — sharpening discovery into measurable, usable data. They are the peer reviewers of a decentralized alignment engine.