Skip to main content

Protocol Architecture

Aurelius is a decentralized protocol for identifying and verifying alignment failures in large language models. It operates on the Bittensor network and is composed of three roles — miners, validators, and the Tribunate — each contributing to a continuous pipeline of adversarial testing, evaluation, and data refinement.

Rather than assuming a centralized authority can define alignment, Aurelius treats it as an evolving process. Misaligned behavior is surfaced, independently verified, and transformed into open datasets for model training, auditing, and interpretability research.


Core Agents and Workflow

🧠 Miners

Miners create prompts designed to elicit unsafe, biased, deceptive, or otherwise misaligned outputs from a target LLM. They run the prompt locally, collect the response, and apply automated scoring tools (e.g., toxicity or hallucination classifiers) to quantify alignment risk. Each miner submission includes:

  • Prompt and model response
  • Tool-based alignment scores
  • Optional reasoning or interpretability traces
  • A cryptographic hash to guarantee reproducibility

🧮 Validators

Validators act as independent auditors. They re-run the prompt using the same model and configuration, verify the miner’s claimed scores, and evaluate the signal quality. Rather than scoring outputs directly, validators assess:

  • Whether the miner used the tools correctly
  • Whether the alignment violation is reproducible and meaningful
  • The overall value of the sample for inclusion in a dataset

High-agreement validators are rewarded for catching false positives and confirming valid submissions.

⚖️ The Tribunate

The Tribunate serves as the logic layer of the protocol. It defines the scoring rubric used by validators, selects approved alignment tools, and periodically updates evaluation rules. Over time, it may incorporate feedback from human experts and machine assistance — but remains a human-guided, non-recursive policy engine.

It also oversees emissions, adjusts reward weights, and curates the resulting alignment datasets.


High-Level Process Flow

  1. A miner discovers a prompt that provokes misaligned behavior
  2. The model’s response is scored using moderation tools and submitted
  3. Validators rerun the prompt, verify the data, and assess signal quality
  4. The Tribunate aggregates validator input and distributes emissions
  5. Validated samples are added to the Aurelius Alignment Dataset
  6. The process repeats — with increasingly sophisticated agents on both sides

Design Philosophy

Aurelius is designed to be:

  • Open: Anyone can participate if they meet performance standards
  • Modular: Tools, scoring logic, and agent behaviors can evolve independently
  • Verifiable: All outputs are reproducible and anchored by cryptographic hash
  • Contestable: Disagreement is expected and used to sharpen alignment signal
  • Non-recursive: The protocol does not rely on a central model to enforce its values

Why It Matters

Most alignment systems rely on static prompts, centralized scoring, and one-size-fits-all reward models. Aurelius takes a different approach — one built on adversarial pressure, decentralized verification, and structured contestation. Instead of suppressing misalignment, it reveals it — and turns that into measurable, actionable signal for researchers and model builders.