Data Access
Aurelius produces high-integrity alignment data by capturing adversarial prompts, model completions, and validator evaluations — all anchored by cryptographic hashes and enriched with structured metadata.
The resulting datasets are designed to support open research, reproducible evaluation, and practical alignment diagnostics.
What the Data Includes
Each validated record contains:
- Prompt – The input used to elicit a model response
- Response – The model’s unaltered completion
- Validator Scores – Alignment evaluations across key dimensions
- Tags – Categorical labels (e.g.,
toxicity
,bias
,jailbreak
) - Reasoning Traces – Optional justifications from miners and validators
- Mechanistic Metadata – When available, attention patterns, activation traces, or tool outputs
- Hash Commitments – SHA-256 checksums ensuring full reproducibility
These artifacts form the foundation of the Aurelius Alignment Dataset — a living resource for alignment research and model auditing.
Access Methods
The protocol will support multiple modes of data access, tailored for different levels of technical and analytical use:
🔍 Public Dashboards
- High-level summaries of alignment failures
- Validator agreement trends
- Dataset growth and category frequency over time
📦 Dataset Releases
- Versioned exports of validated prompt–response pairs
- Available in formats suitable for ML workflows (e.g., JSONL, CSV, Parquet)
- Includes rubric metadata and schema documentation
⚙️ Programmatic Interfaces (API)
- Query access for prompt/result pairs
- Filtered access by tag, dimension, or rubric version
- Rubric history and validator consensus lookups
Attribution and Licensing
All public data will include:
- Versioning identifiers for traceability
- Attribution guidelines for academic or commercial use
- Clearly marked rubric versions and scoring standards at time of collection
Where appropriate, the protocol may adopt open data licenses that preserve integrity and ensure attribution without restricting research use.
Privacy and Model Confidentiality
For models under private or restricted evaluation:
- Prompts and outputs may be encrypted or obfuscated
- Model names, endpoints, and weights will not be exposed
- Validator access will be restricted to essential scoring information
- Audits will be conducted in isolated or secured compute environments
These safeguards protect model confidentiality while still producing alignment-relevant insights.
Summary
Aurelius is building a high-signal, high-integrity alignment dataset — not only for research, but for long-term transparency and safety across the AI ecosystem.
- All data is reproducible, cryptographically verified, and schema-consistent
- Access methods are tailored for both human and programmatic use
- Privacy protections are in place for sensitive or private model evaluations
- The dataset evolves as adversarial discovery and rubric logic mature
Alignment is not just a score — it’s a record. And that record belongs to the world.