dallas@tulsa: ~/resume
dallas@tulsa:~/resume$ cat projects/dissertation.md
← ls projects/

Alignment Learning Models

Trajectory-level framework for detecting behavioral misalignment in AI agents.

What it is

Alignment Learning Models is the working title for Dallas’s PhD dissertation at the University of Tulsa, advised by John Hale. The core idea: instead of inspecting an agent’s language outputs to decide whether it’s misaligned, inspect its behavioral trajectory — the sequence of actions it takes, encoded under a five-element ontology (Agents, Assets, Aims, Actions, Ambits).

The framework extends Decision Transformers to train structured-trajectory models that can detect covert objectives: prompt injection, jailbreak-induced policy drift, and fine-tuned hidden goals. Three empirical case studies ground the approach — Anthropic misuse detection, OpenClaw agent security, and the StrongDM software factory pipeline. Each case independently arrived at a hybrid deterministic-tool + LLM-judge architecture; the dissertation formalizes the security properties of that combined approach.

Contributions include a per-ambit alignment gap metric (stated vs. revealed reward functions via inverse reinforcement learning), latent ambit discovery for inferring covert objective content from behavioral data, and empirical comparison against language-level baselines (LLM-as-judge, prompt scanning).

Status

Dissertation proposal in progress; expected defense December 2026. Committee: Tyler Moore, Brett McKinney, Roger Wainwright.