ยท Game Dev  ยท 1 min read

EmoGalaxy 2: Teaching NPCs to Feel with Unity ML-Agents

A deep-dive into my thesis project โ€” training autonomous game agents to exhibit emergent emotional behaviour using Unity ML-Agents and reinforcement learning.

A deep-dive into my thesis project โ€” training autonomous game agents to exhibit emergent emotional behaviour using Unity ML-Agents and reinforcement learning.

Overview

EmoGalaxy 2 is my university thesis project exploring how reinforcement learning can produce emergent, emotionally-rich NPC behaviour inside a real-time game environment.

The Problem

Traditional game AI uses finite state machines or behaviour trees that are hand-authored โ€” emotions are scripted, not felt. The goal was to see whether an agent trained purely on reward signals could develop behavioural patterns that look emotional to human observers.

Architecture

  • Unity ML-Agents Toolkit โ€” training environment and policy inference
  • PPO (Proximal Policy Optimization) โ€” the core RL algorithm
  • Observation space โ€” positional data, proximity to other agents, recent health delta, sound cues
  • Reward shaping โ€” sparse rewards for survival and cooperation; penalty for isolation

Key Findings

Training for ~50M steps produced agents that exhibited clustering under threat (fear-analogue), competitive aggression near resources (anger-analogue), and exploratory behaviour during calm periods (curiosity-analogue). None of these were explicitly programmed.

What Iโ€™d Do Differently

Curriculum learning from the start would have cut training time by roughly 40%. The initial reward landscape was too sparse, causing the agents to plateau early.


Full thesis write-up and training logs available on request.

Back to Blog
Evaluating LLM Outputs at Scale with Python

Evaluating LLM Outputs at Scale with Python

A practical walkthrough of the evaluation harness I built to benchmark LLM response quality, latency, and cost across multiple models โ€” using DeepEval, custom rubric scorers, and OpenLit for observability.