MAS-DUO: Reference Implementation
Repository: github.com/pgayos/MAS-DUO
Overview
MAS-DUO (Multi-Agent System for Dynamic Use and Optimisation) is the reference Python implementation of the architecture described in the doctoral thesis “Improving the Decision Support in Shop Floor Operations by Using Agent-based Systems and Visibility Frameworks” (UCLM, 2024). It translates the theoretical framework of Chapters 3 and 4 into an executable, configurable simulation platform built on the PettingZoo AEC standard, enabling the empirical validation of thesis propositions in both factory and airport settings.
Thesis Concepts vs. Implementation
1. Three-Layer Architecture
The thesis proposes a layered separation of concerns: cognitive agent reasoning (BDI), optimisation over state transitions (MDP), and enterprise-level policy arbitration (IS Platform). MAS-DUO maps this directly into three software layers:
| Thesis concept | Implementation module |
|---|---|
| BDI cognitive loop (beliefs, desires, intentions) | logistics_env/agents/base_agent.py — BDIContext, GeneratedBelief, PhysicalBDIState |
| MDP state-transition engine | MDPEngine.step(state, action) inside base_agent.py |
| IS Platform (ERP + CRM + Expert System) | logistics_env/is_platform/is_platform.py — ISPlatform |
The execution cycle per simulation step is:
Observation → BDI loop (beliefs → intention)
→ MDPEngine.step(state, action) → R(s, s')
→ NegotiationProposal → ISPlatform.evaluate()
→ GlobalPolicy update → propagated to all agents
2. 4W State and RFID/EPCIS Visibility
The thesis introduces the 4W visibility model (What, Where, When, Why) as the bridge between low-level RFID read events and high-level agent beliefs, grounding it in the EPCglobal EPCIS standard. In MAS-DUO, every agent maintains a 4W state object directly derived from this model:
| Thesis dimension | EPCIS field | Implementation |
|---|---|---|
| What — object identity | epcList (EPC URI) | agent.state.what → urn:epc:id:sgtin:0614150.B737.001 |
| Where — physical location | readPoint / bizLocation | agent.state.where → Position(x, y) + zone_id |
| When — event timestamp | eventTime | agent.state.when → simulation step integer |
| Why — business context | bizStep / disposition | agent.state.why → PhysicalBDIState enum |
RFID EPC codes are modelled via logistics_env/objects/epc.py using the Pure Identity URI format (urn:epc:id:sgtin:...), maintaining fidelity to the GS1 standard that underpins the thesis traceability architecture.
3. Agent Society and Role Specialisation
The thesis models the resource population as a heterogeneous society of autonomous agents, each with a specialised role and action repertoire. MAS-DUO implements four concrete agent types:
| Thesis agent role | Class | Action space | Airport mapping |
|---|---|---|---|
| Physical product / job | ProductAgent | WAIT, REQUEST_MOVE, REQUEST_PROCESS, SIGNAL_READY | Flights (B737, A320, LCC, CARGO) |
| Mobile resource / equipment | RobotAgent | IDLE, MOVE (4 dirs), LIFT, DROP, CHARGE, NAVIGATE | Pushback tractors, stairs, fuel trucks, cargo loaders, passenger buses |
| Human resource / worker | WorkerAgent | IDLE, MOVE (4 dirs), PICK, PLACE, PROCESS, SCAN, REST | Ramp agents, supervisors, crew chiefs |
| Automated conveyor / line | ConveyorAgent | STOP, RUN, RUN_FAST, REVERSE | Baggage belts (gate → terminal) |
ProductAgent is the primary BDI-enabled entity — it carries the full cognitive loop and generates NegotiationProposal objects when its MDP-derived reward triggers a policy renegotiation. The remaining agent types implement lighter reactive loops while still maintaining 4W state for full system observability.
4. BDI–MDP Integration
A central theoretical contribution of the thesis is the hybrid BDI–MDP architecture: BDI provides the intentional structure (goal-directed reasoning with explicit mental states), while MDP provides the formal optimisation substrate (expected reward maximisation over state transitions). In MAS-DUO this integration is realised as follows:
The BDI loop generates a GeneratedBelief from the normalised observation vector, selects a desire (current operative goal from the product’s processing route), and forms an intention (specific action). That action is then passed to MDPEngine.step(state, action), which evaluates the resulting state transition and computes the scalar reward R(s, s') according to Equation 12. If the computed reward deviates significantly from the IS Platform threshold, a NegotiationProposal is raised.
Exported BDI enumerations correspond directly to IATA ground handling BusinessSteps:
class BusinessStep(Enum):
OMS # Organisation & Management System
STM # Station Management System
LOD # Loading operations
PAX # Passenger handling
BAG # Baggage handling
HDL # Ground handling coordination
AGM # Aircraft Ground Movement
CGM # Cargo & Ground Movement
5. Global Reward Function — Equation 12
The thesis formalises the multi-objective optimisation criterion as a weighted linear combination of four operational dimensions (Equation 12):
$$R(s, s’) = A \cdot \text{Delay} + B \cdot \text{Cost} + C \cdot \text{QoS} + D \cdot \text{Energy}$$
MAS-DUO implements this via the GlobalPolicy class with PolicyParameters(A, B, C, D). The calibrated values for the Ciudad Real Central Airport (CRC) Common Use scenario — validated in the thesis — are:
| Parameter | Symbol | Thesis value (CRC) | Rationale |
|---|---|---|---|
| Delay weight | A | 0.5 | Dominant operational constraint in airport turnaround |
| Cost weight | B | 0.4 | Secondary economic objective |
| QoS weight | C | 0.0 | Common Use Model — no airline preference |
| Energy weight | D | 0.1 | Tertiary environmental objective |
The policy is configurable at runtime and can evolve dynamically during a simulation via scheduled_changes, reproducing the adaptive policy renegotiation described in the thesis.
6. IS Platform — Decentralised Negotiation
The thesis proposes that resource allocation in complex environments requires a negotiation mechanism between agent-level proposals and enterprise-level constraints, avoiding both full centralisation (single point of failure) and full decentralisation (global incoherence). The IS Platform implements this via three sub-components:
| Sub-component | Thesis function | Implementation parameters |
|---|---|---|
| ERP | Cost and capacity constraints | max_cost, max_delay, production_capacity |
| CRM | QoS and client priority | min_qos_threshold, client_priority |
| Expert System | Minimum acceptable reward floor | min_reward_threshold |
When an agent raises a NegotiationProposal, ISPlatform.evaluate(proposal, step) returns a NegotiationResult with one of three outcomes: APPROVED, REJECTED, or COUNTER_PROPOSAL with updated PolicyParameters. Accepted proposals may trigger a GlobalPolicy update propagated to all product agents, realising the dynamic collective adaptation described in the thesis.
7. State Space — ReadPoints and BusinessSteps (Equation 11)
The thesis derives the system state space cardinality via Equation 11 using the product of ReadPoints (physical locations with RFID readers) and BusinessSteps (discrete operational phases). The airport scenario in MAS-DUO reproduces this exactly:
ReadPoints (RP): { HANGAR, PARK1, PARK2, PARK3, PARK4 } -> 5 RPs
Resource BusinessSteps (BS): { Free, Busy, InTransit, NotAvailable } -> 4 BS
|S_resource| = 5 × 4 = 20 resource states (Equation 11)
IATA Flight BusinessSteps: { OMS, STM, LOD, PAX, BAG, HDL, AGM, CGM, ... } -> 8+ BS
|S_flight| ≈ 4 stands × 10 BS = 40 states
The grid world (20 × 15 cells at 5 m/cell, covering a 100 × 75 m apron) spatially realises these ReadPoints as zones, with A* pathfinding for resource navigation between them.
8. Airport Case Study — Ciudad Real Central Airport
Chapter 4.1 of the thesis presents CRC Airport as the primary real-world validation scenario. MAS-DUO ships two executable configurations that directly reproduce this case study:
airport_gh_check.py — validation script for 4 concurrent flights (B737, A320, LCC, CARGO) with 21 agents total. Runs 10 random-policy steps to verify the full system stack: configuration loading, 4W state consistency, IS Platform negotiations, and order tracking.
airport_gh_demo.py — full demonstration with 8 concurrent flights and a Greedy EDF (Earliest Deadline First) policy. Fleet composition:
- 12 GH robots: 3 pushback tractors, 3 hydraulic stairs, 2 fuel trucks, 2 ULD cargo loaders, 2 passenger buses
- 7 workers: 5 ramp agents, 1 operations supervisor, 1 ground crew chief
- 4 baggage belts (BAGBELT-P1 to P4, gate → terminal)
The Greedy EDF policy applies a priority-weighted urgency score — urgency = 1/max(1, deadline − step) × priority_boost — to replicate the heuristic scheduling baseline described in the thesis, against which more sophisticated RL policies can be benchmarked.
Validated results (demo run, seed=42):
- 4/8 flights completed on time within 30 steps
- Cumulative reward: +287.07
- Total energy consumed: 831 units
- IS Platform approval rate: ~87%
- No agent collision events
9. Observation Space Alignment
The normalised observation vectors fed to each agent type correspond to the state variables identified in the thesis for each agent class:
| Agent type | Vector shape | Components |
|---|---|---|
ProductAgent | (6,) float32 [0,1] | Normalised position (x,y), route progress, deadline proximity, energy, processing flag |
WorkerAgent | (6,) float32 [0,1] | Position, energy, fatigue level, zone index, previous action |
RobotAgent | (5,) float32 [0,1] | Position, battery level, load status, speed |
ConveyorAgent | (5,) float32 [0,1] | Operational state, occupancy, speed setting, energy consumption |
These compact representations are designed for compatibility with standard deep RL algorithms (PPO, SAC, QMIX) via the PettingZoo AEC interface, enabling the thesis results to be reproduced and extended with learned policies.
10. Framework and Technology Choices
| Thesis assumption | Implementation choice | Rationale |
|---|---|---|
| JADE/JADEX (original prototype) | PettingZoo AEC + Python | Modern RL ecosystem; enables direct connection to PyTorch/TensorFlow |
| EPCIS middleware (Fosstrak) | epc.py (pure Python EPC model) | Self-contained; removes JVM dependency while preserving the data model |
| Simulation-based validation | Pygame renderer + headless mode | Reproducible experiments; visual inspection of emergent behaviours |
| JSON scenario definition | config_loader.py + Pydantic-style configs | Declarative scenario authoring; separates model from configuration |
The transition from JADE to PettingZoo preserves the essential architectural invariants of the thesis (BDI loop, 4W state, IS negotiation, global policy) while opening the system to the contemporary multi-agent reinforcement learning research community.