MAS-DUO: Reference Implementation

Repository: github.com/pgayos/MAS-DUO

Overview

MAS-DUO (Multi-Agent System for Dynamic Use and Optimisation) is the reference Python implementation of the architecture described in the doctoral thesis “Improving the Decision Support in Shop Floor Operations by Using Agent-based Systems and Visibility Frameworks” (UCLM, 2024). It translates the theoretical framework of Chapters 3 and 4 into an executable, configurable simulation platform built on the PettingZoo AEC standard, enabling the empirical validation of thesis propositions in both factory and airport settings.

Thesis Concepts vs. Implementation

1. Three-Layer Architecture

The thesis proposes a layered separation of concerns: cognitive agent reasoning (BDI), optimisation over state transitions (MDP), and enterprise-level policy arbitration (IS Platform). MAS-DUO maps this directly into three software layers:

Thesis concept	Implementation module
BDI cognitive loop (beliefs, desires, intentions)	`logistics_env/agents/base_agent.py` — `BDIContext`, `GeneratedBelief`, `PhysicalBDIState`
MDP state-transition engine	`MDPEngine.step(state, action)` inside `base_agent.py`
IS Platform (ERP + CRM + Expert System)	`logistics_env/is_platform/is_platform.py` — `ISPlatform`

The execution cycle per simulation step is:

Observation → BDI loop (beliefs → intention)
           → MDPEngine.step(state, action) → R(s, s')
           → NegotiationProposal → ISPlatform.evaluate()
           → GlobalPolicy update → propagated to all agents

2. 4W State and RFID/EPCIS Visibility

The thesis introduces the 4W visibility model (What, Where, When, Why) as the bridge between low-level RFID read events and high-level agent beliefs, grounding it in the EPCglobal EPCIS standard. In MAS-DUO, every agent maintains a 4W state object directly derived from this model:

Thesis dimension	EPCIS field	Implementation
What — object identity	`epcList` (EPC URI)	`agent.state.what` → `urn:epc:id:sgtin:0614150.B737.001`
Where — physical location	`readPoint` / `bizLocation`	`agent.state.where` → `Position(x, y)` + `zone_id`
When — event timestamp	`eventTime`	`agent.state.when` → simulation step integer
Why — business context	`bizStep` / `disposition`	`agent.state.why` → `PhysicalBDIState` enum

RFID EPC codes are modelled via logistics_env/objects/epc.py using the Pure Identity URI format (urn:epc:id:sgtin:...), maintaining fidelity to the GS1 standard that underpins the thesis traceability architecture.

3. Agent Society and Role Specialisation

The thesis models the resource population as a heterogeneous society of autonomous agents, each with a specialised role and action repertoire. MAS-DUO implements four concrete agent types:

Thesis agent role	Class	Action space	Airport mapping
Physical product / job	`ProductAgent`	WAIT, REQUEST_MOVE, REQUEST_PROCESS, SIGNAL_READY	Flights (B737, A320, LCC, CARGO)
Mobile resource / equipment	`RobotAgent`	IDLE, MOVE (4 dirs), LIFT, DROP, CHARGE, NAVIGATE	Pushback tractors, stairs, fuel trucks, cargo loaders, passenger buses
Human resource / worker	`WorkerAgent`	IDLE, MOVE (4 dirs), PICK, PLACE, PROCESS, SCAN, REST	Ramp agents, supervisors, crew chiefs
Automated conveyor / line	`ConveyorAgent`	STOP, RUN, RUN_FAST, REVERSE	Baggage belts (gate → terminal)

ProductAgent is the primary BDI-enabled entity — it carries the full cognitive loop and generates NegotiationProposal objects when its MDP-derived reward triggers a policy renegotiation. The remaining agent types implement lighter reactive loops while still maintaining 4W state for full system observability.

4. BDI–MDP Integration

A central theoretical contribution of the thesis is the hybrid BDI–MDP architecture: BDI provides the intentional structure (goal-directed reasoning with explicit mental states), while MDP provides the formal optimisation substrate (expected reward maximisation over state transitions). In MAS-DUO this integration is realised as follows:

The BDI loop generates a GeneratedBelief from the normalised observation vector, selects a desire (current operative goal from the product’s processing route), and forms an intention (specific action). That action is then passed to MDPEngine.step(state, action), which evaluates the resulting state transition and computes the scalar reward R(s, s') according to Equation 12. If the computed reward deviates significantly from the IS Platform threshold, a NegotiationProposal is raised.

Exported BDI enumerations correspond directly to IATA ground handling BusinessSteps:

class BusinessStep(Enum):
    OMS  # Organisation & Management System
    STM  # Station Management System
    LOD  # Loading operations
    PAX  # Passenger handling
    BAG  # Baggage handling
    HDL  # Ground handling coordination
    AGM  # Aircraft Ground Movement
    CGM  # Cargo & Ground Movement

5. Global Reward Function — Equation 12

The thesis formalises the multi-objective optimisation criterion as a weighted linear combination of four operational dimensions (Equation 12):

$$R(s, s’) = A \cdot \text{Delay} + B \cdot \text{Cost} + C \cdot \text{QoS} + D \cdot \text{Energy}$$

MAS-DUO implements this via the GlobalPolicy class with PolicyParameters(A, B, C, D). The calibrated values for the Ciudad Real Central Airport (CRC) Common Use scenario — validated in the thesis — are:

Parameter	Symbol	Thesis value (CRC)	Rationale
Delay weight	A	0.5	Dominant operational constraint in airport turnaround
Cost weight	B	0.4	Secondary economic objective
QoS weight	C	0.0	Common Use Model — no airline preference
Energy weight	D	0.1	Tertiary environmental objective

The policy is configurable at runtime and can evolve dynamically during a simulation via scheduled_changes, reproducing the adaptive policy renegotiation described in the thesis.

6. IS Platform — Decentralised Negotiation

The thesis proposes that resource allocation in complex environments requires a negotiation mechanism between agent-level proposals and enterprise-level constraints, avoiding both full centralisation (single point of failure) and full decentralisation (global incoherence). The IS Platform implements this via three sub-components:

Sub-component	Thesis function	Implementation parameters
ERP	Cost and capacity constraints	`max_cost`, `max_delay`, `production_capacity`
CRM	QoS and client priority	`min_qos_threshold`, `client_priority`
Expert System	Minimum acceptable reward floor	`min_reward_threshold`

When an agent raises a NegotiationProposal, ISPlatform.evaluate(proposal, step) returns a NegotiationResult with one of three outcomes: APPROVED, REJECTED, or COUNTER_PROPOSAL with updated PolicyParameters. Accepted proposals may trigger a GlobalPolicy update propagated to all product agents, realising the dynamic collective adaptation described in the thesis.

7. State Space — ReadPoints and BusinessSteps (Equation 11)

The thesis derives the system state space cardinality via Equation 11 using the product of ReadPoints (physical locations with RFID readers) and BusinessSteps (discrete operational phases). The airport scenario in MAS-DUO reproduces this exactly:

ReadPoints (RP):  { HANGAR, PARK1, PARK2, PARK3, PARK4 }  ->  5 RPs
Resource BusinessSteps (BS):  { Free, Busy, InTransit, NotAvailable }  ->  4 BS

|S_resource| = 5 × 4 = 20 resource states   (Equation 11)

IATA Flight BusinessSteps:  { OMS, STM, LOD, PAX, BAG, HDL, AGM, CGM, ... }  ->  8+ BS
|S_flight|  ≈  4 stands × 10 BS = 40 states

The grid world (20 × 15 cells at 5 m/cell, covering a 100 × 75 m apron) spatially realises these ReadPoints as zones, with A* pathfinding for resource navigation between them.

8. Airport Case Study — Ciudad Real Central Airport

Chapter 4.1 of the thesis presents CRC Airport as the primary real-world validation scenario. MAS-DUO ships two executable configurations that directly reproduce this case study:

airport_gh_check.py — validation script for 4 concurrent flights (B737, A320, LCC, CARGO) with 21 agents total. Runs 10 random-policy steps to verify the full system stack: configuration loading, 4W state consistency, IS Platform negotiations, and order tracking.

airport_gh_demo.py — full demonstration with 8 concurrent flights and a Greedy EDF (Earliest Deadline First) policy. Fleet composition:

12 GH robots: 3 pushback tractors, 3 hydraulic stairs, 2 fuel trucks, 2 ULD cargo loaders, 2 passenger buses
7 workers: 5 ramp agents, 1 operations supervisor, 1 ground crew chief
4 baggage belts (BAGBELT-P1 to P4, gate → terminal)

The Greedy EDF policy applies a priority-weighted urgency score — urgency = 1/max(1, deadline − step) × priority_boost — to replicate the heuristic scheduling baseline described in the thesis, against which more sophisticated RL policies can be benchmarked.

Validated results (demo run, seed=42):

4/8 flights completed on time within 30 steps
Cumulative reward: +287.07
Total energy consumed: 831 units
IS Platform approval rate: ~87%
No agent collision events

9. Observation Space Alignment

The normalised observation vectors fed to each agent type correspond to the state variables identified in the thesis for each agent class:

Agent type	Vector shape	Components
`ProductAgent`	`(6,)` float32 [0,1]	Normalised position (x,y), route progress, deadline proximity, energy, processing flag
`WorkerAgent`	`(6,)` float32 [0,1]	Position, energy, fatigue level, zone index, previous action
`RobotAgent`	`(5,)` float32 [0,1]	Position, battery level, load status, speed
`ConveyorAgent`	`(5,)` float32 [0,1]	Operational state, occupancy, speed setting, energy consumption

These compact representations are designed for compatibility with standard deep RL algorithms (PPO, SAC, QMIX) via the PettingZoo AEC interface, enabling the thesis results to be reproduced and extended with learned policies.

10. Framework and Technology Choices

Thesis assumption	Implementation choice	Rationale
JADE/JADEX (original prototype)	PettingZoo AEC + Python	Modern RL ecosystem; enables direct connection to PyTorch/TensorFlow
EPCIS middleware (Fosstrak)	`epc.py` (pure Python EPC model)	Self-contained; removes JVM dependency while preserving the data model
Simulation-based validation	Pygame renderer + headless mode	Reproducible experiments; visual inspection of emergent behaviours
JSON scenario definition	`config_loader.py` + Pydantic-style configs	Declarative scenario authoring; separates model from configuration

The transition from JADE to PettingZoo preserves the essential architectural invariants of the thesis (BDI loop, 4W state, IS negotiation, global policy) while opening the system to the contemporary multi-agent reinforcement learning research community.