MAS-DUO: Reference Implementation


Repository: github.com/pgayos/MAS-DUO


Overview

MAS-DUO (Multi-Agent System for Dynamic Use and Optimisation) is the reference Python implementation of the architecture described in the doctoral thesis “Improving the Decision Support in Shop Floor Operations by Using Agent-based Systems and Visibility Frameworks” (UCLM, 2024). It translates the theoretical framework of Chapters 3 and 4 into an executable, configurable simulation platform built on the PettingZoo AEC standard, enabling the empirical validation of thesis propositions in both factory and airport settings.


Thesis Concepts vs. Implementation

1. Three-Layer Architecture

The thesis proposes a layered separation of concerns: cognitive agent reasoning (BDI), optimisation over state transitions (MDP), and enterprise-level policy arbitration (IS Platform). MAS-DUO maps this directly into three software layers:

Thesis conceptImplementation module
BDI cognitive loop (beliefs, desires, intentions)logistics_env/agents/base_agent.pyBDIContext, GeneratedBelief, PhysicalBDIState
MDP state-transition engineMDPEngine.step(state, action) inside base_agent.py
IS Platform (ERP + CRM + Expert System)logistics_env/is_platform/is_platform.pyISPlatform

The execution cycle per simulation step is:

Observation → BDI loop (beliefs → intention)
           → MDPEngine.step(state, action) → R(s, s')
           → NegotiationProposal → ISPlatform.evaluate()
           → GlobalPolicy update → propagated to all agents

2. 4W State and RFID/EPCIS Visibility

The thesis introduces the 4W visibility model (What, Where, When, Why) as the bridge between low-level RFID read events and high-level agent beliefs, grounding it in the EPCglobal EPCIS standard. In MAS-DUO, every agent maintains a 4W state object directly derived from this model:

Thesis dimensionEPCIS fieldImplementation
What — object identityepcList (EPC URI)agent.state.whaturn:epc:id:sgtin:0614150.B737.001
Where — physical locationreadPoint / bizLocationagent.state.wherePosition(x, y) + zone_id
When — event timestampeventTimeagent.state.when → simulation step integer
Why — business contextbizStep / dispositionagent.state.whyPhysicalBDIState enum

RFID EPC codes are modelled via logistics_env/objects/epc.py using the Pure Identity URI format (urn:epc:id:sgtin:...), maintaining fidelity to the GS1 standard that underpins the thesis traceability architecture.


3. Agent Society and Role Specialisation

The thesis models the resource population as a heterogeneous society of autonomous agents, each with a specialised role and action repertoire. MAS-DUO implements four concrete agent types:

Thesis agent roleClassAction spaceAirport mapping
Physical product / jobProductAgentWAIT, REQUEST_MOVE, REQUEST_PROCESS, SIGNAL_READYFlights (B737, A320, LCC, CARGO)
Mobile resource / equipmentRobotAgentIDLE, MOVE (4 dirs), LIFT, DROP, CHARGE, NAVIGATEPushback tractors, stairs, fuel trucks, cargo loaders, passenger buses
Human resource / workerWorkerAgentIDLE, MOVE (4 dirs), PICK, PLACE, PROCESS, SCAN, RESTRamp agents, supervisors, crew chiefs
Automated conveyor / lineConveyorAgentSTOP, RUN, RUN_FAST, REVERSEBaggage belts (gate → terminal)

ProductAgent is the primary BDI-enabled entity — it carries the full cognitive loop and generates NegotiationProposal objects when its MDP-derived reward triggers a policy renegotiation. The remaining agent types implement lighter reactive loops while still maintaining 4W state for full system observability.


4. BDI–MDP Integration

A central theoretical contribution of the thesis is the hybrid BDI–MDP architecture: BDI provides the intentional structure (goal-directed reasoning with explicit mental states), while MDP provides the formal optimisation substrate (expected reward maximisation over state transitions). In MAS-DUO this integration is realised as follows:

The BDI loop generates a GeneratedBelief from the normalised observation vector, selects a desire (current operative goal from the product’s processing route), and forms an intention (specific action). That action is then passed to MDPEngine.step(state, action), which evaluates the resulting state transition and computes the scalar reward R(s, s') according to Equation 12. If the computed reward deviates significantly from the IS Platform threshold, a NegotiationProposal is raised.

Exported BDI enumerations correspond directly to IATA ground handling BusinessSteps:

class BusinessStep(Enum):
    OMS  # Organisation & Management System
    STM  # Station Management System
    LOD  # Loading operations
    PAX  # Passenger handling
    BAG  # Baggage handling
    HDL  # Ground handling coordination
    AGM  # Aircraft Ground Movement
    CGM  # Cargo & Ground Movement

5. Global Reward Function — Equation 12

The thesis formalises the multi-objective optimisation criterion as a weighted linear combination of four operational dimensions (Equation 12):

$$R(s, s’) = A \cdot \text{Delay} + B \cdot \text{Cost} + C \cdot \text{QoS} + D \cdot \text{Energy}$$

MAS-DUO implements this via the GlobalPolicy class with PolicyParameters(A, B, C, D). The calibrated values for the Ciudad Real Central Airport (CRC) Common Use scenario — validated in the thesis — are:

ParameterSymbolThesis value (CRC)Rationale
Delay weightA0.5Dominant operational constraint in airport turnaround
Cost weightB0.4Secondary economic objective
QoS weightC0.0Common Use Model — no airline preference
Energy weightD0.1Tertiary environmental objective

The policy is configurable at runtime and can evolve dynamically during a simulation via scheduled_changes, reproducing the adaptive policy renegotiation described in the thesis.


6. IS Platform — Decentralised Negotiation

The thesis proposes that resource allocation in complex environments requires a negotiation mechanism between agent-level proposals and enterprise-level constraints, avoiding both full centralisation (single point of failure) and full decentralisation (global incoherence). The IS Platform implements this via three sub-components:

Sub-componentThesis functionImplementation parameters
ERPCost and capacity constraintsmax_cost, max_delay, production_capacity
CRMQoS and client prioritymin_qos_threshold, client_priority
Expert SystemMinimum acceptable reward floormin_reward_threshold

When an agent raises a NegotiationProposal, ISPlatform.evaluate(proposal, step) returns a NegotiationResult with one of three outcomes: APPROVED, REJECTED, or COUNTER_PROPOSAL with updated PolicyParameters. Accepted proposals may trigger a GlobalPolicy update propagated to all product agents, realising the dynamic collective adaptation described in the thesis.


7. State Space — ReadPoints and BusinessSteps (Equation 11)

The thesis derives the system state space cardinality via Equation 11 using the product of ReadPoints (physical locations with RFID readers) and BusinessSteps (discrete operational phases). The airport scenario in MAS-DUO reproduces this exactly:

ReadPoints (RP):  { HANGAR, PARK1, PARK2, PARK3, PARK4 }  ->  5 RPs
Resource BusinessSteps (BS):  { Free, Busy, InTransit, NotAvailable }  ->  4 BS

|S_resource| = 5 × 4 = 20 resource states   (Equation 11)

IATA Flight BusinessSteps:  { OMS, STM, LOD, PAX, BAG, HDL, AGM, CGM, ... }  ->  8+ BS
|S_flight|  ≈  4 stands × 10 BS = 40 states

The grid world (20 × 15 cells at 5 m/cell, covering a 100 × 75 m apron) spatially realises these ReadPoints as zones, with A* pathfinding for resource navigation between them.


8. Airport Case Study — Ciudad Real Central Airport

Chapter 4.1 of the thesis presents CRC Airport as the primary real-world validation scenario. MAS-DUO ships two executable configurations that directly reproduce this case study:

airport_gh_check.py — validation script for 4 concurrent flights (B737, A320, LCC, CARGO) with 21 agents total. Runs 10 random-policy steps to verify the full system stack: configuration loading, 4W state consistency, IS Platform negotiations, and order tracking.

airport_gh_demo.py — full demonstration with 8 concurrent flights and a Greedy EDF (Earliest Deadline First) policy. Fleet composition:

  • 12 GH robots: 3 pushback tractors, 3 hydraulic stairs, 2 fuel trucks, 2 ULD cargo loaders, 2 passenger buses
  • 7 workers: 5 ramp agents, 1 operations supervisor, 1 ground crew chief
  • 4 baggage belts (BAGBELT-P1 to P4, gate → terminal)

The Greedy EDF policy applies a priority-weighted urgency score — urgency = 1/max(1, deadline − step) × priority_boost — to replicate the heuristic scheduling baseline described in the thesis, against which more sophisticated RL policies can be benchmarked.

Validated results (demo run, seed=42):

  • 4/8 flights completed on time within 30 steps
  • Cumulative reward: +287.07
  • Total energy consumed: 831 units
  • IS Platform approval rate: ~87%
  • No agent collision events

9. Observation Space Alignment

The normalised observation vectors fed to each agent type correspond to the state variables identified in the thesis for each agent class:

Agent typeVector shapeComponents
ProductAgent(6,) float32 [0,1]Normalised position (x,y), route progress, deadline proximity, energy, processing flag
WorkerAgent(6,) float32 [0,1]Position, energy, fatigue level, zone index, previous action
RobotAgent(5,) float32 [0,1]Position, battery level, load status, speed
ConveyorAgent(5,) float32 [0,1]Operational state, occupancy, speed setting, energy consumption

These compact representations are designed for compatibility with standard deep RL algorithms (PPO, SAC, QMIX) via the PettingZoo AEC interface, enabling the thesis results to be reproduced and extended with learned policies.


10. Framework and Technology Choices

Thesis assumptionImplementation choiceRationale
JADE/JADEX (original prototype)PettingZoo AEC + PythonModern RL ecosystem; enables direct connection to PyTorch/TensorFlow
EPCIS middleware (Fosstrak)epc.py (pure Python EPC model)Self-contained; removes JVM dependency while preserving the data model
Simulation-based validationPygame renderer + headless modeReproducible experiments; visual inspection of emergent behaviours
JSON scenario definitionconfig_loader.py + Pydantic-style configsDeclarative scenario authoring; separates model from configuration

The transition from JADE to PettingZoo preserves the essential architectural invariants of the thesis (BDI loop, 4W state, IS negotiation, global policy) while opening the system to the contemporary multi-agent reinforcement learning research community.