ACTION SPACE REDUCTION IN REINFORCEMENT LEARNING USING THE MAXIMUM WIDTH OF THE OPERATION GRAPH FOR FLEXIBLE SCHEDULING PROBLEMS

Authors

Keywords:

artificial intelligence, reinforcement learning, flexible job shop scheduling, action space, neural combinatorial optimization, resource allocation

Abstract

The paper addresses approach of action space reduction in reinforcement learning for flexible scheduling. The relevance of the study is determined by the large number of operation-machine alternatives in scheduling environments, where only a limited subset of actions is feasible at each decision step. This creates redundant action evaluation and increases inference computational cost. The objective of the paper is to evaluate whether feasibility-aware action processing can improve the quality and inference time trade-off of reinforcement learning policies for scheduling. Three variants are compared: a three-branch MLP baseline (BR), a branch-aware scheduling policy with candidate action scoring (BASP), and BASP Sparse with feasibility-aware reduced action-space evaluation. The experiments are conducted on Brandimarte and Behnke benchmark instances. Schedule quality is measured by relative makespan reduction with respect to BR, while inference efficiency is evaluated by inference-time ratios. The results show that BASP improves makespan but increases inference time compared with BR. BASP Sparse preserves most of the quality gain while significantly improving inference efficiency. On Brandimarte groups, BASP Sparse achieves up to 5.95% makespan reduction and up to 2.21 speedup relative to BASP. On the larger Behnke group, it achieves 28.21% makespan reduction and 3.18 speedup relative to BASP. The results indicate that proposed action space reduction approach becomes especially beneficial for larger scheduling instances.

References

Dauzère-Pérès S. The flexible job shop scheduling problem: A review / S. Dauzère-Pérès, E. Gurevsky, J.-B. Lasserre, Y. Mati, C. Savey // European Journal of Operational Research. – 2024. – Т. 314, № 2. – С. 409–432. – DOI: https://doi.org/10.1016/j.ejor.2023.05.017.

Bello I. Neural Combinatorial Optimization with Reinforcement Learning / I. Bello, H. Pham, Q. V. Le, M. Norouzi, S. Bengio. – arXiv, 2016. – URL: https://arxiv.org/abs/1611.09940. – DOI: https://doi.org/10.48550/arXiv.1611.09940.

Echeverria I. Solving the flexible job-shop scheduling problem through an enhanced deep reinforcement learning approach / I. Echeverria, M. Murua, R. Santana. – arXiv, 2023. – URL: https://arxiv.org/abs/2310.15706. – DOI: https://doi.org/10.48550/arXiv.2310.15706.

Lassoued S. Policy-Based Reinforcement Learning with Action Masking for Dynamic Job Shop Scheduling under Uncertainty: Handling Random Arrivals and Machine Failures / S. Lassoued, S. Lier, A. Schwung. – arXiv, 2026. – URL: https://arxiv.org/abs/2601.09293. – DOI https://doi.org/10.48550/arXiv.2601.09293.

He Q. On the degree of parallelism for parallel real-time tasks / Q. He, N. Guan, Z. Jiang, M. Lv // Journal of Systems Architecture. – 2024. – Т. 156. – DOI: https://doi.org/10.1016/j.sysarc.2024.103286.

Brandimarte P. Routing and scheduling in a flexible job shop by tabu search / P. Brandimarte // Annals of Operations Research. – 1993. – Т. 41. – С. 157–183. – DOI: https://doi.org/10.1007/BF02023073.

Behnke D. Test instances for the flexible job shop scheduling problem with work centers / D. Behnke, M. J. Geiger. – Hamburg : Helmut-Schmidt-Universität, 2012. – 32 с. – URL: https://d-nb.info/1023241773/34.

Published

2026-05-08

Issue

Section

Plenary Section