Adaptive Knowledge Aggregation in Asynchronous Reinforcement Learning

From ISLAB/CAISR
Jump to navigationJump to search
Title Adaptive Knowledge Aggregation in Asynchronous Reinforcement Learning
Summary Develop an adaptive method to combine knowledge from multiple reinforcement learning agents more efficiently in asynchronous setups.
Keywords Asynchronous Reinforcement Learning, Knowledge Aggregation, Adaptive Aggregation, Sample Efficiency, RobustnessProperty "Keywords" has a restricted application area and cannot be used as annotation property by a user.
TimeFrame Fall 2025
References Mnih, V. et al., Asynchronous Methods for Deep Reinforcement Learning. (2016)

Shen, H. et al., Towards Understanding Asynchronous Advantage Actor–Critic: Convergence and Linear Speedup. (2020)

Ma, J. et al., FedStaleWeight: Buffered Asynchronous Federated Learning with Fair Aggregation via Staleness Reweighting. (2024)

Wu, Y. et al., Uncertainty Weighted Actor–Critic for Offline Reinforcement Learning. (2021)

Kumar, A. et al., Adaptive aggregation for RL in average reward MDPs. (2012)

Littlestone, N. & Warmuth, M., The Weighted Majority Algorithm. (1994)

Prerequisites Deep Reinforcement Learning course. Good knowledge of Machine Learning and preferably Reinforcement learning. Good programming skills - Required for the implementation of investigated methods.
Author
Supervisor Alexander Galozy
Level Master
Status Open


Reinforcement learning (RL) can achieve impressive results but often requires long training times and substantial computational resources. Asynchronous reinforcement learning (AsyncRL) improves efficiency by running many actors in parallel, yet current methods aggregate updates with simple strategies such as parameter averaging or replay sharing, which fail to consider the quality, staleness, or diversity of contributions. This project will investigate how to design better aggregation mechanisms by implementing and evaluating several baselines (parameter averaging, replay sharing, ensemble voting) and comparing them to a novel adaptive aggregator that weights actor updates according to their confidence and freshness. Using standard benchmarks such as CartPole, LunarLander, and Pong, the project aims to determine whether adaptive aggregation improves learning speed, robustness, and communication efficiency in AsyncRL

Research Questions:

1. How do existing aggregation strategies perform in asynchronous reinforcement learning?

2. Can an adaptive aggregation method that accounts for update quality and staleness improve performance over standard methods?

3. What are the trade-offs between efficiency, stability, and communication cost for different aggregation approaches?

Expected Outcomes

1. Working AsyncRL system with multiple aggregation strategies.

2. A novel adaptive aggregator that is systematically evaluated.

3. Thesis report with results, analysis, and discussion of trade-offs.

4. Open-source code for reproducibility.