Adaptive Knowledge Aggregation in Asynchronous Reinforcement Learning

Title	Adaptive Knowledge Aggregation in Asynchronous Reinforcement Learning
Summary	Develop an adaptive method to combine knowledge from multiple reinforcement learning agents more efficiently in asynchronous setups.
Keywords	Asynchronous Reinforcement Learning, Knowledge Aggregation, Adaptive Aggregation, Sample Efficiency, RobustnessProperty "Keywords" has a restricted application area and cannot be used as annotation property by a user.
TimeFrame	Fall 2025
References	Mnih, V. et al., Asynchronous Methods for Deep Reinforcement Learning. (2016) Shen, H. et al., Towards Understanding Asynchronous Advantage Actor–Critic: Convergence and Linear Speedup. (2020) Ma, J. et al., FedStaleWeight: Buffered Asynchronous Federated Learning with Fair Aggregation via Staleness Reweighting. (2024) Wu, Y. et al., Uncertainty Weighted Actor–Critic for Offline Reinforcement Learning. (2021) Kumar, A. et al., Adaptive aggregation for RL in average reward MDPs. (2012) Littlestone, N. & Warmuth, M., The Weighted Majority Algorithm. (1994)
Prerequisites	Deep Reinforcement Learning course. Good knowledge of Machine Learning and preferably Reinforcement learning. Good programming skills - Required for the implementation of investigated methods.
Author
Supervisor	Alexander Galozy
Level	Master
Status	Open

Reinforcement learning (RL) can achieve impressive results but often requires long training times and substantial computational resources. Asynchronous reinforcement learning (AsyncRL) improves efficiency by running many actors in parallel, yet current methods aggregate updates with simple strategies such as parameter averaging or replay sharing, which fail to consider the quality, staleness, or diversity of contributions. This project will investigate how to design better aggregation mechanisms by implementing and evaluating several baselines (parameter averaging, replay sharing, ensemble voting) and comparing them to a novel adaptive aggregator that weights actor updates according to their confidence and freshness. Using standard benchmarks such as CartPole, LunarLander, and Pong, the project aims to determine whether adaptive aggregation improves learning speed, robustness, and communication efficiency in AsyncRL

Research Questions:

1. How do existing aggregation strategies perform in asynchronous reinforcement learning?

2. Can an adaptive aggregation method that accounts for update quality and staleness improve performance over standard methods?

3. What are the trade-offs between efficiency, stability, and communication cost for different aggregation approaches?

Expected Outcomes

1. Working AsyncRL system with multiple aggregation strategies.

2. A novel adaptive aggregator that is systematically evaluated.

3. Thesis report with results, analysis, and discussion of trade-offs.

4. Open-source code for reproducibility.

Adaptive Knowledge Aggregation in Asynchronous Reinforcement Learning

Navigation menu

Page actions

Page actions

Personal tools

Research

Education

Partners

People

Contact

Links

Internal

Tools

Search