A Meta-Learning Approach for Preserving and Transferring Beneficial Behaviors in Asynchronous Multi-Agent Reinforcement Learning
| Title | A Meta-Learning Approach for Preserving and Transferring Beneficial Behaviors in Asynchronous Multi-Agent Reinforcement Learning |
|---|---|
| Summary | Develop a meta-learning system that preserves beneficial behaviors discovered by individual agents and adapts them for transfer across a population in asynchronous reinforcement learning. |
| Keywords | Meta-Learning, Asynchronous Reinforcement Learning, Behavior Preservation, Adaptive Aggregation, Multi-Agent LearningProperty "Keywords" has a restricted application area and cannot be used as annotation property by a user. |
| TimeFrame | |
| References | Mnih, V. et al. (2016). Asynchronous Methods for Deep Reinforcement Learning. ICML 2016.
Finn, C., Abbeel, P., & Levine, S. (2017). Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. ICML 2017. Gupta, A. et al. (2018). Meta-Reinforcement Learning of Structured Exploration Strategies. NeurIPS 2018. Qi, P. (2024). Model Aggregation Techniques in Federated Learning: A Comprehensive Survey. Future Generation Computer Systems, 139, 1-15 Wu, H. et al. (2024). Adaptive Multi-Agent Reinforcement Learning for Flexible Resource Management. Applied Energy, 374, 121-135 OpenAI Gym. https://www.gymlibrary.ml/ Ray RLLib. https://docs.ray.io/en/latest/rllib.html PyTorch. https://pytorch.org/ |
| Prerequisites | Deep learning course. Good programming knowledge in Python. Good knowledge of ML and preferably Meta-learning/RL and OpenAI gym. |
| Author | |
| Supervisor | Alexander Galozy |
| Level | Master |
| Status | Open |
Asynchronous reinforcement learning (AsyncRL) allows multiple agents to explore environments in parallel, but standard aggregation methods risk diluting rare yet beneficial behaviors discovered by individual actors. This project proposes a meta-learning approach in which a meta-model dynamically adapts how each actor’s updates influence the global policy based on their state, novelty, and potential usefulness to other agents. The meta-model aims to quickly preserve advantageous behaviors while evaluating their transferability, improving learning efficiency, stability, and knowledge propagation across the agent population. Experiments will be conducted in benchmark environments such as CartPole, LunarLander, and Pong, comparing the meta-learning approach to standard aggregation baselines. The project investigates how population-level adaptive weighting can balance exploration and exploitation, effectively generalizing classical single-agent meta-RL approaches like First-Explore to multi-agent asynchronous learning. The outcomes include insights into behavior preservation, transferability, and scalable multi-agent learning.
Research Questions:
How can a meta-model preserve beneficial behaviors discovered by individual agents in asynchronous RL?
How can the meta-model evaluate and transfer these behaviors to improve other agents’ learning?
Does meta-learning-based adaptive aggregation improve population-level learning efficiency, stability, and sample efficiency compared to standard methods?
Outcomes:
Functional meta-model for adaptive aggregation of actor updates, preservation of rare but beneficial behaviors in the global policy and transfer of useful behaviors across agents
Benchmark evaluation on standard RL environments (CartPole, LunarLander, Pong)
Metrics: return, sample efficiency, stability, behavior retention, transfer success
Open-source, reproducible codebase
Complete thesis documentation with methodology, experiments, and analysis