A Meta-Learning Approach for Preserving and Transferring Beneficial Behaviors in Asynchronous Multi-Agent Reinforcement Learning

Title	A Meta-Learning Approach for Preserving and Transferring Beneficial Behaviors in Asynchronous Multi-Agent Reinforcement Learning
Summary	Develop a meta-learning system that preserves beneficial behaviors discovered by individual agents and adapts them for transfer across a population in asynchronous reinforcement learning.
Keywords	Meta-Learning, Asynchronous Reinforcement Learning, Behavior Preservation, Adaptive Aggregation, Multi-Agent LearningProperty "Keywords" has a restricted application area and cannot be used as annotation property by a user.
TimeFrame
References	Mnih, V. et al. (2016). Asynchronous Methods for Deep Reinforcement Learning. ICML 2016. Finn, C., Abbeel, P., & Levine, S. (2017). Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. ICML 2017. Gupta, A. et al. (2018). Meta-Reinforcement Learning of Structured Exploration Strategies. NeurIPS 2018. Qi, P. (2024). Model Aggregation Techniques in Federated Learning: A Comprehensive Survey. Future Generation Computer Systems, 139, 1-15 Wu, H. et al. (2024). Adaptive Multi-Agent Reinforcement Learning for Flexible Resource Management. Applied Energy, 374, 121-135 OpenAI Gym. https://www.gymlibrary.ml/ Ray RLLib. https://docs.ray.io/en/latest/rllib.html PyTorch. https://pytorch.org/
Prerequisites	Deep learning course. Good programming knowledge in Python. Good knowledge of ML and preferably Meta-learning/RL and OpenAI gym.
Author
Supervisor	Alexander Galozy
Level	Master
Status	Open

Asynchronous reinforcement learning (AsyncRL) allows multiple agents to explore environments in parallel, but standard aggregation methods risk diluting rare yet beneficial behaviors discovered by individual actors. This project proposes a meta-learning approach in which a meta-model dynamically adapts how each actor’s updates influence the global policy based on their state, novelty, and potential usefulness to other agents. The meta-model aims to quickly preserve advantageous behaviors while evaluating their transferability, improving learning efficiency, stability, and knowledge propagation across the agent population. Experiments will be conducted in benchmark environments such as CartPole, LunarLander, and Pong, comparing the meta-learning approach to standard aggregation baselines. The project investigates how population-level adaptive weighting can balance exploration and exploitation, effectively generalizing classical single-agent meta-RL approaches like First-Explore to multi-agent asynchronous learning. The outcomes include insights into behavior preservation, transferability, and scalable multi-agent learning.

Research Questions:

How can a meta-model preserve beneficial behaviors discovered by individual agents in asynchronous RL?

How can the meta-model evaluate and transfer these behaviors to improve other agents’ learning?

Does meta-learning-based adaptive aggregation improve population-level learning efficiency, stability, and sample efficiency compared to standard methods?

Outcomes:

Functional meta-model for adaptive aggregation of actor updates, preservation of rare but beneficial behaviors in the global policy and transfer of useful behaviors across agents

Benchmark evaluation on standard RL environments (CartPole, LunarLander, Pong)

Metrics: return, sample efficiency, stability, behavior retention, transfer success

Open-source, reproducible codebase

Complete thesis documentation with methodology, experiments, and analysis

A Meta-Learning Approach for Preserving and Transferring Beneficial Behaviors in Asynchronous Multi-Agent Reinforcement Learning

Navigation menu

Page actions

Page actions

Personal tools

Research

Education

Partners

People

Contact

Links

Internal

Tools

Search