Montaser Mohammedalamen

PhD student, University of Alberta

I am a PhD student at the Computing Sciences Department, University of Alberta, Canada, advised by Prof. Michael Bowling. My research work focuses on how AI systems learn if feedback is not always observable. How do design agents learn to avoid causing harm to humans or other agents or being destructive to the environment or themselves without human supervision or prior knowledge?

Before starting PhD, I was an AI engineer at SonyAI for three years, working on designing a multi-agent high-dynamic environment in a physics simulator, training robust agents cooperatively and competitively with self-play and goal-conditioned RL, transferring learned policies to the real world, and integrating them with vision systems and robot control methods.

Work Experience:

AI Engineer, SonyAI
Designing a Multi-agent high dynamic environment in a physics simulator, Training robust agents cooperatively and competitively with Self-play and Goal-conditioned RL, transfers learned policies to the world and integrates them with vision systems and robot control methods. Direct manager Peter Duerr.
Tokyo, Japan
June 2020 - October 2022 (remotely June 2020 - October 2020)
Research Associate, University of Alberta
Develop an algorithm that automatically behaves cautiously in novel circumstances using robust optimization. Supervised by Prof Michael Bowling.
Edmonton, Canada
February 2020 - December 2020
Machine learning Intern, Sony
Develop Reinforcement Learning algorithms for skill acquisition via imitation learning in cooking robots with visual feedback. Direct manager Michael Spranger.
Tokyo, Japan
July 2019 - Jan 2020

Publications:

Learning to Be Cautious Using Counterfactual Regret Minimization
Montaser Mohammedalamen, Dustin Morrill, Alexander Sieusahai, Yash Satsangi, and Michael Bowling
Under review, 2025
arXiv | code | bibtex | Talk
Generalization in Monitored Markov Decision Processes
Montaser Mohammedalamen, and Michael Bowling
Under review, 2025
arXiv | code | bibtex
Monitored Markov Decision Processes
Simone Parisi, Montaser Mohammedalamen, Alireza Kazemipour, Matthew E. Taylor, and Michael Bowling
International Conference on Autonomous Agents and Multiagent Systems (AAMAS) Auckland, New Zealand, 2024
arXiv | code | bibtex | Talk
Transfer Learning for Prosthetics Using Imitation Learning
D. Khamies,Montaser Mohammedalamen, and Benjamin Rosman
Black in AI workshop, NeurIPS, 2018
arXiv | code | bibtex

Education:

PhD student
Computing Science department, University of Albert, Canada
Janunary 2022 - September 2027(expected)
M.Sc. Machine Intelligence
African Master in Machine Intelligence (AMMI), African Institute for Mathematical Sciences (AIMS), Rwanda
September 2018 - September 2019
B.S.c, Honors Electronics & Computer Engineering
First Class Honors, University of Khartoum, Sudan (ranked 3rd in the Electronics department, CGPA 7.5/10)
August 2013 - September 2018

Scholarships & Awards:

African Master in Machine Intelligence (AMMI)
African Institute for Mathematical Sciences (AIMS), Rwanda.
4% acceptance rate Africa-wide
15,000 USD
August 2018 - September 2019

Projects:

Wheelchair robot Controlled by Brain Signal
Develope a wheelchair robot helps disabled people move using EEG signals from the brain.
GitHub repo
NeurIPS2018 Challenge: RL for Prosthetics
Build an RL Prosthetics controller, that learns to walk like humans, and uses Imitation Learning to accelerate learning.
GitHub repo
Monitored Markov Decision Processes
RL assumes that the reward is observable to the agent all the time, but is often not applicable in some real-world problems, Mon-MDPs present a new framework to tackle that by discussing the theoretical and practical consequences of this setting
GitHub repo
Learning to be Cautious
An algorithm that automatically behaves cautiously and safely in novel circumstances, without prior knowledge or human supervision
GitHub repo
Benchmark ChainerRL library in OpenAI Gym Environments
Benchmarking RL algorithms: Deterministic Policy Gradient (DDPG), Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization (PPO) algorithms.
GitHub repo