312 words

2 minutes

-- views

Thesis Blog 2: A Controlled Study of Risk Aversion in Multi-Agent Reinforcement Learning for Trading

2026-05-24

Blog

AI

/

HFT

/

RL

/

Quant

My thesis studies how risk aversion affects the behaviour of learning agents in a financial market simulation. The setting is a limit order book, where orders to buy and sell are submitted, cancelled, and matched continuously. I focus on two common trading tasks: market making and order execution.

A market-making agent provides liquidity by posting bid and ask quotes. Its main risk is accumulating a large inventory. An order-execution agent tries to buy or sell a target quantity over a fixed time horizon. Its main risk is leaving too much of the order unfinished. These two agents interact in the same simulated market, so the behaviour of one agent can affect the performance of the other.

The thesis uses JaxMARL-HFT, a GPU-accelerated multi-agent reinforcement learning environment for high-frequency trading. In this setup, a market-making agent and an execution agent are trained concurrently with independent PPO. Risk aversion is added through quadratic penalties: a squared-inventory penalty for the market maker and a squared remaining-quantity penalty for the execution agent. The strength of these penalties is controlled by two coefficients, ( \rho_{MM} ) and ( \rho_{EX} ).

The current progress is mainly on the research design. The research question, experimental setup, evaluation metrics, and thesis scope are now defined. The planned experiments will vary the risk-aversion coefficients and compare the resulting policies using metrics such as portfolio value, inventory exposure, slippage, unfinished quantity, trading activity, and cross-play performance between agents trained under different risk levels.

The goal is not to build a new reinforcement-learning algorithm, but to test a specific scientific question: whether simple quadratic risk penalties lead to more stable and risk-aware trading behaviour, or whether they mainly change activity levels, such as making the market maker trade less or making the execution agent trade more aggressively. This makes the project a controlled empirical study of how reward design affects learned behaviour in a multi-agent financial environment.

Thesis Blog 2: A Controlled Study of Risk Aversion in Multi-Agent Reinforcement Learning for Trading

https://fuwari.vercel.app/posts/thesis-2/