1. Damien Berriaud (ETH Zurich)
Title: To Spend or to Gain: Online Learning in Repeated Karma Auctions
Recent years have seen a surge of artificial currency-based mechanisms in contexts where monetary instruments are deemed unfair or inappropriate, e.g., in allocating food donations to food banks, course seats to students, and, more recently, even for traffic congestion management. Yet the applicability of these mechanisms remains limited in repeated auction settings, as it is challenging for users to learn how to bid an artificial currency that has no value outside the auctions. Indeed, users must jointly learn the value of the currency in addition to how to spend it optimally. Moreover, in the prominent class of karma mechanisms, in which artificial karma payments are redistributed to users at each time step, users do not only spend karma to obtain public resources but also gain karma for yielding them. For this novel class of karma auctions, we propose an adaptive karma pacing strategy that learns to bid optimally, and show that this strategy a) is asymptotically optimal for a single user bidding against competing bids drawn from a stationary distribution; b) leads to convergent learning dynamics when all users adopt it; and c) constitutes an approximate Nash equilibrium as the number of users grows. Our results require a novel analysis in comparison to adaptive pacing strategies in monetary auctions, since we depart from the classical assumption that the currency has known value outside the auctions, and consider that the currency is both spent and gained through the redistribution of payments.
2. Yurong Chen (Inria Paris)
Title: Learning a Stackelberg Leader's Incentives from Optimal Commitments (with Xiaotie Deng, Jiarui Gan, and Yuhao Li)
Stackelberg equilibria, as functions of the players' payoffs, can inversely reveal information about the players' incentives. In this paper, we study to what extent one can learn about the leader's incentives by actively querying the leader's optimal commitments against strategically designed followers. We show that, by using polynomially many queries and operations, one can learn a payoff function that is strategically equivalent to the leader's, in the sense that: 1) it preserves the leader's preference over almost all strategy profiles; and 2) it preserves the set of all possible (strong) Stackelberg equilibria the leader may engage in, considering all possible follower types. As an application, we show that the information acquired by our algorithm is sufficient for a follower to induce the best possible Stackelberg equilibrium by imitating a different follower type. To the best of our knowledge, we are the first to demonstrate that this is possible without knowing the leader's payoffs beforehand.
3. Julian Chitiva (HEC Paris)
Title: Platform-Induced Coordination (with Penélope Hernández)
Social media has emerged as a crucial source of information, shaping public discourse and influencing decision-making. This paper focuses on how platform algorithms affect user behavior. We propose a sequential game between a Platform and privately-informed users. The users rely on private and platform-curated shared information to form their expectations about the preference of the others before playing a coordination game. We provide the characterization of the equilibrium strategies for users. These equilibrium strategies follow a monotonic behavior with respect to the platform-curated information. Finally, we provide a characterization of the Platform strategy. In particular, when the Platform wants to bias the users towards extreme behaviors, every user receives the same information which fully separate the users on their type. Moreover, when the Platform objective is more mild, we provide a characterization of the networks that are provide the same ex-ante utility to the Platform.
4. Ivan Conjeaud (Paris School of Economics)
Title: Recommender systems and efficient social learning
In this paper, I study a model of an online platform with a catalog of items whose quality are unknown. Short-lived users arrive in sequence and browse the catalog, paying a small cost each time they want to examine a new alternative. The platform observes their behavior and uses it to deduce information about the items’ quality so as to enhance future user’s experience. I show that the platform is able to distinguish high-quality items if and only if a condition linking the cost of browsing and the thickness of the tail of the distribution from which the quality of the items is drawn holds true. This condition relates to the ability of the platform to incentivize users to explore new items using the information gathered with the previous users. I pin down an easily implementable policy that guarantees efficient learning.
5. Julius Durmann (Technical University of Munich)
Title: Online Optimization Alogorithms in Repeated Price Competition: Equilibrium Learning and Algorithmic Collusion (with Martin Bichler and Matthias Oberlechner)
We study whether uncoupled online learning algorithms in pricing competition converge to the Nash equilibrium or facilitate algorithmic collusion. Focusing on mean-based bandit algorithms (e.g., Exp3), we prove convergence to correlated rationalizable strategies, which align with the Nash equilibrium in many versions of Bertrand oligopolies. In additional simulations with bandit-feedback algorithms, we show that algorithmic collusion mainly arises with symmetric implementations of UCB or Q-learning, while other bandit algorithms typically lead to competitive outcomes. Collusion decreases with asymmetric algorithms and as market competition increases. These findings suggest that algorithmic collusion might be less of a concern unless sellers coordinate on specific collusive algorithms
6. Miles Elvidge (Lancaster University)
Title: Simultaneously Perturbed Optimistic Gradient Methods for Payoff-Based Learning in Games
We examine the long-run behavior of learning in a repeated game where the agents operate in a low-information environment, only observing their own realized payoffs at each stage. We study this problem in the context of monotone games with unconstrained action spaces, where standard optimistic gradient schemes might lead to cycles of play, even with perfect gradient information. To account for the fact that only a \emph{single} payoff observation can be made at each iteration—and no gradient information is directly observable—we design and deploy a simultaneous perturbation gradient estimation method adapted to the challenges to the problem at hand, namely unbounded action spaces, gradients and rewards. In contrast to single-timescale approaches, we find that a two-timescale approach is much more effective at controlling the (unbounded) noise introduced by payoff-based gradient estimators in this setting. Owing to the introduction of a second timescale, we show that the proposed simultaneously perturbed optimistic (SPOG) algorithm converges to equilibrium with probability 1. In addition, by developing a new method to assess the rate of convergence of two-timescales stochastic approximation procedures, we show the sequence of play induced by SPOG converges at a rate of $\tilde{\mathcal{O}}(n^{-2/3})$ in strongly monotone games. To the the best of our knowledge, this is the first convergence rate result for games with unbounded action spaces.
7. Gurubharan Ganeson (University of Oxford)
Title: Searching for Expertise
People trying to acquire information often lack information technologies of their own and must therefore search for experts, who can only convey cheap talk messages. To examine this phenomenon, I consider a model of costly search in which an uninformed receiver sequentially consults randomly drawn cheap-talking experts. Crucially, the pool of experts is heterogeneous, and the receiver cannot observe the sender's motives. The dynamic nature of search creates a dynamic inconsistency problem for the receiver, so the receiver searches weakly too much. Search makes it more likely that the receiver encounters a sender who is more aligned with him. But if no sender is very closely aligned to the receiver and senders' preferences are opposed, the receiver can benefit from being less likely to see a more aligned sender, as opposing senders can impose discipline on each other's communication. When search costs are low, the receiver succumbs to the temptation of excess search, which erodes the discipline that senders impose on each other. Applying this model to social media regulation generates a key insight: maximally informative social media algorithms must discriminate across users, by raising search frictions for users more aligned with trolls.
8. Giordano Giambartolomei (King's College London)
Title: IID Prophet Inequality with Random Horizon: Going Beyond Increasing Hazard Rates
Prophet inequalities are a central object of study in optimal stopping theory. In the iid model, a gambler sees values in an online fashion, sampled independently from a given distribution. Upon observing each value, the gambler either accepts it as a reward or irrevocably rejects it and proceeds to observe the next value. The goal of the gambler, who cannot see the future, is maximising the expected value of the reward while competing against the expectation of a prophet (the offline maximum). In other words, one seeks to maximise the gambler-to-prophet ratio of the expectations. This model has been studied with infinite, finite and unknown number of values. When the gambler faces a random number of values, the model is said to have random horizon. We consider the model in which the gambler is given \textit{a priori} knowledge of the horizon's distribution. Alijani et al. (2020) designed a single-threshold algorithm achieving a ratio of ½ when the random horizon has an increasing hazard rate and is independent of the values. We prove that with a single threshold, a ratio of ½ is actually achievable for several larger classes of horizon distributions, with the largest being known as the G class in reliability theory. Moreover, we show that this does not extend to its dual class (which includes the decreasing hazard rate class), while it can be extended to low-variance horizons. Finally, we construct the first example of a family of horizons, for which multiple thresholds are necessary to achieve a nonzero ratio. We establish that the Secretary Problem optimal stopping rule provides one such algorithm, paving the way towards the study of the model beyond single-threshold algorithms.
9. Mateus Hiro Nagata (HEC Paris)
Title: On the Comparative Performance of Machine Learning and Economic Models for Risky Decision Makings
This paper develops a nonparametric neural network that estimates these functions from datasets not explicitly designed for preference elicitation, while maintaining economic interpretability. Comparing traditional economic models against flexible machine-learning alternatives, I find that even flexible ML approaches cannot significantly improve upon established economic models in predicting aggregate risk behavior, suggesting traditional frameworks capture the essential features of risk preferences. However, individual embeddings improve prediction, indicating that comparing choices to similarly behaving individuals enhances understanding of risk-taking behavior.
10. Sebastian Homrighausen (Aarhus University)
Title: Markov Chain Approximation Algorithms for Random Serial Dictatorship
We consider the social choice problem where given the preferences of 𝑛 agents over 𝑚 alternatives, we have to determine the relative shares of each alternative. Random Serial Dictatorship (RSD) is the only known rule that satisfies anonymity, ex-post efficiency, and strategyproofness. The rule is typically specified as a probabilistic rule but is also used in deterministic strategy-proof settings for deciding on budgeting or time-sharing problems. This motivates the need to compute the probability that RSD selects a given alternative, i.e. its RSD share, which is the focus of the talk (and poster). Computing RSD shares has been shown to be #P-complete, but the problem of whether RSD admits a fully polynomial-time approximation scheme (FPRAS) has remained open for over a decade. We first highlight that some natural approaches to achieving an FPRAS do not work. We then present an algorithmic framework to compute RSD shares via a reduction to Markov Chain Monte Carlo sampling. We show that a natural Markov Chain we present for computing the RSD share is not rapidly mixing in general, and not even for trichotomous (i.e. three-tiered) preferences. However, we prove that it does provide the first known FPRAS for the important case of dichotomous preferences.
11. Stanisław Kaźmierowski (University of Warsaw)
Title: Equilibria of the Colonel Blotto games with Costs
We study a generalized variant of the Colonel Blotto game, referred to as the Colonel Blotto game with costs. Unlike the classic Colonel Blotto game, which imposes the use-it-or-lose-it budget assumption, the Colonel Blotto game with costs captures the strategic importance of costs related both to obtaining resources and assigning them across battlefields. We show that every instance of the Colonel Blotto game with costs is strategically equivalent to an instance of the zero-sum Colonel Blotto game with one additional battlefield. This enables the computation of Nash equilibria of the Colonel Blotto game with costs in polynomial time with respect to the game parameters: the number of battlefields and the number of resources available to the players.
12. Davide Legacci (Université Grenoble Alpes)
Title: A Geometric Decomposition of Continuous Games: Convergence vs. Recurrence under Dual Averaging
Characterising the long-term behaviour of learning dynamics is a fundamental open question in game theory. The absence of a universal mechanism guaranteeing convergence to equilibrium in all games—giving rise to cyclic or chaotic regimes—highlights the importance of understanding which games are learnable, and under which dynamical processes. On the learnable end of the spectrum, potential games exhibit a natural alignment of interests among players, ensuring favourable convergence properties across various settings. In the setting of finite games, harmonic games—where learning dynamics exhibit cyclic behaviour—emerge as the natural complement to potential games. However, in the context of continuous games, little is known about games where convergence fails. This leads to our central question: Which continuous games, as counterparts to potential games, are unlearnable? To answer this question, we introduce via a geometric procedure the class of incompressible games, and establish the following results: (a) in incompressible games, dual averaging returns arbitrarily close to its starting point infinitely often; and (b) any continuous game can be decomposed into a potential game and an incompressible one. These results establish incompressible games as a fundamental class of continuous games where learning fails to converge, offering new insights into the limitations of learning in strategic interactions.
13. David Lurie (Paris-Dauphine University - PSL)
Title: Ergodic POMDPs with Long-Run Average Objectives (with Krishnendu Chatterjee and Raimundo Saona)
We study partially observable Markov decision processes (POMDPs) with long-run average objectives. This paper addresses two central computational questions: whether an algorithm exists to approximate the value and whether one exists to compute it exactly. Previous results have established that, in general, no algorithm exists to compute, or even approximate, the value of POMDPs. Therefore, we introduce a new subclass of POMDPs, termed ergodic POMDPs. Within this class, we prove the existence of an algorithm that approximates the value. Conversely, we establish that no algorithm can solve the exact problem. We further identify structural conditions on the transition and signal matrices of a POMDP sufficient for ergodicity. These results draw a "tight" computational boundary between the approximation and the exact problems in ergodic POMDPs, showing that they cannot be reduced to a Markov decision process and underscoring the necessity of approximation algorithms.
14. Sayan Mukherjee (Indian Statistical Institute, Kolkata)
Title: Equilibrium Selection Via Stochastic Evolution in Continuum Potential Games
We consider stochastic evolution in large population potential games with a continuous strategy set. Examples include aggregative potential games and doubly symmetric games. We approximate the continuous strategy game with a finite strategy game. We then define a stochastic process based on a noisy exponential strategy revision protocol in the finite strategy game. Existing results imply the existence of a stationary distribution of the process on the finite strategy game. We then characterize this stationary distribution as the number of strategies go to infinity and the noise level goes to zero. The resulting distribution puts all its mass on the Nash equilibrium that globally maximizes the potential function of the continuous strategy game. This provides us with an exact history–independent equilibrium selection results in our large population potential game with a continuum of strategies.
15. Pedro P. Perez-Velasco (ETH Zurich)
Title: Tolerance and Polarization
This paper studies how voter polarization reduces overall voter welfare through its effects on policymaking in electoral competition, when voters exhibit tolerance toward certain ideologies— perceiving ideologies as either goods or bads. I develop a model of Downsian electoral competition in a two-dimensional policy space, where each policy is defined by both its ideology and its scale of implementation. Voters tolerate some ideologies in the sense that they prefer higher scale for ideologies they perceive as goods, and lower scale for those seen as bads. In equilibrium, candidates adopt policies aligned with the ideology tolerated by the median voter, but the scale of implementation decreases either as polarization increases or as voters become more intolerant. Unlike prior work, my results shed light on how polarization leads to a welfare decline for all voters in the absence of additional mechanisms: Moderates are harmed by reduced policy scale, while extremists are harmed by the unchanging ideological content of the median voter’s preferred policy. The driving mechanism is that the median voter is no longer pivotal in determining policy scale—candidates also respond to the preferences of more extreme voters, breaking the classic Hotelling-Downs logic that centers policy solely around the median.
16. Juan Carlos Rodríguez Gómez (Universidad de Sevilla)
Title: New values for cooperative games: two approaches from the filtration theory
We present a new perspective on cooperative games through the use of filtration theory in simplicial complexes. Two structural notions of filtrations, bases and vercets, are introduced, capturing respectively a combinatorial view of the game and the timing of player incorporations into coalitions. Building on these notions, we define two solution concepts: the persistence value and the vercet value. These values offer alternative methods for distributing gains in a game by leveraging topological tools and systems of independent sets.
17. Aviman Satpathy (Paris School of Economics)
Title: Can Algorithmic Collusion be Smooth? Asynchronous Q-learning with Boltzmann Exploration in the Iterated Prisoner’s Dilemma (with Philippe Jehiel)
This paper investigates the convergence properties of two independent, stateless, asynchronous Q-learning algorithms interacting repeatedly in an infinite-horizon Prisoner’s Dilemma (PD) game under bandit feedback. In contrast to prior research employing discontinuous exploration policies such as the epsilon-greedy strategy (e.g., Banchio and Mantegazza, 2023), we adopt a smooth Boltzmann (softmax) choice rule. Utilizing techniques from stochastic approximation theory, we derive a deterministic continuous-time ordinary differential equation (ODE) that asymptotically approximates the underlying discrete-time stochastic learning dynamics. We establish that steady-states of this continuous-time system correspond precisely to Logit Quantal Response Equilibria (LQRE) within the space of mixed strategies. Moreover, we demonstrate that in the greedy (high-sensitivity) limit, a unique equilibrium emerges in an arbitrarily small neighborhood of the Nash Equilibrium characterized by mutual defection. We further prove that, for sufficiently large sensitivity parameters, this equilibrium exhibits local asymptotic stability. However, convergence rates are notably slow near neighborhoods of mutual defection and mutual cooperation as sensitivity increases. To address this issue, we suggest the adoption of renormalized Q-learning dynamics (Leslie and Collins, 2005)—where reward prediction errors are scaled inversely by action selection frequencies—to accelerate convergence. With this normalization, asynchronous Q-learning dynamics become asymptotically equivalent to smooth best-response dynamics studied in evolutionary game theory, which are known to converge almost surely to Nash equilibria in two-player binary action games.
18. Yijun Wan (Paris-Dauphine University - PSL)
Title: Time-Dependent Blackwell Approachability and Application to Absorbing Games
Blackwell's approachability is a general online learning framework where a Decision Maker obtains vector-valued outcomes, and aims at the convergence of the average outcome to a given target. Blackwell gave a sufficient condition, along with a strategy guaranteeing such a convergence. Blackwell's approachability has since been applied to numerous problems, in regret minimization and game theory. We extend this framework by allowing the outcome function and the inner product to be time-dependent. We establish a general guarantee for the natural extension to this framework of Blackwell's algorithm. In the case where the target set is an orthant, we present a family of time-dependent inner products which yields different convergence speeds for each coordinate of the average outcome. We apply this framework to absorbing games (an important class of stochastic games) for which we construct $\varepsilon$-uniformly optimal strategies using Blackwell's algorithm in a well-chosen auxiliary approachability problem, thereby giving a novel illustration of the relevance of online learning tools for solving games.
19. Richard Willis (King's College London)
Title: Aligning incentives via reward transfers
Social dilemmas arise when individual incentives conflict with collective welfare. To address this tension, we introduce a mechanism where agents can transfer portions of their rewards to one another. We quantify the alignment between individual and group interests by measuring the minimum transfers required to resolve a dilemma.
20. Zixin (Stephen) Xu (Darden Business School, University of Virginia)
Title: Optimal Index Policies for Sequential Search: A Unified Framework Under Bayesian Learning
We develop a unified framework for optimal sequential search under Bayesian learning when the underlying distribution is unknown and must be inferred over time. In many practical settings—ranging from consumer search and technology adoption to algorithmic decision-making—agents face uncertainty not only over outcomes but also over the data-generating process itself. To address this, we introduce a novel family of distributions that combines properties of both the exponential family and the location-scale family. This new distribution class enables dynamic learning with closed-form Bayesian updates while preserving tractability across a wide range of uncertainty structures. Our main contribution is the derivation of a computationally efficient, pre-computable index policy that determines optimal stopping decisions across subfamilies with unknown location, scale, or both. The framework accommodates both full-recall and no-recall settings and incorporates exponential discounting as well as CARA and CRRA utility preferences. A key innovation is the standardization of state variables, which reduces dimensionality and uncovers structural properties such as multiple decision thresholds and adaptive continuation values. Beyond its implications for operations and behavioral decision-making, this work also offers foundational tools for computer science applications, including non-stationary bandits, Bayesian sampling, and online learning under evolving uncertainty.
21. Jiechen Zhang (EPFL)
Title: Prophet Inequality with Moment Knowledge
We study prophet inequalities where the gambler only knows a limited set of moments (up to the $k$-th) for each random variable, rather than their full distributions. Our work seeks to establish near-optimal guarantees on the expected maximum value, $\mathbb{E}[\max_i X_i]$, achievable under such partial distributional information. Key areas of our investigation include the fundamental case where only the mean of each distribution is known ($k=1$), for which we establish performance bounds. We also provide sharper analyses for important distribution families, such as those with a Monotone Hazard Rate (MHR). Additionally, our research explores general methods for constructing varied distributions that conform to a given set of $k$ moments. This leads to new insights into how partial moment knowledge constrains (but does not eliminate) adversarial constructions, and informs the design of algorithms, such as ``exponential bucketing,'' with provable guarantees.