Galit Ashkenazi-Golan (London School of Economics and Political Science)
Title : Simultaneous Best Response Dynamics in Random Games and Random Potential Games (with Domenico Mergoni, Ed Plumb and Jozef Skokan)
We explore simultaneous best response dynamics in large random games. For two-players random games, we obtain a full characterisation of the dynamics. For three-players or more we show that the dynamics does not converge to pure Nash equilibrium.
For two-players random potential games we show that the dynamics converges very quickly to a length-two cycle, composed of two action profiles that are mis-matched Nash equilibrium actions. For three players we have simulations suggesting it converges quickly to Nash equilibrium. For four players or more we prove that the dynamics does converge quickly to a Nash equilibrium. Simulations suggest that this convergence holds also for games that are ""close"" to potential, games where the payoffs of the players are highly correlated.
Martino Bernasconi (Bocconi University)
Title : No-Regret Learning in Bilateral Trade via Global Budget Balance
Bilateral trade models the problem of intermediating between two rational agents — a seller and a buyer — both characterized by a private valuation for an item they want to trade. We study the online learning version of the problem, in which at each time step a new seller and buyer arrive and the learner has to set prices for them without any knowledge about their (adversarially generated) valuations.
In this setting, known impossibility results rule out the existence of no-regret algorithms when budget balanced has to be enforced at each time step. In this paper, we introduce the notion of global budget balance, which only requires the learner to fulfill budget over the entire time horizon. Under this natural relaxation, we provide the first no-regret algorithms for adversarial bilateral trade under various feedback models.
Mario Bravo (Universidad Adolfo Ibáñez)
Title : Near-optimal sampling complexity for average-reward MDPs via anchoring
In this talk, we present a new model-free algorithm for computing $\varepsilon$-optimal policies in average-reward Markov decision processes (MDPs) under the weakly communicating assumption. The algorithm assumes access to a generative model, which allows sampling transitions from any state-action pair. Our method combines a recursive sampling procedure with Halpern's anchored iteration and achieves a sample and time complexity of $\widetilde{O}(|\mathcal{S}||\mathcal{A}| , ||h^*||^2_sp / \varepsilon^2)$, both in expectation and with high probability. To the best of our knowledge, this is the best-known complexity for model-free algorithms in this setting, up to a factor involving the span seminorm $||h^*||_sp$ of the unknown bias function. Importantly, the algorithm does not require prior knowledge thanks to a stopping criterion that ensures the algorithm terminates in finite time with probability one.
We study a network of agents that learn by interacting with the environment and with the other agents. In the talk, we show the extent to which communication allows to prove performance bounds that are strictly better than the known bounds for non-communicating agents. Our results are formulated within the online learning setting and hold either when agents learn a single common task or when each agent learns a different task.
I will present a sequential decision-making setting where, at every round t, the learner (a so-called market maker) posts a bid price B_t and an ask price A_t to an incoming trader (the so-called taker) with a private valuation for some asset. If the trader's valuation is lower than the bid price, or higher than the ask price, then a trade (sell or buy) occurs. Letting M_t be the market price (observed only at the end of round t), the maker's utility is M_t-B_t if the maker bought the asset, it is S_t-M_t if they sold it, and it is 0 if no trade occurred. I will characterize the maker's regret with respect to the best fixed choice of bid and ask pairs under a variety of assumptions (adversarial, i.i.d., and their variants) on the sequence of market prices and valuations. The upper bound analysis will unveil an intriguing connection relating market making to first-price auctions and dynamic pricing. The lower bound analysis will showcase a unique relationship between the reward and feedback functions that allows learning algorithms to trade off reward for information in a continuous way.
Andres Cristi (École Polytechnique Fédérale de Lausanne)
Title : Planning Against a Prophet: A Graph-Theoretic Framework for Making Sequential Decisions
We introduce a general graph-theoretic framework for studying prophet inequalities. In this framework, an agent traverses a directed acyclic graph from a starting node s to a target node t. Each edge has a value that is sampled from a known distribution. When the agent reaches a node v, it observes the realized values of all the outgoing edges from v. The agent's objective is to maximize the expected total value of the path it takes. As in prophet inequalities, we compare the agent's performance against a prophet who observes all the realizations of the edges' values ahead of time. Our analysis reveals that this ratio highly depends on the number of paths k required to cover all the nodes in the graph. In particular, we provide an algorithm that guarantees a prophet inequality ratio of 1/(2k) and show an upper bound of 1/(k+1).
Our framework captures planning problems in which there is uncertainty regarding the costs/benefits of each action. In particular, it captures an over-time variant of the classic prophet inequality in which a seller leases a durable item, such as an apartment, for n time units. Each period a lessee appears and may have a different value for each lease term. We obtain a tight bound of 1/2 for this variant.
In this talk, I will present the model, some applications, and the main ideas of our proofs. I will also mention some recent progress on this model.
I will describe how to formulate the problem of recovering preferences and utility functions as a binary classification problem, which allows us to use standard ideas from PAC learning and VC dimension. The methods are broadly applicable in environments with noisy binary choice data. Time permitting, I will also cover some very recent results leveraging choice data augmented with response time to recover preferences using recent developments in machine learning.
Any voting rule induces a game on a preference profile: voters are the players, their possible actions are the ballots reflecting their preferences, and their utility is determined by comparing the election's winner to their true preferences. The study of iterative voting began as an analysis of best-response dynamics in voting games, aiming to refine the set of Nash equilibria by eliminating unintuitive profiles. This study has since evolved into the exploration of new voting rules, defined by iterating existing ones and allowing voter strategies to develop under various assumptions. In this line of work, iterative voting has been shown to improve the social welfare of the voters, using simulations, theoretical analyses, and laboratory experiments. In this talk, I will review the recent literature on iterative voting rules and present results ranging from solving voting paradoxes with iteration, to training reinforcement learning agents to vote in an iterative setting.
Yingkai Li (National University of Singapore)
Title : Information Disclosure Makes Item Pricing Competitive (with Yang Cai and Jinzhao Wu)
In classical mechanism design, the prevailing assumption is that the information structure about agents' types is exogenous. This assumption introduces complexity, especially with multi-dimensional agent types, leading to mechanisms that, while optimal, may appear complex and unnatural. Furthermore, Hart and Nisan (2019) show that the gap between the performance of any simple mechanism and the optimal solution could be potentially unbounded. We challenge this conventional view by showing that simple mechanisms can be highly competitive if the information structure is endogenous and can be influenced by the designer.
We study a multi-dimensional generalization of a single-dimensional model proposed by Bergemann and Pesendorfer (2007), where the designer can shape the information structure via information disclosure. Specifically, we consider a fundamental multi-dimensional mechanism design problem, where a seller is selling $m$ items to a single unit-demand buyer to maximize her revenue. The buyer's values can be arbitrarily correlated across the items. Our main result shows that, following an appropriately chosen information disclosure scheme, item pricing, i.e., set a take-it-or-leave-it price on each item is highly competitive and guarantees to attain at least $50.1\%$ of the optimal revenue. To our knowledge, this is the first result demonstrating the (approximate) optimality of simple mechanisms in this extensively studied multi-dimensional setting, without making any assumptions about the buyer's value distribution. We believe our result not only demonstrates the power of information disclosure in enhancing the performance of simple mechanisms but also suggests a new framework for reevaluating their efficacy in multi-dimensional settings.
Patrick Loiseau (INRIA Saclay)
Title : The Price of Opportunity Fairness in Matroid Allocation Problems
We consider matroid allocation problems under opportunity fairness constraints: resources need to be allocated to a set of agents under matroid constraints (which includes classical problems such as bipartite matching). Agents are divided into C groups according to a sensitive attribute, and an allocation is opportunity-fair if each group receives the same share proportional to the maximum feasible allocation it could achieve in isolation. We study the Price of Fairness (PoF), i.e., the ratio between maximum size allocations and maximum size opportunity-fair allocations. We first provide a characterization of the PoF leveraging the underlying polymatroid structure of the allocation problem. Based on this characterization, we prove bounds on the PoF in various settings from fully adversarial (wort-case) to fully random. Notably, one of our main results considers an arbitrary matroid structure with agents randomly divided into groups. In this setting, we prove a PoF bound as a function of the size of the largest group. Our result implies that, as long as there is no dominant group (i.e., the largest group is not too large), opportunity fairness constraints do not induce any loss of social welfare (defined as the allocation size). Overall, our results give insights into which aspects of the problem's structure affect the trade-off between opportunity fairness and social welfare.
Chantal Marlats (Université Panthéon-Assas)
Title : Experimentation in the lab vs in the field (with Lucie Ménager)
We analyse a bandit model with costly information acquisition. More specifically, we consider a setting in which firms are developing products using a common technology with unknown side effects. If the technology is harmful, it carries a risk of breakdown, which could be highly costly for the firm. At any given time, firms can choose how many trials they run to assess the safety of the technology while simultaneously deciding on the quantity of the product to sell. We characterize the symmetric Markovian equilibrium, identifying the equilibrium paths of experimentation and production decisions over time.
Faidra Monachou (Yale University)
Title : Why the Rooney Rule Fumbles: Limitations of Interview-stage Diversity Interventions in Labor Markets
Many industries, including NFL with Rooney Rule and law firms with Mansfield Rule, have adopted interview-stage diversity interventions of inconclusive effectiveness. Motivated by this, we develop a framework of a two-stage hiring process, where rational firms, with limited interview and hiring capacities, aim to maximize the match value of their hires. There are two equi-sized groups, m and w, with identical ex-post match value distributions. Match values are revealed only post-interview, while interview decisions rely on partially informative pre-interview scores. Pre-interview scores are more informative for group m, while interviews reveal more for w. Therefore, if firms could interview all candidates, both groups would be equally hired. However, we show that requiring equal representation in the interview stage does not translate into equal representation in hiring. With or without intervention, a firm may interview more group w candidates but still hire fewer. At an individual level, we show that strong candidates from both groups benefit from the intervention as the candidate-level competition weakens. For borderline candidates, only group w candidates gain at the expense of group m. Furthermore, we study two vertically differentiated firms, where only the top firm adopts the intervention. We show that, due to increased firm-level competition, the lower firm may hire fewer group w candidates, and find examples where fewer group w candidates are hired across the market. At an individual level, while superstar candidates in both groups benefit, surprisingly borderline candidates may be worse. Generally, our natural two-stage market framework may be of independent interest.
Zvika Neeman (Tel Aviv University)
Title : Deterrence of Unwanted Behavior: a Theoretical and Experimental Investigation (with Penélope Hernández and Ro'i Zultan)
We examine whether concentrating enforcement resources, thereby increasing sanction probability in targeted areas or times (at the expense of others), effectively reduces socially unwanted activities. Our theoretical model and experiment investigate whether splitting the probability of sanction p into higher p_H and lower p_L can minimize overall unwanted behavior. This study empirically tests the concept of beneficial splitting, central to Bayesian persuasion literature, to determine if it offers practical advantages in a realistic, parametrized setting.
Ludovic Renou (Queen Mary University of London)
Title : Non-Bayesian Learning in Misspecified Models (with Sebastian Bervoets and Mathieu Faure)
Deviations from Bayesian updating are traditionally categorized as biases, errors, or fallacies, thus implying their inherent ``sub-optimality.'' We offer a more nuanced view. We demonstrate that, in learning problems with misspecified models, non-Bayesian updating can outperform Bayesian updating.
Eran Shmaya (Stony Brook University)
Title : Disentangling Exploration from Exploitation (with Alessandro Lizzeri and Leeat Yariv)
Starting from Robbins (1952), the literature on experimentation via multi-armed bandits has wed exploration and exploitation. Nonetheless, in many applications, agents' exploration and exploitation need not be intertwined: a policymaker may assess new policies different than the status quo; an investor may evaluate projects outside her portfolio. We characterize the optimal experimentation policy when exploration and exploitation are disentangled in the case of Poisson bandits, allowing for general news structures. The optimal policy features complete learning asymptotically, exhibits lots of persistence, but cannot be identified by an index a la Gittins. Disentanglement is particularly valuable for intermediate parameter values.
We propose a mechanism design framework that incorporates both soft information, which can be freely manipulated, and semi-hard information, which entails a cost for falsification. The framework captures various contexts such as school choice, public housing, organ transplant and manipulations of classification algorithms. We first provide a canonical class of mechanisms for these settings. The key idea is to treat the submission of hard information as an observable and payoff-relevant action and the contractible part of the mechanism as a mapping from submitted scores to a distribution over decisions (a score-based decision rule). Each type report triggers a distribution over score submission requests and a distribution over decision rules. We provide conditions under which score-based mechanisms are without loss of generality. In other words, situations under which the agent does not make any type reports and decides without a mediator what score to submit in a score-based decision rule. We proceed to characterize optimal approval mechanisms in the presence of manipulable hard information. In several leading settings optimal mechanisms are score-based (and thus do not rely on soft information) and involve costly screening. The solution methodology we employ is suitable both for concave cost functions and quadratic costs and is applicable to a wide range of contexts in economics and in computer science.
Bernhard von Stengel (London School of Economics and Political Science)
Title : Finding an Empirical Equilibrium by Machine Learning in a Pricing Game (with Sahar Jahani and Rahul Savani)
We apply machine learning to a classical multi-period pricing game. The game is a duopoly with demand inertia, introduced by Selten in 1965 and studied around 1990 with economic experiments. Its strategies are too complex to represent explicitly. Unlike standard multi-agent reinforcement learning of training agents against each other, our framework is that of a Policy-Space Response Oracle (PSRO) and the "double oracle" method. A few initial strategies that play against each other define a "meta-game". A Nash equilibrium of this meta-game is computed and defines the training environment for a learning agent. Once the agent is sufficiently trained to get a higher payoff than in the current equilibrium, it is added as a new strategy to the meta-game, which expands in this way, and the process repeats. Various learning methods such as SAC (Soft Actor Critic) and PPO (Proximal Policy Optimisation) lead to different limit equilibria in this setup, namely either competitive pricing or cooperative price collusion in the duopoly.
Omer Tamuz (Caltech)
Title : Independence of Irrelevant Decisions in Stochastic Choice (with Fedor Sandomirskiy, Po Hyun Sung & Ben Wincelberg)
We investigate stochasticity in choice behavior across diverse decisions. Each decision is modeled as a menu of actions with associated outcomes, and a stochastic choice rule assigns probabilities to actions based on the outcome profile. We characterize rules whose predictions are not affected by whether or not additional, irrelevant decisions are included in the model. Our main result is that such rules form the parametric family of mixed-logit rules.
We study a sender-receiver problem where there the sender communicates about a series of states to a receiver who takes a series of actions. Our model has two distinctive features. First, the set of messages that the sender uses has a fixed size which is possibly smaller than the number of states and than the number of actions. Second, the receiver's strategy is fixed beforehand, and the sender reacts optimally to it. This represents the interaction between a rational player and a fixed decoding algorithm. Our goal is to characterize the joint empirical distributions of states and actions which are achievable, given the incentives of the sender and the message limitations.