- Simone Galperti, Associate Professor, University of California, San Diego
- Aleksandr Levkun, Doctoral Candidate, University of California, San Diego
- Jacopo Perego, Associate Professor, Columbia University
This blog article is derived from the authors’ research paper titled The Value of Data Records, a project of the Economics of Digital Services initiative led by Penn’s Center for Technology, Innovation & Competition (CTIC) and The Warren Center for Network & Data Services. CTIC and the Warren Center are grateful to the John S. and James L. Knight Foundation for its generous support of the EODS initiative.
Personal data is the “new oil” of modern economies. Search engines and social media platforms use it to sell targeted advertisement; e-commerce platforms use it to intermediate trade between buyers and sellers; and job-matching platforms use it to match workers and employers. In each case, a large quantity of personal data fuels a multi-billion-dollar industry. How much of this total value is created by the data of each single individual? And how is the value of an individual’s data record affected by privacy protection policies? These basic questions are at the core of some of the recent debates regarding the future of data markets, including how to design them to compensate individuals for their data, how to conduct demand analysis for data-brokers, to what extent data are a source of market power, and how privacy regulation affects data markets and their participants.
Yet the value of personal data is not well understood and can be hard to assess. One reason is that online intermediaries—like search engines, social media, and e-commerce platforms—often use their data to influence the behavior of strategic agents—like consumers and advertisers—by selectively withholding information. For instance, withholding information is a common practice for many digital platforms. Google’s “quality score” pools people’s searches to increase competition among advertisers; Uber conceals riders’ destinations from drivers to increase riders’ welfare; and Airbnb withholds hosts’ profile pictures to decrease discrimination. Thus, withholding information involves pooling the data records of multiple individuals into a common use, which complicates disentangling the value of each single record. To tackle these challenges, we propose a novel and general approach to determine the value of data records. This approach combines a modern apparatus from economics (called information- and mechanism-design theory) with classic techniques from mathematics (called linear-programming duality).
A core result of our research is a decomposition of the value of a data record into two components. The first is the payoff an intermediary derives directly from a record. For instance, this may be the profit an e-commerce platform directly derives when a consumer buys a product from a third-party seller on the platform. The second component captures externalities that data records exert on one another when the intermediary pools them to withhold the information they contain. For instance, continuing our e-commerce example, the platform may partition its knowledge of the consumers’ wants and needs by grouping them into market segments so as to influence the sellers’ price or quality offers. Importantly, if ignored, this second component can significantly bias the evaluation of data records. For instance, it can render the record of a low-spending buyer more valuable than that of a high-spending buyer for the platform.
Our characterization of the value of data records is a stepping stone to addressing various questions about data markets. First, consider the debate about whether and how a data market could compensate individuals for their personal data. The basic idea is that some part of an e-commerce platform’s total payoff (e.g., its profits) would be distributed back to its users as a form of dividend for the monetization of their data. In this case, how much of this payoff will each receive? Answering this question involves complex considerations—including which privacy protection laws would grant consumers control over their data or the extent to which these compensations are determined competitively. Our approach contributes to this debate by offering a benchmark against which to compare the individuals’ actual compensations, however defined or calculated. Second, we show that an intermediary’s demand for data can be analyzed using well-known tools from standard consumer theory. This allows us to establish a general “scarcity principle” for data, guidelines for investing in acquiring data, and when a platform has a strict incentive to merge two databases—for instance, via takeovers.
Another key question in data markets is how privacy regulations will affect them. Our approach also contributes to this debate. How intermediaries use their data influences the incentives of individuals to share their data in the first place. This is irrelevant if data can be collected for free without individuals’ consent. But data privacy regulations—especially in their more recent forms—often grant individuals more control on their data and more power to not share it if they deem its use inappropriate. Our approach is naturally suited to dealing with these constraints that privacy regulations impose on the intermediaries’ use of personal data and to assessing the consequences for the value of that data. The reason is that we can embed constraints in the mathematical formalization of the intermediary’s data-use problem that capture different privacy protection policies—for instance, whether a platform can refuse to serve a customer who does not share her data or is required to provide her with at least a basic service.
Once again, in addition to the direct payoff it generates for the intermediary, the value of an individual’s data record may reflect externalities that arise because its use helps or hinders convincing other market participants to share their data. One key insight that emerges from our research is that privacy rights may not only shift wealth from data-users to data-sources (i.e., from intermediaries to consumers) but also change the value of data records. For instance, they can increase the value of some people’s records at the expense of others. Thus, privacy can have redistributive effects across data-sources, which should be taken into account by privacy protection policies—especially when combined with policies aimed at establishing systems whereby people are compensated for third-party use of their data.