A. Feder Cooper , Jonathan Frankle, Christopher De Sa
Legal literature on machine learning (ML) tends to focus on harms, and thus tends to reason about individual model outcomes and summary error rates. This focus has masked important aspects of ML that are rooted in its reliance on randomness — namely, stochasticity and non-determinism. While some recent work has begun to reason about the relationship between stochasticity and arbitrariness in legal contexts, the role of non-determinism more broadly remains unexamined. In this paper, we clarify the overlap and differences between these two concepts, and show that the effects of non-determinism, and consequently its implications for the law, become clearer from the perspective of reasoning about ML outputs as distributions over possible outcomes. This distributional viewpoint accounts for randomness by emphasizing the possible outcomes of ML. Importantly, this type of reasoning is not exclusive with current legal reasoning; it complements (and in fact can strengthen) analyses concerning individual, concrete outcomes for specific automated decisions. By illuminating the important role of non-determinism, we demonstrate that ML code falls outside of the cyberlaw frame of treating “code as law,” as this frame assumes that code is deterministic. We conclude with a brief discussion of what work ML can do to constrain the potentially harm-inducing effects of non-determinism, and we indicate where the law must do work to bridge the gap between its current individual-outcome focus and the distributional approach that we recommend.
Dor Bitan, Ran Canetti, Shafi Goldwasser, Rebecca Wexler
The use of hidden investigative software to collect evidence of crimes presents courts with a recurring dilemma: On the one hand, there is often clear public interest in keeping the software hidden to preserve its effectiveness in fighting crimes. On the other hand, criminal defendants have rights to inspect and challenge the full evidence against them, including law enforcement’s investigative methods. In fact, in the U.S. adversarial legal system, the defendant’s rights to scrutinize the government’s tools are crucial to the truth-seeking process and to keeping law enforcement conduct lawful and constitutional. Presently, courts balance these conflicting interests on a case-by-case basis through evidentiary privilege law, often voicing their frustration with the challenging dilemma they face. We demonstrate how judicious use of a sophisticated cryptographic tool called Zero Knowledge Proofs (ZKPs) could help to mitigate this dilemma: Based on actual court cases where evidence was collected using a modified version of a peer-to-peer software, we demonstrate how law enforcement could, in these cases, augment their investigative software with a ZKP-based mechanism that would allow them to later provide full responses to challenges made by a defense expert — and allow a defense expert to independently verify law enforcement claims — while keeping the software hidden. We demonstrate the technical feasibility of our mechanism via a proof-of-concept implementation. We also propose legal analysis that justifies its use, discusses its merits, and considers the legal implications that the very existence of such a mechanism might have, even in cases where it has not been used. Our proof-of-concept may also extend to other verification dilemmas in the legal landscape.
Aloni Cohen, Sarah Scheffler, Mayank Varia
If a court knows that a respondent knows the password to a device, can the court compel the respondent to enter that password into the device? In this work, we propose a new approach to the foregone conclusion doctrine from Fisher v. U.S. that governs the answer to this question. The Holy Grail of this line of work would be a framework for reasoning about whether the testimony implicit in any action is already known to the government. In this paper we attempt something narrower. We introduce a framework for specifying actions for which all implicit testimony is, constructively, a foregone conclusion. Our approach is centered around placing the burden of proof on the government to demonstrate that it is not “rely[ing] on the truthtelling” of the respondent. Building on original legal analysis and using precise computer science formalisms, we propose demonstrability as a new central concept for describing compelled acts. We additionally provide a language for whether a compelled action meaningfully entails the respondent to perform in a manner that is ‘as good as’ the government’s desired goal. Then, we apply our definitions to analyze the compellability of several cryptographic primitives including decryption, multifactor authentication, commitment schemes, and hash functions. In particular, our framework reaches a novel conclusion about compelled decryption in the setting that the encryption scheme is deniable: the government can compel but the respondent is free to use any password of her choice.
Sarah Scheffler, Eran Tromer, Mayank Varia
A central notion in U.S. copyright law is judging the substantial similarity between an original and an (allegedly) derived work. Capturing this notion has proven elusive, and the many approaches offered by case law and legal scholarship are often ill-defined, contradictory, or internally-inconsistent. This work suggests that key parts of the substantial-similarity puzzle are amenable to modeling inspired by theoretical computer science. Our proposed framework quantitatively evaluates how much ”novelty” is needed to produce the derived work with access to the original work, versus reproducing it without access to the copyrighted elements of the original work. ”Novelty” is captured by a computational notion of description length, in the spirit of Kolmogorov-Levin complexity, which is robust to mechanical transformations and availability of contextual information. This results in an actionable framework that could be used by courts as an aid for deciding substantial similarity. We evaluate it on several pivotal cases in copyright law and observe that the results are consistent with the rulings, and are philosophically aligned with the abstraction-filtration-comparison test of Altai.
Azer Bestavros, Stacey Dogan, Paul Ohm, Andrew Sellars
Many pressing societal questions can be answered only by bringing experts from different disciplines together. Questions around misinformation and disinformation, platform power, surveillance capitalism, information privacy, and algorithmic bias, among many others, reside at the intersection of computer science and law. We need to develop institutions that bring together computer scientists and legal scholars to work together on issues like these, and to train new innovators, thought leaders, counselors, and policymakers with hybrid training in both disciplines. In Universities, the disciplines of Computer Science (CS) and Law are separated by many wide chasms. Differences in standards, language, methods, and culture impede professors and other academic researchers who want to collaborate with colleagues on the other side of this divide. Universities place CS and Law in different schools, on different campuses, on different calendars, etc. Researchers in the two disciplines face differing incentives and reward structures for publishing, teaching, funding, and service.
Julissa Milligan Walsh, Mayank Varia, Aloni Cohen, Andrew Sellars, Azer Bestavros
This work examines privacy laws and regulations that limit disclosure of personal data, and explores whether and how these restrictions apply when participants use cryptographically secure multi-party computation (MPC). By protecting data during use, MPC offers the promise of conducting data science in a way that (in some use cases) meets or even exceeds most people’s conceptions of data privacy. With MPC, it is possible to correlate individual records across multiple datasets without revealing the underlying records, to conduct aggregate analysis across datasets which parties are otherwise unwilling to share for competitive reasons, and to analyze aggregate statistics across datasets which no individual party may lawfully hold. However, most adoptions of MPC to date involve data that is not subject to privacy protection under the law. We posit that a major impediment to the adoption of MPC – on the data that society has deemed most worthy of protection – is the difficulty of mapping this new technology onto the design principles of data privacy laws. While a computer scientist might reasonably believe that transforming any data analysis into its privacy-protective variant using MPC is a clear win, we show in this work that the technological guarantees of MPC do not directly imply compliance with privacy laws. Specifically, a lawyer will likely want to ask several important questions about the pre-conditions that are necessary for MPC to succeed, the risk that data might inadvertently or maliciously be disclosed to someone other than the output party, and what recourse to take if this bad event occurs. We have two goals for this work: explaining why the privacy law questions are nuanced and that the lawyer is correct to proceed cautiously, and providing a framework that lawyers can use to reason systematically about whether and how MPC implicates data privacy laws in the context of a specific use case. Our framework revolves around three questions: a definitional question on whether the encodings still constitute ‘personal data,’ a process question about whether the act of executing MPC constitutes a data disclosure event, and a liability question about what happens if something goes wrong. We conclude by providing advice to regulators and suggestions to early adopters to spur uptake of MPC. It is our hope that this work provides the first step toward a methodology that organizations can use when contemplating the use of MPC.
Jinshuo Dong, Jason Hartline, Aravindan Vijayaraghavan
We consider multi-party protocols for classification that are motivated by applications such as e-discovery in court proceedings. We identify a protocol that guarantees that the requesting party receives all responsive documents and the sending party discloses the minimal amount of non-responsive documents necessary to prove that all responsive documents have been received. This protocol can be embedded in a machine learning framework that enables automated labeling of points and the resulting multi-party protocol is equivalent to the standard one-party classification problem (if the one-party classification problem satisfies a natural independence-of-irrelevant-alternatives property). Our formal guarantees focus on the case where there is a linear classifier that correctly partitions the documents.
Aniket Kesari
Both law and computer science are concerned with developing frameworks for protecting privacy and ensuring fairness. Both fields often consider these two values separately and develop legal doctrines and machine learning metrics in isolation from one another. Yet, privacy and fairness values can conflict, especially when considered alongside the accuracy of an algorithm. The computer science literature often treats this problem as an “impossibility theorem” – we can have privacy or fairness but not both. Legal doctrine is similarly constrained by a focus on the inputs to a decision – did the decisionmaker intend to use information about protected attributes. Despite these challenges, there is a way forward. The law has integrated economic frameworks to consider tradeoffs in other domains, and a similar approach can clarify policymakers’ thinking around balancing accuracy, privacy, and fairnesss. This piece illustrates this idea by using a law & economics lens to formalize the notion of a Privacy-Fairness-Accuracy frontier, and demonstrating this framework on a consumer lending dataset. An open-source Python software library and GUI will be made available.
Peter Henderson, Ben Chugg, Brandon Anderson, Daniel E. Ho
We explore the promises and challenges of employing sequential decision-making algorithms — such as bandits, reinforcement learning, and active learning — in law and public policy. While such algorithms have well-characterized performance in the private sector (e.g., online advertising), the tendency to naively apply algorithms motivated by one domain, often online advertisements, can be called the ”advertisement fallacy.” Our main thesis is that law and public policy pose distinct methodological challenges that the machine learning community has not yet addressed. Machine learning will need to address these methodological problems to move ”beyond ads.” Public law, for instance, can pose multiple objectives, necessitate batched and delayed feedback, and require systems to learn rational, causal decision-making policies, each of which presents novel questions at the research frontier. We discuss a wide range of potential applications of sequential decision-making algorithms in regulation and governance, including public health, environmental protection, tax administration, occupational safety, and benefits adjudication. We use these examples to highlight research needed to render sequential decision making policy-compliant, adaptable, and effective in the public sector. We also note the potential risks of such deployments and describe how sequential decision systems can also facilitate the discovery of harms. We hope our work inspires more investigation of sequential decision making in law and public policy, which provide unique challenges for machine learning researchers with potential for significant social benefit.
Moon Duchin, Douglas Spencer
In several areas of law and public policy, there have been longstanding dreams that computers can secure decisionmaking that takes only some things into account, while remaining demonstrably neutral to other factors. In 2022, the U.S. Supreme Court will consider mandating race-neutrality in multiple domains, notably in college admissions and redistricting. In this piece, we clarify the real and imagined uses of computers in redistricting, considering their application for optimization approaches and, more recently, for representative sampling. The current pitch to the Court for a race-blind Voting Rights Act is discussed at length.
Jason D. Hartline, Daniel W. Linna, Liren Shan, Alex Tang
This paper looks at a common law legal system as a learning algorithm, models specific features of legal proceedings, and asks whether this system learns efficiently. A particular feature of our model is explicitly viewing various aspects of court proceedings as learning algorithms. This viewpoint enables directly pointing out that when the costs of going to court are not commensurate with the benefits of going to court, there is a failure of learning and inaccurate outcomes will persist in cases that settle. Specifically, cases are brought to court at an insufficient rate. On the other hand, when individuals can be compelled or incentivized to bring their cases to court, the system can learn and inaccuracy vanishes over time.
Ayelet Gordon-Tapiero, Alexandra Wood, Katrina Ligett
Personalization on digital platforms drives a broad range of harms, including misinformation, manipulation, social polarization, subversion of autonomy, and discrimination. In recent years, policymakers, civil society advocates, and researchers have proposed a wide range of interventions to address these challenges. In this article, we argue that the emerging toolkit reflects an individualistic view of both personal data and data-driven harms that will likely be inadequate to address growing harms in the global data ecosystem. We maintain that interventions must be grounded in an understanding of the fundamentally collective nature of data, wherein platforms leverage complex patterns of behaviors and characteristics observed across a large population to draw inferences and make predictions about individuals. Using the lens of the collective nature of data, we evaluate various approaches to addressing personalization-driven harms currently under consideration. This lens also allows us to frame concrete guidance for future legislation in this space and advocate meaningful transparency that goes far beyond current proposals. We offer a roadmap for what meaningful transparency must constitute: a collective perspective providing a third party with ongoing insight into the information gathered and observed about individuals and how it correlates with any personalized content they receive-across a large, representative population. These insights would enable the third party to understand, identify, quantify, and address cases of personalization-driven harms. We discuss how such transparency can be achieved without sacrificing privacy and provide guidelines for legislation to support the development of this proposal.
Joshua Bloch, Pamela Samuelson
The technical complexity and functionality of computer programs have made it difficult for courts to apply conventional copyright concepts, such as the idea/expression distinction, in the software copyright case law. This has created fertile ground for significant misconceptions. In this paper, we identify fourteen such misconceptions that arose during the lengthy course of the Google v. Oracle litigation. Most of these misconceptions concern application programming interfaces (APIs). We explain why these misconceptions were strategically significant in Oracle’s lawsuit, rebut them, and urge lawyers and computer scientists involved in software copyright litigation to adopt and insist on the use of terminology that is technically sound and unlikely to perpetuate these misconceptions.
Fabian Burmeister, Mickey Zar, Tilo Böhmann, Niva Elkin-Koren, Christian Kurtz, Wolfgang Schulz
This paper explores the use of an architectural perspective to study complex data ecosystems and to facilitate a normative discourse on such ecosystems. It argues that an architectural perspective is helpful to bridging discursive and methodological gaps between information systems (IS) research and legal studies. Combining architectural and normative perspectives is a novel interdisciplinary research approach that provides a framework for analyzing techno-legal contexts. The merits and challenges of this approach are demonstrated and discussed in this paper using the example of COVID-19 contact tracing apps. We conceptualize our results on three levels of knowledge: the first is the actual knowledge of the exemplary contact tracing app we studied and its ecosystem; the second is knowledge of the architectural meta-model that we used, its benefits and its shortcomings; and the third is knowledge of the interdisciplinary research process of acquiring common knowledge shared by IS scholars and legal experts.
James Grimmelmann
If code is law, then the language of law is a programming language. Lawyers and legal scholars can learn about law by studying programming-language theory, and programming-language tools can be usefully applied to legal problems. This article surveys the history of research into programming languages and law and presents ten promising avenues for future efforts. Its goals are to explain how the combination of programming languages and law is distinctive within the broader field of computer science and law, and to demonstrate with concrete examples the remarkable power of programming-language concepts in this new domain.
Ero Balsa, Helen Nissenbaum, Sunoo Park
Privacy technologies support the provision of online services while protecting user privacy. Cryptography lies at the heart of many such technologies, creating remarkable possibilities in terms of functionality while offering robust guarantees of data confidentiality. The cryptography literature and discourse often represent that these technologies eliminate the need to trust service providers, i.e., they enable users to protect their privacy even against untrusted service providers. Despite their apparent promise, privacy technologies have seen limited adoption in practice, and the most successful ones have been implemented by the very service providers these technologies purportedly protect users from. The adoption of privacy technologies by supposedly adversarial service providers highlights a mismatch between traditional models of trust in cryptography and the trust relationships that underlie deployed technologies in practice. Yet this mismatch, while well known to the cryptography and privacy communities, remains relatively poorly documented and examined in the academic literature—let alone broader media. This paper aims to fill that gap. Firstly, we review how the deployment of cryptographic technologies relies on a chain of trust relationships embedded in the modern computing ecosystem, from the development of software to the provision of online services, that is not fully captured by traditional models of trust in cryptography. Secondly, we turn to two case studies—web search and encrypted messaging—to illustrate how, rather than removing trust in service providers, cryptographic privacy technologies shift trust to a broader community of security and privacy experts and others, which in turn enables service providers to implicitly build and reinforce their trust relationship with users. Finally, concluding that the trust models inherent in the traditional cryptographic paradigm elide certain key trust relationships underlying deployed cryptographic systems, we highlight the need for organizational, policy, and legal safeguards to address that mismatch, and suggest some directions for future work.
Johanna Gunawan, Cristiana Santos, Irene Kamara
Internet users are constantly subjected to incessant demands for attention in a noisy digital world. Countless inputs compete for the chance to be clicked, to be seen, and to be interacted with, and they can deploy tactics that take advantage of behavioral psychology to ‘nudge’ users into doing what they want. Some nudges are benign; others deceive, steer, or manipulate users, as the U.S. FTC Commissioner says, “into behavior that is profitable for an online service, but often harmful to [us] or contrary to [our] intent”. These tactics are dark patterns, which are manipulative and deceptive interface designs used at-scale in more than ten percent of global shopping websites and more than ninety-five percent of the most popular apps in online services. Literature discusses several types of harms caused by dark patterns that includes harms of a material nature, such as financial harms, or anticompetitive issues, as well as harms of a non-material nature, such as privacy invasion, time loss, addiction, cognitive burdens, loss of autonomy, and emotional or psychological distress. Through a comprehensive literature review of this scholarship and case law analysis conducted by our interdisciplinary team of HCI and legal scholars, this paper investigates whether harms caused by such dark patterns could give rise to redress for individuals subject to dark pattern practices using consent interactions and the GDPR consent requirements as a case study.