Compliance Effort: The Timing and Content of Open Source Software Responses to GDPR

Aileen Nielsen (Harvard Law School), Elias Datler (Unaffiliated), Karel Kubicek (ETH Zurich)

Two distinct phenomena - the increasing use and importance of open source software (OSS) and the promulgation of the General Data Protection Regulation (GDPR) - together have created an opportunity to study the details of how law affects code. This work presents a description and analysis of a multi-year data set of GitHub issues reflecting all contemporaneously public activities on this open source platform between 2016 and 2022. We identify 40,000 GitHub issues including reference to “GDPR”, and we fine-tune a large language model to automate the processing of natural language text in those issues. With the resulting models, we determine the relevance to GDPR compliance of issues containing the “GDPR” string and also infer for each relevant issue which, if any, GDPR articles are tied to the issue’s content.

With this automated methodology, we find that a bare majority of issues referencing GDPR are relevant to GDPR compliance, and of those, a bare majority reflects fundamental and highly generous compliance issues, such as the basis for processing data. The next largest subpopulations of issues focus on easily observable elements of compliance: notice-and-consent and data access rights. The fourth most common subpopulation, data security issues, appear to be produced and processed differently from other GDPR issues. We also look to the timing of compliance effort, finding that the earliest examples of compliance effort occur just shortly before GDPR’s statutory date of entering into law, and further that there is no identifiable effect of GDPR enforcement actions on the volume of GDPR issue filings. We conclude in discussing some policy lessons that can be drawn from the results of our analysis.


No More Trade-Offs. GPT and Fully Informative Privacy Policies

Przemysław Pałka (Jagiellonian University), Marco Lippi (University of Florence), Francesca Lagioia (Univeristy of Bologna), Ruta Liepina (Univeristy of Bologna), Giovanni Sartor (Univeristy of Bologna)

The paper reports the results of an experiment aimed at testing to what extent ChatGPT 3.5 and 4 can answer questions regarding privacy policies designed in the new format that we propose. In a world of human-only interpreters, there was a trade-off between comprehensiveness and comprehensibility of privacy policies, leading to the actual policies not containing enough information for users to learn anything meaningful. Having shown that GPT performs relatively well with the new format, we provide experimental evidence supporting our policy suggestion, namely that the law should require fully comprehensive privacy policies, even if this means they become less concise

paper: https://arxiv.org/abs/2402.00013
github: https://github.com/ruutaliepina/full-privacy


The impact of online dispute resolution on judicial outcomes in India

Leslie Barrett (Bloomberg), Pranjal Chandra (University of Chicago), Daniel L Chen (Toulouse School of Economics), Viknesh Nagarathinam (Georgetown)

This study assesses the impact of online dispute resolution (ODR) on the Indian Judiciary's efficiency, focusing on Sama's platform. Amidst India's massive case backlogs and the Covid-19 challenges, the introduction of technology, specifically chatbots, in Lok Adalats is analyzed. The research utilizes a Randomized Control Trial (RCT) methodology and empirical specifications to evaluate different ODR features. Results indicate an increased user engagement and maintained resolution rate with chatbot integration. The findings underscore the potential of incorporating procedural justice principles in technological advancements, enhancing user participation, and contributing to the legitimacy and efficacy of the legal system.


Content Sensitivity: Towards a Computational Framework for the Content-Based Test of the First Amendment

Ayelet Gordon-Tapiero (Georgetown), Kobbi Nissim (Georgetown), Paul Ohm (Georgetown) & Muthuramakrishnan Venkitasubramaniam (Georgetown)

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4619075


An Empirical Analysis on the Use and Reporting of National Security Letters

Miro Haller; University of California San Diego, Enze Liu; University of California San Diego, Alex Bellon; University of California San Diego, Andrey Labunets; University of California San Diego, Stefan Savage; University of California San Diego

National Security Letters (NSLs) are similar to administrative subpoenas and can be issued directly by elements of the executive branch to request sensitive metadata without involving a court or grand jury. Importantly, NSLs authorize the imposition of nondisclosure orders (aka "gag orders") on the receiving party. Controversy about potential abuses of this authority has driven a range of legal and policy discussions. To address these concerns, both the public sector and the private sector have sought to document the usage of NSLs in aggregated form. However, each data source is limited in scope, time, and kind, resulting in little public auditing as interpreting the published raw data is challenging.

In this talk, we discuss our insights from collecting, consolidating, and cleaning all NSL data. We discuss (1) whether the published data allows the public to assess the use of NSLs? and (2) the challenges and problems with the data.

We show that longitudinal trends in the usage of NSLs can be observed. For instance, we find a significant increase in NSL requests for non-US persons and that the policy reforms to decrease the mandated nondisclosure period appear to be effective. The observed trends suggest that the current transparency mechanisms seem to be viable safeguards against the excessive use of NSLs. However, aggregating and normalizing the data requires manual reviewing, parsing, and validating. We even find inconsistencies within and across official data sources. Overall, the laborious data collection process hinders external and internal auditing efforts and demonstrates the need for a unified and more usable dataset for NSLs.

We publish our unified data and make recommendations for improving the usability of NSL data in the future to facilitate auditing.

The paper is accessible here: https://arxiv.org/abs/2403.02768

Our unified data set on NSLs: https://github.com/ucsdsysnet/nsl-empirical-analysis


Regulatory CI: Adaptively Regulating Privacy as Contextual Integrity

Sebastian Benthall, NYU; Ido Sivan-Sevilla, UMD.

The practice of regulating privacy, largely based on theories of privacy as control or secrecy, has come under scrutiny. The notice and consent paradigm has proven ineffective in the face of opaque technologies and managerialist reactions by the market. We propose an alternative regulatory model for privacy pivoted around the definition of privacy as Contextual Integrity (CI). Regulating according to CI involves operationalizing the social goods at stake and modeling how appropriate information flow promotes those goods. The social scientific modeling process is informed, deployed, and evaluated through agile regulatory processes – adaptive & responsive regulation – in three learning cycles: (a) the assessment of new risks, (b) real-time monitoring of existing threat actors, and (c) validity assessment of existing regulatory instruments. At the core of our proposal is Regulatory CI, a formalization of Contextual Integrity in which information flows are modeled and audited using Bayesian networks and causal game theory. We use the Cambridge Analytica scandal to demonstrate existing gaps in current regulatory paradigms and the novelty of our proposal.


Giving Voice to the Silenced: Secure Reporting of Sexual Misconduct NDAs

Peter K Chan (Northwestern University), Alyson Carrel (Northwestern Pritzker School of Law), Mayank Varia (Boston University), Xiao Wang (Northwestern University)

Sexual misconduct allegations are often settled, and such settlements often include non-disclosure agreements (NDAs). The wide-spread use of NDAs creates a culture of secrecy surrounding such misconduct, making it difficult for society to have cohesive awareness or agreement about the magnitude of the problem of sexual misconduct. Existing proposals to counter such secrecy, including banning NDAs in sexual misconduct settlements or requiring information deposits in an escrow, have shortcomings.

This work proposes a novel policy option that can measure the overall prevalence of misconduct without intruding into the details of individual cases. Our solution interleaves the design of a new statute and a new cryptographic protocol. Our statutory framework establishes a sexual misconduct settlement registry, incentivizes participation by making it a condition for NDA enforcement, and protects participants from being accused of breaching their NDAs. Complementing this statute, we use secure multi-party computation to reveal statistics about sexual misconduct settlements, along with certificates of participation that we use in our incentive mechanism.


Adi Haviv (Tel Aviv University) , Uri Hacohen (Tel Aviv University). Shahar Sarfaty (Tel Aviv University), Bruria Friedman (Tel Aviv University), Niva Elkin-Koren (Tel Aviv University), Roi Livni (Tel Aviv University) , Amit H. Bermano (Tel Aviv University)

The advent of Generative Artificial Intelligence (GenAI) models, including GitHub Copilot, OpenAI GPT, Stable Diffusion, Midjourney, and DeviantArt, has ushered in a creative revolution, enabling non-professional users to produce high-quality content across various domains. This transformative technology has flooded markets with “cheaply created” synthetic content and sparked legal disputes alleging copyright infringement of human-generated content. To address these challenges, this article introduces a novel approach that harnesses the learning capacity of GenAI models to inform copyright legal analysis. We demonstrate our approach in Language Model (LM) GPT2 and Text-to-Image Stable Diffusion models.

Copyright law distinguishes between original expressions and generic ones (Scènes à faire), protecting the former and permitting reproduction of the latter. However, this distinction has historically been challenging to make consistently, leading to over-protection of copyrighted works. GenAI offers an unprecedented opportunity to enhance this legal analysis by revealing shared patterns in preexisting works.

This paper proposes a data-driven approach to identify the genericity of works created by GenAI. This approach, drawing on interdisciplinary research in computer science and law, employs “data-driven bias” to assess the genericity of expressive compositions, aiding in copyright scope determination. We utilize the capabilities of GenAI to identify and prioritize expressive elements, as well as rank them according to their frequency in the model's dataset which could allow to assess expressive genericity at scale.

The potential implications of measuring expressive genericity for copyright law are profound. Such scoring could assist courts when determining copyright scope during litigation. It could also inform the registration practices of Copyright Offices, allowing registration of only highly original synthetic works. Lastly, it could help copyright owners to signal the value of their works and facilitate fairer licensing deals. More generally, this approach offers valuable insights to policymakers grappling with adapting copyright law to the challenges posed by the era of GenAI.


Guiding large language models to write legal treatises

Colin Doyle (Loyola Law School, Los Angeles)

This talk introduces an open-source architecture for guiding large language models to produce legal treatises based upon a language model’s understanding of a set of legal cases. For this application, a user provides a set of cases and a description of the legal issue to be addressed, and the application manages a sequence of calls to a large language model instructing it to read cases, write notes, and use those notes to construct a legal treatise — all without a human in the loop.

This project is a prototype and a resource for open-source research into legal reasoning with large language models. Large language models seem like they’re poised to take over many legal research and writing tasks, but we’re still not sure of their capabilities and limits. We need better insights into legal reasoning with large language models. This project will be made available as open-source code for researchers to glean those insights by remixing components and running experiments.

Opportunities for future research include evaluating the performance of large language models on difficult legal reasoning tasks, exploring ways to improve LLM performance with better architecture and prompt engineering, and using the outputs of this process to critique human performance with similar legal reasoning and writing tasks.


Value Sensitive Design Towards Fair AI Recruitment Tools

Alexandre Puttick (Bern University of Applied Sciences) Carlotta Rigotti (University of Leiden) Mascha Kurpicz-Briki (Bern University of Applied Sciences) Eduard Fosch-Villaronga (University of Leiden)

While AI applications can potentially enhance efficiency and effectiveness in Human Re-source (HR) processes, evidence suggests that they can perpetuate discrimination and harm individual dignity and well-being. To avoid these adverse consequences, we propose a proof-of-concept integrating value sensitive design principles in AI-driven recruitment and selection processes via prompt engineering. We conducted several experiments to this end: toward an AI recruitment tool guided by fair and trustworthy AI principles. In particular, we lay down the foundations for a decision-support tool that gives HR recruiters a clear choice over the normative stances taken and provides relevant information to the user given that choice. Each model we test demonstrates different priorities in balancing fairness and candidate suitability for the job, which are explicit in the model’s prompt and the generated justifications. We choose evaluation metrics corresponding to possible value choices, contrasting the typical case for AI applications, where AI developers consciously or unconsciously embed a particular normative stance in the tools they develop. By shifting the choice of normative stance to the user with several values provided by EU law, we hope the end-user (HR professional) is much more knowledgeable about their specific needs and should be more directly accountable for the normative stances supported by their AI tool.