Machine Learning & EU Data Sharing Practices

Stanford - Vienna Transatlantic Technology Law Forum, Transatlantic Antitrust and IPR Developments, Stanford University, Issue No. 1/2020

New multidisciplinary research article: ‘Machine Learning & EU Data Sharing Practices’. (pdf. Stanford)

By Mauritz Kop

Citation: Kop, Mauritz, Machine Learning & EU Data Sharing Practices (March 3, 2020). Stanford - Vienna Transatlantic Technology Law Forum, Transatlantic Antitrust and IPR Developments, Stanford University, Issue No. 1/2020. Available at SSRN: https://ssrn.com/abstract=3409712

Download the article here: Kop_Machine Learning and EU Data Sharing Practices-Stanford University

In short, the article connects the dots between intellectual property (IP) on data, data ownership and data protection (GDPR and FFD), in an easy to understand manner. It also provides AI and Data policy and regulatory recommendations to the EU legislature.

As we all know, machine learning & data science can help accelerate many aspects of the development of drugs, antibody prophylaxis, serology tests and vaccines.

New multidisciplinary Stanford Law School research article: ‘'Machine Learning & EU Data Sharing Practices’'.

Abstract

Data sharing is a prerequisite for a successful Transatlantic Artificial Intelligence (AI) ecosystem. Hand-labelled, annotated training datasets (corpora) are a sine qua non for supervised machine learning. But what about intellectual property (IP) and data protection?

Intellectual Property

Data that represent IP subject matter are protected by IP rights. Augmented machine learning training datasets are awarded with either a database right or a sui generis database right in Europe. Unlicensed (or uncleared) use of machine learning input data potentially results in an avalanche of copyright (reproduction right) and database right (extraction right) infringements.

TDM Exceptions, Fair learning and Machine Legibility

The article offers three solutions that address the input (training) data copyright clearance problem and create breathing room for AI developers: the implementation of a broadly scoped, mandatory TDM exception covering all types of data (including news media) in Europe, the Fair Learning principle in the United States and the establishment of an online clearinghouse for machine learning training datasets. A right to machine legibility that drastically improves access to data, will greatly benefit the growth of an AI ecosystem.

Absolute data property right is not opportune

Introducing an absolute data property right or a (neighbouring) data producer right for annotated machine learning training datasets or other classes of data is not opportune. Legislative gaps concerning ownership of data can be remedied by contracts. Implementing a sui generis system of protection for AI-generated Creations & Inventions is -in most industrial sectors- not necessary since machines do not need incentives to create or invent. Where incentives are needed, IP alternatives exist.

Public domain

Autonomously generated non-personal data should fall into the public domain. It should be open data, excluded from protection by the Database Directive (DD), the Copyright Directive (CDSM) and the Trade Secret Directive (TSD).

Shift towards trade secrets, Legal Reform TSD, CDSM & DD

As legal uncertainty about the patentability of AI systems is causing a shift towards trade secrets, legal uncertainty about the protection and exclusive use of machine generated databases is causing a similar shift towards trade secrets. This general shift towards trade secrets to keep competitive advantages results in a disincentive to disclose information and impedes on data sharing. In an era of exponential innovation, it is urgent and opportune that both the TSD, the CDSM and the DD shall be reformed by the EU Commission with the data-driven economy in mind.

Freedom of expression and information, competition law

Informed IP policy seeks to compose a regime that balances underprotection and overprotection of IP rights per economic sector. Freedom of expression and information are core democratic values that should be internalized in our IP framework. The article argues that strengthening and articulation of competition law is more opportune than extending IP rights.

Data protection and privacy

More and more datasets consist of both personal and non-personal machine generated data. Both the General Data Protection Regulation (GDPR) and the Regulation on the free flow of non-personal data (FFD) apply to these ‘mixed datasets’. Based on these two Regulations, data can move freely within the European Union. The article contends that in some cases, GDPR legislation causes market barriers for early-stage AI-startups (SME’s). The GDPR also has some important advantages for European SME’s since it is now the international data protection standard.

Technical dimension of machine learning training data

Besides the legal dimensions, the article describes the technical dimensions of data in machine learning. Most AI models need centralized data. Federated learning, in contrast, trains algorithms by bringing the code to the data, instead of bringing the data to the code. Data sharing is not required.

Future AI & Data Regulation

Both data sharing practices and AI-Regulation are high on the EU Commission’s agenda. The article discusses -inter alia- the EC’s ‘White Paper On Artificial Intelligence - A European approach to excellence and trust’ and the ‘EU Data Strategy’.

Open data

Important European initiatives in the field of open data and data sharing are: the Support Centre for Data Sharing (focused on data sharing practices), the European Data Portal (EDP, data pooling per industry i.e. sharing open datasets from the public sector), the Open Data Europe Portal (ODP, sharing data from European institutions) and the EU Blockchain Observatory and Forum.

Policy strategy and Pareto optimum

Transformative technology is not a zero sum game, but a win-win strategy that creates new value. When developing inclusive transformative tech related policies, the goal should be a Pareto optimum and if possible a Pareto improvement by increasing overall prosperity.

Modalities of AI-regulation

Society should actively shape technology for good. The alternative is that other societies, with different social norms and democratic standards, impose their values on us through the design of their technology. With built-in public values, including Privacy by Design that safeguards data protection, data security and data access rights, the federated learning model is consistent with Human-Centered AI and the European Trustworthy AI paradigm.

Link Wordpress: https://ttlfnews.wordpress.com/2020/03/24/machine-learning-eu-data-sharing-practices/

Blog over Kunstmatige Intelligentie, Quantum, Deep Learning, Blockchain en Big Data Law