Machine Learning & EU Data Sharing Practices
Stanford - Vienna Transatlantic Technology Law Forum, Transatlantic Antitrust and IPR Developments, Stanford University, Issue No. 1/2020
New multidisciplinary research article: ‘Machine Learning & EU Data Sharing Practices’. (pdf. Stanford)
By Mauritz Kop
Citation: Kop, Mauritz, Machine Learning & EU Data Sharing Practices (March 3, 2020). Stanford - Vienna Transatlantic Technology Law Forum, Transatlantic Antitrust and IPR Developments, Stanford University, Issue No. 1/2020. Available at SSRN: https://ssrn.com/abstract=3409712
Download the article here: Kop_Machine Learning and EU Data Sharing Practices-Stanford University
In short, the article connects the dots between intellectual property (IP) on data, data ownership and data protection (GDPR and FFD), in an easy to understand manner. It also provides AI and Data policy and regulatory recommendations to the EU legislature.
As we all know, machine learning & data science can help accelerate many aspects of the development of drugs, antibody prophylaxis, serology tests and vaccines.
Abstract
Data sharing is a prerequisite for a successful Transatlantic Artificial Intelligence (AI) ecosystem. Hand-labelled, annotated training datasets (corpora) are a sine qua non for supervised machine learning. But what about intellectual property (IP) and data protection?
Intellectual Property
Data that represent IP subject matter are protected by IP rights. Augmented machine learning training datasets are awarded with either a database right or a sui generis database right in Europe. Unlicensed (or uncleared) use of machine learning input data potentially results in an avalanche of copyright (reproduction right) and database right (extraction right) infringements.
TDM Exceptions, Fair learning and Machine Legibility
The article offers three solutions that address the input (training) data copyright clearance problem and create breathing room for AI developers: the implementation of a broadly scoped, mandatory TDM exception covering all types of data (including news media) in Europe, the Fair Learning principle in the United States and the establishment of an online clearinghouse for machine learning training datasets. A right to machine legibility that drastically improves access to data, will greatly benefit the growth of an AI ecosystem.
Absolute data property right is not opportune
Introducing an absolute data property right or a (neighbouring) data producer right for annotated machine learning training datasets or other classes of data is not opportune. Legislative gaps concerning ownership of data can be remedied by contracts. Implementing a sui generis system of protection for AI-generated Creations & Inventions is -in most industrial sectors- not necessary since machines do not need incentives to create or invent. Where incentives are needed, IP alternatives exist.
Public domain
Autonomously generated non-personal data should fall into the public domain. It should be open data, excluded from protection by the Database Directive (DD), the Copyright Directive (CDSM) and the Trade Secret Directive (TSD).
Shift towards trade secrets, Legal Reform TSD, CDSM & DD
As legal uncertainty about the patentability of AI systems is causing a shift towards trade secrets, legal uncertainty about the protection and exclusive use of machine generated databases is causing a similar shift towards trade secrets. This general shift towards trade secrets to keep competitive advantages results in a disincentive to disclose information and impedes on data sharing. In an era of exponential innovation, it is urgent and opportune that both the TSD, the CDSM and the DD shall be reformed by the EU Commission with the data-driven economy in mind.
Freedom of expression and information, competition law
Informed IP policy seeks to compose a regime that balances underprotection and overprotection of IP rights per economic sector. Freedom of expression and information are core democratic values that should be internalized in our IP framework. The article argues that strengthening and articulation of competition law is more opportune than extending IP rights.
Data protection and privacy
More and more datasets consist of both personal and non-personal machine generated data. Both the General Data Protection Regulation (GDPR) and the Regulation on the free flow of non-personal data (FFD) apply to these ‘mixed datasets’. Based on these two Regulations, data can move freely within the European Union. The article contends that in some cases, GDPR legislation causes market barriers for early-stage AI-startups (SME’s). The GDPR also has some important advantages for European SME’s since it is now the international data protection standard.
Technical dimension of machine learning training data
Besides the legal dimensions, the article describes the technical dimensions of data in machine learning. Most AI models need centralized data. Federated learning, in contrast, trains algorithms by bringing the code to the data, instead of bringing the data to the code. Data sharing is not required.
Future AI & Data Regulation
Both data sharing practices and AI-Regulation are high on the EU Commission’s agenda. The article discusses -inter alia- the EC’s ‘White Paper On Artificial Intelligence - A European approach to excellence and trust’ and the ‘EU Data Strategy’.
Open data
Important European initiatives in the field of open data and data sharing are: the Support Centre for Data Sharing (focused on data sharing practices), the European Data Portal (EDP, data pooling per industry i.e. sharing open datasets from the public sector), the Open Data Europe Portal (ODP, sharing data from European institutions) and the EU Blockchain Observatory and Forum.
Policy strategy and Pareto optimum
Transformative technology is not a zero sum game, but a win-win strategy that creates new value. When developing inclusive transformative tech related policies, the goal should be a Pareto optimum and if possible a Pareto improvement by increasing overall prosperity.
Modalities of AI-regulation
Society should actively shape technology for good. The alternative is that other societies, with different social norms and democratic standards, impose their values on us through the design of their technology. With built-in public values, including Privacy by Design that safeguards data protection, data security and data access rights, the federated learning model is consistent with Human-Centered AI and the European Trustworthy AI paradigm.
Link Wordpress: https://ttlfnews.wordpress.com/2020/03/24/machine-learning-eu-data-sharing-practices/