R. Sanchez-Reillo, I. Ortega-Fernandez, W. Ponce-Hernandez and H. C. Quiros-Sandoval, "How to implement EU data protection regulation for R&D on personal data," 2017 International Carnahan Conference on Security Technology (ICCST), Madrid

R. Sanchez-Reillo, I. Ortega-Fernandez, W. Ponce-Hernandez and H. C. Quiros-Sandoval, "How to implement EU data protection regulation for R&D on personal data," 2017 International Carnahan Conference on Security Technology (ICCST), Madrid, 2017, pp. 1-7.
doi: 10.1109/CCST.2017.8167797
" Abstract:
Biometric R&D has to deal with personal data. From the Universal Declaration of Human Rights, privacy of a human being shall be protected, and this is addressed in different forms in each region of the world. In the case of the European Union, Data Protection Directives, Laws and Regulation have been established, and interpreted in different ways by each European Member State. Such a diversity has pushed the European Union to generate an improved regulation that will be mandatory in 2018. Biometric R&D shall not only comply with the current Directive, but also has to adapt its work to the new Regulation. This work is intended to describe the situation and provide a recommended procedure when having to acquire personal data.

SECTION I.
Introduction

Data Protection Regulations are essential to guarantee the privacy of citizens, in particular in current society, where our personal data is extremely exposed to a large number and variety of attacks and abuse. This was initially included in the Universal Declaration of Human Rights, back in 1948, in its Article 12. But this article has evolved differently in different parts of the world. One of the regions where this article has been strongly articulated is the European Union (EU), where in 1995 a Directive was issued as to regulate data protection in all EU Member States (95/46/EC) [1]. Although the ideas behind such Directive where very clear, the Directives where understood and implemented differently in different Member States [2]. The evolution of Information Technology, and the growing number of cases of identity theft, as well as personal data abuse, have led to the EU to develop a new version of these Directives, but in this case not as Directives, but as proper Regulation. This regulation, referred as 2016/679 [3], will be of mandatory application from May 25, 2018.

This kind of regulations, and in particular the new one, create major challenges to R&D activities in where personal data has to be considered. In particular, Biometrics require to acquire personal data from citizens, as to both, develop the technology as evaluate the results achieved. Such evaluation does not only include performance, but also security and usability, and not only on prototypes, but also in final products. How to acquire such data, as well as how to store it, save it and keep it, are tasks that have to be taken very seriously. In addition, the transmission of such personal data to third parties is under major constraints, which also make selling or acquiring databases a serious legal problem.

The present work present the problem of privacy and data protection, when related to activities requiring the recognition of human beings. In order to better understand the problem, the current data protection law from the EU will be studied in section III, introducing the most important concepts and requirements. From such a knowledge, the application of such directive to Biometrics R&D will be shown in Section IV, providing a recommended solution. Such a recommended solution will be illustrated by explaining the particular case of a major evaluation of fingerprint technology. The new coming scenario, created by the new EU Regulation, will be analysed in Section V, leading to a set of future work and conclusions.
SECTION II.
Privacy vs. Recognition

Human Recognition can be performed in three different ways, or any combination of those three:

By what the user knows (e.g. passwords). The advantages of this method are based on the lack of addition infrastructure, plus the possibility of changing such a knowledge, and therefore updating the credential. On the other side, knowledge can be forgotten, or even easily copied by eavesdropping.

By what the user has (e.g. cards). Again the token used for recognition can be changed, even though that will require some additional cost. But also the token can be robbed or lost.

By what the user is (e.g. Biometrics). The credentials are expected to be unique (if the discriminative power of the algorithm is high), but they cannot be changed unless using template protection techniques. Well developed and deployed, it can be a comfortable way to identify the individual. But credentials are usually publicly available (e.g. by taking a photo), and they may be spoofed. Therefore Presentation Attack Detection (PAD) mechanisms should be in place.

In any of these cases, the credentials are linked to the real identity of the subject within the system. As identity is a piece of personal data, both the identity and the link shall be protected.

Therefore, an application shall acquire Personal Data (i.e. administrative data), for several reasons:

F or registering his participation in the system

To avoid duplicated entries in the system

For future communications between the system and the user

For allowing the user to claim his rights

Biometric Data is a piece of personal data, which currently is considered as of the same level as administrative data. In addition, for some modalities, the link between the data and the person can be direct (e.g. face recognition or even signatures). Such direct legal link as is considered as an evidence by a trial court.

As there is a direct relationship between Biometrics and Personal Data, there is a huge concern about privacy:

How can the citizen be sure that his personal data is not used outside the claimed purpose?

How feasible is for the system provider to use such data for other means?

Is it possible for the user to belong to a system without providing his administrative data?

So in few words, if we need to recognize human beings, but handling such data may attack his/her privacy, how can this be handled? This question led to the definition of data protection laws and directives.
SECTION III.
Personal Data Protection: the EU Directive

The present section will describe the current EU Directive, as a background to later in the text understand the changes that the new regulation will bring.
A. Background

Article 12 of the Universal Declaration of Human Rights (10 December 1948) reads as: “No one shall be subjected to arbitrary interference with his privacy, family, home or correspondence, nor to attacks upon his honour and reputation. Everyone has the right to the protection of the law against such interference or attacks”.

In addition, some countries also declare that law will limit the use of computer science to guarantee the honour and personal privacy of citizens.

With that in mind, some countries developed laws and acts regarding the automatic process of personal data (e.g. LORTAD in Spain in 1992 [4]). These laws should always be sustained by a regulation stating the rules for the treatment of the automated files using personal data.

In 1995, the EU approved the 95/46/EC Directive on the protection of individuals with regard to the processing of personal data and on the free movement of such data [1]. It shall be clear that such directive only applies when data processing is automated or when data is stored in a structured file so that access to such data is simplified.

The Directive is not a law, but a strong recommendation to Member States. Therefore such Directive was implemented in laws and regulations in each of the countries of the EU. Depending on the Member State, the level of coverage of the Directive changed, achieving laws of different levels of strength.

As a summary, the major principles behind the Directive are:

The citizen is entitled to preserve the full control on his personal data: who is collecting it, for what, and where are they going

The data collector shall implement a relevant policy to preserve the citizen right, declaring the filing of that data and the person responsible for keeping the policy.

B. Concepts and Requirements

Going more in depth on the Directive, the following concepts and requirements are defined:

The citizen is the one deciding which part of his personal data is to be provided.

The citizen has the right to declare his consent towards the data collection act

The citizen has the right to be informed about who is collecting the data, what is the reason for collecting such data and which processes are going to be applied to his data

The citizen has the right to deny the collection of his personal data

The Directive also defines the concept of Data Quality, including rules such as:

Data should be relevant, adequate and non-excessive

Data cannot be used for a purpose different to the one declared when been collected

Data shall be maintained exact and accurate

Data shall be cancelled when no longer needed

The Information Right that the citizen has during collection of his data, allows to him to ask about the existence, finality and recipients of the data, the optionality and mandatory character of each data collection, if there are some consequences in not providing some of the data, and how he can execute his rights. Additionally he has to know the identity and address of the file responsible.

In order to allow the traceability of such Information Right, a user consent shall be signed, being extremely important that such user consent shall be unambiguous and revocable.

Certain data may be subject to a higher level of protection. Typically 3 levels are defined, from a basic protection to the strongest one:

Level 1: Administrative data (including Biometrics)

Level 2: Health data

Level 3: Political, Religious and Ideological data, Race, Sexual Orientation, etc.

The Security of the data collected is implicit within the Directive, but there is no an explicit regulation. Just the statement of requiring the file responsible to adopt all technical and organisational means to secure the data and preserve privacy, and the instruction that without an active and relevant security policy, data shall not be collected

The file responsible and any person taking part in the collection and/or processing processes, shall keep professional secret about the data and the organization of such data, even after the termination of relationship with the file responsible.

Data can only be transferred to a third party after the explicit consent of the citizen.

The Access Rights are property of the citizen, who can claim them with no compensation required. In order to reduce the abuse from the citizens, they can only claim for his access rights once every 12 months.

The Rectification and Cancellation Rights shall be applied within 10 days of the receipt of the request. Cancellation shall derive on the blocking to all data related to that user. If data have been given to a third party, this request is applicable to the third party
C. Coding vs. Anonymization

A very important concept is the definition of coded data and anonymised data. Coded data is data that is not directly linked to a subject, as the link has been substituted by a code that can be reversed under highly secure conditions (typically having to contact the file responsible).

On the other hand, anonymised data is data which link with the subject is permanently broken, not being possible to reestablish such link through reasonable means.

Data may not be anonymised always. For example, in the case of biometrics, the face image will always have a direct link with the user, easily found by any other person. Some other biometric mode could be anonymised (e.g. fingerprints), as it is not trivial for any person to detect the identity of a person by the inspection of the fingerprint image.

These two concepts have led to further controversy, as it is not clear whether anonymised data could be transferred to a third party or not. Another concern is if the file responsible should have the explicit consent of the citizen in order to anonymise his data. These are kept nowadays as open questions.
SECTION IV.
How to Apply the EU Directive in Biometrics R&D

In order to determine when and how to apply the Directive in the field of Biometrics R&D, it is necessary to study the requirements first. Such requirements are first, technical, and then such technical requirements will derive into privacy requirements. This section will cover such study, and will result in a set of recommended actions, which will also be illustrated in a real case.
A. R&D Requirments

R&D in Biometrics require a deep analysis of the intraclass distribution, which means the need of collecting data from the same subject in several sessions, different use cases, different attitudes, etc.

Requirement 1: Collect identity data and contact information from the user.

In addition also the inter-class distribution, shall be analysed. This means to collect data from several subjects, with the same parameters used for the intra-class distribution

Requirement 2: Deny the existence of duplicated entries

Another aspect of interest is to analyse the influence of certain subject conditions, such as relevant physical or mental condition, relevant knowledge background, relevant habits, etc.

Requirement 3: Collect additional data for further processing. Some of this additional data may be subject for a higher level of protection.

It is essential to study biometric data stability, and for this it is needed to acquire data in multiple sessions separated a significant time gap. This is also applicable when analysing the aging effect.

Requirement 4: Keep data linked to identity for long periods

When researching on the robustness against Presentation Attacks, some experts shall produce such attacks, allowing them to deeply examine the subject's data.

Requirement 5: Obtain the permission of the subject

Requirement 6: Analyse all ethical implications (e.g.: deny the process if there is a relationship between attacker and subject.

Most of the times it is needed to use the same data for several iterations of the development phase.

Requirement 7: Keep data for long periods

As the acquisition of a DB is a time and money consuming process, sometimes it may be better to acquire a previously captured DB.

Requirement 8: Obtain data relevant to our study

Or once the new DB has been captured, you may want to distribute it, either by selling or by donating.

Requirement 9: Establish conditions and regulation of data transfer

B. Privacy Requirements

Each of those 9 requirements derive a set of privacy preserving actions.

Requirement 1: Collect identity data and contact information from the user.

    Keep a subject database linked with the collected data

    Assign a code to that subject

    Use the code for the data reference

    Keep both sets of data apart and the link table highly secured

Requirement 2: Deny the existence of duplicated entries

    Implement a functionality for the data collector, to ask the database the ID of the subject being collected

        With a ID/new response

        No further feedback

Requirement 3: Collect additional data for further processing

    Analyse the real need of that data and document such need

    Check the level of protection of each piece of data

    It may be good to work around the data request, so not to increase the level of protection and still have the needed data

Requirement 4: Keep data linked to identity for long periods

    A complete and accurate process to allow the cancellation of data shall be implemented

    All data from that user shall be deleted, not only the link

    Determine at the very beginning the minimum duration of the data in the informed consent

Requirement 5: Obtain the permission of the subject

    Inform the subject on this kind of actions

    It may be recommended to state this as an optional consent

Requirement 6: Analyse all ethical implications

    Ensure ethical and legal agreement with attackers

Requirement 7: Keep data for long periods

    Consider the possibility of anonymising the collected data

    Once all data collection has been done, it may be anonymised by breaking the link and (recommended) changing the user codes

Requirement 8: Obtain data relevant to our study

    Study the legal implications of acquiring such DB, and how to keep the claimed conditions

Requirement 9: Establish conditions and regulation of data transfer

    Enforce the third party not to distribute the DB, and to react towards all claims for cancellation

        Will the third party be able to do it?

        Do you have any power to force this?

C. Recommended Procedure

With all these requirements, a procedure can be designed as to comply with them. This section provides a recommended procedure which is based on 11 steps:

Analyse the data needed and its level of protection

Create a remote (not available to the external world) database to store subject personal data (without samples)

Create web services needed for the study:

    Insert a subject

    Delete a subject by ID

    Ask for the subject ID by providing a very reduced set of personal data

    Provide discriminative subject data for collateral data impact (e.g. age group)

    Provide statistics of the DB (e.g. number of subjects/samples, age/gender distribution, etc.)

Create a local application to enrol subjects in the database

Create a local database for storing subjects' samples

Create a local application to collect subjects' samples

Create a local application to process samples and obtain statistics

Create an informed consent form

Declare all the procedure, the file structure, and the security mechanisms to the relevant Data Protection Agency

Start data collection and processing

Termination

D. Illustration by Example

The above recommended procedure has been applied to a medium-high scale evaluation of fingerprint systems. In particular, the evaluation was called “Evaluation of Interoperability of Fingerprint Sensors and Algorithms”, and a partial public report is available in [5]. The main characteristics of such evaluation are:

4 semiconductor flat sensors

5 algorithms (NBIS [6] + 4 commercial ones)

589 subjects

6 fingers

6 samples / finger

2 visits:

    1st visit: Enrolment + 1st acquisition session

    2nd visti: 2nd acquisition session

Soft ground truth mechanism to ensure proper data collection

Operator controlled data collection

Compensation to subjects: 2 cinema tickets once the collection is finished

The 1st step was to analyse the data needed, resulting in the following:

ID information:

    Name

    Surname

    ID document number

Contact information:

    Phone

    E-mail

Demographics:

    Gender

    Age group

    Habituation to IT

    Habituation to Biometrics

User condition:

    Lack of fingers

    Skin issues that may impact acquisition (YES / NO — no further detail)

All data is consider of Level 1, i.e. Basic Level Protection.

The 2nd step was to create a remote database to store subject personal data (without samples). A MySQL server has been used, with authenticated SSL connection in a computer without access to the external world.

Once created the database, the 3rd step was to create all web services needed for the study, using PHP scripts executed in the same computer where the MySQL database is installed. For such a task, a web server in that same computer was installed, with only authenticated SSL connection available.

The 4th step was to create a local application to register subjects in the database. This was done in in Visual Studio, in C#, and that application was executed in a computer without remote connection to the external world.

With the administrative registry application created, an additional database was created for storing subjects' samples. Such database, for operational purposes, was created as a collection of files with particular file naming based on collected data and user code. This was step 5.

The 6th step was to create the local application for data collection. The acquisition environment is seen in Fig. 1, followed by two screen shots, one for the enrolment and a second one for the verification.
Figure 1
Fig. 1.

Acquisition environment

View All
Figure 2
Fig. 2.

Screenshot of the enrolment phase, showing an incident raised by the ground truth mechanism

View All
Figure 3
Fig. 3

Screenshot of the verification phase

View All

The 7th step was to create a local application to process samples and obtain statistics. Scores were provided in CSV format, to further process data in Matlab or Excel.

The step number 8, is the creation of the informed consent form, describing the process and the target of evaluation, identifying the File and the File Responsible, and providing information about how the citizen can claim for the subject's Rights. The form was provided to the user with its data already filled in

Form with the data of the subject already filled in. One copy was given to the user for future reference, and the other signed command is kept for our records.

Step number 9 was to declare all the procedure, the file structure, and the security mechanisms to the Spanish Data Protection Agency, which was done by the University where the authors work, as it is declared as a relay for the Spanish Data Protection Agency.

With all the previous steps finished we started the data collection and processing, which took 4 months of data collection, plus 3 additional months for data processing.

The last step is the Termination, which is still open as the legal department of the university is still analysing the possibilities and impact of anonymization.
SECTION V.
The New EU Regulation

Regulation 2016/679 [3] has been launched to derogate the 95/46/EC Directive [1], improving, updating and extending its scope. The most important change is that it is no longer a Directive, but a Regulation, and will be mandatory to all Member States in May 25, 2018.

The Regulation takes the most protective interpretation of the Directive, and updates some parts to both, improve the protection of the European citizen, and to also ease the accomplishment of some of the burocracy.

Some Member States are already quite well aligned with this new Regulation, as their interpretation of the Directive was also high protective. But there are other Member States which will have to re-do all their current laws, regulation, and procedures.

The most important changes that the new Regulation introduces are:

It becomes applicable to non-EU established institutions, if data is related to EU citizens.

Creates new categories for personal data

Introduces the concept of pseudonym, as an intermediate state between personal data and anonymization, but no many details are given, so this is creating some controversy.

It regulates additional principles, such as Transparency, Proactive responsibility, Privacy by Design and Privacy Impact Analysis (PIA).

Companies shall/should have a Data Protection Delegate.

File declaration is no longer required in advance, but an internal registry shall be kept.

If a Security Breach or Incident is found, it shall be declared immediately to the Data Protection Agency. If the breach or incident impact subjects, they shall also be informed

The adhesion to Codes of Conduct is recommended, as well as the establishment of certification mechanisms

International data transfer is considered within the new Regulation, although it is not detailed.

An European Data Protection Council is created

Additional requirements about informing the citizen before collecting data are introduced. The most important one is that silence is no longer a way of providing consent, i.e. the consent shall always be explicit.

It defines additional Rights, such as Deletion of Data Right, Right to Forget, Process Limitation Right, Data Portability Right, and the right of not being subject of a decision taken only by automatized means

It is adapted to the new technologies, such as Big Data and Internet of Things.

One of the issues that has been most criticized is the lack of a clear definition of Video-surveillance.

Unfortunately there are still many concepts, mechanisms and interpretations to be defined. These gaps will have to be covered by Member State laws and regulations, which may again create different environments in different Member States.

This is the reason why it cannot be yet defined the impact of the new EU regulation in Biometrics R&D, although it can be said that if the recommended procedure is followed, it is very likely that it will be valid within the new Regulation.
SECTION VI.
Future Steps and Conclusions

Currently all Member States are working in adapting their laws and regulations to the current EU Regulation, but there is still a long way to go, as to get final documents. On the time being, some organizations (e.g. [7]) are starting to divulgate the new EU Regulation and assessing on the potential impact that may have.

In addition each economic sector is trying to influence in the writing of the new laws, as to minimize the impact of its companies and institutions.

It is expected that some Member States will have to modify their laws only lightly, as their previous interpretation was extremely conservative. But some others will have to re-define them completely.

This work has taken such a conservative approach, and defined a procedure to acquire Biometrics DBs, in order to either work on R&D, or in performing independent evaluations.

Anyway, the Biometrics sector shall also be aware of the future events that will appear when those aspects of the new Regulation that are not well or fully defined, become clear.
References

Directive 95/46/EC of the European Parliament and of the Council of 24 October 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data, [online] Available: http://eur-lex.europa.eu/legal-content/en/ALL/?uri=CELEX:31995L0046.
Show Context

Organic Law 15 / 1999 13 december of protection of Personal data, [online] Available: http://www.agpd.es/portalwebAGPD/canaldocumentacion/legislacion/estatal/common/pdfs/2014/Ley_Organica_15-1999de_13_de_diciembre_de_Proteccion_de_Datos_Consolidado.pdf.
Show Context

Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data and repealing Directive 95/46/EC (General Data Protection Regulation), [online] Available: http://eur-lex.europa.eu/legal-content/EN/TXTl?uri=uriserv:OJ.L_.2016.119.01.0001.01.ENG&toc=OJ:l:2016:119:toc.
Show Context

Ley Orgánica 5/1992 de 29 de octubre de regulacián del tratamiento automatizado de los datos de carácter personal, [online] Available: https://www.boe.es/buscar/doc.php?id=BOE-A-1992-24189.
Show Context

Raul Sanchez-Reillo et al., Public Report on an Evaluation of 3 fingerprint sensors and 2 algorithms.
Show Context

NIST Biometric Image Software (NBIS).
Show Context

El Reglamento General de Protección de Datos de la UE: Una perspectiva empresarial, [online] Available: http://www.fundacionesys.com/es/system/files/documentos/ESTUDIO%20PROTECCIO%CC%81N%20DE%20DATOS%20FUNDACION%20ESYS.pdf.
"