## Security and Privacy in Data Science

Data science is the process from collection of data to the use of new insights gained from this data. It is at the core of the big data and machine learning revolution fueling the digitization of our economy. The integration of data science and machine learning into digital and cyber-physical processes and the often sensitive nature of personally identifiable data used in the process, expose the data science process to security and privacy threats. In this talk I will review three exemplary security and privacy problems in different phases of the data science lifecycle and show potential countermeasures. First, I will show how to enhance the privacy of data collection using secure multi-party computation and differential privacy. Second, I will show how to protect data outsourced to a cloud database system and still perform efficient queries using keyword PIR and homomorphic encryption. Last, I will show that differential privacy does…

1 0

## On the (in)security of ElGamal in OpenPGP

Do you think you know ElGamal encryption? Think twice.

We uncover vulnerabilities in the OpenPGP ecosystem stemming from confusion about the definition of ElGamal encryption (and the lack of an unequivocable standard). The first vulnerability leads to practical plaintext recovery in a limited number of cases. The second one, combined with side-channel leakage we found in some popular OpenPGP libraries, leads to feasible key recovery, in relatively rare cases.

We hope that these attacks, that we dub “cross-configuration”, serve as a cautionary tale for standards designers. Cryptographic algorithms, even when they may appear very simple, hide a great deal of complexity in the choices of parameters and data representation. While an instantiation may appear to be safe in isolation, the interaction of two incompatible instantiations may lead to a security disaster, which can only be avoided by a carefully written standard.

Joint work with Bertram Poettering and Alessandro Sorniotti.

1 0

## Estimating the nonlinearity of cryptographic Boolean functions

Boolean functions are used as one component in the design of symmetric ciphers, e.g. the Sbox in AES, or the filtering function in stream ciphers. If these functions are linear, it opens the way to certain types of attacks (linear and differential cryptanalysis). These attacks also work if the function is not linear but can be approximated well by a linear function. This has led to the notion of nonlinearity, a parameter of the function which measures its distance to the closest linear/affine function. The nonlinearity can be computed by a O(n2^n) algorithm, where n is the number of variables. When n is large, this computation is unfeasible. Therefore, we investigate the possibility of probabilistic estimation of the nonlinearity. To do so, we use the notion of nonhomomorphicity (introduced by Zhang and Zheng), which can be estimated efficiently even for large n. We generalise several techniques developed by Bellare et al, to obtain upper and lower bounds for the honhomomorphicity in terms of the nonlinearity. These bounds then allow us to estimate the nonlinearity once a good estimate of the nonhomomorphicity was obtained. This is joint work with Pante Stanica and was first presented at the SETA 2020 conference.

1 0

## Polynomial Representation Is Tricky: Maliciously Secure Private Set Intersection Revisited

Private Set Intersection protocols (PSIs) allow parties to compute the intersection of their private sets, such that nothing about the sets’ elements beyond the intersection is revealed. PSIs have a variety of applications, primarily in efficiently supporting data sharing in a privacy-preserving manner. At Eurocrypt 2019, Ghosh and Nilges proposed three efficient PSIs based on the polynomial representation of sets and proved their security against active adversaries. In this talk, I will discuss that these three PSIs are susceptible to several serious attacks. The attacks let an adversary (1) learn the correct intersection while making its victim believe that the intersection is empty, (2) learn a certain element of its victim’s set beyond the intersection, and (3) delete multiple elements of its victim’s input set. I will explain why the proofs did not identify these attacks and discuss how the issues can be rectified.

This is a joint work with Steven Murdoch (UCL) and Thomas Zacharias (University of Edinburgh)

1 0

## An Overview of Password-authenticated Key Exchange Protocols

Password-authenticated key exchange (PAKE) is an interesting example that shows the magic of mathematics. It allows two remote users to establish a “high-entropy” key from a “low-entropy” shared secret without involving any trusted third party. Following Bellovin and Merrit’s 1992 Encrypted Key Exchange (EKE), many PAKE protocols have been proposed in the next 30 years. Today, some have been adopted in large-scale applications, e.g., secure messenger, Wi-Fi, iCloud, browser sync and Thread. On the other hand, designing a robust PAKE protocol has proved extremely delicate and error-prone. In this talk, I will provide a review of the three decades research in this field, a summary of the state-of-the-art, and a taxonomy to categorize existing protocols. A comparative analysis of protocol performance is provided, using representative examples from taxonomy categories. Finally, I will review the recent IETF selection of PAKE protocols for standardisation and summarise lessons as well as open problems.…

1 0

## Private Blocklist Lookups with Checklist

In this talk, I will present Checklist, a system for private blocklist lookups. In Checklist, a client can determine whether a particular string appears on a server-held blocklist of strings, without leaking its string to the server. Checklist is the first blocklist-lookup system that (1) leaks no information about the client’s string to the server, (2) does not require the client to store the blocklist in its entirety, and (3) allows the server to respond to the client’s query in time sublinear in the blocklist size. To make this possible, Checklist uses a new two-server private-information-retrieval protocol that is both asymptotically and concretely faster, in terms of server-side time, than those of prior work. We will discuss the evaluation of Checklist in the context of the “Safe Browsing” blocklist, which all major browsers use to prevent web clients from visiting malware-hosting URLs. Joint work with Henry Corrigan-Gibbs.

1 0

## How to Make Private Distributed Cardinality Estimation Practical, and Get Differential Privacy for Free [Usenix Security ’21]

Secure computation is a promising privacy enhancing technology, but it is often not scalable enough for data intensive applications. On the other hand, the use of sketches has gained popularity in data mining, because sketches often give rise to highly efficient and scalable sub-linear algorithms. It is natural to ask: what if we put secure computation and sketches together? We investigated the question and the findings are interesting: we can get security, we can get scalability, and somewhat unexpectedly, we can also get differential privacy — for free. Our study started from building a secure computation protocol based on the Flajolet-Martin (FM) sketches, for solving the Private Distributed Cardinality Estimation (PDCE) problem, which is a fundamental problem with applications ranging from crowd tracking to network monitoring. The state of art protocol for PDCE is computationally expensive and not scalable enough to cope with big data applications, which prompted us to design a better protocol. Our further analysis revealed that if the cardinality to be estimated is large enough, our protocol can achieve [latex](\epsilon,\delta)[/latex]-differential privacy automatically, without requiring any additional manipulation of the output. The result signifies a new approach for achieving differential privacy that departs from the mainstream approach (i.e. adding noise to the result). Free differential privacy can be achieved because of two reasons: secure computation minimizes information leakage, and the intrinsic estimation variance of the FM sketch makes the output of our protocol uncertain. We further show that the result is not just theoretical: the minimal cardinality for differential privacy to hold is only [latex]10^2-10^4[/latex] for typical parameters.

1 0

## Recent Developments in Lightweight Cryptography: Alzette and SPARKLE

In this talk I present the family of lightweight cryptographic permutations SPARKLE with a focus on its core component — the ARX-based S-box Alzette. SPARKLE and Alzette are at the heart of our submission to the ongoing competition for new standards in lightweight cryptography organised by the US National Institute of Standards and Technology (NIST). On March 29, 2021, SPARKLE was selected as one of the ten finalists entering the final round of the competition, out of 56 initial submissions.

1 0