Protecting Privacy of Web Users: Technical and Legal Perspectives
As millions of users browse the Web on a daily basis, their data is continuously collected by numerous companies and agencies with the help of Web tracking technologies. Website owners, however, need to become compliant with recent EU privacy regulations (such as GDPR and ePrivacy) and often rely on consent banners to either inform users or collect their consent to tracking. In this talk, I discuss our recent research in Web tracking and analysis of consent banners from three dimensions:
1) measurement: detection of Web tracking technologies and analysis of consent banners;
2) compliance: multi-disciplinary discussion with legal scholars about potential violations of GDPR and ePrivacy in the discovered practices, and with design scholar of the manipulative tactics and their legality in consent banners;
3) evidence tools: our recent efforts in building browser extensions and evaluating user studies about consent banners for the regulator.
Finally, we present the impact of our work and underline the need for multi-disciplinary research in the area of Web privacy.
Security and Privacy in Data Science
Data science is the process from collection of data to the use of new insights gained from this data. It is at the core of the big data and machine learning revolution fueling the digitization of our economy. The integration of data science and machine learning into digital and cyber-physical processes and the often sensitive nature of personally identifiable data used in the process, expose the data science process to security and privacy threats. In this talk I will review three exemplary security and privacy problems in different phases of the data science lifecycle and show potential countermeasures. First, I will show how to enhance the privacy of data collection using secure multi-party computation and differential privacy. Second, I will show how to protect data outsourced to a cloud database system and still perform efficient queries using keyword PIR and homomorphic encryption. Last, I will show that differential privacy does…
Is Differential Privacy a Silver Bullet for Machine Learning?
Some machine learning applications involve training data that is sensitive, such as the medical histories of patients in a clinical trial. A model may inadvertently and implicitly store some of its training data; careful analysis of the model may therefore reveal sensitive information. To address this problem, algorithms for private machine learning have been proposed. In this talk, we first show that training neural networks with rigorous privacy guarantees like differential privacy requires rethinking their architectures with the goals of privacy-preserving gradient descent in mind. Second, we explore how private aggregation surfaces the synergies between privacy and generalization in machine learning. Third, we present recent work towards a form of collaborative machine learning that is both privacy-preserving in the sense of differential privacy, and confidentiality-preserving in the sense of the cryptographic community. We motivate the need for this new approach by showing how existing paradigms like federated learning fail to preserve privacy in these settings.
Failing Content Security Policy? Learning from its past to improve its future
Content Security Policy has been around for 10 years and still only a fraction of sites on the Web leverage its full potential to mitigate XSS and other flaws. In this talk, we will analyze the evolution of CSP over time and how sites could leverage it to secure against three attacks classes. This is based on our NDSS 2020 paper (https://swag.cispa.saarland/papers/roth2020csp.pdf), which sheds light on the usage of CSP on 10,000 sites over a period of six years. Furthermore, we discuss insights on technical roadblocks of CSP (NDSS 2021, https://swag.cispa.saarland/papers/steffens2021blockparty.pdf), which shows that CSP’s success is in large parts blocked by third parties. Finally, we will discuss our most recent work on (un)usability aspects and fundamental roadblocks for developers (CCS 2021, https://swag.cispa.saarland/papers/roth2021usable.pdf).
Trustworthy Machine Learning: Robustness, Privacy, Generalization, and their Interconnections
Advances in machine learning have led to rapid and widespread deployment of learning based inference and decision making for safety-critical applications, such as autonomous driving and security diagnostics. Current machine learning systems, however, assume that training and test data follow the same, or similar, distributions, and do not consider active adversaries manipulating either distribution. Recent work has demonstrated that motivated adversaries can circumvent anomaly detection or other machine learning models at test time through evasion attacks, or can inject well-crafted malicious instances into training data to induce errors in inference time through poisoning attacks. In this talk, I will describe my recent research about security and privacy problems in machine learning systems, with a focus on potential certifiably defense approaches via logic reasoning and domain knowledge integration with neural networks. We will also discuss other defense principles towards developing practical robust learning systems with robustness guarantees. Zoom meeting link: https://newcastleuniversity.zoom.us/j/81238177624?pwd=Nm16blNtakgwMmgrVVZpbmNCU2t5Zz09…
Learning from the People: Responsibly Encouraging Adoption of Contact Tracing Apps
While significant focus was put on developing privacy protocols for these apps, relatively less attention was given to understanding why, and why not, users might adopt them. Yet, for these technological solutions to benefit public health, users must be willing to adopt these apps. In this talk I showcase the value of taking a descriptive ethics approach to setting best practices in this new domain. Descriptive ethics, introduced by the field of moral philosophy, determines best practices by learning directly from the user — observing people’s preferences and inferring best practice from that behavior — instead of exclusively relying on experts’ normative decisions. This talk presents an empirically-validated framework of user’s decision inputs to adopt COVID19 contact tracing apps, including app accuracy, privacy, benefits, and mobile costs. Using predictive models of users’ likelihood to install COVID apps based on quantifications of these factors, I show how high the bar is for achieving adoption. I conclude by discussing a large-scale field study in which we put our survey and experimental results into practice to help the state of Louisiana advertise their COVID app through a series of randomized controlled Google Ads experiments.
Combinatorial String Dissemination
String data are often disseminated to support applications such as location-based service provision or DNA sequence analysis. This dissemination, however, may expose sensitive patterns that model confidential knowledge (e.g., trips to mental health clinics from a string representing a user’s location history). In this talk, I will consider the problem of sanitizing a string by concealing the occurrences of sensitive patterns, while maintaining data utility, in two settings that are relevant to many common string processing tasks. In the first setting, the goal is to generate the minimal-length string that preserves the order of appearance and frequency of all non-sensitive patterns. In the second setting, the goal is to generate a string that is at minimal edit distance from the original string, in addition to preserving the order of appearance and frequency of all non-sensitive patterns. I will present algorithms for each setting and experiments evaluating these algorithms.
Human-centric Privacy Design and Engineering
Privacy is ultimately about people. User studies and experiments provide insights on users’ privacy needs, concerns, and expectations, which are essential to understand what a system’s actual privacy issues are from a user perspective. Drawing on the speaker’s research on privacy notices and controls in different contexts, from cookie consent notices to smart speakers, this talk discusses how and why privacy controls are often misaligned with user needs, how public policy aimed at protecting privacy often falls short, and how a human-centric approach to privacy design and engineering can yield usable and useful privacy protections that more effectively meet users’ needs and might also benefit companies. Zoom meeting link: https://newcastleuniversity.zoom.us/j/84890082823?pwd=TEJTKzEvVDJPZy9mYU1GUzNORTRKdz09 Meeting ID: 848 9008 2823 Passcode: 944316 Youtube Live Streaming: https://youtu.be/8WBlfTLoO2k Slides Youtube VoD
Polynomial Representation Is Tricky: Maliciously Secure Private Set Intersection Revisited
Private Set Intersection protocols (PSIs) allow parties to compute the intersection of their private sets, such that nothing about the sets’ elements beyond the intersection is revealed. PSIs have a variety of applications, primarily in efficiently supporting data sharing in a privacy-preserving manner. At Eurocrypt 2019, Ghosh and Nilges proposed three efficient PSIs based on the polynomial representation of sets and proved their security against active adversaries. In this talk, I will discuss that these three PSIs are susceptible to several serious attacks. The attacks let an adversary (1) learn the correct intersection while making its victim believe that the intersection is empty, (2) learn a certain element of its victim’s set beyond the intersection, and (3) delete multiple elements of its victim’s input set. I will explain why the proofs did not identify these attacks and discuss how the issues can be rectified.
This is a joint work with Steven Murdoch (UCL) and Thomas Zacharias (University of Edinburgh)
Private Blocklist Lookups with Checklist
In this talk, I will present Checklist, a system for private blocklist lookups. In Checklist, a client can determine whether a particular string appears on a server-held blocklist of strings, without leaking its string to the server. Checklist is the first blocklist-lookup system that (1) leaks no information about the client’s string to the server, (2) does not require the client to store the blocklist in its entirety, and (3) allows the server to respond to the client’s query in time sublinear in the blocklist size. To make this possible, Checklist uses a new two-server private-information-retrieval protocol that is both asymptotically and concretely faster, in terms of server-side time, than those of prior work. We will discuss the evaluation of Checklist in the context of the “Safe Browsing” blocklist, which all major browsers use to prevent web clients from visiting malware-hosting URLs. Joint work with Henry Corrigan-Gibbs.