Combinatorial String Dissemination
String data are often disseminated to support applications such as location-based service provision or DNA sequence analysis. This dissemination, however, may expose sensitive patterns that model confidential knowledge (e.g., trips to mental health clinics from a string representing a user’s location history). In this talk, I will consider the problem of sanitizing a string by concealing the occurrences of sensitive patterns, while maintaining data utility, in two settings that are relevant to many common string processing tasks. In the first setting, the goal is to generate the minimal-length string that preserves the order of appearance and frequency of all non-sensitive patterns. In the second setting, the goal is to generate a string that is at minimal edit distance from the original string, in addition to preserving the order of appearance and frequency of all non-sensitive patterns. I will present algorithms for each setting and experiments evaluating these algorithms.