Research

HeavyWater and SimplexWater: LLM Watermarking

How can we distinguish between AI-generated and human-written text? One effective approach is through watermarking LLM outputs. Watermarking works by subtly altering the model’s next-token predictions in a way that remains imperceptible to users but can be detected by a verifier who holds a secret key. We frame this watermarking process as an optimization problem: finding the "optimal perturbation" to the token distribution according to a fixed score function that is accessible only with the secret key. For binary score functions, this reduces to a classic distance-maximizing code design problem in coding theory, followed by an optimal transport problem. We also extend the analysis to non-binary scores and derive a new watermarking scheme that outperforms existing watermarks, which is based on scores drawn from heavy-tailed distributions. The main contributions of this work are:

Providing a generalized information-theoretic framework for analyzing watermakrs, which includes many of the existing watermarks as special cases
Providing the optimal watermark with binary scores (SimplexWater) that maximizes detection with zero distortion
Establishing conditions under which a watermark based on arbitrary scores maximizes detection, and introducing the HeavyWater watermark, which outperforms all existing approaches.

Paper:

HeavyWater and SimplexWater: Watermarking Low-Entropy Text Distributions.

CorDP-DME: Correlated Privacy Mechanisms for Differentially Private Distributed Mean Estimation

Differentially private distributed mean estimation (DP-DME) is a key component in private federated learning. DP-DME (with $n$ distributed users) traditionally employs either local DP (LDP) or central DP (CDP), achieving MSEs of $O(1/n)$ and $O(1/n^2)$, respectively. CDP attains a lower MSE by relying on a trusted party. Cryptographic protocols such as secure aggregation have been integrated into DP-DME systems to achieve CDP-level MSE without a trusted party. However, they result in multiple-round protocols and involve significant communication and computational overhead. We propose CorDP-DME, an alternative DP-DME mechanism that is based on an information-theoretic framework that uses optimally correlated Gaussian noise, and effectively navigates the trade-off between privacy, accuracy and robustness, bridging the gap between LDP and CDP error bounds. CorDP-DME:

is a single-round protocol
requires no trusted party
achieves significantly improved accuracy compared to LDP
incurs substantially lower communication and computational costs and increased resilience against dropouts and colluding users compared to secure aggregation.

Papers:

Correlated Privacy Mechanisms for Differentially Private Distributed Mean Estimation.

Multi-Group Proportional Representation in Database Retrieval and Text-to-Image Generation

In image retrieval and text-to-image generation, ensuring well-represented results for user queries is essential to avoid representational harms—instances where certain groups, individuals, or traits are misrepresented or excluded due to biases in data or model behavior. These biases can perpetuate stereotypes, result in unfair outcomes, and marginalize underrepresented communities. While prior research has explored fairness by enforcing equal or proportional representation across individual attributes such as race or gender, less attention has been given to intersectional groups, which are defined by combinations of multiple attributes (e.g., race and gender together). Our work addresses this gap by developing tools to measure and mitigate representational harms in both image retrieval and text-to-image generation. Specifically, we:

Introduce Multi-Group Proportional Representation (MPR), a theoretically grounded metric for measuring intersectional group representation
Propose Multi-Group Aware Proportional Retrieval (MAPR), an image retrieval algorithm that improves both fairness and accuracy, outperforming existing fair retrieval approaches
Conduct a comprehensive analysis of representational gaps in current text-to-image generation models, and propose a method to mitigate these disparities.

Papers:

Private Applroximate Nearest neighbor Search for Vector Database Querying

Vector databases have become increasingly popular in ML/AI applications such as retrieval-augmented generation (RAG) and recommendation systems. They store high-dimensional embeddings of text, images, or video data and enable efficient retrieval by converting user queries into vector embeddings and ranking results based on similarity. Given the scale—often millions of entries—they rely on approximate nearest neighbor (ANN) search techniques for fast retrieval. In this work, we propose a novel method for performing ANN search while ensuring perfect privacy of user queries, effectively extending the classical notion of private information retrieval to the setting of vector databases. We introduce an information-theoretic formulation of the private ANN problem and present a scheme based on Reed-Solomon codes that guarantees perfect privacy without sacrificing the accuracy of the underlying non-private ANN algorithm. Our approach achieves lower communication costs compared to existing cryptographic protocols for private ANN search.

Paper:

Private Approximate Nearest Neighbor Search for Vector Database Querying.

PRUW: Private Read-Update-Write for Efficient Federated Learning with Perfect Information-theoretic Privacy

Private-Read-Update-Write (PRUW) is an information-theoretic framework that enables users to privately download (read) and update (write) sections of a data storage system without revealing the values of the updates or the sections accessed, while ensuring perfect accuracy. PRUW has applications in efficient variants of federated learning (FL) such as federated submodel learning (FSL) and FL with gradient sparsification. In FSL, the users only download and update sections of the model that can be updated by their limited data types, which reduces communication and computation costs; however, the accessed sections and update values can still disclose users' private information. Similarly, in FL with sparsification, users transmit only the most significant $r$ fraction of updates, with the indices and values potentially revealing sensitive information. PRUW addresses these concerns by using coding theoretic tools that perfectly hides the values and the indices of the downloaded and updated information while maintaining perfect accuracy. We propose two variants:

structured PRUW, where the FL model is divided into pre-determined sections (as in FSL)
unstructured PRUW, where the selected content has no specific structure (as in the top $r$ updates selected in gradient sparsification).

For both, We develop coding schemes using the properties of Lagrange polynomials and Cauchy-Vandermonde matrices, along with concepts from coded computing and private information retrieval (PIR) to achieve perfect privacy and accuracy. We show that the asymptotic communication cost of PRUW can be as low as twice that of the corresponding non-private read-write operations. Additionally, we provide the following extensions of basic PRUW:

Relaxing the perfect privacy and accuracy conditions in PRUW to characterize the rate-distortion and rate-privacy-storage trade-offs with corresponding achievable schemes.
PRUW schemes that are applicable to servers with arbitrary storage constraints.

Papers:

Private Information Retrieval

Private information retrieval (PIR) has been widely studied in information theory to obtain the fundamental communication rates in perfectly private database retrieval. In PIR, a user downloads a required file from a database system that stores multiple files, without revealing what was downloaded. We introduced and analyzed the following variants of PIR:

Semantic PIR, which broadens the scope of classical PIR by incorporating files with arbitrary lengths and retrieval probabilities. By deriving the capacity of semantic PIR, we demonstrated that semantic PIR always outperforms classical PIR. Our findings highlight the benefits of leveraging the natural differences in files to improve the communication rate, as opposed to adhering strictly to the classical PIR model.
Quantum PIR, which considers quantum communication channels instead of classical channels in PIR to improve the communication rates by a factor of two.

Papers:

Publications

Jorunal Papers

A. Aytekin, M. Nomeir, S. Vithana and S. Ulukus, "Quantum X-Secure E-Eavesdropped T-Colluding Symmetric Private Information Retrieval," in IEEE Transactions on Information Theory, vol. 71(5):3974-3988, May 2025.
S. Vithana and S. Ulukus. Information-Theoretically Private Federated Submodel Learning with Storage Constrained Databases, in IEEE Transactions on Information Theory, 70(8):6041–6059, August 2024.
S. Vithana and S. Ulukus. Private Read-Update-Write with Controllable Information Leakage for Storage-Efficient Federated Learning with Top $r$ Sparsification, IEEE Transactions on Information Theory, 70(5):3669-3692, May 2024.
S. Vithana and S. Ulukus. Deceptive Information Retrieval, in Entropy, 26(3):244, March 2024.
S. Vithana and S. Ulukus. Private Read Update Write (PRUW) in Federated Submodel Learning (FSL): Communication Efficient Schemes With and Without Sparsification, IEEE Transactions on Information Theory, 70(2):1320-1348, February 2024.
S. Vithana, Z. Wang and S. Ulukus. Private Information Retrieval and Its Applications: An Introduction, Open Problems, Future Directions, in IEEE BITS Magazine, 2023.
S. Vithana, K. Banawan, and S. Ulukus. Semantic Private Information Retrieval, in IEEE Transactions on Information Theory, 68(4):2635–2652, April 2022.
S. Vithana, M. Ekanayake, H. Ekanayake, A. Rathnayake, G. Jayatilaka, V. Herath, R. Godaliyadda and P. Ekanayake, Adaptive Hierarchical Clustering for Hyperspectral Image Classification: Umbrella Clustering, In Journal of Spectral Imaging, 8(a11), July 2019.
S. Vithana, R. Abeysekara, S. Oorloff, A. Rupasinghe, V. Herath, R. Godaliyadda, P. Ekanayake, Comparison of Two Algorithms for Land Cover Mapping Based on Hyperspectral Imagery, International Journal on Advances in ICT for Emerging Regions, 11(1), July 2018.

Conference Papers

S. Vithana, V. Cadambe, F. Calmon, and H. Jeong. "Differentially Private Distributed Mean Estimation with Constrained User Correlations". IEEE International Symposium on Information Theory (ISIT), 2025.
S. Jung, A. Oesterling, C. M. Verdun, S. Vithana, T. Moon, and F. P. Calmon. "Multi-Group Proportional Representation for Text-to-Image Models". Conference on Computer Vision and Pattern Recognition (CVPR), 2025.
S. Vithana, V. R. Cadambe, F. P. Calmon, H, Jeong. "Correlated Privacy Mechanisms for Differentially Private Distributed Mean Estimation". IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), 2025.
A. Oesterling, C. Verdun, C. Long, A. Glynn, L. Paes, S. Vithana, M. Cardone, F. P. Calmon "Multi-Group Proportional Representation", The Thirty-Eighth Annual Conference on Neural Information Processing Systems (NeurIPS), December 2024.
S. Jung, A. Oesterling, C. M. Verdun, S. Vithana, T. Moon, and F. P. Calmon. "Measuring Representational Harms in Image Generation with a Multi-Group Proportional Metric", In NeurIPS Workshop on Algorithmic Fairness Through the Lens of Metrics and Evaluation, December 2024.
S. Vithana, M. Cardone and F.P. Calmon. "Private Approximate Nearest Neighbor Search for Vector Database Querying", in IEEE International Symposium on Information theory (ISIT), July 2024.
M. Nomeir, S. Vithana, S. Ulukus, "Asymmetric X-Secure T-Private Information Retrieval: More Databases is not Always Better", In 58th Annual Conference on Information Sciences and Systems (CISS), March 2024.
A. Aytekin, M. Nomeir, S. Vithana, S. Ulukus, "Quantum Symmetric Private Information Retrieval with Secure Storage and Eavesdroppers", In IEEE GLOBECOM Workshops, December 2023.
M. Nomeir,S. Vithana, S. Ulukus, "Private Membership Aggregation", In IEEE Military Communications Conference (MILCOM), October 2023.
S. Vithana and S. Ulukus. "Private Read Update Write (PRUW) with Heterogeneous Databases", in IEEE International Symposium on Information theory (ISIT), June 2023.
S. Vithana and S. Ulukus. "Rate-Privacy-Storage Trade off in Federated Learning with Top r Sparsification", in IEEE International Conference on Communications (ICC), May 2023. (Best Paper Award)
S. Vithana and S. Ulukus. "Model Segmentation for Storage Efficient Private Federated Learning with Top r Sparsification", in Conference on Information Sciences and Systems (CISS), March 2023.
S. Vithana and S. Ulukus. "Private Federated Submodel Learning with Sparsification", in IEEE Information Theory Workshop (ITW), November 2022.
S. Vithana and S. Ulukus. "Rate Distortion Trade off in Private Read Update Write in Federated Submodel Learning", in Asilomar Conference on Signals, Systems and Computers, October 2022.
S. Vithana and S. Ulukus. "Private Read Update Write (PRUW) with Storage Constrained Databases", in IEEE International Symposium on Information theory (ISIT), June 2022.
S. Vithana and S. Ulukus. "Efficient Private Federated Submodel Learning", in IEEE International Conference on Communications (ICC), May 2022.
S. Vithana, K. Banawan, and S. Ulukus. "Semantic Private Information Retrieval from MDS Coded Databases", in IEEE International Symposium on Information theory (ISIT), July 2021.
S. Vithana, K. Banawan, and S. Ulukus. "Semantic Private Information Retrieval: Effects of Heterogeneous Message Sizes and Popularities", in IEEE Global Communications Conference (GLOBECOM), December 2020.
M. Ekanayake, H. Ekanayake, A. Rathnayake, S. Vithana, V. Herath, R. Godaliyadda, MPB Ekanayake, "A Semi-Supervised Algorithm to Map Major Vegetation Zones Using Satellite Hyperspectral Data", In 9th Workshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensing (WHISPERS), September 2018.
S. Oorloff, R. Abeysekara, S. Vithana, A. Rupasinghe, V. Herath, R. Godaliyadda, P. Ekanayake, "Spectral-Spatial Hybrid Mechanism for Feature Detection Using Spectral Correlation", In IEEE International Conference on Industrial and Information Systems (ICIIS), December 2017.
S. Vithana, R. Abeysekara, S. Oorloff, A. Rupasinghe, V. Herath, R. Godaliyadda, P. Ekanayake. "Hyperspectral Imaging Based Land Cover Mapping Using Data Obtained by the Hyperion Sensor", in Seventeenth International Conference on Advances in ICT for Emerging Regions IEEE (ICTer), September 2017. (Best Paper Award)

Sajani Vithana

Research

HeavyWater and SimplexWater: LLM Watermarking

CorDP-DME: Correlated Privacy Mechanisms for Differentially Private Distributed Mean Estimation

Multi-Group Proportional Representation in Database Retrieval and Text-to-Image Generation

Private Applroximate Nearest neighbor Search for Vector Database Querying

PRUW: Private Read-Update-Write for Efficient Federated Learning with Perfect Information-theoretic Privacy

Private Information Retrieval

Publications

Jorunal Papers

Conference Papers