|
|
Data Minimization and Privacy by Design in LLM Applications
Author: Venkata Sudhakar
ShopMax India processes personal data - names, addresses, purchase history, and browsing behaviour - to power personalised LLM features. Privacy by Design requires that only the minimum necessary data is included in LLM prompts, that personal identifiers are anonymised before leaving ShopMax India infrastructure, and that users retain control over how their data is used. These principles are increasingly mandated by data protection regulations applicable to Indian businesses serving domestic and international customers.
Data minimisation in LLM applications means replacing personal identifiers with pseudonyms before constructing prompts, limiting the context to fields the LLM actually needs for the task, and stripping outputs of personal information before logging. A pseudonymisation layer maps real identifiers to synthetic ones using a keyed hash, allowing reconstruction for authorised purposes while protecting privacy in transit and at rest.
The example below shows ShopMax India's pseudonymisation layer that anonymises customer data before passing it to the LLM and verifies that the response contains no re-identified personal details.
It gives the following output,
Output PII check: CLEAN
Anonymised ID: CUST_a3f8b21c
Response: For a customer in Bangalore looking for laptops in the Rs 50,000-70,000 range,
I recommend the Dell Inspiron 15 (Rs 58,990) and Lenovo ThinkPad E15 (Rs 64,500).
Implement data minimisation as a mandatory pre-processing step in the LLM request pipeline rather than an optional add-on. Keep the pseudonymisation key in a key management service and rotate it annually. Log only the anonymised customer ID in LLM audit logs - never the original name or email. Build a data subject access request handler that can reconstruct which LLM calls involved a specific customer using the pseudonymisation key, to satisfy regulatory access requests.
|
|