|
|
Cloud DLP
Author: Venkata Sudhakar
Google Cloud Data Loss Prevention (DLP) is a fully managed service that helps you discover, classify, and protect sensitive data. It can inspect text, images, and structured data to find and de-identify personally identifiable information (PII) and other sensitive content. Key Features: 1. 150+ built-in detectors - Automatically detects PII like SSNs, credit card numbers, emails, phone numbers, and medical info. 2. De-identification - Mask, redact, tokenize, or encrypt sensitive data while preserving data utility. 3. Custom info types - Define custom patterns using regex or dictionaries for organization-specific sensitive data. 4. Multi-format - Inspect text, CSV, JSON, images, BigQuery tables, Cloud Storage files, and Datastore. 5. Risk analysis - Analyze re-identification risk and statistical properties of datasets. The below example shows how to inspect text for PII and de-identify it using the Cloud DLP Java client.
It gives the following output,
Findings:
[VERY_LIKELY] PERSON_NAME: John Doe
[VERY_LIKELY] US_SOCIAL_SECURITY_NUMBER: 123-45-6789
[VERY_LIKELY] EMAIL_ADDRESS: [email protected]
[VERY_LIKELY] PHONE_NUMBER: (555) 123-4567
[VERY_LIKELY] CREDIT_CARD_NUMBER: 4111-1111-1111-1111
De-identified text:
Patient [PERSON_NAME], SSN: [US_SOCIAL_SECURITY_NUMBER],
Email: [EMAIL_ADDRESS], Phone: [PHONE_NUMBER],
Credit Card: [CREDIT_CARD_NUMBER]
De-identification Techniques: Redaction - Remove sensitive values entirely. Masking - Replace characters with a mask symbol (e.g., ****-****-****-1111). Tokenization - Replace with a format-preserving encrypted token. Reversible with the encryption key. Pseudonymization - Replace with a consistent fake value. Same input always produces the same output.
|
|