tl  tr
  Home | Tutorials | Articles | Videos | Products | Tools | Search
Interviews | Open Source | Tag Cloud | Follow Us | Bookmark | Contact   
 Cloud Platforms > Google Cloud Platform (GCP) > Cloud DLP

Cloud DLP

Author: Venkata Sudhakar

Google Cloud Data Loss Prevention (DLP) is a fully managed service that helps you discover, classify, and protect sensitive data. It can inspect text, images, and structured data to find and de-identify personally identifiable information (PII) and other sensitive content.

Key Features:

1. 150+ built-in detectors - Automatically detects PII like SSNs, credit card numbers, emails, phone numbers, and medical info.

2. De-identification - Mask, redact, tokenize, or encrypt sensitive data while preserving data utility.

3. Custom info types - Define custom patterns using regex or dictionaries for organization-specific sensitive data.

4. Multi-format - Inspect text, CSV, JSON, images, BigQuery tables, Cloud Storage files, and Datastore.

5. Risk analysis - Analyze re-identification risk and statistical properties of datasets.

The below example shows how to inspect text for PII and de-identify it using the Cloud DLP Java client.


It gives the following output,

Findings:
  [VERY_LIKELY] PERSON_NAME: John Doe
  [VERY_LIKELY] US_SOCIAL_SECURITY_NUMBER: 123-45-6789
  [VERY_LIKELY] EMAIL_ADDRESS: [email protected]
  [VERY_LIKELY] PHONE_NUMBER: (555) 123-4567
  [VERY_LIKELY] CREDIT_CARD_NUMBER: 4111-1111-1111-1111

De-identified text:
Patient [PERSON_NAME], SSN: [US_SOCIAL_SECURITY_NUMBER],
Email: [EMAIL_ADDRESS], Phone: [PHONE_NUMBER],
Credit Card: [CREDIT_CARD_NUMBER]

De-identification Techniques:

Redaction - Remove sensitive values entirely.

Masking - Replace characters with a mask symbol (e.g., ****-****-****-1111).

Tokenization - Replace with a format-preserving encrypted token. Reversible with the encryption key.

Pseudonymization - Replace with a consistent fake value. Same input always produces the same output.


 
  


  
bl  br