In Browser
	StumbleUpon
	del.icio.us
	Google
	Google Buzz
	reddit
	LinkedIn

	Facebook
	Twitter
	Linkedin
	E-Mail

Cloud Platforms > Google Cloud Platform (GCP) > Cloud DLP

Cloud DLP

Author: Venkata Sudhakar

Google Cloud Data Loss Prevention (DLP) is a fully managed service that helps you discover, classify, and protect sensitive data. It can inspect text, images, and structured data to find and de-identify personally identifiable information (PII) and other sensitive content.

Key Features:

1. 150+ built-in detectors - Automatically detects PII like SSNs, credit card numbers, emails, phone numbers, and medical info.

2. De-identification - Mask, redact, tokenize, or encrypt sensitive data while preserving data utility.

3. Custom info types - Define custom patterns using regex or dictionaries for organization-specific sensitive data.

4. Multi-format - Inspect text, CSV, JSON, images, BigQuery tables, Cloud Storage files, and Datastore.

5. Risk analysis - Analyze re-identification risk and statistical properties of datasets.

The below example shows how to inspect text for PII and de-identify it using the Cloud DLP Java client.

import com.google.cloud.dlp.v2.DlpServiceClient;
import com.google.privacy.dlp.v2.*;
import java.util.List;

public class CloudDLPExample {
    public static void main(String[] args) throws Exception {
        String projectId = "my-project";
        String textToInspect = "Patient John Doe, SSN: 123-45-6789, " +
            "Email: john.doe@example.com, Phone: (555) 123-4567, " +
            "Credit Card: 4111-1111-1111-1111";

try (DlpServiceClient dlp = DlpServiceClient.create()) {
            // Inspect text for PII
            ContentItem item = ContentItem.newBuilder()
                .setValue(textToInspect).build();

InspectConfig inspectConfig = InspectConfig.newBuilder()
                .addInfoTypes(InfoType.newBuilder().setName("PERSON_NAME"))
                .addInfoTypes(InfoType.newBuilder().setName("US_SOCIAL_SECURITY_NUMBER"))
                .addInfoTypes(InfoType.newBuilder().setName("EMAIL_ADDRESS"))
                .addInfoTypes(InfoType.newBuilder().setName("PHONE_NUMBER"))
                .addInfoTypes(InfoType.newBuilder().setName("CREDIT_CARD_NUMBER"))
                .setMinLikelihood(Likelihood.LIKELY)
                .build();

InspectContentRequest request = InspectContentRequest.newBuilder()
                .setParent("projects/" + projectId + "/locations/global")
                .setItem(item)
                .setInspectConfig(inspectConfig)
                .build();

InspectContentResponse response = dlp.inspectContent(request);
            System.out.println("Findings:");
            for (Finding f : response.getResult().getFindingsList()) {
                System.out.printf("  [%s] %s: %s%n",
                    f.getLikelihood(), f.getInfoType().getName(), f.getQuote());
            }

// De-identify - replace with info type name
            DeidentifyConfig deidentifyConfig = DeidentifyConfig.newBuilder()
                .setInfoTypeTransformations(InfoTypeTransformations.newBuilder()
                    .addTransformations(InfoTypeTransformations.InfoTypeTransformation.newBuilder()
                        .setPrimitiveTransformation(PrimitiveTransformation.newBuilder()
                            .setReplaceWithInfoTypeConfig(ReplaceWithInfoTypeConfig.getDefaultInstance()))))
                .build();

DeidentifyContentResponse deidentified = dlp.deidentifyContent(
                DeidentifyContentRequest.newBuilder()
                    .setParent("projects/" + projectId + "/locations/global")
                    .setItem(item)
                    .setDeidentifyConfig(deidentifyConfig)
                    .setInspectConfig(inspectConfig)
                    .build());

System.out.println("\nDe-identified text:");
            System.out.println(deidentified.getItem().getValue());
        }
    }
}

It gives the following output,

Findings:
  [VERY_LIKELY] PERSON_NAME: John Doe
  [VERY_LIKELY] US_SOCIAL_SECURITY_NUMBER: 123-45-6789
  [VERY_LIKELY] EMAIL_ADDRESS: [email protected]
  [VERY_LIKELY] PHONE_NUMBER: (555) 123-4567
  [VERY_LIKELY] CREDIT_CARD_NUMBER: 4111-1111-1111-1111

De-identified text:
Patient [PERSON_NAME], SSN: [US_SOCIAL_SECURITY_NUMBER],
Email: [EMAIL_ADDRESS], Phone: [PHONE_NUMBER],
Credit Card: [CREDIT_CARD_NUMBER]

De-identification Techniques:

Redaction - Remove sensitive values entirely.

Masking - Replace characters with a mask symbol (e.g., ****-****-****-1111).

Tokenization - Replace with a format-preserving encrypted token. Reversible with the encryption key.

Pseudonymization - Replace with a consistent fake value. Same input always produces the same output.

Send your comments, suggestions or queries regarding this site to [email protected].