tl  tr
  Home | Tutorials | Articles | Videos | Products | Tools | Search
Interviews | Open Source | Tag Cloud | Follow Us | Bookmark | Contact   
 Cloud Platforms > Google Cloud Platform (GCP) > Cloud Dataprep

Cloud Dataprep

Author: Venkata Sudhakar

Google Cloud Dataprep is an intelligent data service for visually exploring, cleaning, and preparing structured and unstructured data for analysis. Built on the Trifacta Wrangler platform, it helps data analysts and data scientists prepare data without writing code.

Key Features:

1. Visual interface - Point-and-click UI for data transformation with no coding required.

2. Smart suggestions - ML-powered transformation suggestions based on data patterns.

3. Automatic schema detection - Automatically detects data types and anomalies.

4. BigQuery integration - Reads from and writes directly to BigQuery.

5. Scalable execution - Runs transformation jobs on Cloud Dataflow at any scale.

The below example shows a Dataprep wrangle script that cleans and transforms a CSV dataset.


It gives the following output (transformed dataset preview),

customer_id | order_date_clean | order_year | order_month | product_name    | quantity | unit_price | total_revenue | order_tier
------------|------------------|------------|-------------|-----------------|----------|------------|---------------|----------
CUST001     | 2024-01-15       | 2024       | 1           | LAPTOP PRO X    | 2        | 899.99     | 1799.98       | HIGH
CUST002     | 2024-01-16       | 2024       | 1           | WIRELESS MOUSE  | 5        | 29.99      | 149.95        | STANDARD

Cloud Dataprep Workflow:

1. Import - Connect to Cloud Storage, BigQuery, or upload files directly.

2. Explore - View data distributions, detect anomalies, and understand data quality.

3. Transform - Apply cleaning and transformation steps using the visual recipe editor.

4. Run - Execute the recipe as a Cloud Dataflow job at any scale.

5. Export - Write results to BigQuery, Cloud Storage, or download locally.


 
  


  
bl  br