In Browser
	StumbleUpon
	del.icio.us
	Google
	Google Buzz
	reddit
	LinkedIn

	Facebook
	Twitter
	Linkedin
	E-Mail

Cloud Platforms > Google Cloud Platform (GCP) > AI Platform

AI Platform

Author: Venkata Sudhakar

Google Cloud AI Platform (now part of Vertex AI) is a managed machine learning platform that enables data scientists and ML engineers to build, train, deploy, and manage ML models at scale. It provides a unified environment for the entire ML workflow.

Key Features of AI Platform:

1. Training - Run distributed training jobs using custom containers or built-in algorithms.

2. Prediction - Deploy trained models for online and batch predictions.

3. Notebooks - Managed JupyterLab instances for experimentation.

4. Pipelines - Orchestrate end-to-end ML workflows using Kubeflow Pipelines.

5. Feature Store - Centralized repository for storing and serving ML features.

The below example shows how to submit a training job to Google Cloud AI Platform using the Python SDK.

from googleapiclient import discovery
from googleapiclient import errors

def create_training_job(project_id, bucket_name, job_id):
    service = discovery.build('ml', 'v1')

training_inputs = {
        'scaleTier': 'BASIC',
        'packageUris': [f'gs://{bucket_name}/trainer-0.1.tar.gz'],
        'pythonModule': 'trainer.task',
        'args': [
            '--train-files', f'gs://{bucket_name}/data/train.csv',
            '--eval-files', f'gs://{bucket_name}/data/eval.csv',
            '--train-steps', '1000',
            '--eval-steps', '100'
        ],
        'region': 'us-central1',
        'runtimeVersion': '2.8',
        'pythonVersion': '3.7',
        'jobDir': f'gs://{bucket_name}/jobs/{job_id}'
    }

job_spec = {
        'jobId': job_id,
        'trainingInput': training_inputs
    }

request = service.projects().jobs().create(
        parent=f'projects/{project_id}',
        body=job_spec
    )

try:
        response = request.execute()
        print(f"Job created: {response['jobId']}")
        print(f"State: {response['state']}")
        return response
    except errors.HttpError as err:
        print(f"Error creating job: {err}")
        raise

# Usage
create_training_job('my-project', 'my-bucket', 'my_training_job_001')

It gives the following output,

Job created: my_training_job_001
State: QUEUED

AI Platform Training Job States:

1. QUEUED - Job has been submitted and is waiting to be scheduled.

2. PREPARING - Resources are being allocated for the job.

3. RUNNING - The training job is actively running.

4. SUCCEEDED - The job completed successfully.

5. FAILED - The job encountered an error and stopped.

AI Platform has now evolved into Vertex AI, which provides a unified platform combining AutoML and custom training with additional MLOps capabilities including model monitoring, experiment tracking, and managed pipelines.

Send your comments, suggestions or queries regarding this site to [email protected].