本指南提供了一个完整的端到端工作流,用于使用 Google Cloud 的 Vertex AI 平台和 Gemini 2.5 Flash 训练模型并对图像素材资源进行分类。您将学习如何在 Python Colab 环境中集成 BigQuery 以进行数据检索、集成 Cloud Storage 以进行资产管理,以及集成 Vertex AI 以进行机器学习推理。
配置
在运行代码示例之前,请设置以下特定于项目的变量:
PROJECT_ID="PROJECT_ID"REGION="REGION "# e.g., "us-central1"LOCATION="LOCATION "# e.g., "us"CUSTOMER_ID="CUSTOMER_ID"# required to subscribe to the dataset
环境设置
安装必需的依赖项并配置身份验证以访问 Google Cloud 服务:
# Install Google Cloud SDK dependencies for AI Platform integration!pipinstallgoogle-cloud-aiplatformgoogle-cloud-storagegoogle-cloud-bigquerygoogle-cloud-bigquery-data-exchange-q# Import core libraries for cloud services and machine learning operationsimportjsonimportosfromgoogle.cloudimportbigqueryimportvertexaifromvertexai.generative_modelsimportGenerativeModel,Part# Configure authentication for Google Cloud service access# Initiates OAuth flow in new browser tab if authentication requiredfromgoogle.colabimportauthifos.environ.get("VERTEX_PRODUCT")!="COLAB_ENTERPRISE":fromgoogle.colabimportauthauth.authenticate_user(project_id=PROJECT_ID)# Initialize Vertex AI client with project configurationvertexai.init(project=PROJECT_ID,location=REGION)print(f"Vertex AI initialized for project: {PROJECT_ID} in region: {REGION}")
订阅 Analytics Hub 数据集
您还必须订阅 Analytics Hub 数据集。
fromgoogle.cloudimportbigquery_data_exchange_v1beta1ah_client=bigquery_data_exchange_v1beta1.AnalyticsHubServiceClient()HUB_PROJECT_ID='maps-platform-analytics-hub'DATA_EXCHANGE_ID=f"imagery_insights_exchange_{LOCATION}"LINKED_DATASET_NAME=f"imagery_insights___preview___{LOCATION}"# subscribe to the listing (create a linked dataset in your consumer project)destination_dataset=bigquery_data_exchange_v1beta1.DestinationDataset()destination_dataset.dataset_reference.dataset_id=LINKED_DATASET_NAMEdestination_dataset.dataset_reference.project_id=PROJECT_IDdestination_dataset.location=LOCATIONLISTING_ID=f"imagery_insights_{CUSTOMER_ID.replace('-','_')}__{LOCATION}"published_listing=f"projects/{HUB_PROJECT_ID}/locations/{LOCATION}/dataExchanges/{DATA_EXCHANGE_ID}/listings/{LISTING_ID}"request=bigquery_data_exchange_v1beta1.SubscribeListingRequest(destination_dataset=destination_dataset,name=published_listing,)# request the subscriptionah_client.subscribe_listing(request=request)
使用 BigQuery 提取数据
执行 BigQuery 查询,从 latest_observations 表中提取 Google Cloud Storage URI。这些 URI 将直接传递给 Vertex AI 模型以进行分类。
# Initialize BigQuery clientbigquery_client=bigquery.Client(project=PROJECT_ID)# Define SQL query to retrieve observation records from imagery datasetquery=f"""SELECT *FROM `{PROJECT_ID}.imagery_insights___preview___{LOCATION}.latest_observations`LIMIT 10;"""print(f"Executing BigQuery query:\n{query}")# Submit query job to BigQuery service and await completionquery_job=bigquery_client.query(query)# Transform query results into structured data format for downstream processing# Convert BigQuery Row objects to dictionary representations for enhanced accessibilityquery_response_data=[]forrowinquery_job:query_response_data.append(dict(row))# Extract Cloud Storage URIs from result set, filtering null valuesgcs_uris=[item.get("gcs_uri")foriteminquery_response_dataifitem.get("gcs_uri")]print(f"BigQuery query returned {len(query_response_data)} records.")print(f"Extracted {len(gcs_uris)} GCS URIs:")foruriingcs_uris:print(uri)
图片分类函数
此辅助函数使用 Vertex AI 的 Gemini 2.5 Flash 模型处理图片分类:
defclassify_image_with_gemini(gcs_uri:str,prompt:str="What is in this image?")-> str:""" Performs multimodal image classification using Vertex AI's Gemini 2.5 Flash model. Leverages direct Cloud Storage integration to process image assets without local download requirements, enabling scalable batch processing workflows. Args: gcs_uri (str): Fully qualified Google Cloud Storage URI (format: gs://bucket-name/path/to/image.jpg) prompt (str): Natural language instruction for classification task execution Returns: str: Generated textual description from the generative model, or error message if classification pipeline fails Raises: Exception: Captures service-level errors and returns structured failure response """try:# Instantiate Gemini 2.5 Flash model for inference operationsmodel=GenerativeModel("gemini-2.5-flash")# Construct multimodal Part object from Cloud Storage reference# Note: MIME type may need dynamic inference for mixed image formatsimage_part=Part.from_uri(uri=gcs_uri,mime_type="image/jpeg")# Execute multimodal inference request with combined visual and textual inputsresponses=model.generate_content([image_part,prompt])returnresponses.textexceptExceptionase:print(f"Error classifying image from URI {gcs_uri}: {e}")return"Classification failed."
批量图片分类
处理所有提取的 URI 并生成分类:
classification_results=[]# Execute batch classification pipeline across all extracted GCS URIsforuriingcs_uris:print(f"\nProcessing: {uri}")# Define comprehensive classification prompt for detailed feature extractionclassification_prompt="Describe this image in detail, focusing on any objects, signs, or features visible."# Invoke Gemini model for multimodal inference on current assetresult=classify_image_with_gemini(uri,classification_prompt)# Aggregate structured results for downstream analytics and reportingclassification_results.append({"gcs_uri":uri,"classification":result})print(f"Classification for {uri}:\n{result}")
后续步骤
对图片进行分类后,您可以考虑以下高级工作流程:
模型微调:使用分类结果训练自定义模型。
自动处理:设置 Cloud Functions 以自动对新图片进行分类。
数据分析:对分类模式执行统计分析。
集成:将结果连接到下游应用。
问题排查
常见问题和解决方案:
身份验证错误:确保 IAM 角色和 API 启用情况正确无误。
速率限制:针对大型批次实现指数退避。
内存限制:对于大型数据集,以较小的批次处理图片。
URI 格式错误:验证 GCS URI 是否采用 gs://bucket-name/path/to/image 格式。
[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["没有我需要的信息","missingTheInformationINeed","thumb-down"],["太复杂/步骤太多","tooComplicatedTooManySteps","thumb-down"],["内容需要更新","outOfDate","thumb-down"],["翻译问题","translationIssue","thumb-down"],["示例/代码问题","samplesCodeIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-09-05。"],[],[],null,["# Getting Started with Image Classification Using Vertex AI and BigQuery\n\nThis guide provides a complete end-to-end workflow for training models and\nclassifying imagery assets using Google Cloud's Vertex AI platform with Gemini\n2.5 Flash. You'll learn to integrate BigQuery for data retrieval, Cloud Storage\nfor asset management, and Vertex AI for machine learning inference in a\nPython Colab environment.\n| **Important:** Use Colab Enterprise, as its longer [idle shutdown\n| time](https://cloud.google.com/colab/docs/idle-shutdown) is important for training machine learning models.\n\nConfiguration\n-------------\n\nSet the following project-specific variables before running the code samples: \n\n PROJECT_ID = \"\u003cvar label=\"project ID\" translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e\"\n REGION = \"\u003cvar label=\"region\" translate=\"no\"\u003eREGION\u003c/var\u003e \" # e.g., \"us-central1\"\n LOCATION = \"\u003cvar label=\"location\" translate=\"no\"\u003eLOCATION\u003c/var\u003e \" # e.g., \"us\"\n CUSTOMER_ID = \"\u003cvar label=\"customer ID\" translate=\"no\"\u003eCUSTOMER_ID\u003c/var\u003e\" # required to subscribe to the dataset\n\nEnvironment Setup\n-----------------\n\nInstall required dependencies and configure authentication to access Google\nCloud services: \n\n # Install Google Cloud SDK dependencies for AI Platform integration\n !pip install google-cloud-aiplatform google-cloud-storage google-cloud-bigquery google-cloud-bigquery-data-exchange -q\n\n # Import core libraries for cloud services and machine learning operations\n import json\n import os\n from google.cloud import bigquery\n import vertexai\n from vertexai.generative_models import GenerativeModel, Part\n\n # Configure authentication for Google Cloud service access\n # Initiates OAuth flow in new browser tab if authentication required\n from google.colab import auth\n\n if os.environ.get(\"VERTEX_PRODUCT\") != \"COLAB_ENTERPRISE\":\n from google.colab import auth\n auth.authenticate_user(project_id=PROJECT_ID)\n\n # Initialize Vertex AI client with project configuration\n vertexai.init(project=PROJECT_ID, location=REGION)\n\n print(f\"Vertex AI initialized for project: {PROJECT_ID} in region: {REGION}\")\n\nSubscribe to the Analytics Hub dataset\n--------------------------------------\n\nYou must also subscribe to the Analytics Hub dataset. \n\n from google.cloud import bigquery_data_exchange_v1beta1\n\n ah_client = bigquery_data_exchange_v1beta1.AnalyticsHubServiceClient()\n\n HUB_PROJECT_ID = 'maps-platform-analytics-hub'\n DATA_EXCHANGE_ID = f\"imagery_insights_exchange_{LOCATION}\"\n LINKED_DATASET_NAME = f\"imagery_insights___preview___{LOCATION}\"\n\n\n # subscribe to the listing (create a linked dataset in your consumer project)\n destination_dataset = bigquery_data_exchange_v1beta1.DestinationDataset()\n destination_dataset.dataset_reference.dataset_id = LINKED_DATASET_NAME\n destination_dataset.dataset_reference.project_id = PROJECT_ID\n destination_dataset.location = LOCATION\n LISTING_ID=f\"imagery_insights_{CUSTOMER_ID.replace('-', '_')}__{LOCATION}\"\n\n published_listing = f\"projects/{HUB_PROJECT_ID}/locations/{LOCATION}/dataExchanges/{DATA_EXCHANGE_ID}/listings/{LISTING_ID}\"\n\n request = bigquery_data_exchange_v1beta1.SubscribeListingRequest(\n destination_dataset=destination_dataset,\n name=published_listing,\n )\n\n # request the subscription\n ah_client.subscribe_listing(request=request)\n\nData Extraction with BigQuery\n-----------------------------\n\nExecute a BigQuery query to extract Google Cloud Storage URIs from the\n`latest_observations` table. These URIs will be passed directly to the Vertex AI\nmodel for classification.\n**Note:** The model requires GCS URIs rather than downloaded image files. This approach optimizes data transfer and processing efficiency. \n\n # Initialize BigQuery client\n bigquery_client = bigquery.Client(project=PROJECT_ID)\n\n # Define SQL query to retrieve observation records from imagery dataset\n query = f\"\"\"\n SELECT\n *\n FROM\n `{PROJECT_ID}.imagery_insights___preview___{LOCATION}.latest_observations`\n LIMIT 10;\n \"\"\"\n\n print(f\"Executing BigQuery query:\\n{query}\")\n\n # Submit query job to BigQuery service and await completion\n query_job = bigquery_client.query(query)\n\n # Transform query results into structured data format for downstream processing\n # Convert BigQuery Row objects to dictionary representations for enhanced accessibility\n query_response_data = []\n for row in query_job:\n query_response_data.append(dict(row))\n\n # Extract Cloud Storage URIs from result set, filtering null values\n gcs_uris = [item.get(\"gcs_uri\") for item in query_response_data if item.get(\"gcs_uri\")]\n\n print(f\"BigQuery query returned {len(query_response_data)} records.\")\n print(f\"Extracted {len(gcs_uris)} GCS URIs:\")\n for uri in gcs_uris:\n print(uri)\n\nImage Classification Function\n-----------------------------\n\nThis helper function handles the classification of images using Vertex AI's\nGemini 2.5 Flash model: \n\n def classify_image_with_gemini(gcs_uri: str, prompt: str = \"What is in this image?\") -\u003e str:\n \"\"\"\n Performs multimodal image classification using Vertex AI's Gemini 2.5 Flash model.\n\n Leverages direct Cloud Storage integration to process image assets without local\n download requirements, enabling scalable batch processing workflows.\n\n Args:\n gcs_uri (str): Fully qualified Google Cloud Storage URI \n (format: gs://bucket-name/path/to/image.jpg)\n prompt (str): Natural language instruction for classification task execution\n\n Returns:\n str: Generated textual description from the generative model, or error message\n if classification pipeline fails\n\n Raises:\n Exception: Captures service-level errors and returns structured failure response\n \"\"\"\n try:\n # Instantiate Gemini 2.5 Flash model for inference operations\n model = GenerativeModel(\"gemini-2.5-flash\")\n\n # Construct multimodal Part object from Cloud Storage reference\n # Note: MIME type may need dynamic inference for mixed image formats\n image_part = Part.from_uri(uri=gcs_uri, mime_type=\"image/jpeg\")\n\n # Execute multimodal inference request with combined visual and textual inputs\n responses = model.generate_content([image_part, prompt])\n return responses.text\n except Exception as e:\n print(f\"Error classifying image from URI {gcs_uri}: {e}\")\n return \"Classification failed.\"\n\nBatch Image Classification\n--------------------------\n\nProcess all extracted URIs and generate classifications: \n\n classification_results = []\n\n # Execute batch classification pipeline across all extracted GCS URIs\n for uri in gcs_uris:\n print(f\"\\nProcessing: {uri}\")\n\n # Define comprehensive classification prompt for detailed feature extraction\n classification_prompt = \"Describe this image in detail, focusing on any objects, signs, or features visible.\"\n\n # Invoke Gemini model for multimodal inference on current asset\n result = classify_image_with_gemini(uri, classification_prompt)\n\n # Aggregate structured results for downstream analytics and reporting\n classification_results.append({\"gcs_uri\": uri, \"classification\": result})\n\n print(f\"Classification for {uri}:\\n{result}\")\n\nNext Steps\n----------\n\nWith your images classified, consider these advanced workflows:\n\n- **Model Fine-tuning**: Use classification results to train custom models.\n- **Automated Processing**: Set up Cloud Functions to classify new images automatically.\n- **Data Analysis**: Perform statistical analysis on classification patterns.\n- **Integration**: Connect results to downstream applications.\n\nTroubleshooting\n---------------\n\nCommon issues and solutions:\n\n- **Authentication errors**: Ensure proper IAM roles and API enablement.\n- **Rate limiting**: Implement exponential backoff for large batches.\n- **Memory constraints**: Process images in smaller batches for large datasets.\n- **URI format errors** : Verify GCS URIs follow the format `gs://bucket-name/path/to/image`.\n\nFor additional support, refer to the [Vertex AI\ndocumentation](https://cloud.google.com/vertex-ai/docs) and [BigQuery\ndocumentation](https://cloud.google.com/bigquery/docs)."]]