Build an AI/ML-driven image archive processing workflow: Image archive, analysis & report generation

Build an AI-enhanced image processing enterprise workflow: Image archive, analysis & report generation, or "How to spice up file backups"🥱 Seattle:: Fall 2025 Wesley Chun AI TPgM, Red Hat and Principal, CyberWeb @wescpy@ AI TPM, Red Hat and Principal, CyberWeb ● Mission: enable developer success with Google tools, platforms & APIs, using OSS AI projects in GCP projects; and train accelerated Python users ● Focus: OSS AI tools, Google Cloud (GCP) & Workspace (GWS) APIs; GAE migrations; Google X-product sol'ns ● Services: technical consulting, training, engineering, speaking, code samples, hands-on tutorials, public technical content (blogs, social, etc.) About the speaker (and Red Hat AI) Previous experience / background ● Software Engineer & Developer Advocate ○ Google, Sun, HP, Cisco, EMC, Xilinx & ○ Original Yahoo!Mail engineer/SWE ● Technical trainer, teacher, instructor ○ Teaching Math, Linux, Python since '83 ○ Adjunct CS Faculty at local SV colleges ● Python community member ○ Popular Core Python series author ○ Python Software Foundation Fellow ● AB (Math/CS) & CMP (Music/Piano), UC Berkeley and MSCS, UC Santa Barbara ● Adjunct Computer Science Faculty, Foothill College (Silicon Valley) GWS Dev Show goo.gl/JpBQ40 GAE migration bit.ly/3xk2Swi

Why and Agenda ● Organizations have real-life business problems seeking solutions ● Google Cloud/GCP and Google Workspace/GWS: most Google APIs ○ May know GCP for compute, storage, data, and AI/ML services ○ GWS known for its apps (Gmail, Drive,...); also for developers(!) ● Many other APIs from Google AI, Maps, YouTube, Firebase, etc. ● Use all of them to build novel solutions to unique business problems 1 Using Google APIs 2 Google APIs sampler 3 AI-enhanced image processing workflow 4 Wrap-up 464! (2024) 01 Using Google APIs Getting started & the nuts-n-bolts

General steps 1. Go to Cloud Console 2. Login to Google/Gmail account (Workspace domain may require admin approval) 3. Create project (per application) 4. Enable APIs to use 5. Enable billing (CC, Free Trial, etc.) 6. Download client library(ies) 7. Create & download credentials 8. Write code* 9. Run code (may need to authorize) Google APIs: how to use *In your code 1. Import API client library 2. Create API client object 3. Use client to make API Calls Costs & pricing ● GCP & GMP: pay-per-use (CC req'd) ● GWS: "subscription" (incl. $0USD/mo.) ● GMP: $200/mo. free usage ● GCP Free Trial: $300/1Q ● GCP "Always Free" tier ○ Some products have free tier ○ Daily or monthly quota ○ Must exceed to incur billing ● More on both programs at cloud.google.com/free Cloud/GCP console console.cloud.google.com ● Hub of all developer activity ● Applications == projects ○ New project for new apps ○ Projects have a billing acct ● Manage billing accounts ○ Financial instrument required ○ Personal or corporate credit cards, Free Trial, and education grants ● Access GCP product settings ● Manage users & security ● Manage APIs in devconsole

● View application statistics ● En-/disable Google APIs ● Obtain application credentials Using Google APIs goo.gl/RbyTFD API manager aka Developers Console (devconsole) console.developers.google.com Three different credentials types ● Simple: API keys (to access public data) ○ Simplest form of authorization: an API key; tied to a project ○ Allows access to public data ○ Do not put in code, lose, or upload to GitHub! (can be restricted however) ○ Supported by: Google Maps, (some) YouTube, (some) GCP, etc. ● Authorized: OAuth client IDs (to access data owned by [human] user) ○ Provides additional layer of security via OAuth2 (RFC 6749) ○ Owner must grant permission for your app to access their data ○ Access granularity determined by requested permissions (user scopes) ○ Supported by: Google Workspace, (some) YouTube, (some) GCP, etc. ● Authorized: service accounts (to access data owned by an app/robot user) ○ Provides additional layer of security via OAuth2 or JWT (RFC 7519) ○ Project owning data grants permission implicitly; requires public-private key-pair ○ Access granularity determined by Cloud IAM permissions granted to service account key-pair ○ Supported by: GCP, (some) Google Workspace, etc. Blog series: dev.to/wescpy

Two different client library "styles" ● "Platform" client libraries (lower-level) ○ Supports multiple products as a "lowest-common denominator" ○ Manage API service endpoints (setup & use) ○ Manage authorization (API keys, OAuth client IDs, service accounts) ○ Google Workspace, Cloud/GCP, Google Analytics, YouTube, Google Ads APIs, etc. ○ developers.google.com/api-client-library ● "Product" client libraries (higher-level) ○ Custom client libraries made specifically for each product ○ Managing API service endpoints & security mostly taken care of ○ Only need to create a "client" to use API services ○ Cloud/GCP: cloud.google.com/apis/docs/cloud-client-libraries, Firebase, Maps: developers.google.com/maps/web-services/client-library ● Some Google APIs families support both, e.g., Cloud/GCP Google APIs client libraries for common languages; demos in developers.google.com/api- client-library cloud.google.com/apis/docs /cloud-client-libraries

(User-)authorized API access(lower-level, older, generic) OAuth boilerplate goo.gl/KMfbeK from googleapiclient import discovery from httplib2 import Http from oauth2client import file, client, tools SCOPES = ... # at least one (string or array of strings) # 'storage.json' - where to store OAuth2 tokens from API # 'client_secret.json' - OAuth2 client ID & secret (download from DevConsole) store = file.Storage('storage.json') creds = store.get() if not creds or creds.invalid: flow = client.flow_from_clientsecrets('client_secret.json', SCOPES) creds = tools.run_flow(flow, store) # create API service endpoint; for example: API='sheets', VERSION='v4' SERVICE = discovery.build(API, VERSION, http=creds.authorize(Http())) (User-)authorized API access(lower-level, newer, generic) from googleapiclient import discovery from google_auth_oauthlib.flow import InstalledAppFlow from google.auth.transport.requests import Request from google.oauth2 import credentials SCOPES = ... # at least one (string or array of strings) # 'storage.json' - where to store OAuth2 tokens from API # 'client_secret.json' - OAuth2 client ID & secret (download from DevConsole) TOKENS = 'storage.json' # OAuth2 token storage if os.path.exists(TOKENS): creds = credentials.Credentials.from_authorized_user_file(TOKENS) if not (creds and creds.valid): if creds and creds.expired and creds.refresh_token: creds.refresh(Request()) else: flow = InstalledAppFlow.from_client_secrets_file('client_secret.json', SCOPES) creds = flow.run_local_server() with open(TOKENS, 'w') as token: token.write(creds.to_json()) # create API service endpoint; for example: API='sheets', VERSION='v4' SERVICE = discovery.build(API, VERSION, http=creds.authorize(Http()))

OAuth2 or API key HTTP-based REST APIs 1 HTTP 2 Google APIs request-response workflow ● Application makes request ● Request received by service ● Process data, return response ● Results sent to application (typical client-server model) 02 Google APIs Sampler of GCP & GWS APIs

● GWS developer home: developers.google.com/gsuite ● GWS developer intro: youtu.be/NqumcYgj5LI ● GWS REST APIs: youtu.be/2VpvWhDdXsI ● GWS Apps Script: youtu.be/xDovB0pu4OU ● Comprehensive overview: youtu.be/kkp0aNGlynw Google Workspace (formerly G Suite and Google Apps) (GWS) APIs Google Compute Engine, Cloud Storage AWS EC2 & S3; Rackspace; Joyent SaaS Software as a Service PaaS Platform as a Service IaaS Infrastructure as a Service Google Apps Script Salesforce1/force.com Google Workspace (was G Suite/Google Apps) Yahoo!Mail, Hotmail, Salesforce, Netsuite, Oﬃce 365 Google App Engine, Cloud Functions Heroku, Cloud Foundry, Engine Yard, AWS Lambda Google BigQuery, Cloud SQL, Vertex AI, Cloud Firestore, NL, Vision, Pub/Sub AWS Kinesis, RDS; Windows Azure SQL, Docker Google Cloud Platform vs. Google Workspace GWS APIs GCP APIs

List (first 100) files/folders in Drive (older, OAuth2) from __future__ import print_function from googleapiclient import discovery from httplib2 import Http from oauth2client import file, client, tools SCOPES = 'https://www.googleapis.com/auth/drive.metadata.readonly' store = file.Storage('storage.json') creds = store.get() if not creds or creds.invalid: flow = client.flow_from_clientsecrets('client_secret.json', SCOPES) creds = tools.run_flow(flow, store) DRIVE = discovery.build('drive', 'v3', http=creds.authorize(Http())) files = DRIVE.files().list().execute().get('files', []) for f in files: print(f['name'], f['mimeType']) Listing your ﬁles goo.gl/ZIgf8k github.com/wescpy/gsuite-apis-intro Migrate SQL data to a Sheet # read SQL data then create new spreadsheet & add rows into it FIELDS = ('ID', 'Customer Name', 'Product Code', 'Units Ordered', 'Unit Price', 'Status') cxn = sqlite3.connect('db.sqlite') cur = cxn.cursor() rows = cur.execute('SELECT * FROM orders').fetchall() cxn.close() rows.insert(0, FIELDS) DATA = {'properties': {'title': 'Customer orders'}} SHEET_ID = SHEETS.spreadsheets().create(body=DATA, fields='spreadsheetId').execute().get('spreadsheetId') SHEETS.spreadsheets().values().update(spreadsheetId=SHEET_ID, range='A1', body={'values': rows}, valueInputOption='RAW').execute() Migrate SQL data to Sheets goo.gl/N1RPwC

Storage: listing buckets from __future__ import print_function from googleapiclient import discovery GCS = discovery.build('storage', 'v1') BUCKET = YOUR_BUCKET # send bucket name & return fields to API, display results print('n** Objects in bucket %r...' % BUCKET) FIELDS = 'items(name,size)' files = GCS.objects().list(bucket=BUCKET, fields=FIELDS ).execute().get('items') or [{'name': '(none)', 'size': 'NaN'}] for f in files: print(' %s (%s)' % (f['name'], f['size'])) IMG = 'gs://cloud-samples-data/vision/using_curl/shanghai.jpeg' body = {'requests': [{ 'image': {'source': {'imageUri': IMG}}, 'features': [{'type': 'LABEL_DETECTION'}], }]} VISION = discovery.build('vision', 'v1', developerKey=API_KEY) labeling = VISION.images().annotate(body=body).execute().get('responses') for labels in labeling: if 'labelAnnotations' in labels: print('** Labels detected (and confidence score):') for label in labels['labelAnnotations']: print(label['description'], '(%.2f%%)' % (label['score']*100.)) Vision: label annotation/object detection

$ python3 label-detect.py Labels (and confidence score): ============================== People (95.05%) Street (89.12%) Mode of transport (89.09%) Transport (85.13%) Vehicle (84.69%) Snapshot (84.11%) Urban area (80.29%) Infrastructure (73.14%) Road (72.74%) Pedestrian (68.90%) Vision: label annotation/object detection g.co/codelabs/vision-python Image: Google 03 AI/ML-driven image processing workflow Archive and analyze GWS images with GCP

mfg: licensed from iStockPhoto arch: public domain from Piqsels media: by RAEng Pubs from Pixabay advert: by Free-Photos from Pixabay Image licensed from Adobe

Image licensed from iStockPhoto Image: Gerd Altmann from Pixabay

Image: CC BY 2.0 Mike MacKenzie from VPNSRUS Image: CC0 PD from RawPixel

Cloud Vision Google Workspace Google Cloud Image processing application architecture Cloud Storage Drive Sheets Archive image Categorize image Record results Google Maps Google AI Maps (Static) Geolocate image & map LLM image analysis

Application initialization and API setup from __future__ import print_function import argparse, base64, io, os, sys, time, webbrowser from googleapiclient import discovery, http from httplib2 import Http from oauth2client import file, client, tools from PIL import Image from google import genai from settings import API_KEY k_ize = lambda b: '%6.2fK' % (b/1000.) # bytes to kBs FILE = 'YOUR_IMG_ON_DRIVE' BUCKET = 'YOUR_BUCKET_NAME' SHEET = 'YOUR_SHEET_ID' TOP = 5 # get top Vision API labels DEBUG = False # create Gemini API client object plus constants PROMPT = 'Describe this image in 2-3 sentences' MODEL = 'gemini-2.5-flash' GENAI = genai.Client(api_key=API_KEY) # process credentials for OAuth2 tokens SCOPES = ( 'https://www.googleapis.com/auth/drive.readonly', 'https://www.googleapis.com/auth/devstorage', 'https://www.googleapis.com/auth/cloud-vision', 'https://www.googleapis.com/auth/spreadsheets', ) store = file.Storage('storage.json') creds = store.get() if not creds or creds.invalid: flow = client.flow_from_clientsecrets( 'client_secret.json', SCOPES) creds = tools.run_flow(flow, store) # create Google API client objects HTTP = creds.authorize(Http()) DRIVE = discovery.build('drive', 'v3', http=HTTP) GCS = discovery.build('storage', 'v1', http=HTTP) VISION = discovery.build('vision', 'v1', http=HTTP) SHEETS = discovery.build('sheets', 'v4', http=HTTP) def drive_get_file(fname): rsp = DRIVE.files().list(q="name='%s'" % fname).execute().get['files'][0] fileId, fname, mtype = rsp['id'], rsp['name'], rsp['mimeType'] blob = DRIVE.files().get_media(fileId).execute() return fname, mtype, rsp['modifiedTime'], blob, drive_geoloc_maps(fileId) Individual API components def drive_geoloc_maps(file_id): imd = DRIVE.files().get(fileId=file_id).execute().get('imageMediaMetadata') return 'https://maps.googleapis.com/maps/api/staticmap?size=480x480&markers=%s,%s&key=%s' % ( imd['location']['latitude'], imd['location']['longitude'], API_KEY) def genai_analyze_img(media): image = Image.open(io.BytesIO(media)) return GENAI.models.generate_content( model=MODEL, contents=(PROMPT, image)).text.strip() def sheet_append_row(sheet, row): rsp = SHEETS.spreadsheets().values().append( spreadsheetId=sheet, range='Sheet1', body={'values': rows}).execute() return rsp.get('updates').get('updatedCells') def vision_label_img(img, top): body = {'requests': [{'image': {'content': img}, 'features': [{'type': 'LABEL_DETECTION', 'maxResults': top}]}]} rsp = VISION.images().annotate( body=body).execute().get('responses', [{}])[0] return ', '.join('%s (%.2f%%)' % (label['description'], label['score']*100.) for label in rsp['labelAnnotations']) def gcs_blob_upload(fname, bucket, blob, mimetype): body = {'name': fname, 'uploadType': 'multipart', 'contentType': mimetype} return GCS.objects().insert(bucket, body, blob).execute()

Acquire file info from Drive; Maps geolocation def drive_get_file(fname): rsp = DRIVE.files().list( q="name='%s'" % fname).execute().get['files'][0] fileId, fname, mtype = rsp['id'], rsp['name'], rsp['mimeType'] blob = DRIVE.files().get_media(fileId).execute() return mtype, rsp['modifiedTime'], blob, drive_geoloc_maps(fileId) Upload file media/binary/blob to Google Cloud Storage def gcs_blob_upload(fname, bucket, blob, mimetype): body = { 'name': fname, 'uploadType': 'multipart', 'contentType': mimetype } return GCS.objects().insert( bucket, body, blob).execute()

Cloud Vision label annotation (image contents) def vision_label_img(img, top): body = [{'image': {'content': img}, 'features': [{ 'type': 'LABEL_DETECTION', 'maxResults': top, }] }] rsp = VISION.images().annotate( body={'requests': body}).execute()['responses'][0] return ', '.join('%s (%.2f%%)' % ( label['description'], label['score']*100.) for label in rsp['labelAnnotations']) Analyze & describe image contents with LLM from PIL import Image from google import genai from settings import API_KEY # gen AI setup PROMPT = 'Describe this image in 2-3 sentences' MODEL = 'gemini-2.5-flash' GENAI = genai.Client(api_key=API_KEY) def genai_analyze_img(media): 'analyze image with genAI LLM and return analysis' image = Image.open(io.BytesIO(media)) return GENAI.models.generate_content( model=MODEL, contents=(PROMPT, image)).text.strip()

Create Google Maps map if image geolocated MAPS_URL = 'https://maps.googleapis.com/maps/api/staticmap?size=480x480&markers=' def drive_geoloc_maps(file_id): imd = DRIVE.files().get(fileId=file_id, fields='imageMediaMetadata').execute().get('imageMediaMetadata') if not imd or 'location' not in imd: return '' return '%s%s,%s&key=%s' % (MAPS_URL, imd['location']['latitude'], imd['location']['longitude'], API_KEY) Store all results in one row to Google Sheets def sheet_append_row(sheet, row): rsp = SHEETS.spreadsheets().values().append( spreadsheetId=sheet, range='Sheet1', body={'values': row}).execute() return rsp.get('updates').get('updatedCells')

Complete image processing workflow def main(fname, bucket, sheet_id, top): fname, mtype, ftime, data, maps = drive_get_img(fname) gcs_blob_upload(fname, bucket, data, mtype) viz = vision_label_img(data, top) gem = genai_analyze_img(data) sheet_append_row(sheet_id, [fname, mtype, ftime, len(data), viz, gem, maps]) ● Project goal: Imagining an actual enterprise use case and solve it! ● Specific goals: free-up highly-utilized resource, archive data to colder/cheaper storage, analyze images, generate report for mgmt ● Download image binary from Google Drive ● If image geolocated use Google Maps to create static map ● Upload object to Cloud Storage bucket ● Process image label/object detection by Cloud Vision ● GenAI request for Gemini to summarize image contents ● Write back-up location & analysis results into Google Sheets ● Blog post: goo.gle/3nPxmlc (original post); Cloud X-post ● Codelab: free, online, self-paced, hands-on tutorial ● g.co/codelabs/drive-gcs-vision-sheets ● Application source code (Gemini & Maps version in /alt) ● github.com/wescpy/analyze_gsimg App summary

04 Wrap-up Summary & resources Session Summary ● File backup: "boring" ... while AI/ML: "exciting" ● Merge together for possible business scenario solvable with APIs ● Google provides more than just apps ○ More than search, YouTube, Android, Chrome, and Gmail/Docs ○ "Much" Google technology available to developers via APIs ● Google APIs do vary ○ Alas, developer experience differs between product families ○ Some products have higher-level product client libraries ○ Others require use of lower-level client libraries (or none at all!) ■ Lower-level may be useful as lowest common denominator ● Interesting possibilities using multiple Google product APIs

Other Google APIs & platforms ● Firebase (mobile development platform + RT DB; ML Kit) ○ firebase.google.com & firebase.google.com/docs/ml-kit ● Google Looker/Data Studio (data visualization, dashboards, etc.) ○ datastudio.google.com/overview ○ goo.gle/datastudio-course ● Actions on Google/Assistant/DialogFlow (voice apps) ○ developers.google.com/actions ● YouTube (Data, Analytics, and Livestreaming APIs) ○ developers.google.com/youtube ● Google Maps (Maps, Routes, and Places APIs) ○ developers.google.com/maps ● Flutter (native apps [Android, iOS, web] w/1 code base[!]) ○ flutter.dev ● Documentation ○ GCP: cloud.google.com/{docs,vision,automl,storage,language,speech,translate,firestore,sql, video-intelligence,bigquery,filestore,identity-platform,vertex-ai,kubernetes,compute,gpu,tpu} ○ GWS & other non-GCP: developers.google.com/{gsuite,gmail,drive,calendar,docs,sheets, slides,forms,classroom,chat,apps-script,maps,youtube,analytics,cast,actions,people,ar,books} ● Introductory "codelabs" ([free] self-paced, hands-on tutorials) ○ GWS APIs: g.co/codelabs/gsuite-apis-intro (featuring Drive API) ○ Cloud Vision API: g.co/codelabs/vision-python (or C#) ○ All other codelabs: g.co/codelabs (all Google APIs, all levels) ● Videos ○ GWS: goo.gl/JpBQ40, Drive: developers.google.com/drive/web/videos, Sheets: developers.google.com/sheets/api/videos, GCP: youtube.com/GoogleCloudPlatform ● Code: github.com/GoogleCloudPlatform & github.com/googleworkspace ● GCP Free Trial (new users) and Always Free tier: cloud.google.com/free ● Compare GCP to AWS and Azure: cloud.google.com/docs/compare/aws Online resources

Bring me to your organization ... it is my job to help you! ● Engineering consulting ● Software development ● Technical seminars/tech talks ● Hands-on workshops ● Technical training courses ● Migration strategy & planning ● cyberwebconsulting.com ● appenginemigrations.com Images: author Slides you're looking at them now Work: cyberwebconsulting.com Books corepython.com Blog dev.to/wescpy App blog post goo.gle/3nPxmlc GCP+GWS 101 (2019) youtu.be/ri8Bfptgo9Q ...& 102 talks (2023) youtu.be/3IQ4Yv80lJg Progress bars goo.gl/69EJVw Thank you! Questions? Wesley Chun AI TPM, Red Hat & Principal, CyberWeb AI OSS, Python, GCP & GWS specialist @wescpy (Tw/X, BS, SO, GH, IG, LI)

Build an AI/ML-driven image archive processing workflow: Image archive, analysis & report generation

More Related Content

Similar to Build an AI/ML-driven image archive processing workflow: Image archive, analysis & report generation

More from wesley chun

Recently uploaded

Build an AI/ML-driven image archive processing workflow: Image archive, analysis & report generation