Skip to content

Conversation

@panavenue
Copy link
Collaborator

@panavenue panavenue commented Nov 4, 2025

Description - This pull request introduces a comprehensive system for collecting and exporting client-side metrics from the Cloud SQL Go Connector, providing valuable insights into the connector's performance and behavior. The key changes include the integration of OpenTelemetry for metrics collection and the introduction of a dedicated telemetry package.

  1. Setup Metric meter and records in tel with new metrics -
  • connect_latencies: A histogram that records the duration of dial operations, providing insight into connection latency.
  • open_connections: An up-down counter that tracks the number of currently open connections to a Cloud SQL instance.
  • closed_connection_count: A counter that records the total number of closed connections, which helps in understanding connection churn.
  1. Dialer Integration
  • A MetricRecorder is now lazily initialized for each Cloud SQL instance when a connection is first established
  • The Dial method has been enhanced to record metrics at various stages of a connection's lifecycle. For example, it records the connection latency upon a successful dial and increments the open connections counter
  • When a connection is closed, the open connections counter is decremented, and the closed connection count is incremented
@panavenue panavenue requested a review from a team as a code owner November 4, 2025 23:18
@panavenue panavenue force-pushed the system_metrics_setup branch from 8747506 to 1f835d8 Compare November 4, 2025 23:46
@panavenue panavenue force-pushed the system_metrics_setup branch from 1f835d8 to ad4a6e6 Compare November 4, 2025 23:55
@panavenue panavenue requested a review from enocom November 14, 2025 19:02
Copy link
Member

@enocom enocom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad to see the AlloyDB implementation was helpful. Left a couple of comments.

func (d *Dialer) connectionInfoCache(
ctx context.Context, cn instance.ConnName, useIAMAuthN *bool,
) (*monitoredCache, error) {
) (*monitoredCache, bool, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose for the new "ok" result parameter? It looks like it is not used.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see now it is "cache hit" Add this to the method comment.


// DefaultExportInterval is the interval that the metric exporter runs. It
// should always be 60s. This value is exposed as a var to faciliate testing.
var DefaultExportInterval = 60 * time.Second
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be a const, right?


// WithApplicationName returns an Option that sets the Application Name.
// This is used to identify the application in metrics.
func WithApplicationName(name string) Option {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WithApplicationID() now, I think.

type Config struct {
// Enabled specifies whether the metrics should be enabled.
Enabled bool
// Version is the version of the alloydbconn.Dialer.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cloudsqlconn.Dialer

ConnectorVersion: versionString,
DatabaseEngineType: "DB-Engine-Type-Testing", // TODO: detect database engine type
}
mr := tel.NewMetricRecorder(ctx, d.logger, d.mClient, cfg)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if mClient is not set?


// Resolve the instance connection name to a ConnName struct.
// Note: icn may be a domain name that resolves to an instance connection name.
cn, err := d.resolver.Resolve(ctx, icn)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should count DNS lookup successes and errors. Does this make more sense as a label on our connect count metric, or is DNS resolution + status its own separate counter.

func (d *Dialer) connectionInfoCache(
ctx context.Context, cn instance.ConnName, useIAMAuthN *bool,
) (*monitoredCache, error) {
) (*monitoredCache, bool, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see now it is "cache hit" Add this to the method comment.

authCredentials *auth.Credentials
iamLoginTokenProvider auth.TokenProvider
useragents []string
applicationName string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

applicationID with a TODO to set this by default using GCE VM Name, GKE pod name, Serverless app instance name (whatever that is).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants