Skip to content

Observability

Last Updated: 2025-11-25

This document describes the observability features in Ocmonica, including Prometheus metrics, OpenTelemetry tracing, and structured logging.


Overview

Ocmonica provides comprehensive observability through:

  • Metrics: Prometheus metrics exposed at /metrics
  • Tracing: OpenTelemetry distributed tracing with Jaeger integration
  • Logging: Structured JSON logging with trace correlation

Accessing Metrics

Metrics are exposed at the /metrics endpoint in Prometheus text format:

curl http://localhost:8080/metrics

Available Metrics

HTTP Request Metrics

Automatically collected by echoprometheus middleware for all HTTP endpoints.

ocmonica_request_duration_seconds

  • Type: Histogram
  • Description: Duration of HTTP requests in seconds
  • Labels:
  • method: HTTP method (GET, POST, PUT, DELETE, etc.)
  • url: Request URL path
  • code: HTTP status code

ocmonica_requests_total

  • Type: Counter
  • Description: Total number of HTTP requests
  • Labels:
  • method: HTTP method
  • url: Request URL path
  • code: HTTP status code

ocmonica_request_size_bytes

  • Type: Histogram
  • Description: Size of HTTP request payloads in bytes
  • Labels: method, url, code

ocmonica_response_size_bytes

  • Type: Histogram
  • Description: Size of HTTP response payloads in bytes
  • Labels: method, url, code

File Operation Metrics

Track file upload, download, and management operations.

ocmonica_file_uploads_total

  • Type: Counter
  • Description: Total number of file uploads
  • Labels:
  • status: Operation status (success or failure)
  • mime_type: MIME type of uploaded file (e.g., text/plain, image/jpeg)

Example PromQL queries:

# Upload success rate
rate(ocmonica_file_uploads_total{status="success"}[5m]) /
rate(ocmonica_file_uploads_total[5m])

# Uploads by MIME type
sum by (mime_type) (rate(ocmonica_file_uploads_total{status="success"}[5m]))

ocmonica_file_downloads_total

  • Type: Counter
  • Description: Total number of file downloads
  • Labels: status (success or failure)

ocmonica_file_deletions_total

  • Type: Counter
  • Description: Total number of file deletions
  • Labels: status (success or failure)

ocmonica_file_upload_size_bytes

  • Type: Histogram
  • Description: Size distribution of uploaded files in bytes
  • Buckets: 1KB, 10KB, 100KB, 1MB, 10MB, 100MB

Example PromQL queries:

# Average upload size
rate(ocmonica_file_upload_size_bytes_sum[5m]) /
rate(ocmonica_file_upload_size_bytes_count[5m])

# 95th percentile upload size
histogram_quantile(0.95, rate(ocmonica_file_upload_size_bytes_bucket[5m]))

ocmonica_file_download_size_bytes

  • Type: Histogram
  • Description: Size distribution of downloaded files in bytes
  • Buckets: 1KB, 10KB, 100KB, 1MB, 10MB, 100MB

ocmonica_file_operation_duration_seconds

  • Type: Histogram
  • Description: Duration of file operations in seconds
  • Labels:
  • operation: Type of operation (upload, download, delete, move, copy)

Example PromQL queries:

# Average upload duration
rate(ocmonica_file_operation_duration_seconds_sum{operation="upload"}[5m]) /
rate(ocmonica_file_operation_duration_seconds_count{operation="upload"}[5m])

# 99th percentile download duration
histogram_quantile(0.99, rate(ocmonica_file_operation_duration_seconds_bucket{operation="download"}[5m]))


Authentication Metrics

Track authentication attempts and token operations.

ocmonica_auth_attempts_total

  • Type: Counter
  • Description: Total number of authentication attempts
  • Labels:
  • method: Authentication method (password for username/password login)
  • status: Attempt status (success or failure)

Example PromQL queries:

# Login success rate
rate(ocmonica_auth_attempts_total{method="password",status="success"}[5m]) /
rate(ocmonica_auth_attempts_total{method="password"}[5m])

# Failed login attempts (potential security issue)
rate(ocmonica_auth_attempts_total{status="failure"}[5m])

ocmonica_api_key_usage_total

  • Type: Counter
  • Description: Total number of API key authentication attempts
  • Labels:
  • status: API key status (success, expired, revoked, invalid)

Example PromQL queries:

# API key usage by status
sum by (status) (rate(ocmonica_api_key_usage_total[5m]))

# Rate of expired API key attempts (users may need new keys)
rate(ocmonica_api_key_usage_total{status="expired"}[5m])

ocmonica_token_refreshes_total

  • Type: Counter
  • Description: Total number of JWT token refresh operations

Database Metrics

Planned for future implementation

The following database metrics are defined but not yet instrumented:

  • ocmonica_db_query_duration_seconds: Query execution time
  • ocmonica_db_connections_active: Active database connections
  • ocmonica_db_errors_total: Database errors by type

Search Metrics

Planned for future implementation

  • ocmonica_search_queries_total: Total search queries by type
  • ocmonica_search_duration_seconds: Search query duration
  • ocmonica_search_results_count: Number of results returned

Prometheus Configuration

Add this job to your prometheus.yml to scrape Ocmonica metrics:

scrape_configs:
  - job_name: 'ocmonica'
    static_configs:
      - targets: ['localhost:8080']
    metrics_path: '/metrics'
    scrape_interval: 15s

OpenTelemetry Tracing

Ocmonica uses OpenTelemetry for distributed tracing with OTLP HTTP exporter.

Configuration

telemetry:
  enabled: true
  service_name: "ocmonica"
  otlp_endpoint: "http://localhost:4318"  # Jaeger OTLP endpoint

Automatic Instrumentation

  • HTTP requests are automatically traced via Echo middleware
  • gRPC requests are automatically traced via ConnectRPC interceptors
  • Context propagation throughout all layers

Manual Instrumentation

import (
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/attribute"
)

func (s *FileService) Upload(ctx context.Context, ...) error {
    tracer := otel.Tracer("file-service")
    ctx, span := tracer.Start(ctx, "FileService.Upload")
    defer span.End()

    span.SetAttributes(
        attribute.String("file.name", filename),
        attribute.String("user.id", userID),
    )

    // ... business logic
}

Viewing Traces

With the development stack running (task docker:dev):

  • Jaeger UI: http://localhost:16686
  • Search by service name: ocmonica
  • Filter by operation, tags, or duration

Example Dashboards

File Operations Dashboard

Key metrics to monitor:

  • Upload/download throughput (operations/second)
  • Upload/download bandwidth (bytes/second)
  • Operation latency (p50, p95, p99)
  • Error rates by operation type
  • File size distribution
  • Popular MIME types

Authentication Dashboard

Key metrics to monitor:

  • Login attempts (success vs. failure)
  • API key usage by status
  • Token refresh rate
  • Failed authentication rate (security monitoring)

Performance Dashboard

Key metrics to monitor:

  • HTTP request latency by endpoint
  • Request throughput
  • Error rate (4xx, 5xx responses)
  • Request/response payload sizes

Alerting Examples

High Error Rate

- alert: HighFileOperationErrorRate
  expr: |
    rate(ocmonica_file_uploads_total{status="failure"}[5m]) /
    rate(ocmonica_file_uploads_total[5m]) > 0.05
  for: 5m
  annotations:
    summary: "High file upload error rate"
    description: "File upload error rate is {{ $value | humanizePercentage }}"

Authentication Failures

- alert: HighAuthFailureRate
  expr: rate(ocmonica_auth_attempts_total{status="failure"}[5m]) > 5
  for: 2m
  annotations:
    summary: "High authentication failure rate"
    description: "{{ $value }} failed auth attempts/sec (possible attack)"

Slow Operations

- alert: SlowFileUploads
  expr: |
    histogram_quantile(0.95,
      rate(ocmonica_file_operation_duration_seconds_bucket{operation="upload"}[5m])
    ) > 10
  for: 5m
  annotations:
    summary: "Slow file uploads detected"
    description: "95th percentile upload time is {{ $value }}s"

Structured Logging

Ocmonica uses slog for structured JSON logging with trace correlation.

Log Levels

  • debug: Detailed debugging information
  • info: General operational events
  • warn: Warning conditions
  • error: Error conditions

Configuration

logging:
  level: "info"
  format: "json"  # or "text"

Trace Correlation

Logs automatically include trace IDs when tracing is enabled:

{
  "time": "2025-11-25T12:00:00Z",
  "level": "INFO",
  "msg": "file uploaded",
  "trace_id": "abc123...",
  "span_id": "def456...",
  "file_name": "document.pdf",
  "user_id": "user-123"
}

Best Practices

  1. Monitoring: Set up dashboards for key business metrics (uploads, downloads, auth)
  2. Alerting: Configure alerts for error rates and performance degradation
  3. Capacity Planning: Track growth trends in file operations and storage usage
  4. Security: Monitor failed authentication attempts for potential security issues
  5. Performance: Track operation latencies to identify performance bottlenecks

Development Stack

Run the full observability stack locally:

task docker:dev

This starts:

  • Ocmonica Backend: http://localhost:8080
  • Prometheus: http://localhost:9091
  • Jaeger: http://localhost:16686
  • Grafana: http://localhost:3001

Next Steps

For additional observability features:

  • Profiling: Go pprof endpoints can be added for debugging
  • Custom Dashboards: Import Grafana dashboards from grafana/dashboards/
  • Alertmanager: Configure notification channels for alerts