Observability¶
Last Updated: 2025-11-25
This document describes the observability features in Ocmonica, including Prometheus metrics, OpenTelemetry tracing, and structured logging.
Overview¶
Ocmonica provides comprehensive observability through:
- Metrics: Prometheus metrics exposed at
/metrics - Tracing: OpenTelemetry distributed tracing with Jaeger integration
- Logging: Structured JSON logging with trace correlation
Accessing Metrics¶
Metrics are exposed at the /metrics endpoint in Prometheus text format:
Available Metrics¶
HTTP Request Metrics¶
Automatically collected by echoprometheus middleware for all HTTP endpoints.
ocmonica_request_duration_seconds¶
- Type: Histogram
- Description: Duration of HTTP requests in seconds
- Labels:
method: HTTP method (GET, POST, PUT, DELETE, etc.)url: Request URL pathcode: HTTP status code
ocmonica_requests_total¶
- Type: Counter
- Description: Total number of HTTP requests
- Labels:
method: HTTP methodurl: Request URL pathcode: HTTP status code
ocmonica_request_size_bytes¶
- Type: Histogram
- Description: Size of HTTP request payloads in bytes
- Labels:
method,url,code
ocmonica_response_size_bytes¶
- Type: Histogram
- Description: Size of HTTP response payloads in bytes
- Labels:
method,url,code
File Operation Metrics¶
Track file upload, download, and management operations.
ocmonica_file_uploads_total¶
- Type: Counter
- Description: Total number of file uploads
- Labels:
status: Operation status (successorfailure)mime_type: MIME type of uploaded file (e.g.,text/plain,image/jpeg)
Example PromQL queries:
# Upload success rate
rate(ocmonica_file_uploads_total{status="success"}[5m]) /
rate(ocmonica_file_uploads_total[5m])
# Uploads by MIME type
sum by (mime_type) (rate(ocmonica_file_uploads_total{status="success"}[5m]))
ocmonica_file_downloads_total¶
- Type: Counter
- Description: Total number of file downloads
- Labels:
status(successorfailure)
ocmonica_file_deletions_total¶
- Type: Counter
- Description: Total number of file deletions
- Labels:
status(successorfailure)
ocmonica_file_upload_size_bytes¶
- Type: Histogram
- Description: Size distribution of uploaded files in bytes
- Buckets: 1KB, 10KB, 100KB, 1MB, 10MB, 100MB
Example PromQL queries:
# Average upload size
rate(ocmonica_file_upload_size_bytes_sum[5m]) /
rate(ocmonica_file_upload_size_bytes_count[5m])
# 95th percentile upload size
histogram_quantile(0.95, rate(ocmonica_file_upload_size_bytes_bucket[5m]))
ocmonica_file_download_size_bytes¶
- Type: Histogram
- Description: Size distribution of downloaded files in bytes
- Buckets: 1KB, 10KB, 100KB, 1MB, 10MB, 100MB
ocmonica_file_operation_duration_seconds¶
- Type: Histogram
- Description: Duration of file operations in seconds
- Labels:
operation: Type of operation (upload,download,delete,move,copy)
Example PromQL queries:
# Average upload duration
rate(ocmonica_file_operation_duration_seconds_sum{operation="upload"}[5m]) /
rate(ocmonica_file_operation_duration_seconds_count{operation="upload"}[5m])
# 99th percentile download duration
histogram_quantile(0.99, rate(ocmonica_file_operation_duration_seconds_bucket{operation="download"}[5m]))
Authentication Metrics¶
Track authentication attempts and token operations.
ocmonica_auth_attempts_total¶
- Type: Counter
- Description: Total number of authentication attempts
- Labels:
method: Authentication method (passwordfor username/password login)status: Attempt status (successorfailure)
Example PromQL queries:
# Login success rate
rate(ocmonica_auth_attempts_total{method="password",status="success"}[5m]) /
rate(ocmonica_auth_attempts_total{method="password"}[5m])
# Failed login attempts (potential security issue)
rate(ocmonica_auth_attempts_total{status="failure"}[5m])
ocmonica_api_key_usage_total¶
- Type: Counter
- Description: Total number of API key authentication attempts
- Labels:
status: API key status (success,expired,revoked,invalid)
Example PromQL queries:
# API key usage by status
sum by (status) (rate(ocmonica_api_key_usage_total[5m]))
# Rate of expired API key attempts (users may need new keys)
rate(ocmonica_api_key_usage_total{status="expired"}[5m])
ocmonica_token_refreshes_total¶
- Type: Counter
- Description: Total number of JWT token refresh operations
Database Metrics¶
Planned for future implementation
The following database metrics are defined but not yet instrumented:
ocmonica_db_query_duration_seconds: Query execution timeocmonica_db_connections_active: Active database connectionsocmonica_db_errors_total: Database errors by type
Search Metrics¶
Planned for future implementation
ocmonica_search_queries_total: Total search queries by typeocmonica_search_duration_seconds: Search query durationocmonica_search_results_count: Number of results returned
Prometheus Configuration¶
Add this job to your prometheus.yml to scrape Ocmonica metrics:
scrape_configs:
- job_name: 'ocmonica'
static_configs:
- targets: ['localhost:8080']
metrics_path: '/metrics'
scrape_interval: 15s
OpenTelemetry Tracing¶
Ocmonica uses OpenTelemetry for distributed tracing with OTLP HTTP exporter.
Configuration¶
telemetry:
enabled: true
service_name: "ocmonica"
otlp_endpoint: "http://localhost:4318" # Jaeger OTLP endpoint
Automatic Instrumentation¶
- HTTP requests are automatically traced via Echo middleware
- gRPC requests are automatically traced via ConnectRPC interceptors
- Context propagation throughout all layers
Manual Instrumentation¶
import (
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/attribute"
)
func (s *FileService) Upload(ctx context.Context, ...) error {
tracer := otel.Tracer("file-service")
ctx, span := tracer.Start(ctx, "FileService.Upload")
defer span.End()
span.SetAttributes(
attribute.String("file.name", filename),
attribute.String("user.id", userID),
)
// ... business logic
}
Viewing Traces¶
With the development stack running (task docker:dev):
- Jaeger UI: http://localhost:16686
- Search by service name:
ocmonica - Filter by operation, tags, or duration
Example Dashboards¶
File Operations Dashboard¶
Key metrics to monitor:
- Upload/download throughput (operations/second)
- Upload/download bandwidth (bytes/second)
- Operation latency (p50, p95, p99)
- Error rates by operation type
- File size distribution
- Popular MIME types
Authentication Dashboard¶
Key metrics to monitor:
- Login attempts (success vs. failure)
- API key usage by status
- Token refresh rate
- Failed authentication rate (security monitoring)
Performance Dashboard¶
Key metrics to monitor:
- HTTP request latency by endpoint
- Request throughput
- Error rate (4xx, 5xx responses)
- Request/response payload sizes
Alerting Examples¶
High Error Rate¶
- alert: HighFileOperationErrorRate
expr: |
rate(ocmonica_file_uploads_total{status="failure"}[5m]) /
rate(ocmonica_file_uploads_total[5m]) > 0.05
for: 5m
annotations:
summary: "High file upload error rate"
description: "File upload error rate is {{ $value | humanizePercentage }}"
Authentication Failures¶
- alert: HighAuthFailureRate
expr: rate(ocmonica_auth_attempts_total{status="failure"}[5m]) > 5
for: 2m
annotations:
summary: "High authentication failure rate"
description: "{{ $value }} failed auth attempts/sec (possible attack)"
Slow Operations¶
- alert: SlowFileUploads
expr: |
histogram_quantile(0.95,
rate(ocmonica_file_operation_duration_seconds_bucket{operation="upload"}[5m])
) > 10
for: 5m
annotations:
summary: "Slow file uploads detected"
description: "95th percentile upload time is {{ $value }}s"
Structured Logging¶
Ocmonica uses slog for structured JSON logging with trace correlation.
Log Levels¶
debug: Detailed debugging informationinfo: General operational eventswarn: Warning conditionserror: Error conditions
Configuration¶
Trace Correlation¶
Logs automatically include trace IDs when tracing is enabled:
{
"time": "2025-11-25T12:00:00Z",
"level": "INFO",
"msg": "file uploaded",
"trace_id": "abc123...",
"span_id": "def456...",
"file_name": "document.pdf",
"user_id": "user-123"
}
Best Practices¶
- Monitoring: Set up dashboards for key business metrics (uploads, downloads, auth)
- Alerting: Configure alerts for error rates and performance degradation
- Capacity Planning: Track growth trends in file operations and storage usage
- Security: Monitor failed authentication attempts for potential security issues
- Performance: Track operation latencies to identify performance bottlenecks
Development Stack¶
Run the full observability stack locally:
This starts:
- Ocmonica Backend: http://localhost:8080
- Prometheus: http://localhost:9091
- Jaeger: http://localhost:16686
- Grafana: http://localhost:3001
Next Steps¶
For additional observability features:
- Profiling: Go pprof endpoints can be added for debugging
- Custom Dashboards: Import Grafana dashboards from
grafana/dashboards/ - Alertmanager: Configure notification channels for alerts