19 Commits

Author SHA1 Message Date
Timothy Jaeryang Baek
f376d4f378 chore: format 2026-02-11 16:24:11 -06:00
Thomas Rehn
654172d757 fix: redis clustermode instrumentation 2026-02-03 15:25:37 +01:00
Classic298
7839d043ff fix: use efficient COUNT queries in telemetry metrics to prevent connection pool exhaustion (#20542)
fix: use efficient COUNT queries in telemetry metrics to prevent connection pool exhaustion

This fixes database connection pool exhaustion issues reported after v0.7.0,
particularly affecting PostgreSQL deployments on high-latency networks (e.g., AWS Aurora).

## The Problem

The telemetry metrics callbacks (running every 10 seconds via OpenTelemetry's
PeriodicExportingMetricReader) were using inefficient queries that loaded entire
database tables into memory just to count records:

    len(Users.get_users()["users"])  # Loads ALL user records to count them

On high-latency network-attached databases like AWS Aurora, this would:
1. Hold database connections for hundreds of milliseconds while transferring data
2. Deserialize all records into Python objects
3. Only then count the list length

Under concurrent load, these long-held connections would stack up and drain the
connection pool, resulting in:

    sqlalchemy.exc.TimeoutError: QueuePool limit of size 5 overflow 10 reached,
    connection timed out, timeout 30.00

## The Fix

Replace inefficient full-table loads with efficient COUNT(*) queries using
methods that already exist in the codebase:

- `len(Users.get_users()["users"])` → `Users.get_num_users()`
- Similar changes for other telemetry callbacks as needed

COUNT(*) queries use database indexes and return a single integer, completing in
~5-10ms even on Aurora, versus potentially 500ms+ for loading all records.

## Why v0.7.1's Session Sharing Disable "Helped"

The v0.7.1 change to disable DATABASE_ENABLE_SESSION_SHARING by default appeared
to fix the issue, but it was masking the root cause. Disabling session sharing
causes connections to be returned to the pool faster (more connection churn),
which reduced the window for pool exhaustion but didn't address the underlying
inefficient queries.

With this fix, session sharing can be safely re-enabled for deployments that
benefit from it (especially PostgreSQL), as telemetry will no longer hold
connections for extended periods.

## Impact

- Telemetry connection usage drops from potentially seconds to ~30ms total per
  collection cycle
- Connection pool pressure from telemetry becomes negligible (~0.3% utilization)
- Enterprise PostgreSQL deployments (Aurora, RDS, etc.) should no longer
  experience pool exhaustion under normal load
2026-01-10 15:33:42 +04:00
Classic298
823b9a6dd9 chore/perf: Remove old SRC level log env vars with no impact (#20045)
* Update openai.py

* Update env.py

* Merge pull request open-webui#19030 from open-webui/dev (#119)

Co-authored-by: Tim Baek <tim@openwebui.com>
Co-authored-by: Claude <noreply@anthropic.com>

---------

Co-authored-by: Tim Baek <tim@openwebui.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-12-20 08:16:14 -05:00
Timothy Jaeryang Baek
70948f8803 enh/refac: deprecate USER_POOL 2025-11-28 07:39:02 -05:00
FlorentMair80
58cff5e482 feat: add a metric to monitor daily unique users (#19236)
#19234
2025-11-17 15:31:24 -05:00
Timothy Jaeryang Baek
b14617a653 refac: otel metrics handle 500 2025-09-16 12:11:32 -05:00
Timothy Jaeryang Baek
919d65f36f feat/enh: ENABLE_OTEL_TRACES granular otel support 2025-08-20 23:03:12 +04:00
expruc
58180c0586 added otel lgos specific config 2025-08-02 22:15:22 +03:00
Tim Jaeryang Baek
49926f06ee Merge branch 'dev' into feat/otel-logger-handler 2025-08-02 14:52:16 +04:00
expruc
a679fb3f45 split otel metrics from general otel configuration 2025-08-02 11:30:34 +03:00
expruc
2035eabb1f added otel logging handler 2025-07-31 21:58:49 +03:00
Timothy Jaeryang Baek
aa83ebae58 refac: lazySpanExporter no longer needed 2025-07-31 17:30:37 +04:00
Timothy Jaeryang Baek
d8b80caff3 refac/fix: remove insecure arg for otel http exporter 2025-07-21 16:35:23 +04:00
expruc
cbbc4cfd26 added active and total user metrics 2025-07-16 22:06:36 +03:00
Timothy Jaeryang Baek
8b35ea6eea enh: OTEL_OTLP_SPAN_EXPORTER 2025-06-30 15:52:32 +04:00
Jesper Kristensen
4119ab261e Added support for basic auth wiht OTEL exporter 2025-06-18 11:42:33 +02:00
Jason Kidd
210dc746f0 feat: Add OpenTelemetry Metrics Support via OTLP Exporter 2025-06-10 10:52:10 -07:00
Timothy Jaeryang Baek
396c28817c refac 2025-03-11 18:55:30 +00:00