Self-Hosted Bitrix24 High Availability: Master-Replica Cluster Setup

Q: How do I handle custom code (modifications) during cluster updates?

All customisations should live in /local/ rather than inside the core /bitrix/ module directories. This is the officially supported pattern and ensures core updates do not overwrite your changes. Migrate any legacy code from /bitrix/phpinterface/ to /local/ before running the first update on the cluster.

Published: 20 Feb 2026 · Updated: 6 Jun 2026 · 12 min read · By: ACP Group Bitrix24 team

A self-hosted Bitrix24 instance running on a single server is a single point of failure; a master-replica cluster with a load balancer eliminates that risk and keeps the portal available even when one node goes down.

Why HA Matters for On-Premise Bitrix24

On-premise Bitrix24 deployments shift all redundancy responsibility to your team, and for a 100–300 user portal, the cost of unplanned downtime on a single server almost always exceeds the investment in a clustered HA architecture that eliminates hardware, OS, and database single points of failure.

Cloud editions of Bitrix24 handle redundancy automatically. The moment you move to the self-hosted (on-premise) box edition — typically to satisfy data-sovereignty requirements, internal security policies, or compliance obligations — that responsibility shifts entirely to your team. A single-server setup with 47 GB RAM and 16 CPU cores can handle a medium-sized portal comfortably, but it provides zero protection against hardware failure, OS-level crashes, or database corruption.

In practice, the cost of unplanned downtime for a 100–300 user portal almost always exceeds the cost of the additional infrastructure required to prevent it. Clustered deployments also make maintenance windows shorter: you can roll updates through nodes one at a time without taking the portal offline. For context on what on-premise deployment involves end-to-end, see Self-Hosted CRM: Why Choose Bitrix24 On-Premise for Data Sovereignty.

Core Components of a Bitrix24 HA Architecture

A production-grade Bitrix24 HA setup requires five fully redundant layers: a load balancer, two or more stateless web nodes, shared file storage, a MySQL master-replica database cluster, and a shared cache/session bus — leaving any single layer unredundant defeats the entire architecture.

A production-grade HA setup for self-hosted Bitrix24 typically consists of five layers:

Layer	Component	Role
Load balancer	nginx / HAProxy	Distributes HTTP/S traffic; detects dead nodes
Web nodes (×2 or more)	Apache + PHP-FPM	Stateless application tier
Shared file storage	NFS / GlusterFS / S3-compatible	Single source of truth for `/home/bitrix/www/upload` and other user content
Database cluster	MySQL/Percona — master + replica(s)	Persistent data with automatic or manual failover
Cache / session bus	Memcached or Redis	Shared session storage so users don't lose sessions on node switchover

All layers must be redundant — hardening the web tier while leaving a standalone database defeats the purpose.

Web Environment Requirements and Version Pinning

All cluster nodes must run the identical Bitrix VM template, with CentOS Stream 9, nginx 1.26+, Apache 2.4.62+, PHP 8.2+, and Percona Server 8.0+; mixed environments between nodes cause subtle, hard-to-diagnose bugs and outdated environments block critical updates.

The Bitrix web environment is versioned and opinionated. Based on project audit data, running on an outdated environment (e.g., web environment 7.5.1 when 9.x is current) introduces instability risks and blocks certain updates. The official Bitrix virtual machine image bundles:

OS: CentOS Stream 9 (CentOS 7 is end-of-life and carries unpatched CVEs)
Reverse proxy: nginx 1.26+
Application server: Apache 2.4.62+
PHP: 8.2+ (PHP-FPM mode; display_errors and display_startup_errors must be off in production)
Database: Percona Server 8.0+ (MySQL 5.7 is EOL)

When building cluster nodes, always deploy each node from the same Bitrix VM template version. Mixed environments — one node on old PHP, another on new — cause subtle, hard-to-diagnose bugs.

Key MySQL tuning points for clustered deployments: - Disable query_cache (removed in MySQL 8, but still present in some Percona 5.7 configs — it causes cache-invalidation races across nodes) - Disable local_infile (security hardening) - Increase innodb_log_file_size beyond the default 64 MB — a medium portal with a ~28 GB database benefits from values of 512 MB–2 GB to reduce checkpoint pressure - Tune join_buffer_size and sort_buffer_size according to workload profiling, not defaults

MySQL Master-Replica Replication Setup

Configure Bitrix24's .settings.php with separate default (master) and slave (read-only replica) connection profiles, and enforce GTID-based semi-synchronous replication with ROW binary log format and sync_binlog = 1 to guarantee durability and enable automatic failover via Orchestrator or ProxySQL.

Bitrix24's database connection is defined in /home/bitrix/www/bitrix/.settings.php (the connections block). In a clustered setup you configure two connection profiles:

'default' => [
    'className' => '\Bitrix\Main\DB\MysqliConnection',
    'host'      => 'db-master.internal',
    'database'  => 'sitemanager',
    'login'     => 'bitrix',
    'password'  => '<strong-password>',
    'options'   => 2,
],
'slave'   => [
    'className' => '\Bitrix\Main\DB\MysqliConnection',
    'host'      => 'db-replica.internal',
    'database'  => 'sitemanager',
    'login'     => 'bitrix',
    'password'  => '<strong-password>',
    'options'   => 2,
    'readonly'  => true,
],

Bitrix24 uses the slave connection for read queries automatically when it is present. Key replication checklist:

GTID-based replication — easier failover promotion than file/position-based
Semi-synchronous replication — at least one replica acknowledges each commit before the master returns success; prevents data loss on master crash
Binary log format: ROW — required for Bitrix compatibility
innodb_flush_log_at_trx_commit = 1 and sync_binlog = 1 on master — critical for durability
Replica lag monitoring — alert if Seconds_Behind_Master exceeds 30 seconds; Bitrix's session and cache logic assume near-real-time replication

For automatic failover, tools like Orchestrator or ProxySQL can promote a replica to master and update the connection string without manual intervention.

Shared Storage for Uploaded Files

The /home/bitrix/www/upload directory must be identical across all web nodes in real time; NFS v4 is the pragmatic default for deployments up to ~500 users, but file-based cache must be replaced with Memcached or Redis before adding a second web node.

The /home/bitrix/www/upload directory (and a few others, including cache if you use file-based caching) must be identical across all web nodes in real time. Options ranked by operational complexity:

Solution	Latency	Complexity	Suitable for
NFS v4 over a dedicated storage node	Low	Low	Most deployments up to ~500 users
GlusterFS replicated volume	Medium	Medium	Geo-distributed nodes
Ceph / RBD block device	Very low	High	Large-scale / enterprise
S3-compatible object storage + custom adapter	Low (with CDN)	Medium	Cloud-hybrid setups

NFS remains the pragmatic default for on-premise mid-size deployments. Mount it with noatime,nodiratime to reduce write amplification, and monitor mount availability with a watchdog — a stale NFS mount that hangs rather than errors is a classic cause of web-node freezes.

The cache directory deserves special attention: file-based cache does not scale across nodes. Switch to Memcached or Redis as the cache backend (/bitrix/.settings.php, cache section) before you add the second web node.

Load Balancer and Session Stickiness

The load balancer must use least_conn distribution with max_fails=3 fail_timeout=15s health checks, while sessions must be moved from the default filesystem handler to Redis with persistence enabled so users on any node retain their sessions across switchovers.

The load balancer sits in front of all web nodes and performs two jobs: traffic distribution and health checking.

nginx upstream example (simplified):

upstream bitrix_nodes {
    least_conn;
    server web1.internal:80 max_fails=3 fail_timeout=15s;
    server web2.internal:80 max_fails=3 fail_timeout=15s;
    keepalive 32;
}

Session handling: Bitrix24 stores sessions on the file system by default. In a cluster you must move sessions to shared storage: - Memcached — fastest, but sessions are lost on memcached restart - Redis (with persistence enabled) — recommended; survives restarts

Configure in /etc/php.d/session.ini (or the PHP-FPM pool config):

session.save_handler = redis
session.save_path    = "tcp://redis.internal:6379"

Also configure Push & Pull (the real-time notification server) on each web node to point to the same Redis pub/sub channel — otherwise users on different nodes miss chat messages and live feed updates.

HA Architecture Diagram

The canonical Bitrix24 HA topology routes HTTPS traffic through an nginx/HAProxy load balancer to two stateless Apache+PHP-FPM web nodes sharing NFS storage and a Redis session/cache bus, all writing to a Percona 8.0 MySQL master that replicates via GTID to a hot standby replica.

The diagram below shows how incoming HTTPS traffic flows through a load balancer to two stateless web nodes, which share a file storage backend and a Redis session/cache bus, with a MySQL master-replica pair handling all persistence.

flowchart TD
    CLIENT[Users / Browsers] --> LB[Load Balancer\nnginx / HAProxy\nSSL Termination]
    LB --> WEB1[Web Node 1\nApache + PHP-FPM]
    LB --> WEB2[Web Node 2\nApache + PHP-FPM]
    WEB1 --> NFS[Shared Storage\nNFS / GlusterFS\nupload, static files]
    WEB2 --> NFS
    WEB1 --> REDIS[Redis\nSessions + Cache\n+ Push&Pull]
    WEB2 --> REDIS
    WEB1 --> DBMASTER[MySQL Master\nPercona 8.0]
    WEB2 --> DBMASTER
    DBMASTER -- GTID replication --> DBREPLICA[MySQL Replica\nPercona 8.0]
    DBMASTER -.failover.-> DBREPLICA

Security Hardening Across the Cluster

Every additional web node expands the attack surface, so security controls — including IP allowlists for the admin panel, HSTS headers, PHP error display disabled, 2FA for all admin accounts, and a shared IP blocklist synced via ipset — must be applied consistently at the load-balancer level across the entire cluster.

Adding more nodes increases the attack surface proportionally. A security audit of a typical single-node installation surfaces findings that become amplified in a cluster — every node must be hardened consistently. Core checklist:

Admin panel access restricted by IP allowlist — apply at the load balancer level (not just per-node) so it is enforced even if nginx on one node is misconfigured
HSTS header + HTTP → HTTPS redirect — configure on the load balancer, not on individual nodes, to avoid inconsistency
PHP error display disabled — display_errors = Off and display_startup_errors = Off in every node's php.ini; leaking stack traces to the browser exposes file paths, SQL structure, and config details
Two-factor authentication (2FA) for all admin accounts — a cluster-level config in Bitrix admin panel applies to all nodes automatically
Proactive Protection module — ensure the minimum security level is above the platform default; disable frame embedding unless required
Firewall with IP blocklist — maintain a shared blocklist (e.g., via ipset synced across nodes) and update it regularly
Web antivirus — Bitrix's built-in web antivirus adds some CPU overhead; evaluate the performance impact on each node before enabling in production

For update procedures and testing practices, the pattern from real projects is: always test on a staging replica first, back up database + files + configs, then roll out node by node. This is doubly important in a cluster because a failed update on one node can cause split-brain behaviour if session or cache schemas diverge. See Bitrix24 Implementation Cost & Timeline: Real Data from 1,300+ Projects for realistic planning figures if you are sizing the HA project budget.

Update and Maintenance Procedures in a Clustered Environment

Rolling updates in a Bitrix24 cluster require a full backup, staging validation covering CRM, tasks, integrations, and Push & Pull, followed by node-by-node rollout with the updated node drained at the load balancer first, keeping a pre-update database snapshot available for at least 48 hours post-deployment.

Rolling updates are one of the primary operational benefits of an HA cluster. The recommended procedure from real project plans:

Full backup of the master database, all file storage, and all configuration files before any change
Staging validation — spin up a copy of the entire cluster (or at minimum a single-node replica of production), apply the update, and run functional tests: - CRM entity creation and editing - Task workflows and business processes - 1C or ERP integrations (sync scripts, POST-based data exchange) - Chat and Push & Pull notifications
Code migration: customisations should live in /local/ rather than inside /bitrix/ module directories — this is the officially supported approach and prevents core updates from overwriting your changes. Migrate any files in /bitrix/php_interface/ to /local/ before updating
Node-by-node rollout: drain node 1 at the load balancer (set to down in upstream config), update it, smoke-test, then repeat for node 2
Post-update DB checks: run Bitrix's built-in database structure check and resolve any reported errors (typically a handful of auto-fixable schema mismatches)
Rollback plan: keep the pre-update database snapshot readily accessible for at least 48 hours post-deployment

If your team is also considering migrating from a cloud CRM to a self-hosted setup, the same staged approach applies — see How to Migrate from HubSpot to Bitrix24: Step-by-Step Plan for a migration framework that transfers smoothly to an on-premise cluster target.

Monitoring and Failover Testing

A Bitrix24 HA cluster must be validated with scheduled failure tests — monthly web-node kills (pass: failover within 15 s, no session loss), quarterly MySQL master kills (pass: reconnection within 60 s), and bi-annual full DR restores — instrumented via Prometheus and Grafana tracking replication lag, Redis hit rate, and upstream health.

A cluster that has never been tested under failure conditions is not an HA cluster — it is a more expensive single-server setup. Build these checks into your runbook:

Test	Frequency	Pass criterion
Kill web node 1	Monthly	Traffic shifts to web node 2 within `fail_timeout` (default 15 s); no user session loss
Kill MySQL master	Quarterly	Replica promoted; application reconnects within 60 s
NFS mount failure simulation	Quarterly	Application returns a clean error, does not hang web workers
Full DR restore from backup	Bi-annually	Portal operational from backup in under defined RTO

Instrument your cluster with at minimum: node-level resource metrics (CPU, memory, disk I/O), MySQL replication lag, Memcached/Redis hit rate, and load balancer upstream health — ideally surfaced in a single dashboard (Prometheus + Grafana is a common stack with the Bitrix environment).

Frequently Asked Questions

Can I run a Bitrix24 HA cluster on virtual machines instead of bare metal?

Yes — most production HA deployments use VMs or cloud instances. The Bitrix VM image works on VMware, KVM, and major cloud providers. What matters is that the network between nodes has low latency (ideally under 1 ms), especially for NFS and MySQL replication traffic.

How many web nodes do I actually need for high availability?

Two web nodes is the minimum for true HA — one can go down and the portal stays up. Three nodes are common when rolling updates need zero downtime: one node is always in service while another is being updated.

Does Bitrix24's Push & Pull (real-time notifications) work across multiple web nodes?

Yes, but it requires a shared Redis or Memcached instance as the pub/sub backend. Each web node's Push & Pull server must point to the same Redis channel; otherwise users on different nodes will miss chat messages and live-feed updates.

What Percona/MySQL version should I use for a new Bitrix24 cluster?

Percona Server 8.0 is the current recommended version. MySQL 5.7 is end-of-life. Ensure binary logging is in ROW format, GTID mode is enabled, and innodb_log_file_size is set well above the default 64 MB for any portal with a database larger than a few gigabytes.

How do I handle custom code (modifications) during cluster updates?

All customisations should live in /local/ rather than inside the core /bitrix/ module directories. This is the officially supported pattern and ensures core updates do not overwrite your changes. Migrate any legacy code from /bitrix/php_interface/ to /local/ before running the first update on the cluster.

Is an IP allowlist for the Bitrix24 admin panel sufficient for cluster security?

It is one of the most effective controls — enforce it at the load balancer level so it applies uniformly across all nodes. Combine it with 2FA for admin accounts, disabled PHP error output, HSTS headers, and Bitrix's Proactive Protection module set above the minimum security level.

Based on real practice

This article is based on 10 internal documents from the practice of ACP Group — work plans, specs, questionnaires and Bitrix24 implementation cases.

Need help implementing Bitrix24?

ACP Group — Gold Partner of Bitrix24. 7+ years, 1300+ projects.
Call us +971 55 780 1481 or visit our main site.

Go to acp-24.com →