Scaling your Kantree deployment

Understanding the different components of Kantree

Kantree is divided into 4 components:

  • web: the web application server (port 5000 by default)
  • worker: background worker
  • scheduler: the process launching tasks at fixed time
  • realtime: the realtime server, a socket.io server for realtime collaboration (port 4000 by default)

IMPORTANT: Only 1 scheduler instance can be started at a time across all your machines

Application processes

Kantree uses Gunicorn to run its application processes.

You can adjust the number of Gunicorn workers based on your machine processing power (CPU). The default number is 2x{number of CPU cores}+1.

Use the --web-scale parameter to modify the number of Gunicorn workers:

$ ./platform run --web-scale NUMBER_OF_WORKERS

Background processing

Kantree makes extensive use of background processing. Queues can accumulate tasks at time and properly scaling workers is important.

Increasing the number of workers

Use the --worker-scale parameter:

$ ./platform run --worker-scale NUMBER_OF_WORKERS

The formula 2x{number of CPU cores}+1 can also be used to calculate the number of possible workers. But the sum total of application and background workers can not exceed this formula.

Using multiple queues

Tasks can be dispatched in up to four different queues. By default, a single queue is used for all tasks. To use different queues for some tasks, use the following config key in your configuration file:

queue_project_hooks: QUEUE_NAME
queue_automations: QUEUE_NAME
queue_related_cards: QUEUE_NAME
queue_compute_formulas: QUEUE_NAME
queue_refresh_charts: QUEUE_NAME
queue_batch_actions: QUEUE_NAME

It is good practice to at least use a dedicated queue for queue_compute_formulas, queue_refresh_charts and queue_automations once you are seeing a moderate usage of Kantree.

queue_automations: automations
queue_related_cards: low
queue_compute_formulas: low
queue_refresh_charts: low

Once you have configured the queue names in the configuration file, you need to ensure that some workers will process them. By default, workers only process tasks from the default queue. Use the --worker-queues parameter with a comma separated list of queue names:

$ ./platform run --worker-queues default,low,automations

This will create 2 processes: worker_default1 and worker_low1. If you combine this parameter with --worker-scale X it will start X workers per queue.

You can use the --worker-combine-queues paremeter to instead launch X workers that each process tasks from all queues.

Real-time connections

Real-time connections require one socket per connection. On linux, this means one file descriptor per connection. The default max opened file is 1024 which can quickly become a bottleneck.

Increase the limit using the fs.file-max kernel directive:

# sysctl -w fs.file-max=MAX_FILE

Scaling real-time servers can be hard. If you are serving thousands of users, contact us for support.

Multi-node deployment

Kantree’s architecture makes it very easy to deploy on multiple machines.

You can deploy as many instances of Kantree processes as you want on as many machine as long as they connect to the same PostgreSQL and Redis instances.

Installing a node

First, generate a configuration file that you can use on all your nodes:

$ ./platform gen-config config-prod.yml

Then you can unpack the kantree archive on each node, add your configuration file and your license file and then run:

$ ./platform init-node

Upgrade your database from one of the node using:

$ ./platform upgrade-db

To start only a specific process, use the run command with the process name as argument:

$ ./platform run web
$ ./platform run worker
$ ./platform run scheduler
$ ./platform run realtime

You don’t have to start all the processes together. You can for example start worker processes on one machine and web and realtime processes on another.

If you are starting workers on a machine dedicated to background processing, remember to properly scale the number of workers.

Load balacing the nodes

Finally, you’ll need to install a load-balancer to serve Kantree using all the nodes. We recommend Nginx (which we use ourselves). You will find an nginx.conf.example file at the root of your installation.

Add the ip addresses for all your nodes running web processes in the upstream app section. Same of the push processes under the upstream push section.

You will also need to replace the {{SERVER_NAME}} and {{PATH_TO_KANTREE_ROOT}} placeholders.

IMPORTANT: for the realtime server, the session needs to be sticky (ie. the requests coming from a client needs to always reach the same server).

Services Kantree depends on

Redis

Given the usage we do of Redis, you shouldn’t need to have a complicated Redis deployement. For almost all setups, Redis can be installed on the same machine as your Kantree server. If the application server becomes to low on RAM, you can put Redis on its own machine and that should be enough.

In case you want to ensure high-avaibility, you can check out Redis Sentinel.

Postgresql

Postgresql can be a tricky beast to scale. For the vast majority of deployments, Postgresql can live on a single machine. If you encounter performance issues due to the database, consider upgrading the machine itself (the hardware) before trying to scale horizontally.

There are many ressources to help you scale Postgresql or configure high-availability. It is not the purpose of this guide to cover these aspects.