Clustering

Prior to v2.8.0, Fremis supports a simple clustering to shard and replicate the data into multiple nodes with a coordinator as a router. Internally the data in the Fremis is divided into 256 partitions. We could shard the partitions into multiple nodes, where each node could hold 1 - 256 partitions.
Starting from v2.8.0, in addition to sharding, fremis can handle load balancing to ensure high availability. This configuration can be combined to enable both sharding and balancing. The partition is still divided into 256 partitions. But now multiple clients can handle the same or overlapping partitions.

Setup Database Server

Create a user-defined bridge network:
docker network create nf-visionaire
Create docker volumes for postgresdb:
docker volume create postgres-data
Create Postgre Container
docker run -it -d -p 5432:5432 \
--name=postgresdb \
--network="nf-visionaire" \
--restart unless-stopped \
-e POSTGRES_PASSWORD=nfvisionaire123 \
-e POSTGRES_DB=nfvisionaire \
-e POSTGRES_USER=postgres \
-e PGDATA=/var/lib/postgresql/data/pgdata \
-v postgres-data:/var/lib/postgresql/data \
postgres:12-alpine
Init new additional DB
docker run -it --rm \
--network="nf-visionaire" \
-e PGPASSWORD=nfvisionaire123 \
postgres:12-alpine \
sh -c 'psql -h postgresdb -U postgres -c "CREATE DATABASE nfv4;" || true'

Nodes

Currently Fremis only supports sharding mode, partition replication on multiple node would leads to undefined behavior.
Create a user-defined bridge network for each server:
docker network create nf-visionaire
For example we want to shard Fremis into 2 nodes. So node1 hold partition 0-127 and node2 hold 128-255. Then we run:

Node 1

# node1
docker run --gpus all -it -d -p 4005:4005 \
--name fremisn-node1 \
--network="nf-visionaire" \
--restart unless-stopped nodefluxio/fremis-n:v2.6.0-gpu \
httpserver \
--access-key [VISIONAIRE_ACCESS_KEY] \
--secret-key [VISIONAIRE_SECRET_KEY] \
--dk [NODE1_DEPLOYMENT_KEY] \
--storage postgres \
--db-address [DB_SERVER_ADDRESS] \
--db-port [DB_SERVER_PORT] \
--db-name nfv4 \
--db-username postgres \
--db-password nfvisionaire123 \
--config-path /app/config.yml \
--partition-start 0 \
--partition-end 127 \
-v

Node 2

# node2
docker run --gpus all -it -d -p 4006:4006 \
--name fremisn-node2 \
--network="nf-visionaire" \
--restart unless-stopped nodefluxio/fremis-n:v2.6.0-gpu \
httpserver \
--access-key [VISIONAIRE_ACCESS_KEY] \
--secret-key [VISIONAIRE_SECRET_KEY] \
--dk [NODE2_DEPLOYMENT_KEY] \
--listen-port 4006 \
--storage postgres \
--db-address [DB_SERVER_ADDRESS] \
--db-port [DB_SERVER_PORT] \
--db-name nfv4 \
--db-username postgres \
--db-password nfvisionaire123 \
--config-path /app/config.yml \
--partition-start 128 \
--partition-end 255 \
-v
notice the --partition-start and --partition-end flag.

Coordinator

To route requests to multiple nodes we need a router/load balancer, this is the function of the coordinator.
Coordinator API is exactly the same as Fremis API.

Coordinator Config File

Fremis coordinator expects configuration file in yaml format, specifying clients’ address and partition start and end for each client.
Prior to v2.8.0, one partition can only appear in one client. Starting from v2.8.0, one partition can appear in multiple clients. When multiple clients have the same partition, the requests will be balanced to clients accordingly.
For example, to do simple sharding:
version: "v1"
nodes:
- address: "[IP_Node_1]:[Port_Node_1]" # node1
partition_start: 0
partition_end: 127
- address: "[IP_Node_2]:[Port_Node_2]" # node2
partition_start: 128
partition_end: 255
Where {IP/Port}Node{I} represents address and port for client I respectively.
To enable High Availability in fremis (only valid starting from v2.8.0):
version: "v1"
nodes:
- address: "[IP_Node_1]:[Port_Node_1]" # node1
partition_start: 0
partition_end: 255
- address: "[IP_Node_2]:[Port_Node_2]" # node2
partition_start: 0
partition_end: 255
- address: "[IP_Node_3]:[Port_Node_3]" # node2
partition_start: 0
partition_end: 255
With this configuration, all three clients are expected to load all the data and fremis coordinator will balance the requests to the clients uniformly. When one or two clients are dead or unreachable from the coordinator, fremis coordinator is expected to normally but the coordinator will only perform requests to the healthy client(s). Upon dead client(s) are restarted/available, then the coordinator will start to balance again.
Both configurations are also allowed in fremis (only valid starting from v2.8.0):
version: "v1"
nodes:
- address: "[IP_Node_1]:[Port_Node_1]" # node1
partition_start: 0
partition_end: 127
- address: "[IP_Node_2]:[Port_Node_2]" # node2
partition_start: 0
partition_end: 127
- address: "[IP_Node_3]:[Port_Node_3]" # node1
partition_start: 128
partition_end: 255
- address: "[IP_Node_4]:[Port_Node_4]" # node1
partition_start: 128
partition_end: 255
With such configuration, client 1 will share the data and compute with client 2, while client 3 pairs with client 4. Like the previous config example, the coordinator is expected to work normally when one or two clients are dead as long as the partition is complete. For example, the coordinator is expected to work normally with client 1 and client 4 dead, but not when client 1 and client 2 are dead.
Run the coordinator
docker run --gpus all -it -d -p 4004:4004 \
--name fremisn-coordinator \
--network="nf-visionaire" \
-v ~:/config --restart unless-stopped \
nodefluxio/fremis-n:v2.6.0-gpu coordinator \
--listen-port 4004 \
--config-path /config/config.yaml \
--access-key QI5MT9X9IZZ812DUZD1VNO0EY \
--secret-key xDM1BsvzU4jjZ0F-Piaw14D3sUeSLf3LAYYaWzFrphH195gKID0_VLnfyD0sFa4b \
--dk 8f580441-1208-4d6a-b71e-b9f769d00a3f \
--storage postgres \
--db-address 10.0.28.10 \
--db-port 5432 \
--db-name nfv4 \
--db-username postgres \
--db-password nfvisionaire123 \
-v
Then we could use coordinator as usual Fremis.