Comment on page

Clustering

Setup Database Server

Create a user-defined bridge network:
docker network create nf-visionaire
Create docker volumes for postgresdb:
docker volume create postgres-data
Create Postgre Container
docker run -it -d -p 5432:5432 \
--name=postgresdb \
--network="nf-visionaire" \
--restart unless-stopped \
-e POSTGRES_PASSWORD=nfvisionaire123 \
-e POSTGRES_DB=nfvisionaire \
-e POSTGRES_USER=postgres \
-e PGDATA=/var/lib/postgresql/data/pgdata \
-v postgres-data:/var/lib/postgresql/data \
postgres:12-alpine
Init new additional DB
docker run -it --rm \
--network="nf-visionaire" \
-e PGPASSWORD=nfvisionaire123 \
postgres:12-alpine \
sh -c 'psql -h postgresdb -U postgres -c "CREATE DATABASE nfv4;" || true'

Nodes

Currently Face Searching only supports sharding mode, partition replication on multiple node would leads to undefined behavior.
Create a user-defined bridge network for each server:
docker network create nf-visionaire
For example we want to shard Face Searching
into 2 nodes. So node1 hold partition 0-127 and node2 hold 128-255. Then we run:

Node 1 (GPU)

# node1
docker run --gpus device=0 -it -d -p 4005:4005 \
--env FREMISN_BATCH_SEARCH_MAX_SIZE=1 \
--name fremisn \
--network="nf-visionaire" \
-v <HOST FREMIS-N CONFIG FOLDER>:/config/config.yml
--restart unless-stopped \
nodefluxio/fremis-n:v2.12.18-gpu \
httpserver \
--access-key ${VISIONAIRE_CLOUD_ACCESS_KEY} \
--secret-key ${VISIONAIRE_CLOUD_SECRET_KEY} \
--dk ${DEPLOYMENT_KEY_SNAPSHOT} \
--db-address localhost \
--db-port 5432 \
--db-name nfvisionaire \
--db-username postgres \
--db-password nfvisionaire123 \
--listen-port 4005 \
--listen-port-monitoring 5005 \
--storage postgres \
--config-path /config/config.yml \
--precision=16 \
--partition-start 0 \
--partition-end 127 \
--verbose

Node 2 (GPU)

# node2
docker run --gpus device=0 -it -d -p 4005:4005 \
--env FREMISN_BATCH_SEARCH_MAX_SIZE=1 \
--name fremisn \
--network="nf-visionaire" \
-v <HOST FREMIS-N CONFIG FOLDER>:/config/config.yml
--restart unless-stopped \
nodefluxio/fremis-n:v2.12.18-gpu \
httpserver \
--access-key ${VISIONAIRE_CLOUD_ACCESS_KEY} \
--secret-key ${VISIONAIRE_CLOUD_SECRET_KEY} \
--dk ${DEPLOYMENT_KEY_SNAPSHOT} \
--db-address localhost \
--db-port 5432 \
--db-name nfvisionaire \
--db-username postgres \
--db-password nfvisionaire123 \
--listen-port 4005 \
--listen-port-monitoring 5005 \
--storage postgres \
--config-path /config/config.yml \
--precision=16 \
--partition-start 128 \
--partition-end 255 \
--verbose
Notice the --partition-start and --partition-end flag. This should be fill in with partition id that you want to use for the specific node.
If you want to use Face Searching in CPU Mode. Please check in Single Node

Coordinator

To route requests to multiple nodes we need a router/load balancer, this is the function of the coordinator.
Coordinator API is exactly the same as Fremis API.

Coordinator Config File

Fremis coordinator expects configuration file in yaml format, specifying clients’ address and partition start and end for each client.
Prior to v2.8.0, one partition can only appear in one client. Starting from v2.8.0, one partition can appear in multiple clients. When multiple clients have the same partition, the requests will be balanced to clients accordingly.
For example, to do simple sharding:
version: "v1"
nodes:
- address: "[IP_Node_1]:[Port_Node_1]" # node1
partition_start: 0
partition_end: 127
- address: "[IP_Node_2]:[Port_Node_2]" # node2
partition_start: 128
partition_end: 255
Where {IP/Port}Node{I} represents address and port for client I respectively.
To enable High Availability in fremis (only valid starting from v2.8.0):
version: "v1"
nodes:
- address: "[IP_Node_1]:[Port_Node_1]" # node1
partition_start: 0
partition_end: 255
- address: "[IP_Node_2]:[Port_Node_2]" # node2
partition_start: 0
partition_end: 255
- address: "[IP_Node_3]:[Port_Node_3]" # node2
partition_start: 0
partition_end: 255
With this configuration, all three clients are expected to load all the data and fremis coordinator will balance the requests to the clients uniformly. When one or two clients are dead or unreachable from the coordinator, fremis coordinator is expected to normally but the coordinator will only perform requests to the healthy client(s). Upon dead client(s) are restarted/available, then the coordinator will start to balance again.
Both configurations are also allowed in fremis (only valid starting from v2.8.0):
version: "v1"
nodes:
- address: "[IP_Node_1]:[Port_Node_1]" # node1
partition_start: 0
partition_end: 127
- address: "[IP_Node_2]:[Port_Node_2]" # node2
partition_start: 0
partition_end: 127
- address: "[IP_Node_3]:[Port_Node_3]" # node1
partition_start: 128
partition_end: 255
- address: "[IP_Node_4]:[Port_Node_4]" # node1
partition_start: 128
partition_end: 255
With such configuration, client 1 will share the data and compute with client 2, while client 3 pairs with client 4. Like the previous config example, the coordinator is expected to work normally when one or two clients are dead as long as the partition is complete. For example, the coordinator is expected to work normally with client 1 and client 4 dead, but not when client 1 and client 2 are dead.
Run the coordinator
docker run --gpus device=0 -it -d -p 4005:4005 \
--env FREMISN_BATCH_SEARCH_MAX_SIZE=1 \
--name fremisn \
--network="nf-visionaire" \
-v <HOST FREMIS-N CONFIG FOLDER>:/config/config.yml
--restart unless-stopped \
nodefluxio/fremis-n:v2.12.18-gpu coordinator \
--access-key ${VISIONAIRE_CLOUD_ACCESS_KEY} \
--secret-key ${VISIONAIRE_CLOUD_SECRET_KEY} \
--dk ${DEPLOYMENT_KEY_SNAPSHOT} \
--db-address localhost \
--db-port 5432 \
--db-name nfvisionaire \
--db-username postgres \
--db-password nfvisionaire123 \
--listen-port 4005 \
--listen-port-monitoring 5005 \
--storage postgres \
--config-path /config/config.yml \
--precision=16 \
--verbose
Then we could use coordinator as usual Fremis.