Page cover image

Clustering

Setup Database Server

Create a user-defined bridge network:

docker network create nf-visionaire

Create docker volumes for postgresdb:

docker volume create postgres-data

Create Postgre Container

docker run -it -d -p 5432:5432 \
--name=postgresdb \
--network="nf-visionaire" \
--restart unless-stopped \
-e POSTGRES_PASSWORD=nfvisionaire123 \
-e POSTGRES_DB=nfvisionaire \
-e POSTGRES_USER=postgres \
-e PGDATA=/var/lib/postgresql/data/pgdata \
-v postgres-data:/var/lib/postgresql/data \
postgres:12-alpine

Init new additional DB

docker run -it --rm \
--network="nf-visionaire" \
-e PGPASSWORD=nfvisionaire123 \
postgres:12-alpine \
sh -c 'psql -h postgresdb -U postgres -c "CREATE DATABASE nfv4;" || true'

Nodes

Currently Face Searching only supports sharding mode, partition replication on multiple node would leads to undefined behavior.

Create a user-defined bridge network for each server:

docker network create nf-visionaire

For example we want to shard Face Searching

into 2 nodes. So node1 hold partition 0-127 and node2 hold 128-255. Then we run:

Node 1 (GPU)

# node1
docker run --gpus device=0 -it -d -p 4005:4005 \
--env FREMISN_BATCH_SEARCH_MAX_SIZE=1 \
--name fremisn \
--network="nf-visionaire" \
-v <HOST FREMIS-N CONFIG FOLDER>:/config/config.yml
--restart unless-stopped \
nodefluxio/fremis-n:v2.12.18-gpu \
httpserver \
--access-key ${VISIONAIRE_CLOUD_ACCESS_KEY} \
--secret-key ${VISIONAIRE_CLOUD_SECRET_KEY} \
--dk ${DEPLOYMENT_KEY_SNAPSHOT} \
--db-address localhost \
--db-port 5432 \
--db-name nfvisionaire \
--db-username postgres \
--db-password nfvisionaire123 \
--listen-port 4005 \
--listen-port-monitoring 5005 \
--storage postgres \
--config-path /config/config.yml \
--precision=16 \
--partition-start 0 \
--partition-end 127 \
--verbose 

Node 2 (GPU)

# node2
docker run --gpus device=0 -it -d -p 4005:4005 \
--env FREMISN_BATCH_SEARCH_MAX_SIZE=1 \
--name fremisn \
--network="nf-visionaire" \
-v <HOST FREMIS-N CONFIG FOLDER>:/config/config.yml
--restart unless-stopped \
nodefluxio/fremis-n:v2.12.18-gpu \
httpserver \
--access-key ${VISIONAIRE_CLOUD_ACCESS_KEY} \
--secret-key ${VISIONAIRE_CLOUD_SECRET_KEY} \
--dk ${DEPLOYMENT_KEY_SNAPSHOT} \
--db-address localhost \
--db-port 5432 \
--db-name nfvisionaire \
--db-username postgres \
--db-password nfvisionaire123 \
--listen-port 4005 \
--listen-port-monitoring 5005 \
--storage postgres \
--config-path /config/config.yml \
--precision=16 \
--partition-start 128 \
--partition-end 255 \
--verbose 

If you want to use Face Searching in CPU Mode. Please check in Single Node

Coordinator

To route requests to multiple nodes we need a router/load balancer, this is the function of the coordinator.

Coordinator API is exactly the same as Fremis API.

Coordinator Config File

Fremis coordinator expects configuration file in yaml format, specifying clients’ address and partition start and end for each client.

Prior to v2.8.0, one partition can only appear in one client. Starting from v2.8.0, one partition can appear in multiple clients. When multiple clients have the same partition, the requests will be balanced to clients accordingly.

For example, to do simple sharding:

version: "v1"
nodes:
- address: "[IP_Node_1]:[Port_Node_1]" # node1
  partition_start: 0
  partition_end: 127
- address: "[IP_Node_2]:[Port_Node_2]" # node2
  partition_start: 128
  partition_end: 255

Where {IP/Port}Node{I} represents address and port for client I respectively.

To enable High Availability in fremis (only valid starting from v2.8.0):

version: "v1"
nodes:
- address: "[IP_Node_1]:[Port_Node_1]" # node1
  partition_start: 0
  partition_end: 255
- address: "[IP_Node_2]:[Port_Node_2]" # node2
  partition_start: 0
  partition_end: 255
- address: "[IP_Node_3]:[Port_Node_3]" # node2
  partition_start: 0
  partition_end: 255

With this configuration, all three clients are expected to load all the data and fremis coordinator will balance the requests to the clients uniformly. When one or two clients are dead or unreachable from the coordinator, fremis coordinator is expected to normally but the coordinator will only perform requests to the healthy client(s). Upon dead client(s) are restarted/available, then the coordinator will start to balance again.

Both configurations are also allowed in fremis (only valid starting from v2.8.0):

version: "v1"
nodes:
- address: "[IP_Node_1]:[Port_Node_1]" # node1
  partition_start: 0
  partition_end: 127
- address: "[IP_Node_2]:[Port_Node_2]" # node2
  partition_start: 0
  partition_end: 127
- address: "[IP_Node_3]:[Port_Node_3]" # node1
  partition_start: 128
  partition_end: 255
- address: "[IP_Node_4]:[Port_Node_4]" # node1
  partition_start: 128
  partition_end: 255

With such configuration, client 1 will share the data and compute with client 2, while client 3 pairs with client 4. Like the previous config example, the coordinator is expected to work normally when one or two clients are dead as long as the partition is complete. For example, the coordinator is expected to work normally with client 1 and client 4 dead, but not when client 1 and client 2 are dead.

Run the coordinator

docker run --gpus device=0 -it -d -p 4005:4005 \
--env FREMISN_BATCH_SEARCH_MAX_SIZE=1 \
--name fremisn \
--network="nf-visionaire" \
-v <HOST FREMIS-N CONFIG FOLDER>:/config/config.yml
--restart unless-stopped \
nodefluxio/fremis-n:v2.12.18-gpu coordinator \
--access-key ${VISIONAIRE_CLOUD_ACCESS_KEY} \
--secret-key ${VISIONAIRE_CLOUD_SECRET_KEY} \
--dk ${DEPLOYMENT_KEY_SNAPSHOT} \
--db-address localhost \
--db-port 5432 \
--db-name nfvisionaire \
--db-username postgres \
--db-password nfvisionaire123 \
--listen-port 4005 \
--listen-port-monitoring 5005 \
--storage postgres \
--config-path /config/config.yml \
--precision=16 \
--verbose 

Then we could use coordinator as usual Fremis.

Last updated

Was this helpful?