Compare commits
10 Commits
ef3d998ead
...
feature/no
| Author | SHA1 | Date | |
|---|---|---|---|
| 46dee0a6cb | |||
| 2108df5283 | |||
| 380de4c80b | |||
| 83285c2ab5 | |||
| edcfecd24b | |||
| 80117ee36f | |||
| 780a0c530d | |||
| d0af40f4c7 | |||
| 83cef6e6f6 | |||
| 3751ec554d |
495
ARCHITECTURE.md
495
ARCHITECTURE.md
@@ -1,495 +0,0 @@
|
||||
# oc-discovery — Architecture et analyse technique
|
||||
|
||||
> **Convention de lecture**
|
||||
> Les points marqués ✅ ont été corrigés dans le code. Les points marqués ⚠️ restent ouverts.
|
||||
|
||||
## Table des matières
|
||||
|
||||
1. [Vue d'ensemble](#1-vue-densemble)
|
||||
2. [Hiérarchie des rôles](#2-hiérarchie-des-rôles)
|
||||
3. [Mécanismes principaux](#3-mécanismes-principaux)
|
||||
- 3.1 Heartbeat long-lived (node → indexer)
|
||||
- 3.2 Scoring de confiance
|
||||
- 3.3 Enregistrement auprès des natifs (indexer → native)
|
||||
- 3.4 Pool d'indexeurs : fetch + consensus
|
||||
- 3.5 Self-delegation et offload loop
|
||||
- 3.6 Résilience du mesh natif
|
||||
- 3.7 DHT partagée
|
||||
- 3.8 PubSub gossip (indexer registry)
|
||||
- 3.9 Streams applicatifs (node ↔ node)
|
||||
4. [Tableau récapitulatif](#4-tableau-récapitulatif)
|
||||
5. [Risques et limites globaux](#5-risques-et-limites-globaux)
|
||||
6. [Pistes d'amélioration](#6-pistes-damélioration)
|
||||
|
||||
---
|
||||
|
||||
## 1. Vue d'ensemble
|
||||
|
||||
`oc-discovery` est un service de découverte P2P pour le réseau OpenCloud. Il repose sur
|
||||
**libp2p** (transport TCP + PSK réseau privé) et une **DHT Kademlia** (préfixe `oc`)
|
||||
pour indexer les pairs. L'architecture est intentionnellement hiérarchique : des _natifs_
|
||||
stables servent de hubs autoritaires auxquels des _indexeurs_ s'enregistrent, et des _nœuds_
|
||||
ordinaires découvrent des indexeurs via ces natifs.
|
||||
|
||||
```
|
||||
┌──────────────┐ heartbeat ┌──────────────────┐
|
||||
│ Node │ ───────────────────► │ Indexer │
|
||||
│ (libp2p) │ ◄─────────────────── │ (DHT server) │
|
||||
└──────────────┘ stream applicatif └────────┬─────────┘
|
||||
│ subscribe / heartbeat
|
||||
▼
|
||||
┌──────────────────┐
|
||||
│ Native Indexer │◄──► autres natifs
|
||||
│ (hub autoritaire│ (mesh)
|
||||
└──────────────────┘
|
||||
```
|
||||
|
||||
Tous les participants partagent une **clé pré-partagée (PSK)** qui isole le réseau
|
||||
des connexions libp2p externes non autorisées.
|
||||
|
||||
---
|
||||
|
||||
## 2. Hiérarchie des rôles
|
||||
|
||||
| Rôle | Binaire | Responsabilité |
|
||||
|---|---|---|
|
||||
| **Node** | `node_mode=node` | Se fait indexer, publie/consulte des records DHT |
|
||||
| **Indexer** | `node_mode=indexer` | Reçoit les heartbeats, écrit en DHT, s'enregistre auprès des natifs |
|
||||
| **Native Indexer** | `node_mode=native` | Hub : tient le registre des indexeurs vivants, évalue le consensus, sert de fallback |
|
||||
|
||||
Un même processus peut cumuler les rôles node+indexer ou indexer+native.
|
||||
|
||||
---
|
||||
|
||||
## 3. Mécanismes principaux
|
||||
|
||||
### 3.1 Heartbeat long-lived (node → indexer)
|
||||
|
||||
**Fonctionnement**
|
||||
|
||||
Un stream libp2p **persistant** (`/opencloud/heartbeat/1.0`) est ouvert depuis le nœud
|
||||
vers chaque indexeur de son pool (`StaticIndexers`). Toutes les 20 secondes, le nœud
|
||||
envoie un `Heartbeat` JSON sur ce stream. L'indexeur répond en enregistrant le peer dans
|
||||
`StreamRecords[ProtocolHeartbeat]` avec une expiry de 2 min.
|
||||
|
||||
Si `sendHeartbeat` échoue (stream reset, EOF, timeout), le peer est retiré de
|
||||
`StaticIndexers` et `replenishIndexersFromNative` est déclenché.
|
||||
|
||||
**Avantages**
|
||||
- Détection rapide de déconnexion (erreur sur le prochain encode).
|
||||
- Un seul stream par pair réduit la pression sur les connexions TCP.
|
||||
- Le channel de nudge (`indexerHeartbeatNudge`) permet un reconnect immédiat sans
|
||||
attendre le ticker de 20 s.
|
||||
|
||||
**Limites / risques**
|
||||
- ⚠️ Un seul stream persistant : si la couche TCP reste ouverte mais "gelée" (middlebox,
|
||||
NAT silencieux), l'erreur peut ne pas remonter avant plusieurs minutes.
|
||||
- ⚠️ `StaticIndexers` est une map partagée globale : si deux goroutines appellent
|
||||
`replenishIndexersFromNative` simultanément (cas de perte multiple), on peut avoir
|
||||
des écritures concurrentes non protégées hors des sections critiques.
|
||||
|
||||
---
|
||||
|
||||
### 3.2 Scoring de confiance
|
||||
|
||||
**Fonctionnement**
|
||||
|
||||
Avant d'enregistrer un heartbeat dans `StreamRecords`, l'indexeur vérifie un **score
|
||||
minimum** calculé par `CheckHeartbeat` :
|
||||
|
||||
```
|
||||
Score = (0.4 × uptime_ratio + 0.4 × bpms + 0.2 × diversity) × 100
|
||||
```
|
||||
|
||||
- `uptime_ratio` : durée de présence du peer / durée depuis le démarrage de l'indexeur.
|
||||
- `bpms` : débit mesuré via un stream dédié (`/opencloud/probe/1.0`) normalisé par 50 Mbps.
|
||||
- `diversity` : ratio d'IP /24 distincts parmi les indexeurs que le peer déclare.
|
||||
|
||||
Deux seuils sont appliqués selon l'état du peer :
|
||||
- **Premier heartbeat** (peer absent de `StreamRecords`, uptime = 0) : seuil à **40**.
|
||||
- **Heartbeats suivants** (uptime accumulé) : seuil à **75**.
|
||||
|
||||
**Avantages**
|
||||
- Décourage les peers éphémères ou lents d'encombrer le registre.
|
||||
- La diversité réseau réduit le risque de concentration sur un seul sous-réseau.
|
||||
- Le stream de probe dédié évite de polluer le stream JSON heartbeat avec des données binaires.
|
||||
- Le double seuil permet aux nouveaux peers d'être admis dès leur première connexion.
|
||||
|
||||
**Limites / risques**
|
||||
- ✅ **Deadlock logique de démarrage corrigé** : avec uptime = 0 le score maximal était 60,
|
||||
en-dessous du seuil de 75. Les nouveaux peers étaient silencieusement rejetés à jamais.
|
||||
→ Seuil abaissé à **40** pour le premier heartbeat (`isFirstHeartbeat`), 75 ensuite.
|
||||
- ⚠️ Les seuils (40 / 75) restent câblés en dur, sans possibilité de configuration.
|
||||
- ⚠️ La mesure de bande passante envoie entre 512 et 2048 octets par heartbeat : à 20 s
|
||||
d'intervalle et 500 nœuds max, cela représente ~50 KB/s de trafic probe en continu.
|
||||
- ⚠️ `diversity` est calculé sur les adresses que le nœud *déclare* avoir — ce champ est
|
||||
auto-rapporté et non vérifié, facilement falsifiable.
|
||||
|
||||
---
|
||||
|
||||
### 3.3 Enregistrement auprès des natifs (indexer → native)
|
||||
|
||||
**Fonctionnement**
|
||||
|
||||
Chaque indexeur (non-natif) envoie périodiquement (toutes les 60 s) une
|
||||
`IndexerRegistration` JSON sur un stream one-shot (`/opencloud/native/subscribe/1.0`)
|
||||
vers chaque natif configuré. Le natif :
|
||||
|
||||
1. Stocke l'entrée en cache local avec un TTL de **90 s** (`IndexerTTL`).
|
||||
2. Gossipe le `PeerID` sur le topic PubSub `oc-indexer-registry` aux autres natifs.
|
||||
3. Persiste l'entrée en DHT de manière asynchrone (retry jusqu'à succès).
|
||||
|
||||
**Avantages**
|
||||
- Stream jetable : pas de ressource longue durée côté natif pour les enregistrements.
|
||||
- Le cache local est immédiatement disponible pour `handleNativeGetIndexers` sans
|
||||
attendre la DHT.
|
||||
- La dissémination PubSub permet à d'autres natifs de connaître l'indexeur sans
|
||||
qu'il ait besoin de s'y enregistrer directement.
|
||||
|
||||
**Limites / risques**
|
||||
- ✅ **TTL trop serré corrigé** : le TTL de 66 s n'était que 10 % au-dessus de l'intervalle
|
||||
de 60 s — un léger retard réseau pouvait expirer un indexeur sain entre deux renewals.
|
||||
→ `IndexerTTL` porté à **90 s** (+50 %).
|
||||
- ⚠️ Si le `PutValue` DHT échoue définitivement (réseau partitionné), le natif possède
|
||||
l'entrée mais les autres natifs qui n'ont pas reçu le message PubSub ne la connaissent
|
||||
jamais — incohérence silencieuse.
|
||||
- ⚠️ `RegisterWithNative` ignore les adresses en `127.0.0.1`, mais ne gère pas
|
||||
les adresses privées (RFC1918) qui seraient non routables depuis d'autres hôtes.
|
||||
|
||||
---
|
||||
|
||||
### 3.4 Pool d'indexeurs : fetch + consensus
|
||||
|
||||
**Fonctionnement**
|
||||
|
||||
Lors de `ConnectToNatives` (démarrage ou replenish), le nœud/indexeur :
|
||||
|
||||
1. **Fetch** : envoie `GetIndexersRequest` au premier natif répondant
|
||||
(`/opencloud/native/indexers/1.0`), reçoit une liste de candidats.
|
||||
2. **Consensus (round 1)** : interroge **tous** les natifs configurés en parallèle
|
||||
(`/opencloud/native/consensus/1.0`, timeout 3 s, collecte sur 4 s).
|
||||
Un indexeur est confirmé si **strictement plus de 50 %** des natifs répondants
|
||||
le considèrent vivant.
|
||||
3. **Consensus (round 2)** : si le pool est insuffisant, les suggestions des natifs
|
||||
(indexeurs qu'ils connaissent mais qui n'étaient pas dans les candidats initiaux)
|
||||
sont soumises à un second round.
|
||||
|
||||
**Avantages**
|
||||
- La règle de majorité absolue empêche un natif compromis ou désynchronisé d'injecter
|
||||
des indexeurs fantômes.
|
||||
- Le double round permet de compléter le pool avec des alternatives connues des natifs
|
||||
sans sacrifier la vérification.
|
||||
- Si le fetch retourne un **fallback** (natif comme indexeur), le consensus est skippé —
|
||||
cohérent car il n'y a qu'une seule source.
|
||||
|
||||
**Limites / risques**
|
||||
- ⚠️ Avec **un seul natif** configuré (très courant en dev/test), le consensus est trivial
|
||||
(100 % d'un seul vote) — la règle de majorité ne protège rien dans ce cas.
|
||||
- ⚠️ `fetchIndexersFromNative` s'arrête au **premier natif répondant** (séquentiellement) :
|
||||
si ce natif a un cache périmé ou partiel, le nœud obtient un pool sous-optimal sans
|
||||
consulter les autres.
|
||||
- ⚠️ Le timeout de collecte global (4 s) est fixe : sur un réseau lent ou géographiquement
|
||||
distribué, des natifs valides peuvent être éliminés faute de réponse à temps.
|
||||
- ⚠️ `replaceStaticIndexers` **ajoute** sans jamais retirer d'anciens indexeurs expirés :
|
||||
le pool peut accumuler des entrées mortes que seul le heartbeat purge ensuite.
|
||||
|
||||
---
|
||||
|
||||
### 3.5 Self-delegation et offload loop
|
||||
|
||||
**Fonctionnement**
|
||||
|
||||
Si un natif ne dispose d'aucun indexeur vivant lors d'un `handleNativeGetIndexers`,
|
||||
il se désigne lui-même comme indexeur temporaire (`selfDelegate`) : il retourne sa propre
|
||||
adresse multiaddr et ajoute le demandeur dans `responsiblePeers`, dans la limite de
|
||||
`maxFallbackPeers` (50). Au-delà, la délégation est refusée et une réponse vide est
|
||||
retournée pour que le nœud tente un autre natif.
|
||||
|
||||
Toutes les 30 s, `runOffloadLoop` vérifie si des indexeurs réels sont de nouveau
|
||||
disponibles. Si oui, pour chaque peer responsable :
|
||||
- **Stream présent** : `Reset()` du stream heartbeat — le peer reçoit une erreur,
|
||||
déclenche `replenishIndexersFromNative` et migre vers de vrais indexeurs.
|
||||
- **Stream absent** (peer jamais admis par le scoring) : `ClosePeer()` sur la connexion
|
||||
réseau — le peer reconnecte et re-demande ses indexeurs au natif.
|
||||
|
||||
**Avantages**
|
||||
- Continuité de service : un nœud n'est jamais bloqué en l'absence temporaire d'indexeurs.
|
||||
- La migration est automatique et transparente pour le nœud.
|
||||
- `Reset()` (vs `Close()`) interrompt les deux sens du stream, garantissant que le peer
|
||||
reçoit bien une erreur.
|
||||
- La limite de 50 empêche le natif de se retrouver surchargé lors de pénuries prolongées.
|
||||
|
||||
**Limites / risques**
|
||||
- ✅ **Offload sans stream corrigé** : si le heartbeat n'avait jamais été enregistré dans
|
||||
`StreamRecords` (score < seuil — cas amplifié par le bug de scoring), l'offload
|
||||
échouait silencieusement et le peer restait dans `responsiblePeers` indéfiniment.
|
||||
→ Branche `else` : `ClosePeer()` + suppression de `responsiblePeers`.
|
||||
- ✅ **`responsiblePeers` illimité corrigé** : le natif acceptait un nombre arbitraire
|
||||
de peers en self-delegation, devenant lui-même un indexeur surchargé.
|
||||
→ `selfDelegate` vérifie `len(responsiblePeers) >= maxFallbackPeers` et retourne
|
||||
`false` si saturé.
|
||||
- ⚠️ La délégation reste non coordonnée entre natifs : un natif surchargé refuse (retourne
|
||||
vide) mais ne redirige pas explicitement vers un natif voisin qui aurait de la capacité.
|
||||
|
||||
---
|
||||
|
||||
### 3.6 Résilience du mesh natif
|
||||
|
||||
**Fonctionnement**
|
||||
|
||||
Quand le heartbeat vers un natif échoue, `replenishNativesFromPeers` tente de trouver
|
||||
un remplaçant dans cet ordre :
|
||||
|
||||
1. `fetchNativeFromNatives` : demande à chaque natif vivant (`/opencloud/native/peers/1.0`)
|
||||
une adresse de natif inconnue.
|
||||
2. `fetchNativeFromIndexers` : demande à chaque indexeur connu
|
||||
(`/opencloud/indexer/natives/1.0`) ses natifs configurés.
|
||||
3. Si aucun remplaçant et `remaining ≤ 1` : `retryLostNative` relance un ticker de 30 s
|
||||
qui retente la connexion directe au natif perdu.
|
||||
|
||||
`EnsureNativePeers` maintient des heartbeats de natif à natif via `ProtocolHeartbeat`,
|
||||
avec une **unique goroutine** couvrant toute la map `StaticNatives`.
|
||||
|
||||
**Avantages**
|
||||
- Le gossip multi-hop via indexeurs permet de retrouver un natif même si aucun pair
|
||||
direct ne le connaît.
|
||||
- `retryLostNative` gère le cas d'un seul natif (déploiement minimal).
|
||||
- La reconnexion automatique (`retryLostNative`) déclenche `replenishIndexersIfNeeded`
|
||||
pour restaurer aussi le pool d'indexeurs.
|
||||
|
||||
**Limites / risques**
|
||||
- ✅ **Goroutines heartbeat multiples corrigé** : `EnsureNativePeers` démarrait une
|
||||
goroutine `SendHeartbeat` par adresse native (N natifs → N goroutines → N² heartbeats
|
||||
par tick). → Utilisation de `nativeMeshHeartbeatOnce` : une seule goroutine itère sur
|
||||
`StaticNatives`.
|
||||
- ⚠️ `retryLostNative` tourne indéfiniment sans condition d'arrêt liée à la vie du processus
|
||||
(pas de `context.Context`). Si le binaire est gracefully shutdown, cette goroutine
|
||||
peut bloquer.
|
||||
- ⚠️ La découverte transitoire (natif → indexeur → natif) est à sens unique : un indexeur
|
||||
ne connaît que les natifs de sa propre config, pas les nouveaux natifs qui auraient
|
||||
rejoint après son démarrage.
|
||||
|
||||
---
|
||||
|
||||
### 3.7 DHT partagée
|
||||
|
||||
**Fonctionnement**
|
||||
|
||||
Tous les indexeurs et natifs participent à une DHT Kademlia (préfixe `oc`, mode
|
||||
`ModeServer`). Deux namespaces sont utilisés :
|
||||
|
||||
- `/node/<DID>` → `PeerRecord` JSON signé (publié par les indexeurs sur heartbeat de nœud).
|
||||
- `/indexer/<PeerID>` → `liveIndexerEntry` JSON avec TTL (publié par les natifs).
|
||||
|
||||
Chaque natif lance `refreshIndexersFromDHT` (toutes les 30 s) qui ré-hydrate son cache
|
||||
local depuis la DHT pour les PeerIDs connus (`knownPeerIDs`) dont l'entrée locale a expiré.
|
||||
|
||||
**Avantages**
|
||||
- Persistance décentralisée : un record survit à la perte d'un seul natif ou indexeur.
|
||||
- Validation des entrées : `PeerRecordValidator` et `IndexerRecordValidator` rejettent
|
||||
les records malformés ou expirés au moment du `PutValue`.
|
||||
- L'index secondaire `/name/<name>` permet la résolution par nom humain.
|
||||
|
||||
**Limites / risques**
|
||||
- ⚠️ La DHT Kademlia en réseau privé (PSK) est fonctionnelle mais les nœuds bootstrap
|
||||
ne sont pas configurés explicitement : la découverte dépend de connexions déjà établies,
|
||||
ce qui peut ralentir la convergence au démarrage.
|
||||
- ⚠️ `PutValue` est réessayé en boucle infinie si `"failed to find any peer in table"` —
|
||||
une panne de réseau prolongée génère des goroutines bloquées.
|
||||
- ⚠️ Si la PSK est compromise, un attaquant peut écrire dans la DHT ; les `liveIndexerEntry`
|
||||
d'indexeurs ne sont pas signées, contrairement aux `PeerRecord`.
|
||||
- ⚠️ `refreshIndexersFromDHT` prune `knownPeerIDs` si la DHT n'a aucune entrée fraîche,
|
||||
mais ne prune pas `liveIndexers` — une entrée expirée reste en mémoire jusqu'au GC
|
||||
ou au prochain refresh.
|
||||
|
||||
---
|
||||
|
||||
### 3.8 PubSub gossip (indexer registry)
|
||||
|
||||
**Fonctionnement**
|
||||
|
||||
Quand un indexeur s'enregistre auprès d'un natif, ce dernier publie l'adresse sur le
|
||||
topic GossipSub `oc-indexer-registry`. Les autres natifs abonnés mettent à jour leur
|
||||
`knownPeerIDs` sans attendre la DHT.
|
||||
|
||||
Le `TopicValidator` rejette tout message dont le contenu n'est pas un multiaddr
|
||||
parseable valide avant qu'il n'atteigne la boucle de traitement.
|
||||
|
||||
**Avantages**
|
||||
- Dissémination quasi-instantanée entre natifs connectés.
|
||||
- Complément utile à la DHT pour les registrations récentes qui n'ont pas encore
|
||||
été persistées.
|
||||
- Le filtre syntaxique bloque les messages malformés avant propagation dans le mesh.
|
||||
|
||||
**Limites / risques**
|
||||
- ✅ **`TopicValidator` sans validation corrigé** : le validateur acceptait systématiquement
|
||||
tous les messages (`return true`), permettant à un natif compromis de gossiper
|
||||
n'importe quelle donnée.
|
||||
→ Le validateur vérifie désormais que le message est un multiaddr parseable
|
||||
(`pp.AddrInfoFromString`).
|
||||
- ⚠️ La validation reste syntaxique uniquement : l'origine du message (l'émetteur
|
||||
est-il un natif légitime ?) n'est pas vérifiée.
|
||||
- ⚠️ Si le natif redémarre, il perd son abonnement et manque les messages publiés
|
||||
pendant son absence. La re-hydratation depuis la DHT compense, mais avec un délai
|
||||
pouvant aller jusqu'à 30 s.
|
||||
- ⚠️ Le gossip ne porte que le `Addr` de l'indexeur, pas sa TTL ni sa signature.
|
||||
|
||||
---
|
||||
|
||||
### 3.9 Streams applicatifs (node ↔ node)
|
||||
|
||||
**Fonctionnement**
|
||||
|
||||
`StreamService` gère les streams entre nœuds partenaires (relations `PARTNER` stockées
|
||||
en base) via des protocols dédiés (`/opencloud/resource/*`). Un heartbeat partenaire
|
||||
(`ProtocolHeartbeatPartner`) maintient les connexions actives. Les events sont routés
|
||||
via `handleEvent` et le système NATS en parallèle.
|
||||
|
||||
**Avantages**
|
||||
- TTL par protocol (`PersistantStream`, `WaitResponse`) adapte le comportement au
|
||||
type d'échange (longue durée pour le planner, courte pour les CRUDs).
|
||||
- La GC (`gc()` toutes les 8 s, démarrée une seule fois dans `InitStream`) libère
|
||||
rapidement les streams expirés.
|
||||
|
||||
**Limites / risques**
|
||||
- ✅ **Fuite de goroutines GC corrigée** : `HandlePartnerHeartbeat` appelait
|
||||
`go s.StartGC(30s)` à chaque heartbeat reçu (~20 s), créant un nouveau ticker
|
||||
goroutine infini à chaque appel.
|
||||
→ Appel supprimé ; la GC lancée par `InitStream` est suffisante.
|
||||
- ✅ **Boucle infinie sur EOF corrigée** : `readLoop` effectuait `s.Stream.Close();
|
||||
continue` après une erreur de décodage, re-tentant indéfiniment de lire un stream
|
||||
fermé.
|
||||
→ Remplacé par `return` ; les defers (`Close`, `delete`) nettoient correctement.
|
||||
- ⚠️ La récupération de partenaires depuis `conf.PeerIDS` est marquée `TO REMOVE` :
|
||||
présence de code provisoire en production.
|
||||
|
||||
---
|
||||
|
||||
## 4. Tableau récapitulatif
|
||||
|
||||
| Mécanisme | Protocole | Avantage principal | État du risque |
|
||||
|---|---|---|---|
|
||||
| Heartbeat node→indexer | `/opencloud/heartbeat/1.0` | Détection rapide de perte | ⚠️ Stream TCP gelé non détecté |
|
||||
| Scoring de confiance | (inline dans heartbeat) | Filtre les pairs instables | ✅ Deadlock corrigé (seuil 40/75) |
|
||||
| Enregistrement natif | `/opencloud/native/subscribe/1.0` | TTL ample, cache immédiat | ✅ TTL porté à 90 s |
|
||||
| Fetch pool d'indexeurs | `/opencloud/native/indexers/1.0` | Prend le 1er natif répondant | ⚠️ Natif au cache périmé possible |
|
||||
| Consensus | `/opencloud/native/consensus/1.0` | Majorité absolue | ⚠️ Trivial avec 1 seul natif |
|
||||
| Self-delegation + offload | (in-memory) | Disponibilité sans indexeur | ✅ Limite 50 peers + ClosePeer |
|
||||
| Mesh natif | `/opencloud/native/peers/1.0` | Gossip multi-hop | ✅ Goroutines dédupliquées |
|
||||
| DHT | `/oc/kad/1.0.0` | Persistance décentralisée | ⚠️ Retry infini, pas de bootstrap |
|
||||
| PubSub registry | `oc-indexer-registry` | Dissémination rapide | ✅ Validation multiaddr |
|
||||
| Streams applicatifs | `/opencloud/resource/*` | TTL par protocol | ✅ Fuite GC + EOF corrigés |
|
||||
|
||||
---
|
||||
|
||||
## 5. Risques et limites globaux
|
||||
|
||||
### Sécurité
|
||||
|
||||
- ⚠️ **Adresses auto-rapportées non vérifiées** : le champ `IndexersBinded` dans le heartbeat
|
||||
est auto-déclaré par le nœud et sert à calculer la diversité. Un pair malveillant peut
|
||||
gonfler son score en déclarant de fausses adresses.
|
||||
- ⚠️ **PSK comme seule barrière d'entrée** : si la PSK est compromise (elle est statique et
|
||||
fichier-based), tout l'isolement réseau saute. Il n'y a pas de rotation de clé ni
|
||||
d'authentification supplémentaire par pair.
|
||||
- ⚠️ **DHT sans ACL sur les entrées indexeur** : la signature des `PeerRecord` est vérifiée
|
||||
à la lecture, mais les `liveIndexerEntry` ne sont pas signées. La validation PubSub
|
||||
bloque les multiaddrs invalides mais pas les adresses d'indexeurs légitimes usurpées.
|
||||
|
||||
### Disponibilité
|
||||
|
||||
- ⚠️ **Single point of failure natif** : avec un seul natif, la perte de celui-ci stoppe
|
||||
toute attribution d'indexeurs. `retryLostNative` pallie, mais sans indexeurs, les nœuds
|
||||
ne peuvent pas publier.
|
||||
- ⚠️ **Bootstrap DHT** : sans nœuds bootstrap explicites, la DHT met du temps à converger
|
||||
si les connexions initiales sont peu nombreuses.
|
||||
|
||||
### Cohérence
|
||||
|
||||
- ⚠️ **`replaceStaticIndexers` n'efface jamais** : d'anciens indexeurs morts restent dans
|
||||
`StaticIndexers` jusqu'à ce que le heartbeat échoue. Un nœud peut avoir un pool
|
||||
surévalué contenant des entrées inatteignables.
|
||||
- ⚠️ **`TimeWatcher` global** : défini une seule fois au démarrage de `ConnectToIndexers`.
|
||||
Si l'indexeur tourne depuis longtemps, les nouveaux nœuds auront un `uptime_ratio`
|
||||
durablement faible. Le seuil abaissé à 40 pour le premier heartbeat atténue l'impact
|
||||
initial, mais les heartbeats suivants devront accumuler un uptime suffisant.
|
||||
|
||||
---
|
||||
|
||||
## 6. Pistes d'amélioration
|
||||
|
||||
Les pistes déjà implémentées sont marquées ✅. Les pistes ouvertes restent à traiter.
|
||||
|
||||
### ✅ Score : double seuil pour les nouveaux peers
|
||||
~~Remplacer le seuil binaire~~ — **Implémenté** : seuil à 40 pour le premier heartbeat
|
||||
(peer absent de `StreamRecords`), 75 pour les suivants. Un peer peut désormais être admis
|
||||
dès sa première connexion sans bloquer sur l'uptime nul.
|
||||
_Fichier : `common/common_stream.go`, `CheckHeartbeat`_
|
||||
|
||||
### ✅ TTL indexeur aligné avec l'intervalle de renouvellement
|
||||
~~TTL de 66 s trop proche de 60 s~~ — **Implémenté** : `IndexerTTL` passé à **90 s**.
|
||||
_Fichier : `indexer/native.go`_
|
||||
|
||||
### ✅ Limite de la self-delegation
|
||||
~~`responsiblePeers` illimité~~ — **Implémenté** : `selfDelegate` retourne `false` quand
|
||||
`len(responsiblePeers) >= maxFallbackPeers` (50). Le site d'appel retourne une réponse
|
||||
vide et logue un warning.
|
||||
_Fichier : `indexer/native.go`_
|
||||
|
||||
### ✅ Validation PubSub des adresses gossipées
|
||||
~~`TopicValidator` accepte tout~~ — **Implémenté** : le validateur vérifie que le message
|
||||
est un multiaddr parseable via `pp.AddrInfoFromString`.
|
||||
_Fichier : `indexer/native.go`, `subscribeIndexerRegistry`_
|
||||
|
||||
### ✅ Goroutines heartbeat dédupliquées dans `EnsureNativePeers`
|
||||
~~Une goroutine par adresse native~~ — **Implémenté** : `nativeMeshHeartbeatOnce`
|
||||
garantit qu'une seule goroutine `SendHeartbeat` couvre toute la map `StaticNatives`.
|
||||
_Fichier : `common/native_stream.go`_
|
||||
|
||||
### ✅ Fuite de goroutines GC dans `HandlePartnerHeartbeat`
|
||||
~~`go s.StartGC(30s)` à chaque heartbeat~~ — **Implémenté** : appel supprimé ; la GC
|
||||
de `InitStream` est suffisante.
|
||||
_Fichier : `stream/service.go`_
|
||||
|
||||
### ✅ Boucle infinie sur EOF dans `readLoop`
|
||||
~~`continue` après `Stream.Close()`~~ — **Implémenté** : remplacé par `return` pour
|
||||
laisser les defers nettoyer proprement.
|
||||
_Fichier : `stream/service.go`_
|
||||
|
||||
---
|
||||
|
||||
### ⚠️ Fetch pool : interroger tous les natifs en parallèle
|
||||
|
||||
`fetchIndexersFromNative` s'arrête au premier natif répondant. Interroger tous les natifs
|
||||
en parallèle et fusionner les listes (similairement à `clientSideConsensus`) éviterait
|
||||
qu'un natif au cache périmé fournisse un pool sous-optimal.
|
||||
|
||||
### ⚠️ Consensus avec quorum configurable
|
||||
|
||||
Le seuil de confirmation (`count*2 > total`) est câblé en dur. Le rendre configurable
|
||||
(ex. `consensus_quorum: 0.67`) permettrait de durcir la règle sur des déploiements
|
||||
à 3+ natifs sans modifier le code.
|
||||
|
||||
### ⚠️ Désenregistrement explicite
|
||||
|
||||
Ajouter un protocole `/opencloud/native/unsubscribe/1.0` : quand un indexeur s'arrête
|
||||
proprement, il notifie les natifs pour invalider son TTL immédiatement plutôt qu'attendre
|
||||
90 s.
|
||||
|
||||
### ⚠️ Bootstrap DHT explicite
|
||||
|
||||
Configurer les natifs comme nœuds bootstrap DHT via `dht.BootstrapPeers` pour accélérer
|
||||
la convergence Kademlia au démarrage.
|
||||
|
||||
### ⚠️ Context propagé dans les goroutines longue durée
|
||||
|
||||
`retryLostNative`, `refreshIndexersFromDHT` et `runOffloadLoop` ne reçoivent aucun
|
||||
`context.Context`. Les passer depuis `InitNative` permettrait un arrêt propre lors du
|
||||
shutdown du processus.
|
||||
|
||||
### ⚠️ Redirection explicite lors du refus de self-delegation
|
||||
|
||||
Quand un natif refuse la self-delegation (pool saturé), retourner vide force le nœud à
|
||||
réessayer sans lui indiquer vers qui se tourner. Une liste de natifs alternatifs dans la
|
||||
réponse (`AlternativeNatives []string`) permettrait au nœud de trouver directement un
|
||||
natif moins chargé.
|
||||
@@ -9,8 +9,7 @@ type Config struct {
|
||||
PublicKeyPath string
|
||||
PrivateKeyPath string
|
||||
NodeEndpointPort int64
|
||||
IndexerAddresses string
|
||||
NativeIndexerAddresses string // multiaddrs of native indexers, comma-separated; bypasses IndexerAddresses when set
|
||||
IndexerAddresses string
|
||||
|
||||
PeerIDS string // TO REMOVE
|
||||
|
||||
@@ -18,6 +17,19 @@ type Config struct {
|
||||
|
||||
MinIndexer int
|
||||
MaxIndexer int
|
||||
// SearchTimeout is the max duration without a new result before the
|
||||
// distributed peer search stream is closed. Default: 5s.
|
||||
SearchTimeout int // seconds; 0 → use default (5)
|
||||
|
||||
// Indexer connection burst guard: max new connections accepted within the window.
|
||||
// 0 → use defaults (20 new peers per 30s).
|
||||
MaxConnPerWindow int // default 20
|
||||
ConnWindowSecs int // default 30
|
||||
|
||||
// Per-node behavioral limits (sliding 60s window). 0 → use built-in defaults.
|
||||
MaxHBPerMinute int // default 5
|
||||
MaxPublishPerMinute int // default 10
|
||||
MaxGetPerMinute int // default 50
|
||||
}
|
||||
|
||||
var instance *Config
|
||||
|
||||
294
daemons/node/common/common_cache.go
Normal file
294
daemons/node/common/common_cache.go
Normal file
@@ -0,0 +1,294 @@
|
||||
package common
|
||||
|
||||
import (
|
||||
"errors"
|
||||
"sync"
|
||||
"time"
|
||||
|
||||
pp "github.com/libp2p/go-libp2p/core/peer"
|
||||
"github.com/libp2p/go-libp2p/core/protocol"
|
||||
)
|
||||
|
||||
type Score struct {
|
||||
FirstContacted time.Time
|
||||
UptimeTracker *UptimeTracker
|
||||
LastFillRate float64
|
||||
Score float64
|
||||
// IsSeed marks indexers that came from the IndexerAddresses static config.
|
||||
// Seeds are sticky: they are never evicted by the score threshold alone.
|
||||
// A seed is only removed when: (a) heartbeat fails, or (b) it sends
|
||||
// SuggestMigrate and the node already has MinIndexer non-seed alternatives.
|
||||
IsSeed bool
|
||||
// challenge bookkeeping (2-3 peers per batch, raw data returned by indexer)
|
||||
hbCount int // heartbeats sent since last challenge batch
|
||||
nextChallenge int // send challenges when hbCount reaches this (rand 1-10)
|
||||
challengeTotal int // number of own-PeerID challenges sent (ground truth)
|
||||
challengeCorrect int // own PeerID found AND lastSeen within 2×interval
|
||||
// fill rate consistency: cross-check reported fillRate vs peerCount/maxNodes
|
||||
fillChecked int
|
||||
fillConsistent int
|
||||
// BornAt stability
|
||||
LastBornAt time.Time
|
||||
bornAtChanges int
|
||||
// DHT challenge
|
||||
dhtChecked int
|
||||
dhtSuccess int
|
||||
dhtBatchCounter int
|
||||
// Peer witnesses
|
||||
witnessChecked int
|
||||
witnessConsistent int
|
||||
}
|
||||
|
||||
// computeNodeSideScore computes the node's quality assessment of an indexer from raw metrics.
|
||||
// All ratios are in [0,1]; result is in [0,100].
|
||||
// - uptimeRatio : gap-aware fraction of lifetime the indexer was reachable
|
||||
// - challengeAccuracy: own-PeerID challenges answered correctly (found + recent lastSeen)
|
||||
// - latencyScore : 1 - RTT/maxRTT, clamped [0,1]
|
||||
// - fillScore : 1 - fillRate — prefer less-loaded indexers
|
||||
// - fillConsistency : fraction of ticks where peerCount/maxNodes ≈ fillRate (±10%)
|
||||
func (s *Score) ComputeNodeSideScore(latencyScore float64) float64 {
|
||||
uptime := s.UptimeTracker.UptimeRatio()
|
||||
challengeAccuracy := 1.0
|
||||
if s.challengeTotal > 0 {
|
||||
challengeAccuracy = float64(s.challengeCorrect) / float64(s.challengeTotal)
|
||||
}
|
||||
fillScore := 1.0 - s.LastFillRate
|
||||
fillConsistency := 1.0
|
||||
if s.fillChecked > 0 {
|
||||
fillConsistency = float64(s.fillConsistent) / float64(s.fillChecked)
|
||||
}
|
||||
witnessConsistency := 1.0
|
||||
if s.witnessChecked > 0 {
|
||||
witnessConsistency = float64(s.witnessConsistent) / float64(s.witnessChecked)
|
||||
}
|
||||
dhtSuccessRate := 1.0
|
||||
if s.dhtChecked > 0 {
|
||||
dhtSuccessRate = float64(s.dhtSuccess) / float64(s.dhtChecked)
|
||||
}
|
||||
base := ((0.20 * uptime) +
|
||||
(0.20 * challengeAccuracy) +
|
||||
(0.15 * latencyScore) +
|
||||
(0.10 * fillScore) +
|
||||
(0.10 * fillConsistency) +
|
||||
(0.15 * witnessConsistency) +
|
||||
(0.10 * dhtSuccessRate)) * 100
|
||||
// BornAt stability: each unexpected BornAt change penalises by 30%.
|
||||
bornAtPenalty := 1.0 - 0.30*float64(s.bornAtChanges)
|
||||
if bornAtPenalty < 0 {
|
||||
bornAtPenalty = 0
|
||||
}
|
||||
return base * bornAtPenalty
|
||||
}
|
||||
|
||||
type Directory struct {
|
||||
MuAddr sync.RWMutex
|
||||
MuScore sync.RWMutex
|
||||
MuStream sync.RWMutex
|
||||
Addrs map[string]*pp.AddrInfo
|
||||
Scores map[string]*Score
|
||||
Nudge chan struct{}
|
||||
Streams ProtocolStream
|
||||
}
|
||||
|
||||
func (d *Directory) ExistsScore(a string) bool {
|
||||
d.MuScore.RLock()
|
||||
defer d.MuScore.RUnlock()
|
||||
for addr, ai := range d.Scores {
|
||||
if ai != nil && (a == addr) {
|
||||
return true
|
||||
}
|
||||
}
|
||||
return false
|
||||
}
|
||||
|
||||
func (d *Directory) GetScore(a string) *Score {
|
||||
d.MuScore.RLock()
|
||||
defer d.MuScore.RUnlock()
|
||||
for addr, s := range d.Scores {
|
||||
if s != nil && (a == addr) {
|
||||
sCopy := *s
|
||||
return &sCopy
|
||||
}
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
func (d *Directory) GetScores() map[string]*Score {
|
||||
d.MuScore.RLock()
|
||||
defer d.MuScore.RUnlock()
|
||||
score := map[string]*Score{}
|
||||
for addr, s := range d.Scores {
|
||||
score[addr] = s
|
||||
}
|
||||
return score
|
||||
}
|
||||
|
||||
func (d *Directory) DeleteScore(a string) {
|
||||
d.MuScore.RLock()
|
||||
defer d.MuScore.RUnlock()
|
||||
score := map[string]*Score{}
|
||||
for addr, s := range d.Scores {
|
||||
if a != addr {
|
||||
score[addr] = s
|
||||
}
|
||||
}
|
||||
d.Scores = score
|
||||
}
|
||||
|
||||
func (d *Directory) SetScore(addr string, score *Score) *pp.AddrInfo {
|
||||
d.MuScore.Lock()
|
||||
defer d.MuScore.Unlock()
|
||||
d.Scores[addr] = score
|
||||
return nil
|
||||
}
|
||||
|
||||
func (d *Directory) ExistsAddr(addrOrId string) bool {
|
||||
d.MuAddr.RLock()
|
||||
defer d.MuAddr.RUnlock()
|
||||
for addr, ai := range d.Addrs {
|
||||
if ai != nil && (addrOrId == ai.ID.String() || addrOrId == addr) {
|
||||
return true
|
||||
}
|
||||
}
|
||||
return false
|
||||
}
|
||||
|
||||
func (d *Directory) GetAddr(addrOrId string) *pp.AddrInfo {
|
||||
d.MuAddr.RLock()
|
||||
defer d.MuAddr.RUnlock()
|
||||
for addr, ai := range d.Addrs {
|
||||
if ai != nil && (addrOrId == ai.ID.String() || addrOrId == addr) {
|
||||
aiCopy := *ai
|
||||
return &aiCopy
|
||||
}
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
func (d *Directory) DeleteAddr(a string) {
|
||||
d.MuAddr.RLock()
|
||||
defer d.MuAddr.RUnlock()
|
||||
addrs := map[string]*pp.AddrInfo{}
|
||||
for addr, s := range d.Addrs {
|
||||
if a != addr {
|
||||
addrs[addr] = s
|
||||
}
|
||||
}
|
||||
d.Addrs = addrs
|
||||
}
|
||||
|
||||
func (d *Directory) SetAddr(addr string, info *pp.AddrInfo) *pp.AddrInfo {
|
||||
d.MuAddr.Lock()
|
||||
defer d.MuAddr.Unlock()
|
||||
d.Addrs[addr] = info
|
||||
return nil
|
||||
}
|
||||
|
||||
func (d *Directory) GetAddrIDs() []pp.ID {
|
||||
d.MuAddr.RLock()
|
||||
defer d.MuAddr.RUnlock()
|
||||
indexers := make([]pp.ID, 0, len(d.Addrs))
|
||||
for _, ai := range d.Addrs {
|
||||
if ai != nil {
|
||||
indexers = append(indexers, ai.ID)
|
||||
}
|
||||
}
|
||||
return Shuffle(indexers)
|
||||
}
|
||||
|
||||
func (d *Directory) GetAddrsStr() []string {
|
||||
d.MuAddr.RLock()
|
||||
defer d.MuAddr.RUnlock()
|
||||
indexers := make([]string, 0, len(d.Addrs))
|
||||
for s, ai := range d.Addrs {
|
||||
if ai != nil {
|
||||
indexers = append(indexers, s)
|
||||
}
|
||||
}
|
||||
|
||||
return Shuffle(indexers)
|
||||
}
|
||||
|
||||
type Entry struct {
|
||||
Addr string
|
||||
Info *pp.AddrInfo
|
||||
}
|
||||
|
||||
func (d *Directory) GetAddrs() []Entry {
|
||||
d.MuAddr.RLock()
|
||||
defer d.MuAddr.RUnlock()
|
||||
indexers := make([]Entry, 0, len(d.Addrs))
|
||||
for addr, ai := range d.Addrs {
|
||||
if ai != nil {
|
||||
indexers = append(indexers, Entry{
|
||||
Addr: addr,
|
||||
Info: ai,
|
||||
})
|
||||
}
|
||||
}
|
||||
return Shuffle(indexers)
|
||||
}
|
||||
|
||||
// NudgeIndexerHeartbeat signals the indexer heartbeat goroutine to fire immediately.
|
||||
func (d *Directory) NudgeIt() {
|
||||
select {
|
||||
case d.Nudge <- struct{}{}:
|
||||
default: // nudge already pending, skip
|
||||
}
|
||||
}
|
||||
|
||||
type ProtocolStream map[protocol.ID]map[pp.ID]*Stream
|
||||
|
||||
func (ps ProtocolStream) Get(protocol protocol.ID) map[pp.ID]*Stream {
|
||||
if ps[protocol] == nil {
|
||||
ps[protocol] = map[pp.ID]*Stream{}
|
||||
}
|
||||
|
||||
return ps[protocol]
|
||||
}
|
||||
|
||||
func (ps ProtocolStream) GetPerID(protocol protocol.ID, peerID pp.ID) *Stream {
|
||||
if ps[protocol] == nil {
|
||||
ps[protocol] = map[pp.ID]*Stream{}
|
||||
}
|
||||
return ps[protocol][peerID]
|
||||
}
|
||||
|
||||
func (ps ProtocolStream) Add(protocol protocol.ID, peerID *pp.ID, s *Stream) error {
|
||||
if ps[protocol] == nil {
|
||||
ps[protocol] = map[pp.ID]*Stream{}
|
||||
}
|
||||
if peerID != nil {
|
||||
if s != nil {
|
||||
ps[protocol][*peerID] = s
|
||||
} else {
|
||||
return errors.New("unable to add stream : stream missing")
|
||||
}
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
func (ps ProtocolStream) Delete(protocol protocol.ID, peerID *pp.ID) {
|
||||
if streams, ok := ps[protocol]; ok {
|
||||
if peerID != nil && streams[*peerID] != nil && streams[*peerID].Stream != nil {
|
||||
streams[*peerID].Stream.Close()
|
||||
delete(streams, *peerID)
|
||||
} else {
|
||||
for _, s := range ps {
|
||||
for _, v := range s {
|
||||
if v.Stream != nil {
|
||||
v.Stream.Close()
|
||||
}
|
||||
}
|
||||
}
|
||||
delete(ps, protocol)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
var Indexers = &Directory{
|
||||
Addrs: map[string]*pp.AddrInfo{},
|
||||
Scores: map[string]*Score{},
|
||||
Nudge: make(chan struct{}, 1),
|
||||
Streams: ProtocolStream{},
|
||||
}
|
||||
255
daemons/node/common/common_heartbeat.go
Normal file
255
daemons/node/common/common_heartbeat.go
Normal file
@@ -0,0 +1,255 @@
|
||||
package common
|
||||
|
||||
import (
|
||||
"context"
|
||||
"encoding/json"
|
||||
"io"
|
||||
"time"
|
||||
|
||||
"github.com/libp2p/go-libp2p/core/host"
|
||||
"github.com/libp2p/go-libp2p/core/network"
|
||||
pp "github.com/libp2p/go-libp2p/core/peer"
|
||||
|
||||
oclib "cloud.o-forge.io/core/oc-lib"
|
||||
)
|
||||
|
||||
type Heartbeat struct {
|
||||
Name string `json:"name"`
|
||||
Stream *Stream `json:"stream"`
|
||||
DID string `json:"did"`
|
||||
PeerID string `json:"peer_id"`
|
||||
Timestamp int64 `json:"timestamp"`
|
||||
IndexersBinded []string `json:"indexers_binded"`
|
||||
Score float64
|
||||
// Record carries a fresh signed PeerRecord (JSON) so the receiving indexer
|
||||
// can republish it to the DHT without an extra round-trip.
|
||||
// Only set by nodes (not indexers heartbeating other indexers).
|
||||
Record json.RawMessage `json:"record,omitempty"`
|
||||
// Need is how many more indexers this node wants (MaxIndexer - current pool size).
|
||||
// The receiving indexer uses this to know how many suggestions to return.
|
||||
// 0 means the pool is full — no suggestions needed unless SuggestMigrate.
|
||||
Need int `json:"need,omitempty"`
|
||||
// Challenges is a list of PeerIDs the node asks the indexer to spot-check.
|
||||
// Always includes the node's own PeerID (ground truth) + up to 2 additional
|
||||
// known peers. Nil means no challenge this tick.
|
||||
Challenges []string `json:"challenges,omitempty"`
|
||||
// ChallengeDID asks the indexer to retrieve this DID from the DHT (every 5th batch).
|
||||
ChallengeDID string `json:"challenge_did,omitempty"`
|
||||
// Referent marks this indexer as the node's designated search referent.
|
||||
// Only one indexer per node receives Referent=true at a time (the best-scored one).
|
||||
// The indexer stores the node in its referencedNodes for distributed search.
|
||||
Referent bool `json:"referent,omitempty"`
|
||||
}
|
||||
|
||||
// SearchPeerRequest is sent by a node to an indexer via ProtocolSearchPeer.
|
||||
// The indexer broadcasts it on the GossipSub search mesh and streams results back.
|
||||
type SearchPeerRequest struct {
|
||||
QueryID string `json:"query_id"`
|
||||
// At least one of PeerID, DID, Name must be set.
|
||||
PeerID string `json:"peer_id,omitempty"`
|
||||
DID string `json:"did,omitempty"`
|
||||
Name string `json:"name,omitempty"`
|
||||
}
|
||||
|
||||
// SearchQuery is broadcast on TopicSearchPeer by the receiving indexer.
|
||||
// EmitterID is the indexer's own PeerID — responding indexers open a
|
||||
// ProtocolSearchPeerResponse stream back to it.
|
||||
type SearchQuery struct {
|
||||
QueryID string `json:"query_id"`
|
||||
PeerID string `json:"peer_id,omitempty"`
|
||||
DID string `json:"did,omitempty"`
|
||||
Name string `json:"name,omitempty"`
|
||||
EmitterID string `json:"emitter_id"`
|
||||
}
|
||||
|
||||
// SearchPeerResult is sent by a responding indexer to the emitting indexer
|
||||
// via ProtocolSearchPeerResponse, and forwarded by the emitting indexer to
|
||||
// the node on the open ProtocolSearchPeer stream.
|
||||
type SearchPeerResult struct {
|
||||
QueryID string `json:"query_id"`
|
||||
Records []SearchHit `json:"records"`
|
||||
}
|
||||
|
||||
// SearchHit is a single peer found during distributed search.
|
||||
type SearchHit struct {
|
||||
PeerID string `json:"peer_id"`
|
||||
DID string `json:"did"`
|
||||
Name string `json:"name"`
|
||||
}
|
||||
|
||||
// ChallengeEntry is the indexer's raw answer for one challenged peer.
|
||||
type ChallengeEntry struct {
|
||||
PeerID string `json:"peer_id"`
|
||||
Found bool `json:"found"`
|
||||
LastSeen time.Time `json:"last_seen,omitempty"` // zero if not found
|
||||
}
|
||||
|
||||
// HeartbeatResponse carries raw metrics only — no pre-cooked score.
|
||||
type HeartbeatResponse struct {
|
||||
FillRate float64 `json:"fill_rate"`
|
||||
PeerCount int `json:"peer_count"`
|
||||
MaxNodes int `json:"max_nodes"` // capacity — lets node cross-check fillRate
|
||||
BornAt time.Time `json:"born_at"`
|
||||
Challenges []ChallengeEntry `json:"challenges,omitempty"`
|
||||
// DHTFound / DHTPayload: response to a ChallengeDID request.
|
||||
DHTFound bool `json:"dht_found,omitempty"`
|
||||
DHTPayload json.RawMessage `json:"dht_payload,omitempty"`
|
||||
// Witnesses: random sample of connected nodes so the querying node can cross-check.
|
||||
Witnesses []pp.AddrInfo `json:"witnesses,omitempty"`
|
||||
// Suggestions: better indexers this indexer knows about via its DHT cache.
|
||||
// The node should open heartbeat connections to these (they become StaticIndexers).
|
||||
Suggestions []pp.AddrInfo `json:"suggestions,omitempty"`
|
||||
// SuggestMigrate: set when this indexer is overloaded (fill rate > threshold)
|
||||
// and is actively trying to hand the node off to the Suggestions list.
|
||||
// Seeds: node de-stickies this indexer once it has MinIndexer non-seed alternatives.
|
||||
// Non-seeds: node removes this indexer immediately if it has enough alternatives.
|
||||
SuggestMigrate bool `json:"suggest_migrate,omitempty"`
|
||||
}
|
||||
|
||||
// ComputeIndexerScore computes a composite quality score [0, 100] for the connecting peer.
|
||||
// - uptimeRatio: fraction of tracked lifetime online (gap-aware) — peer reliability
|
||||
// - bpms: bandwidth normalized to MaxExpectedMbps — link capacity
|
||||
// - diversity: indexer's own /24 subnet diversity — network topology quality
|
||||
// - latencyScore: 1 - RTT/maxRoundTrip — link responsiveness
|
||||
// - fillRate: fraction of indexer slots used (0=empty, 1=full) — collective trust signal:
|
||||
// a fuller indexer has been chosen and retained by many peers, which is evidence of quality.
|
||||
func (hb *Heartbeat) ComputeIndexerScore(uptimeRatio float64, bpms float64, diversity float64, latencyScore float64, fillRate float64) {
|
||||
hb.Score = ((0.20 * uptimeRatio) +
|
||||
(0.20 * bpms) +
|
||||
(0.20 * diversity) +
|
||||
(0.15 * latencyScore) +
|
||||
(0.25 * fillRate)) * 100
|
||||
}
|
||||
|
||||
type HeartbeatInfo []struct {
|
||||
Info []byte `json:"info"`
|
||||
}
|
||||
|
||||
// WitnessRequest is sent by a node to a peer to ask its view of a given indexer.
|
||||
type WitnessRequest struct {
|
||||
IndexerPeerID string `json:"indexer_peer_id"`
|
||||
}
|
||||
|
||||
// WitnessReport is returned by a peer in response to a WitnessRequest.
|
||||
type WitnessReport struct {
|
||||
Seen bool `json:"seen"`
|
||||
BornAt time.Time `json:"born_at,omitempty"`
|
||||
FillRate float64 `json:"fill_rate,omitempty"`
|
||||
Score float64 `json:"score,omitempty"`
|
||||
}
|
||||
|
||||
// HandleBandwidthProbe echoes back everything written on the stream, then closes.
|
||||
// It is registered by all participants so the measuring side (the heartbeat receiver)
|
||||
// can open a dedicated probe stream and read the round-trip latency + throughput.
|
||||
func HandleBandwidthProbe(s network.Stream) {
|
||||
defer s.Close()
|
||||
s.SetDeadline(time.Now().Add(10 * time.Second))
|
||||
io.Copy(s, s) // echo every byte back to the sender
|
||||
}
|
||||
|
||||
// HandleWitnessQuery answers a witness query: the caller wants to know
|
||||
// what this node thinks of a given indexer (identified by its PeerID).
|
||||
func HandleWitnessQuery(h host.Host, s network.Stream) {
|
||||
defer s.Close()
|
||||
s.SetDeadline(time.Now().Add(5 * time.Second))
|
||||
var req WitnessRequest
|
||||
if err := json.NewDecoder(s).Decode(&req); err != nil {
|
||||
return
|
||||
}
|
||||
report := WitnessReport{}
|
||||
for _, ai := range Indexers.GetAddrs() {
|
||||
if ai.Info == nil || ai.Info.ID.String() != req.IndexerPeerID {
|
||||
continue
|
||||
}
|
||||
if score := Indexers.GetScore(addrKey(*ai.Info)); score != nil {
|
||||
report.Seen = true
|
||||
report.BornAt = score.LastBornAt
|
||||
report.FillRate = score.LastFillRate
|
||||
report.Score = score.Score
|
||||
}
|
||||
break
|
||||
}
|
||||
json.NewEncoder(s).Encode(report)
|
||||
}
|
||||
|
||||
// SupportsHeartbeat probes pid with a short-lived stream to verify it has
|
||||
// a ProtocolHeartbeat handler (i.e. it is an indexer, not a plain node).
|
||||
// Only protocol negotiation is performed — no data is sent.
|
||||
// Returns false on any error, including "protocol not supported".
|
||||
func SupportsHeartbeat(h host.Host, pid pp.ID) bool {
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 3*time.Second)
|
||||
defer cancel()
|
||||
s, err := h.NewStream(ctx, pid, ProtocolHeartbeat)
|
||||
if err != nil {
|
||||
return false
|
||||
}
|
||||
s.Reset()
|
||||
return true
|
||||
}
|
||||
|
||||
// queryWitnesses contacts each witness in parallel, collects their view of the
|
||||
// indexer, and updates score.witnessChecked / score.witnessConsistent.
|
||||
// Called in a goroutine — must not hold any lock.
|
||||
func queryWitnesses(h host.Host, indexerPeerID string, indexerBornAt time.Time, indexerFillRate float64, witnesses []pp.AddrInfo, score *Score) {
|
||||
logger := oclib.GetLogger()
|
||||
type result struct{ consistent bool }
|
||||
results := make(chan result, len(witnesses))
|
||||
|
||||
for _, ai := range witnesses {
|
||||
if ai.ID == h.ID() {
|
||||
// Never query ourselves — skip and count as inconclusive.
|
||||
results <- result{}
|
||||
continue
|
||||
}
|
||||
go func(ai pp.AddrInfo) {
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
|
||||
defer cancel()
|
||||
s, err := h.NewStream(ctx, ai.ID, ProtocolWitnessQuery)
|
||||
if err != nil {
|
||||
results <- result{}
|
||||
return
|
||||
}
|
||||
defer s.Close()
|
||||
s.SetDeadline(time.Now().Add(5 * time.Second))
|
||||
if err := json.NewEncoder(s).Encode(WitnessRequest{IndexerPeerID: indexerPeerID}); err != nil {
|
||||
results <- result{}
|
||||
return
|
||||
}
|
||||
var rep WitnessReport
|
||||
if err := json.NewDecoder(s).Decode(&rep); err != nil || !rep.Seen {
|
||||
results <- result{}
|
||||
return
|
||||
}
|
||||
// BornAt must be identical (fixed timestamp).
|
||||
bornAtOK := !rep.BornAt.IsZero() && rep.BornAt.Equal(indexerBornAt)
|
||||
// FillRate coherent within ±25% (it fluctuates normally).
|
||||
diff := rep.FillRate - indexerFillRate
|
||||
if diff < 0 {
|
||||
diff = -diff
|
||||
}
|
||||
fillOK := diff < 0.25
|
||||
consistent := bornAtOK && fillOK
|
||||
logger.Debug().
|
||||
Str("witness", ai.ID.String()).
|
||||
Bool("bornAt_ok", bornAtOK).
|
||||
Bool("fill_ok", fillOK).
|
||||
Msg("witness report")
|
||||
results <- result{consistent: consistent}
|
||||
}(ai)
|
||||
}
|
||||
|
||||
checked, consistent := 0, 0
|
||||
for range witnesses {
|
||||
r := <-results
|
||||
checked++
|
||||
if r.consistent {
|
||||
consistent++
|
||||
}
|
||||
}
|
||||
|
||||
if checked == 0 {
|
||||
return
|
||||
}
|
||||
score.witnessChecked += checked
|
||||
score.witnessConsistent += consistent
|
||||
}
|
||||
589
daemons/node/common/common_indexer_hb.go
Normal file
589
daemons/node/common/common_indexer_hb.go
Normal file
@@ -0,0 +1,589 @@
|
||||
package common
|
||||
|
||||
import (
|
||||
"context"
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"math/rand"
|
||||
"strings"
|
||||
"sync/atomic"
|
||||
"time"
|
||||
|
||||
"oc-discovery/conf"
|
||||
|
||||
oclib "cloud.o-forge.io/core/oc-lib"
|
||||
|
||||
"github.com/libp2p/go-libp2p/core/host"
|
||||
"github.com/libp2p/go-libp2p/core/network"
|
||||
pp "github.com/libp2p/go-libp2p/core/peer"
|
||||
"github.com/libp2p/go-libp2p/core/protocol"
|
||||
)
|
||||
|
||||
var TimeWatcher time.Time
|
||||
|
||||
// retryRunning guards against launching multiple retryUntilSeedResponds goroutines.
|
||||
var retryRunning atomic.Bool
|
||||
|
||||
func ConnectToIndexers(h host.Host, minIndexer int, maxIndexer int, recordFn ...func() json.RawMessage) error {
|
||||
TimeWatcher = time.Now().UTC()
|
||||
logger := oclib.GetLogger()
|
||||
|
||||
// Bootstrap from IndexerAddresses seed set.
|
||||
addresses := strings.Split(conf.GetConfig().IndexerAddresses, ",")
|
||||
if len(addresses) > maxIndexer {
|
||||
addresses = addresses[0:maxIndexer]
|
||||
}
|
||||
for _, indexerAddr := range addresses {
|
||||
indexerAddr = strings.TrimSpace(indexerAddr)
|
||||
if indexerAddr == "" {
|
||||
continue
|
||||
}
|
||||
ad, err := pp.AddrInfoFromString(indexerAddr)
|
||||
if err != nil {
|
||||
logger.Err(err)
|
||||
continue
|
||||
}
|
||||
key := ad.ID.String()
|
||||
Indexers.SetAddr(key, ad)
|
||||
// Pre-create score entry with IsSeed=true so the sticky flag is set before
|
||||
// the first heartbeat tick (lazy creation in doTick would lose the flag).
|
||||
if !Indexers.ExistsScore(key) {
|
||||
Indexers.SetScore(key, &Score{
|
||||
FirstContacted: time.Now().UTC(),
|
||||
UptimeTracker: &UptimeTracker{FirstSeen: time.Now().UTC()},
|
||||
nextChallenge: rand.Intn(10) + 1,
|
||||
IsSeed: true,
|
||||
})
|
||||
}
|
||||
}
|
||||
seeds := Indexers.GetAddrs()
|
||||
indexerCount := len(seeds)
|
||||
|
||||
if indexerCount < minIndexer {
|
||||
return fmt.Errorf("you run a node without indexers... your gonna be isolated.")
|
||||
}
|
||||
|
||||
// Start long-lived heartbeat to seed indexers. The single goroutine follows
|
||||
// all subsequent StaticIndexers changes.
|
||||
SendHeartbeat(context.Background(), ProtocolHeartbeat, conf.GetConfig().Name,
|
||||
h, Indexers, 20*time.Second, maxIndexer, recordFn...)
|
||||
|
||||
// Watch for inbound connections: if a peer connects to us and our pool has
|
||||
// room, probe it first to confirm it supports ProtocolHeartbeat (i.e. it is
|
||||
// an indexer). Plain nodes don't register the handler — the negotiation fails
|
||||
// instantly so we never pollute the pool with non-indexer peers.
|
||||
h.Network().Notify(&network.NotifyBundle{
|
||||
ConnectedF: func(n network.Network, c network.Conn) {
|
||||
if c.Stat().Direction != network.DirInbound {
|
||||
return
|
||||
}
|
||||
if len(Indexers.GetAddrs()) >= maxIndexer {
|
||||
return
|
||||
}
|
||||
peerID := c.RemotePeer()
|
||||
if Indexers.ExistsAddr(peerID.String()) {
|
||||
return
|
||||
}
|
||||
// Probe in a goroutine — ConnectedF must not block.
|
||||
go func(pid pp.ID) {
|
||||
if !SupportsHeartbeat(h, pid) {
|
||||
return // plain node, skip
|
||||
}
|
||||
if len(Indexers.GetAddrs()) >= maxIndexer {
|
||||
return
|
||||
}
|
||||
if Indexers.ExistsAddr(pid.String()) {
|
||||
return
|
||||
}
|
||||
addrs := h.Peerstore().Addrs(pid)
|
||||
if len(addrs) == 0 {
|
||||
return
|
||||
}
|
||||
ai := FilterLoopbackAddrs(pp.AddrInfo{ID: pid, Addrs: addrs})
|
||||
if len(ai.Addrs) == 0 {
|
||||
return
|
||||
}
|
||||
adCopy := ai
|
||||
Indexers.SetAddr(pid.String(), &adCopy)
|
||||
Indexers.NudgeIt()
|
||||
log := oclib.GetLogger()
|
||||
log.Info().Str("peer", pid.String()).
|
||||
Msg("[pool] inbound indexer peer added as candidate")
|
||||
}(peerID)
|
||||
},
|
||||
})
|
||||
|
||||
// Proactive DHT upgrade: once seeds are connected and the DHT routing table
|
||||
// is warm, discover better indexers and add them to the pool alongside the seeds.
|
||||
// Seeds stay as guaranteed anchors; scoring will demote poor performers over time.
|
||||
go func(seeds []Entry) {
|
||||
// Let seed connections establish and the DHT routing table warm up.
|
||||
time.Sleep(5 * time.Second)
|
||||
// For pure nodes (no IndexerService), spin up a lightweight DHT client.
|
||||
if discoveryDHT == nil {
|
||||
if len(seeds) == 0 {
|
||||
return
|
||||
}
|
||||
initNodeDHT(h, seeds)
|
||||
}
|
||||
if discoveryDHT == nil {
|
||||
return
|
||||
}
|
||||
current := len(Indexers.GetAddrs())
|
||||
need := maxIndexer - current
|
||||
if need <= 0 {
|
||||
need = maxIndexer / 2 // diversify even when pool is already at capacity
|
||||
}
|
||||
logger.Info().Int("need", need).Msg("[dht] proactive indexer discovery from DHT")
|
||||
replenishIndexersFromDHT(h, need)
|
||||
}(seeds)
|
||||
|
||||
return nil
|
||||
}
|
||||
|
||||
// reconnectToSeeds re-adds the configured seed indexers to StaticIndexers as
|
||||
// sticky fallback entries. Called when the pool drops to zero so the node
|
||||
// never becomes completely isolated.
|
||||
func reconnectToSeeds() {
|
||||
logger := oclib.GetLogger()
|
||||
logger.Warn().Msg("[pool] all indexers lost, reconnecting to configured seeds")
|
||||
addresses := strings.Split(conf.GetConfig().IndexerAddresses, ",")
|
||||
for _, addrStr := range addresses {
|
||||
addrStr = strings.TrimSpace(addrStr)
|
||||
if addrStr == "" {
|
||||
continue
|
||||
}
|
||||
ad, err := pp.AddrInfoFromString(addrStr)
|
||||
if err != nil {
|
||||
continue
|
||||
}
|
||||
key := ad.ID.String()
|
||||
Indexers.SetAddr(key, ad)
|
||||
if score := Indexers.GetScore(key); score == nil {
|
||||
Indexers.SetScore(key, &Score{
|
||||
FirstContacted: time.Now().UTC(),
|
||||
UptimeTracker: &UptimeTracker{FirstSeen: time.Now().UTC()},
|
||||
nextChallenge: rand.Intn(10) + 1,
|
||||
IsSeed: true,
|
||||
})
|
||||
} else {
|
||||
// Restore sticky flag so the seed is not immediately re-ejected.
|
||||
score.IsSeed = true
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// retryUntilSeedResponds loops with exponential backoff until at least one
|
||||
// configured seed is reachable again. Once seeds are back in the pool it
|
||||
// nudges the heartbeat loop and lets the normal DHT upgrade path take over.
|
||||
// Should be called in a goroutine — it blocks until the situation resolves.
|
||||
// Panics immediately if no seeds are configured: there is nothing to wait for.
|
||||
func retryUntilSeedResponds() {
|
||||
if !retryRunning.CompareAndSwap(false, true) {
|
||||
return // another goroutine is already running the retry loop
|
||||
}
|
||||
defer retryRunning.Store(false)
|
||||
|
||||
logger := oclib.GetLogger()
|
||||
rawAddresses := strings.TrimSpace(conf.GetConfig().IndexerAddresses)
|
||||
if rawAddresses == "" {
|
||||
// No seeds configured: rely on the inbound-connection notifee to fill
|
||||
// the pool. Just wait patiently — the loop below will return as soon
|
||||
// as any peer connects and NudgeIt() is called.
|
||||
logger.Warn().Msg("[pool] pool empty and no seeds configured — waiting for inbound indexer")
|
||||
}
|
||||
backoff := 10 * time.Second
|
||||
const maxBackoff = 5 * time.Minute
|
||||
for {
|
||||
time.Sleep(backoff)
|
||||
if backoff < maxBackoff {
|
||||
backoff *= 2
|
||||
}
|
||||
// Check whether someone else already refilled the pool.
|
||||
if len(Indexers.GetAddrs()) > 0 {
|
||||
logger.Info().Msg("[pool] pool refilled externally, stopping seed retry")
|
||||
return
|
||||
}
|
||||
logger.Warn().Dur("backoff", backoff).Msg("[pool] still isolated, retrying seeds")
|
||||
reconnectToSeeds()
|
||||
if len(Indexers.GetAddrs()) > 0 {
|
||||
Indexers.NudgeIt()
|
||||
// Re-bootstrap DHT now that we have at least one connection candidate.
|
||||
if discoveryDHT != nil {
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 15*time.Second)
|
||||
discoveryDHT.Bootstrap(ctx) //nolint:errcheck
|
||||
cancel()
|
||||
}
|
||||
return
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// ensureScore returns the Score for addr, creating it if absent.
|
||||
func ensureScore(d *Directory, addr string) *Score {
|
||||
if !d.ExistsScore(addr) {
|
||||
d.SetScore(addr, &Score{
|
||||
FirstContacted: time.Now().UTC(),
|
||||
UptimeTracker: &UptimeTracker{FirstSeen: time.Now().UTC()},
|
||||
nextChallenge: rand.Intn(10) + 1,
|
||||
})
|
||||
}
|
||||
return d.GetScore(addr)
|
||||
}
|
||||
|
||||
// evictPeer removes addr from directory atomically and returns a snapshot of
|
||||
// remaining AddrInfos (for consensus voter selection).
|
||||
func evictPeer(d *Directory, addr string, id pp.ID, proto protocol.ID) []pp.AddrInfo {
|
||||
d.Streams.Delete(proto, &id)
|
||||
d.DeleteAddr(addr)
|
||||
voters := make([]pp.AddrInfo, 0, len(d.Addrs))
|
||||
for _, ai := range d.GetAddrs() {
|
||||
if ai.Info != nil {
|
||||
voters = append(voters, *ai.Info)
|
||||
}
|
||||
}
|
||||
d.DeleteScore(addr)
|
||||
return voters
|
||||
}
|
||||
|
||||
// handleSuggestions adds unknown suggested indexers to the directory.
|
||||
func handleSuggestions(d *Directory, from string, suggestions []pp.AddrInfo) {
|
||||
added := 0
|
||||
for _, sug := range suggestions {
|
||||
key := addrKey(sug)
|
||||
if !d.ExistsAddr(key) {
|
||||
cpy := sug
|
||||
d.SetAddr(key, &cpy)
|
||||
added++
|
||||
}
|
||||
}
|
||||
if added > 0 {
|
||||
logger := oclib.GetLogger()
|
||||
logger.Info().Int("added", added).Str("from", from).
|
||||
Msg("added suggested indexers from heartbeat response")
|
||||
d.NudgeIt()
|
||||
}
|
||||
}
|
||||
|
||||
// SendHeartbeat starts a goroutine that sends periodic heartbeats to peers.
|
||||
// recordFn, when provided, is called on each tick and its output is embedded in
|
||||
// the heartbeat as a fresh signed PeerRecord so the receiving indexer can
|
||||
// republish it to the DHT without an extra round-trip.
|
||||
// Pass no recordFn (or nil) for indexer→indexer / native heartbeats.
|
||||
func SendHeartbeat(ctx context.Context, proto protocol.ID, name string, h host.Host, directory *Directory, interval time.Duration, maxPool int, recordFn ...func() json.RawMessage) {
|
||||
logger := oclib.GetLogger()
|
||||
isIndexerHB := directory == Indexers
|
||||
var recFn func() json.RawMessage
|
||||
if len(recordFn) > 0 {
|
||||
recFn = recordFn[0]
|
||||
}
|
||||
go func() {
|
||||
logger.Info().Str("proto", string(proto)).Int("peers", len(directory.Addrs)).Msg("heartbeat started")
|
||||
t := time.NewTicker(interval)
|
||||
defer t.Stop()
|
||||
|
||||
// peerEntry pairs addr key with AddrInfo so doTick can update score maps directly.
|
||||
type peerEntry struct {
|
||||
addr string
|
||||
ai *pp.AddrInfo
|
||||
}
|
||||
|
||||
doTick := func() {
|
||||
addrs := directory.GetAddrsStr()
|
||||
need := maxPool - len(addrs)
|
||||
if need < 0 {
|
||||
need = 0
|
||||
}
|
||||
baseHB := Heartbeat{
|
||||
Name: name,
|
||||
PeerID: h.ID().String(),
|
||||
Timestamp: time.Now().UTC().Unix(),
|
||||
IndexersBinded: addrs,
|
||||
Need: need,
|
||||
}
|
||||
if recFn != nil {
|
||||
baseHB.Record = recFn()
|
||||
}
|
||||
// Determine the referent indexer: highest-scored one receives Referent=true
|
||||
// so it stores us in its referencedNodes for distributed search.
|
||||
var referentAddr string
|
||||
if isIndexerHB {
|
||||
var bestScore float64 = -1
|
||||
for _, ai2 := range directory.GetAddrs() {
|
||||
if s := directory.GetScore(ai2.Addr); s != nil && s.Score > bestScore {
|
||||
bestScore = s.Score
|
||||
referentAddr = ai2.Addr
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
for _, ai := range directory.GetAddrs() {
|
||||
// Build per-peer heartbeat copy so challenge injection is peer-specific.
|
||||
hb := baseHB
|
||||
if isIndexerHB && referentAddr != "" && ai.Addr == referentAddr {
|
||||
hb.Referent = true
|
||||
}
|
||||
// Ensure an IndexerScore entry exists for this peer.
|
||||
var score *Score
|
||||
if isIndexerHB {
|
||||
score = ensureScore(directory, ai.Addr)
|
||||
|
||||
// Inject challenge batch if due (random 1-10 HBs between batches).
|
||||
score.hbCount++
|
||||
if score.hbCount >= score.nextChallenge {
|
||||
// Ground truth: node's own PeerID — indexer MUST have us.
|
||||
challenges := []string{h.ID().String()}
|
||||
// Add up to 2 more known peers (other indexers) for richer data.
|
||||
// Use the already-snapshotted entries to avoid re-locking.
|
||||
for _, ai2 := range directory.GetAddrs() {
|
||||
if ai2.Addr != ai.Addr && ai2.Info != nil {
|
||||
challenges = append(challenges, ai2.Info.ID.String())
|
||||
if len(challenges) >= 3 {
|
||||
break
|
||||
}
|
||||
}
|
||||
}
|
||||
hb.Challenges = challenges
|
||||
score.hbCount = 0
|
||||
score.nextChallenge = rand.Intn(10) + 1
|
||||
score.challengeTotal++ // count own-PeerID challenge (ground truth)
|
||||
score.dhtBatchCounter++
|
||||
// DHT challenge every 5th batch: ask indexer to retrieve our own DID.
|
||||
if score.dhtBatchCounter%5 == 0 {
|
||||
var selfDID string
|
||||
if len(baseHB.Record) > 0 {
|
||||
var partial struct {
|
||||
DID string `json:"did"`
|
||||
}
|
||||
if json.Unmarshal(baseHB.Record, &partial) == nil {
|
||||
selfDID = partial.DID
|
||||
}
|
||||
}
|
||||
if selfDID != "" {
|
||||
hb.ChallengeDID = selfDID
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
resp, rtt, err := sendHeartbeat(ctx, h, proto, ai.Info, hb, directory.Streams, interval*time.Second)
|
||||
if err != nil { // Heartbeat fails
|
||||
HeartbeatFailure(h, proto, directory, ai.Addr, ai.Info, isIndexerHB, maxPool, err)
|
||||
continue
|
||||
}
|
||||
|
||||
// Update IndexerScore — uptime recorded on any successful send,
|
||||
// even if the indexer does not support bidirectional heartbeat (Fix 1).
|
||||
if isIndexerHB && score != nil {
|
||||
score.UptimeTracker.RecordHeartbeat()
|
||||
score.UptimeTracker.ConsecutiveFails = 0 // reset on success
|
||||
|
||||
maxRTT := BaseRoundTrip * 10
|
||||
latencyScore := 1.0 - float64(rtt)/float64(maxRTT)
|
||||
if latencyScore < 0 {
|
||||
latencyScore = 0
|
||||
}
|
||||
if latencyScore > 1 {
|
||||
latencyScore = 1
|
||||
}
|
||||
|
||||
// Update fill / challenge fields only when the indexer responded.
|
||||
if resp != nil {
|
||||
// BornAt stability check.
|
||||
if score.LastBornAt.IsZero() {
|
||||
score.LastBornAt = resp.BornAt
|
||||
} else if !resp.BornAt.IsZero() && !resp.BornAt.Equal(score.LastBornAt) {
|
||||
score.bornAtChanges++
|
||||
score.LastBornAt = resp.BornAt
|
||||
logger.Warn().Str("peer", ai.Info.ID.String()).
|
||||
Int("changes", score.bornAtChanges).
|
||||
Msg("indexer BornAt changed — possible restart or impersonation")
|
||||
}
|
||||
score.LastFillRate = resp.FillRate
|
||||
|
||||
// Fill rate consistency: cross-check peerCount/maxNodes vs reported fillRate.
|
||||
if resp.MaxNodes > 0 {
|
||||
expected := float64(resp.PeerCount) / float64(resp.MaxNodes)
|
||||
diff := expected - resp.FillRate
|
||||
if diff < 0 {
|
||||
diff = -diff
|
||||
}
|
||||
score.fillChecked++
|
||||
if diff < 0.1 {
|
||||
score.fillConsistent++
|
||||
}
|
||||
}
|
||||
|
||||
// Validate challenge responses. Only own-PeerID counts as ground truth.
|
||||
if len(hb.Challenges) > 0 && len(resp.Challenges) > 0 {
|
||||
ownID := h.ID().String()
|
||||
for _, ce := range resp.Challenges {
|
||||
if ce.PeerID != ownID {
|
||||
continue // informational only
|
||||
}
|
||||
recentEnough := !ce.LastSeen.IsZero() &&
|
||||
time.Since(ce.LastSeen) < 2*RecommendedHeartbeatInterval
|
||||
if ce.Found && recentEnough {
|
||||
score.challengeCorrect++
|
||||
}
|
||||
logger.Info().Str("peer", ai.Info.ID.String()).
|
||||
Bool("found", ce.Found).
|
||||
Bool("recent", recentEnough).
|
||||
Msg("own-PeerID challenge result")
|
||||
break
|
||||
}
|
||||
}
|
||||
|
||||
// DHT challenge result.
|
||||
if hb.ChallengeDID != "" {
|
||||
score.dhtChecked++
|
||||
if resp.DHTFound {
|
||||
score.dhtSuccess++
|
||||
}
|
||||
}
|
||||
|
||||
// Launch witness cross-check asynchronously (must not hold lock).
|
||||
if len(resp.Witnesses) > 0 {
|
||||
go queryWitnesses(h, ai.Info.ID.String(), resp.BornAt, resp.FillRate, resp.Witnesses, score)
|
||||
} else if resp.MaxNodes > 0 {
|
||||
// No witnesses offered. Valid if indexer only has us (PeerCount==1).
|
||||
// Cross-check: FillRate should equal 1/MaxNodes within ±10%.
|
||||
expected := 1.0 / float64(resp.MaxNodes)
|
||||
diff := resp.FillRate - expected
|
||||
if diff < 0 {
|
||||
diff = -diff
|
||||
}
|
||||
score.witnessChecked++
|
||||
if resp.PeerCount == 1 && diff < 0.1 {
|
||||
score.witnessConsistent++
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
score.Score = score.ComputeNodeSideScore(latencyScore)
|
||||
age := score.UptimeTracker.Uptime()
|
||||
minScore := dynamicMinScore(age)
|
||||
// Fix 4: grace period — at least 2 full heartbeat cycles before ejecting.
|
||||
isSeed := score.IsSeed
|
||||
// Seeds are sticky: never evicted by score alone (SuggestMigrate handles it).
|
||||
// Never eject the last indexer by score alone — we would lose all connectivity.
|
||||
belowThreshold := score.Score < minScore &&
|
||||
score.UptimeTracker.TotalOnline >= 2*RecommendedHeartbeatInterval &&
|
||||
!isSeed &&
|
||||
len(directory.Addrs) > 1
|
||||
|
||||
if belowThreshold {
|
||||
logger.Info().Str("peer", ai.Info.ID.String()).
|
||||
Float64("score", score.Score).Float64("min", minScore).
|
||||
Msg("indexer score below threshold, removing from pool")
|
||||
voters := evictPeer(directory, ai.Addr, ai.Info.ID, proto)
|
||||
need := max(maxPool-len(voters), 1)
|
||||
if len(voters) > 0 {
|
||||
go TriggerConsensus(h, voters, need)
|
||||
} else {
|
||||
go replenishIndexersFromDHT(h, need)
|
||||
}
|
||||
}
|
||||
|
||||
// Accept suggestions from this indexer — add unknown ones to the directory.
|
||||
if resp != nil && len(resp.Suggestions) > 0 {
|
||||
handleSuggestions(directory, ai.Info.ID.String(), resp.Suggestions)
|
||||
}
|
||||
|
||||
// Handle SuggestMigrate: indexer is overloaded and wants us to move.
|
||||
if resp != nil && resp.SuggestMigrate && isIndexerHB {
|
||||
nonSeedCount := 0
|
||||
for _, sc := range directory.GetScores() {
|
||||
if !sc.IsSeed {
|
||||
nonSeedCount++
|
||||
}
|
||||
}
|
||||
if nonSeedCount >= conf.GetConfig().MinIndexer {
|
||||
if isSeed {
|
||||
// Seed has offloaded us: clear sticky flag, score eviction takes over.
|
||||
score.IsSeed = false
|
||||
logger.Info().Str("peer", ai.Info.ID.String()).
|
||||
Msg("seed discharged via SuggestMigrate, de-stickied")
|
||||
} else {
|
||||
evictPeer(directory, ai.Addr, ai.Info.ID, proto)
|
||||
logger.Info().Str("peer", ai.Info.ID.String()).Msg("accepted migration from overloaded indexer")
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
for {
|
||||
select {
|
||||
case <-t.C:
|
||||
doTick()
|
||||
case <-directory.Nudge:
|
||||
if isIndexerHB {
|
||||
logger.Info().Msg("nudge received, heartbeating new indexers immediately")
|
||||
doTick()
|
||||
}
|
||||
case <-ctx.Done():
|
||||
return
|
||||
}
|
||||
}
|
||||
}()
|
||||
}
|
||||
|
||||
func HeartbeatFailure(h host.Host, proto protocol.ID, directory *Directory,
|
||||
addr string, info *pp.AddrInfo, isIndexerHB bool, maxPool int, err error) {
|
||||
logger := oclib.GetLogger()
|
||||
logger.Err(err)
|
||||
// Seeds are never evicted on heartbeat failure.
|
||||
// Keeping them in the pool lets the regular 60-second ticker retry them
|
||||
// at a natural cadence — no reconnect storm, no libp2p dial-backoff accumulation.
|
||||
// A seed will self-heal once it comes back; DHT and inbound peers fill the gap.
|
||||
if isIndexerHB {
|
||||
if score := directory.GetScore(addr); score != nil {
|
||||
if score.IsSeed {
|
||||
logger.Warn().Str("peer", info.ID.String()).
|
||||
Msg("[pool] seed heartbeat failed — keeping in pool, ticker will retry " + err.Error())
|
||||
return
|
||||
}
|
||||
// Indirect probing via other alive indexers:
|
||||
// If other indexers in the pool are still responding, they act as implicit
|
||||
// third-party witnesses confirming our connectivity is fine — the failed
|
||||
// indexer is genuinely dead, evict immediately.
|
||||
// If this is the last indexer, there is no third party. Retry up to 3 times
|
||||
// (consecutive failures tracked in UptimeTracker) before declaring it dead.
|
||||
if len(directory.GetAddrs()) <= 1 {
|
||||
score.UptimeTracker.ConsecutiveFails++
|
||||
if score.UptimeTracker.ConsecutiveFails < 3 {
|
||||
logger.Warn().Str("peer", info.ID.String()).
|
||||
Int("attempt", score.UptimeTracker.ConsecutiveFails).
|
||||
Msg("[indirect] last indexer failed, retrying before eviction")
|
||||
return
|
||||
}
|
||||
logger.Warn().Str("peer", info.ID.String()).
|
||||
Msg("[indirect] last indexer failed 3 times consecutively, evicting")
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
logger.Info().Str("peer", info.ID.String()).Str("proto", string(proto)).
|
||||
Msg("heartbeat failed, removing peer from pool : " + err.Error())
|
||||
consensusVoters := evictPeer(directory, addr, info.ID, proto)
|
||||
if isIndexerHB {
|
||||
need := maxPool - len(consensusVoters)
|
||||
if need < 1 {
|
||||
need = 1
|
||||
}
|
||||
logger.Info().Int("remaining", len(consensusVoters)).Int("need", need).Msg("pool state after removal")
|
||||
poolSize := len(directory.GetAddrs())
|
||||
if poolSize == 0 {
|
||||
// Pool is truly empty (no seeds configured or no seeds in pool).
|
||||
// Start the backoff retry loop — it will re-add seeds and nudge
|
||||
// only once a seed actually responds.
|
||||
go retryUntilSeedResponds()
|
||||
} else if len(consensusVoters) > 0 {
|
||||
go TriggerConsensus(h, consensusVoters, need)
|
||||
} else {
|
||||
go replenishIndexersFromDHT(h, need)
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -19,7 +19,8 @@ type Event struct {
|
||||
Type string `json:"type"`
|
||||
From string `json:"from"` // peerID
|
||||
|
||||
User string
|
||||
User string
|
||||
Groups []string
|
||||
|
||||
DataType int64 `json:"datatype"`
|
||||
Timestamp int64 `json:"ts"`
|
||||
@@ -112,6 +113,12 @@ func NewLongLivedPubSubService(h host.Host) *LongLivedPubSubService {
|
||||
}
|
||||
}
|
||||
|
||||
func (s *LongLivedPubSubService) GetPubSub(topicName string) *pubsub.Topic {
|
||||
s.PubsubMu.Lock()
|
||||
defer s.PubsubMu.Unlock()
|
||||
return s.LongLivedPubSubs[topicName]
|
||||
}
|
||||
|
||||
func (s *LongLivedPubSubService) processEvent(
|
||||
ctx context.Context,
|
||||
p *peer.Peer,
|
||||
@@ -123,26 +130,8 @@ func (s *LongLivedPubSubService) processEvent(
|
||||
return handler(ctx, topicName, event)
|
||||
}
|
||||
|
||||
const TopicPubSubNodeActivity = "oc-node-activity"
|
||||
const TopicPubSubSearch = "oc-node-search"
|
||||
|
||||
func (s *LongLivedPubSubService) SubscribeToNodeActivity(ps *pubsub.PubSub, f *func(context.Context, TopicNodeActivityPub, string)) error {
|
||||
ps.RegisterTopicValidator(TopicPubSubNodeActivity, func(ctx context.Context, p pp.ID, m *pubsub.Message) bool {
|
||||
return true
|
||||
})
|
||||
if topic, err := ps.Join(TopicPubSubNodeActivity); err != nil {
|
||||
return err
|
||||
} else {
|
||||
s.PubsubMu.Lock()
|
||||
defer s.PubsubMu.Unlock()
|
||||
s.LongLivedPubSubs[TopicPubSubNodeActivity] = topic
|
||||
}
|
||||
if f != nil {
|
||||
return SubscribeEvents(s, context.Background(), TopicPubSubNodeActivity, -1, *f)
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
func (s *LongLivedPubSubService) SubscribeToSearch(ps *pubsub.PubSub, f *func(context.Context, Event, string)) error {
|
||||
ps.RegisterTopicValidator(TopicPubSubSearch, func(ctx context.Context, p pp.ID, m *pubsub.Message) bool {
|
||||
return true
|
||||
@@ -151,8 +140,8 @@ func (s *LongLivedPubSubService) SubscribeToSearch(ps *pubsub.PubSub, f *func(co
|
||||
return err
|
||||
} else {
|
||||
s.PubsubMu.Lock()
|
||||
defer s.PubsubMu.Unlock()
|
||||
s.LongLivedPubSubs[TopicPubSubSearch] = topic
|
||||
s.PubsubMu.Unlock()
|
||||
}
|
||||
if f != nil {
|
||||
return SubscribeEvents(s, context.Background(), TopicPubSubSearch, -1, *f)
|
||||
@@ -182,6 +171,7 @@ func waitResults[T interface{}](s *LongLivedPubSubService, ctx context.Context,
|
||||
for {
|
||||
s.PubsubMu.Lock() // check safely if cache is actually notified subscribed to topic
|
||||
if s.LongLivedPubSubs[proto] == nil { // if not kill the loop.
|
||||
s.PubsubMu.Unlock()
|
||||
break
|
||||
}
|
||||
s.PubsubMu.Unlock()
|
||||
|
||||
183
daemons/node/common/common_scoring.go
Normal file
183
daemons/node/common/common_scoring.go
Normal file
@@ -0,0 +1,183 @@
|
||||
package common
|
||||
|
||||
import (
|
||||
"context"
|
||||
cr "crypto/rand"
|
||||
"io"
|
||||
"net"
|
||||
"slices"
|
||||
"time"
|
||||
|
||||
"github.com/libp2p/go-libp2p/core/host"
|
||||
pp "github.com/libp2p/go-libp2p/core/peer"
|
||||
)
|
||||
|
||||
const MaxExpectedMbps = 100.0
|
||||
const MinPayloadChallenge = 512
|
||||
const MaxPayloadChallenge = 2048
|
||||
const BaseRoundTrip = 400 * time.Millisecond
|
||||
|
||||
type UptimeTracker struct {
|
||||
FirstSeen time.Time
|
||||
LastSeen time.Time
|
||||
TotalOnline time.Duration
|
||||
ConsecutiveFails int // incremented on each heartbeat failure; reset to 0 on success
|
||||
}
|
||||
|
||||
// RecordHeartbeat accumulates online time gap-aware: only counts the interval if
|
||||
// the gap since the last heartbeat is within 2× the recommended interval (i.e. no
|
||||
// extended outage). Call this each time a heartbeat is successfully processed.
|
||||
func (u *UptimeTracker) RecordHeartbeat() {
|
||||
now := time.Now().UTC()
|
||||
if !u.LastSeen.IsZero() {
|
||||
gap := now.Sub(u.LastSeen)
|
||||
if gap <= 2*RecommendedHeartbeatInterval {
|
||||
u.TotalOnline += gap
|
||||
}
|
||||
}
|
||||
u.LastSeen = now
|
||||
}
|
||||
|
||||
func (u *UptimeTracker) Uptime() time.Duration {
|
||||
return time.Since(u.FirstSeen)
|
||||
}
|
||||
|
||||
// UptimeRatio returns the fraction of tracked lifetime during which the peer was
|
||||
// continuously online (gap ≤ 2×RecommendedHeartbeatInterval). Returns 0 before
|
||||
// the first heartbeat interval has elapsed.
|
||||
func (u *UptimeTracker) UptimeRatio() float64 {
|
||||
total := time.Since(u.FirstSeen)
|
||||
if total <= 0 {
|
||||
return 0
|
||||
}
|
||||
ratio := float64(u.TotalOnline) / float64(total)
|
||||
if ratio > 1 {
|
||||
ratio = 1
|
||||
}
|
||||
return ratio
|
||||
}
|
||||
|
||||
func (u *UptimeTracker) IsEligible(min time.Duration) bool {
|
||||
return u.Uptime() >= min
|
||||
}
|
||||
|
||||
// getBandwidthChallengeRate opens a dedicated ProtocolBandwidthProbe stream to
|
||||
// remotePeer, sends a random payload, reads the echo, and computes throughput
|
||||
// and a latency score. Returns (ok, bpms, latencyScore, error).
|
||||
// latencyScore is 1.0 when RTT is very fast and 0.0 when at or beyond maxRoundTrip.
|
||||
// Using a separate stream avoids mixing binary data on the JSON heartbeat stream
|
||||
// and ensures the echo handler is actually running on the remote side.
|
||||
func getBandwidthChallengeRate(h host.Host, remotePeer pp.ID, payloadSize int) (bool, float64, float64, error) {
|
||||
payload := make([]byte, payloadSize)
|
||||
if _, err := cr.Read(payload); err != nil {
|
||||
return false, 0, 0, err
|
||||
}
|
||||
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
|
||||
defer cancel()
|
||||
s, err := h.NewStream(ctx, remotePeer, ProtocolBandwidthProbe)
|
||||
if err != nil {
|
||||
return false, 0, 0, err
|
||||
}
|
||||
defer s.Reset()
|
||||
s.SetDeadline(time.Now().Add(10 * time.Second))
|
||||
start := time.Now()
|
||||
if _, err = s.Write(payload); err != nil {
|
||||
return false, 0, 0, err
|
||||
}
|
||||
s.CloseWrite()
|
||||
// Half-close the write side so the handler's io.Copy sees EOF and stops.
|
||||
// Read the echo.
|
||||
response := make([]byte, payloadSize)
|
||||
if _, err = io.ReadFull(s, response); err != nil {
|
||||
return false, 0, 0, err
|
||||
}
|
||||
|
||||
duration := time.Since(start)
|
||||
maxRoundTrip := BaseRoundTrip + (time.Duration(payloadSize) * (100 * time.Millisecond))
|
||||
mbps := float64(payloadSize*8) / duration.Seconds() / 1e6
|
||||
|
||||
// latencyScore: 1.0 = instant, 0.0 = at maxRoundTrip or beyond.
|
||||
latencyScore := 1.0 - float64(duration)/float64(maxRoundTrip)
|
||||
if latencyScore < 0 {
|
||||
latencyScore = 0
|
||||
}
|
||||
if latencyScore > 1 {
|
||||
latencyScore = 1
|
||||
}
|
||||
|
||||
if duration > maxRoundTrip || mbps < 5.0 {
|
||||
return false, float64(mbps / MaxExpectedMbps), latencyScore, nil
|
||||
}
|
||||
return true, float64(mbps / MaxExpectedMbps), latencyScore, nil
|
||||
}
|
||||
|
||||
func getDiversityRate(h host.Host, peers []string) float64 {
|
||||
peers, _ = checkPeers(h, peers)
|
||||
diverse := []string{}
|
||||
for _, p := range peers {
|
||||
ip, err := ExtractIP(p)
|
||||
if err != nil {
|
||||
continue
|
||||
}
|
||||
div := ip.Mask(net.CIDRMask(24, 32)).String()
|
||||
if !slices.Contains(diverse, div) {
|
||||
diverse = append(diverse, div)
|
||||
}
|
||||
}
|
||||
if len(diverse) == 0 || len(peers) == 0 {
|
||||
return 1
|
||||
}
|
||||
return float64(len(diverse)) / float64(len(peers))
|
||||
}
|
||||
|
||||
// getOwnDiversityRate measures subnet /24 diversity of the indexer's own connected peers.
|
||||
// This evaluates the indexer's network position rather than the connecting node's topology.
|
||||
func getOwnDiversityRate(h host.Host) float64 {
|
||||
diverse := map[string]struct{}{}
|
||||
total := 0
|
||||
for _, pid := range h.Network().Peers() {
|
||||
for _, maddr := range h.Peerstore().Addrs(pid) {
|
||||
total++
|
||||
ip, err := ExtractIP(maddr.String())
|
||||
if err != nil {
|
||||
continue
|
||||
}
|
||||
diverse[ip.Mask(net.CIDRMask(24, 32)).String()] = struct{}{}
|
||||
}
|
||||
}
|
||||
if total == 0 {
|
||||
return 1
|
||||
}
|
||||
return float64(len(diverse)) / float64(total)
|
||||
}
|
||||
|
||||
func checkPeers(h host.Host, peers []string) ([]string, []string) {
|
||||
concretePeer := []string{}
|
||||
ips := []string{}
|
||||
for _, p := range peers {
|
||||
ad, err := pp.AddrInfoFromString(p)
|
||||
if err != nil {
|
||||
continue
|
||||
}
|
||||
if PeerIsAlive(h, *ad) {
|
||||
concretePeer = append(concretePeer, p)
|
||||
if ip, err := ExtractIP(p); err == nil {
|
||||
ips = append(ips, ip.Mask(net.CIDRMask(24, 32)).String())
|
||||
}
|
||||
}
|
||||
}
|
||||
return concretePeer, ips
|
||||
}
|
||||
|
||||
// dynamicMinScore returns the minimum acceptable score for a peer, starting
|
||||
// permissive (20%) for brand-new peers and hardening linearly to 80% over 24h.
|
||||
// This prevents ejecting newcomers in fresh networks while filtering parasites.
|
||||
func dynamicMinScore(age time.Duration) float64 {
|
||||
hours := age.Hours()
|
||||
score := 20.0 + 60.0*(hours/24.0)
|
||||
if score > 80.0 {
|
||||
score = 80.0
|
||||
}
|
||||
return score
|
||||
}
|
||||
305
daemons/node/common/common_service.go
Normal file
305
daemons/node/common/common_service.go
Normal file
@@ -0,0 +1,305 @@
|
||||
package common
|
||||
|
||||
import (
|
||||
"encoding/json"
|
||||
"errors"
|
||||
"fmt"
|
||||
"io"
|
||||
"math/rand"
|
||||
"strings"
|
||||
"sync"
|
||||
"time"
|
||||
|
||||
oclib "cloud.o-forge.io/core/oc-lib"
|
||||
"github.com/libp2p/go-libp2p/core/host"
|
||||
"github.com/libp2p/go-libp2p/core/network"
|
||||
pp "github.com/libp2p/go-libp2p/core/peer"
|
||||
"github.com/libp2p/go-libp2p/core/protocol"
|
||||
)
|
||||
|
||||
type LongLivedStreamRecordedService[T interface{}] struct {
|
||||
*LongLivedPubSubService
|
||||
StreamRecords map[protocol.ID]map[pp.ID]*StreamRecord[T]
|
||||
StreamMU sync.RWMutex
|
||||
maxNodesConn int
|
||||
// AllowInbound, when set, is called once at stream open before any heartbeat
|
||||
// is decoded. remotePeer is the connecting peer; isNew is true when no
|
||||
// StreamRecord exists yet (first-ever connection). Return a non-nil error
|
||||
// to immediately reset the stream and refuse the peer.
|
||||
AllowInbound func(remotePeer pp.ID, isNew bool) error
|
||||
// ValidateHeartbeat, when set, is called inside the heartbeat loop after
|
||||
// each successful CheckHeartbeat decode. Return a non-nil error to reset
|
||||
// the stream and terminate the session.
|
||||
ValidateHeartbeat func(remotePeer pp.ID) error
|
||||
// AfterHeartbeat is called after each successful heartbeat with the full
|
||||
// decoded Heartbeat so the hook can use the fresh embedded PeerRecord.
|
||||
AfterHeartbeat func(hb *Heartbeat)
|
||||
// AfterDelete is called after gc() evicts an expired peer, outside the lock.
|
||||
// name and did may be empty if the HeartbeatStream had no metadata.
|
||||
AfterDelete func(pid pp.ID, name string, did string)
|
||||
// BuildHeartbeatResponse, when set, is called after each successfully decoded
|
||||
// heartbeat to build the response sent back to the node.
|
||||
// remotePeer is the peer that sent the heartbeat (used for offload routing).
|
||||
// need is how many more indexers the node wants (from hb.Need).
|
||||
// referent is true when the node designated this indexer as its search referent.
|
||||
// rawRecord is the fresh signed PeerRecord embedded in the heartbeat (hb.Record),
|
||||
// passed directly so the handler does not race with AfterHeartbeat goroutine
|
||||
// updating StreamRecord.Record.
|
||||
BuildHeartbeatResponse func(remotePeer pp.ID, need int, challenges []string, challengeDID string, referent bool, rawRecord json.RawMessage) *HeartbeatResponse
|
||||
}
|
||||
|
||||
func (ix *LongLivedStreamRecordedService[T]) MaxNodesConn() int {
|
||||
return ix.maxNodesConn
|
||||
}
|
||||
|
||||
func NewStreamRecordedService[T interface{}](h host.Host, maxNodesConn int) *LongLivedStreamRecordedService[T] {
|
||||
service := &LongLivedStreamRecordedService[T]{
|
||||
LongLivedPubSubService: NewLongLivedPubSubService(h),
|
||||
StreamRecords: map[protocol.ID]map[pp.ID]*StreamRecord[T]{},
|
||||
maxNodesConn: maxNodesConn,
|
||||
}
|
||||
go service.StartGC(30 * time.Second)
|
||||
// Garbage collection is needed on every Map of Long-Lived Stream... it may be a top level redesigned
|
||||
go service.Snapshot(1 * time.Hour)
|
||||
return service
|
||||
}
|
||||
|
||||
func (ix *LongLivedStreamRecordedService[T]) StartGC(interval time.Duration) {
|
||||
go func() {
|
||||
t := time.NewTicker(interval)
|
||||
defer t.Stop()
|
||||
for range t.C {
|
||||
fmt.Println("ACTUALLY RELATED INDEXERS", Indexers.Addrs, len(Indexers.Addrs))
|
||||
ix.gc()
|
||||
}
|
||||
}()
|
||||
}
|
||||
|
||||
func (ix *LongLivedStreamRecordedService[T]) gc() {
|
||||
ix.StreamMU.Lock()
|
||||
now := time.Now().UTC()
|
||||
if ix.StreamRecords[ProtocolHeartbeat] == nil {
|
||||
ix.StreamRecords[ProtocolHeartbeat] = map[pp.ID]*StreamRecord[T]{}
|
||||
ix.StreamMU.Unlock()
|
||||
return
|
||||
}
|
||||
streams := ix.StreamRecords[ProtocolHeartbeat]
|
||||
|
||||
type gcEntry struct {
|
||||
pid pp.ID
|
||||
name string
|
||||
did string
|
||||
}
|
||||
var evicted []gcEntry
|
||||
for pid, rec := range streams {
|
||||
if now.After(rec.HeartbeatStream.Expiry) || now.Sub(rec.HeartbeatStream.UptimeTracker.LastSeen) > 2*rec.HeartbeatStream.Expiry.Sub(now) {
|
||||
name, did := "", ""
|
||||
if rec.HeartbeatStream != nil {
|
||||
name = rec.HeartbeatStream.Name
|
||||
did = rec.HeartbeatStream.DID
|
||||
}
|
||||
evicted = append(evicted, gcEntry{pid, name, did})
|
||||
for _, sstreams := range ix.StreamRecords {
|
||||
if sstreams[pid] != nil {
|
||||
delete(sstreams, pid)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
ix.StreamMU.Unlock()
|
||||
|
||||
if ix.AfterDelete != nil {
|
||||
for _, e := range evicted {
|
||||
ix.AfterDelete(e.pid, e.name, e.did)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func (ix *LongLivedStreamRecordedService[T]) Snapshot(interval time.Duration) {
|
||||
go func() {
|
||||
logger := oclib.GetLogger()
|
||||
t := time.NewTicker(interval)
|
||||
defer t.Stop()
|
||||
for range t.C {
|
||||
infos := ix.snapshot()
|
||||
for _, inf := range infos {
|
||||
logger.Info().Msg(" -> " + inf.DID)
|
||||
}
|
||||
}
|
||||
}()
|
||||
}
|
||||
|
||||
// -------- Snapshot / Query --------
|
||||
func (ix *LongLivedStreamRecordedService[T]) snapshot() []*StreamRecord[T] {
|
||||
ix.StreamMU.Lock()
|
||||
defer ix.StreamMU.Unlock()
|
||||
|
||||
out := make([]*StreamRecord[T], 0, len(ix.StreamRecords))
|
||||
for _, streams := range ix.StreamRecords {
|
||||
for _, stream := range streams {
|
||||
out = append(out, stream)
|
||||
}
|
||||
}
|
||||
return out
|
||||
}
|
||||
|
||||
func (ix *LongLivedStreamRecordedService[T]) HandleHeartbeat(s network.Stream) {
|
||||
logger := oclib.GetLogger()
|
||||
defer s.Close()
|
||||
|
||||
// AllowInbound: burst guard + ban check before the first byte is read.
|
||||
if ix.AllowInbound != nil {
|
||||
remotePeer := s.Conn().RemotePeer()
|
||||
ix.StreamMU.RLock()
|
||||
_, exists := ix.StreamRecords[ProtocolHeartbeat][remotePeer]
|
||||
ix.StreamMU.RUnlock()
|
||||
if err := ix.AllowInbound(remotePeer, !exists); err != nil {
|
||||
logger.Warn().Err(err).Str("peer", remotePeer.String()).Msg("inbound connection refused")
|
||||
s.Reset()
|
||||
return
|
||||
}
|
||||
}
|
||||
|
||||
dec := json.NewDecoder(s)
|
||||
for {
|
||||
ix.StreamMU.Lock()
|
||||
if ix.StreamRecords[ProtocolHeartbeat] == nil {
|
||||
ix.StreamRecords[ProtocolHeartbeat] = map[pp.ID]*StreamRecord[T]{}
|
||||
}
|
||||
streams := ix.StreamRecords[ProtocolHeartbeat]
|
||||
streamsAnonym := map[pp.ID]HeartBeatStreamed{}
|
||||
for k, v := range streams {
|
||||
streamsAnonym[k] = v
|
||||
}
|
||||
ix.StreamMU.Unlock()
|
||||
pid, hb, err := CheckHeartbeat(ix.Host, s, dec, streamsAnonym, &ix.StreamMU, ix.maxNodesConn)
|
||||
if err != nil {
|
||||
// Stream-level errors (EOF, reset, closed) mean the connection is gone
|
||||
// — exit so the goroutine doesn't spin forever on a dead stream.
|
||||
// Metric/policy errors (score too low, too many connections) are transient
|
||||
// — those are also stream-terminal since the stream carries one session.
|
||||
if errors.Is(err, io.EOF) || errors.Is(err, io.ErrUnexpectedEOF) ||
|
||||
strings.Contains(err.Error(), "reset") ||
|
||||
strings.Contains(err.Error(), "closed") ||
|
||||
strings.Contains(err.Error(), "too many connections") {
|
||||
logger.Info().Err(err).Msg("heartbeat stream terminated, closing handler")
|
||||
return
|
||||
}
|
||||
logger.Warn().Err(err).Msg("heartbeat check failed, retrying on same stream")
|
||||
continue
|
||||
}
|
||||
// ValidateHeartbeat: per-tick behavioral check (rate limiting, bans).
|
||||
if ix.ValidateHeartbeat != nil {
|
||||
if err := ix.ValidateHeartbeat(*pid); err != nil {
|
||||
logger.Warn().Err(err).Str("peer", pid.String()).Msg("heartbeat rejected, closing stream")
|
||||
s.Reset()
|
||||
return
|
||||
}
|
||||
}
|
||||
ix.StreamMU.Lock()
|
||||
// if record already seen update last seen
|
||||
if rec, ok := streams[*pid]; ok {
|
||||
rec.DID = hb.DID
|
||||
// Preserve the existing UptimeTracker so TotalOnline accumulates correctly.
|
||||
// hb.Stream is a fresh Stream with no UptimeTracker; carry the old one over.
|
||||
oldTracker := rec.GetUptimeTracker()
|
||||
rec.HeartbeatStream = hb.Stream
|
||||
if oldTracker != nil {
|
||||
rec.HeartbeatStream.UptimeTracker = oldTracker
|
||||
} else {
|
||||
rec.HeartbeatStream.UptimeTracker = &UptimeTracker{FirstSeen: time.Now().UTC()}
|
||||
}
|
||||
rec.HeartbeatStream.UptimeTracker.RecordHeartbeat()
|
||||
rec.LastScore = hb.Score
|
||||
logger.Info().Msg("A new node is updated : " + pid.String())
|
||||
} else {
|
||||
tracker := &UptimeTracker{FirstSeen: time.Now().UTC()}
|
||||
tracker.RecordHeartbeat()
|
||||
hb.Stream.UptimeTracker = tracker
|
||||
streams[*pid] = &StreamRecord[T]{
|
||||
DID: hb.DID,
|
||||
HeartbeatStream: hb.Stream,
|
||||
LastScore: hb.Score,
|
||||
}
|
||||
logger.Info().Msg("A new node is subscribed : " + pid.String())
|
||||
}
|
||||
ix.StreamMU.Unlock()
|
||||
// Enrich hb.DID before calling the hook: nodes never set hb.DID directly;
|
||||
// extract it from the embedded signed PeerRecord if available, then fall
|
||||
// back to the DID stored by handleNodePublish in the stream record.
|
||||
if hb.DID == "" && len(hb.Record) > 0 {
|
||||
var partial struct {
|
||||
DID string `json:"did"`
|
||||
}
|
||||
if json.Unmarshal(hb.Record, &partial) == nil && partial.DID != "" {
|
||||
hb.DID = partial.DID
|
||||
}
|
||||
}
|
||||
if hb.DID == "" {
|
||||
ix.StreamMU.RLock()
|
||||
if rec, ok := streams[*pid]; ok {
|
||||
hb.DID = rec.DID
|
||||
}
|
||||
ix.StreamMU.RUnlock()
|
||||
}
|
||||
if ix.AfterHeartbeat != nil && hb.DID != "" {
|
||||
go ix.AfterHeartbeat(hb)
|
||||
}
|
||||
// Send response back to the node (bidirectional heartbeat).
|
||||
if ix.BuildHeartbeatResponse != nil {
|
||||
if resp := ix.BuildHeartbeatResponse(s.Conn().RemotePeer(), hb.Need, hb.Challenges, hb.ChallengeDID, hb.Referent, hb.Record); resp != nil {
|
||||
s.SetWriteDeadline(time.Now().Add(3 * time.Second))
|
||||
json.NewEncoder(s).Encode(resp)
|
||||
s.SetWriteDeadline(time.Time{})
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func CheckHeartbeat(h host.Host, s network.Stream, dec *json.Decoder, streams map[pp.ID]HeartBeatStreamed, lock *sync.RWMutex, maxNodes int) (*pp.ID, *Heartbeat, error) {
|
||||
if len(h.Network().Peers()) >= maxNodes {
|
||||
return nil, nil, fmt.Errorf("too many connections, try another indexer")
|
||||
}
|
||||
var hb Heartbeat
|
||||
if err := dec.Decode(&hb); err != nil {
|
||||
return nil, nil, err
|
||||
}
|
||||
_, bpms, latencyScore, _ := getBandwidthChallengeRate(h, s.Conn().RemotePeer(), MinPayloadChallenge+int(rand.Float64()*(MaxPayloadChallenge-MinPayloadChallenge)))
|
||||
{
|
||||
pid, err := pp.Decode(hb.PeerID)
|
||||
if err != nil {
|
||||
return nil, nil, err
|
||||
}
|
||||
uptimeRatio := float64(0)
|
||||
age := time.Duration(0)
|
||||
lock.Lock()
|
||||
if rec, ok := streams[pid]; ok && rec.GetUptimeTracker() != nil {
|
||||
uptimeRatio = rec.GetUptimeTracker().UptimeRatio()
|
||||
age = rec.GetUptimeTracker().Uptime()
|
||||
}
|
||||
lock.Unlock()
|
||||
// E: measure the indexer's own subnet diversity, not the node's view.
|
||||
diversity := getOwnDiversityRate(h)
|
||||
// fillRate: fraction of indexer capacity used — higher = more peers trust this indexer.
|
||||
fillRate := 0.0
|
||||
if maxNodes > 0 {
|
||||
fillRate = float64(len(h.Network().Peers())) / float64(maxNodes)
|
||||
if fillRate > 1 {
|
||||
fillRate = 1
|
||||
}
|
||||
}
|
||||
hb.ComputeIndexerScore(uptimeRatio, bpms, diversity, latencyScore, fillRate)
|
||||
// B: dynamic minScore — starts at 20% for brand-new peers, ramps to 80% at 24h.
|
||||
minScore := dynamicMinScore(age)
|
||||
if hb.Score < minScore {
|
||||
return nil, nil, errors.New("not enough trusting value")
|
||||
}
|
||||
hb.Stream = &Stream{
|
||||
Name: hb.Name,
|
||||
DID: hb.DID,
|
||||
Stream: s,
|
||||
Expiry: time.Now().UTC().Add(2 * time.Minute),
|
||||
} // here is the long-lived bidirectional heartbeat.
|
||||
return &pid, &hb, err
|
||||
}
|
||||
}
|
||||
@@ -2,16 +2,8 @@ package common
|
||||
|
||||
import (
|
||||
"context"
|
||||
cr "crypto/rand"
|
||||
"encoding/json"
|
||||
"errors"
|
||||
"fmt"
|
||||
"io"
|
||||
"math/rand"
|
||||
"net"
|
||||
"oc-discovery/conf"
|
||||
"slices"
|
||||
"strings"
|
||||
"sync"
|
||||
"time"
|
||||
|
||||
@@ -22,330 +14,29 @@ import (
|
||||
"github.com/libp2p/go-libp2p/core/protocol"
|
||||
)
|
||||
|
||||
type LongLivedStreamRecordedService[T interface{}] struct {
|
||||
*LongLivedPubSubService
|
||||
StreamRecords map[protocol.ID]map[pp.ID]*StreamRecord[T]
|
||||
StreamMU sync.RWMutex
|
||||
maxNodesConn int
|
||||
// AfterHeartbeat is an optional hook called after each successful heartbeat update.
|
||||
// The indexer sets it to republish the embedded signed record to the DHT.
|
||||
AfterHeartbeat func(pid pp.ID)
|
||||
// AfterDelete is called after gc() evicts an expired peer, outside the lock.
|
||||
// name and did may be empty if the HeartbeatStream had no metadata.
|
||||
AfterDelete func(pid pp.ID, name string, did string)
|
||||
}
|
||||
const (
|
||||
ProtocolPublish = "/opencloud/record/publish/1.0"
|
||||
ProtocolGet = "/opencloud/record/get/1.0"
|
||||
)
|
||||
|
||||
func NewStreamRecordedService[T interface{}](h host.Host, maxNodesConn int) *LongLivedStreamRecordedService[T] {
|
||||
service := &LongLivedStreamRecordedService[T]{
|
||||
LongLivedPubSubService: NewLongLivedPubSubService(h),
|
||||
StreamRecords: map[protocol.ID]map[pp.ID]*StreamRecord[T]{},
|
||||
maxNodesConn: maxNodesConn,
|
||||
}
|
||||
go service.StartGC(30 * time.Second)
|
||||
// Garbage collection is needed on every Map of Long-Lived Stream... it may be a top level redesigned
|
||||
go service.Snapshot(1 * time.Hour)
|
||||
return service
|
||||
}
|
||||
const ProtocolHeartbeat = "/opencloud/heartbeat/1.0"
|
||||
|
||||
func (ix *LongLivedStreamRecordedService[T]) StartGC(interval time.Duration) {
|
||||
go func() {
|
||||
t := time.NewTicker(interval)
|
||||
defer t.Stop()
|
||||
for range t.C {
|
||||
ix.gc()
|
||||
}
|
||||
}()
|
||||
}
|
||||
// ProtocolWitnessQuery is opened by a node to ask a peer what it thinks of a given indexer.
|
||||
const ProtocolWitnessQuery = "/opencloud/witness/1.0"
|
||||
|
||||
func (ix *LongLivedStreamRecordedService[T]) gc() {
|
||||
ix.StreamMU.Lock()
|
||||
now := time.Now().UTC()
|
||||
if ix.StreamRecords[ProtocolHeartbeat] == nil {
|
||||
ix.StreamRecords[ProtocolHeartbeat] = map[pp.ID]*StreamRecord[T]{}
|
||||
ix.StreamMU.Unlock()
|
||||
return
|
||||
}
|
||||
streams := ix.StreamRecords[ProtocolHeartbeat]
|
||||
fmt.Println(StaticNatives, StaticIndexers, streams)
|
||||
// ProtocolSearchPeer is opened by a node toward one of its indexers to start a
|
||||
// distributed peer search. The stream stays open; the indexer writes
|
||||
// SearchPeerResult JSON objects as results arrive from the GossipSub mesh.
|
||||
const ProtocolSearchPeer = "/opencloud/search/peer/1.0"
|
||||
|
||||
type gcEntry struct {
|
||||
pid pp.ID
|
||||
name string
|
||||
did string
|
||||
}
|
||||
var evicted []gcEntry
|
||||
for pid, rec := range streams {
|
||||
if now.After(rec.HeartbeatStream.Expiry) || now.Sub(rec.HeartbeatStream.UptimeTracker.LastSeen) > 2*rec.HeartbeatStream.Expiry.Sub(now) {
|
||||
name, did := "", ""
|
||||
if rec.HeartbeatStream != nil {
|
||||
name = rec.HeartbeatStream.Name
|
||||
did = rec.HeartbeatStream.DID
|
||||
}
|
||||
evicted = append(evicted, gcEntry{pid, name, did})
|
||||
for _, sstreams := range ix.StreamRecords {
|
||||
if sstreams[pid] != nil {
|
||||
delete(sstreams, pid)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
ix.StreamMU.Unlock()
|
||||
// ProtocolSearchPeerResponse is opened by an indexer back toward the emitting
|
||||
// indexer to deliver search results found in its referencedNodes.
|
||||
const ProtocolSearchPeerResponse = "/opencloud/search/peer/response/1.0"
|
||||
|
||||
if ix.AfterDelete != nil {
|
||||
for _, e := range evicted {
|
||||
ix.AfterDelete(e.pid, e.name, e.did)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func (ix *LongLivedStreamRecordedService[T]) Snapshot(interval time.Duration) {
|
||||
go func() {
|
||||
logger := oclib.GetLogger()
|
||||
t := time.NewTicker(interval)
|
||||
defer t.Stop()
|
||||
for range t.C {
|
||||
infos := ix.snapshot()
|
||||
for _, inf := range infos {
|
||||
logger.Info().Msg(" -> " + inf.DID)
|
||||
}
|
||||
}
|
||||
}()
|
||||
}
|
||||
|
||||
// -------- Snapshot / Query --------
|
||||
func (ix *LongLivedStreamRecordedService[T]) snapshot() []*StreamRecord[T] {
|
||||
ix.StreamMU.Lock()
|
||||
defer ix.StreamMU.Unlock()
|
||||
|
||||
out := make([]*StreamRecord[T], 0, len(ix.StreamRecords))
|
||||
for _, streams := range ix.StreamRecords {
|
||||
for _, stream := range streams {
|
||||
out = append(out, stream)
|
||||
}
|
||||
}
|
||||
return out
|
||||
}
|
||||
|
||||
func (ix *LongLivedStreamRecordedService[T]) HandleHeartbeat(s network.Stream) {
|
||||
logger := oclib.GetLogger()
|
||||
defer s.Close()
|
||||
dec := json.NewDecoder(s)
|
||||
for {
|
||||
ix.StreamMU.Lock()
|
||||
if ix.StreamRecords[ProtocolHeartbeat] == nil {
|
||||
ix.StreamRecords[ProtocolHeartbeat] = map[pp.ID]*StreamRecord[T]{}
|
||||
}
|
||||
streams := ix.StreamRecords[ProtocolHeartbeat]
|
||||
streamsAnonym := map[pp.ID]HeartBeatStreamed{}
|
||||
for k, v := range streams {
|
||||
streamsAnonym[k] = v
|
||||
}
|
||||
ix.StreamMU.Unlock()
|
||||
pid, hb, err := CheckHeartbeat(ix.Host, s, dec, streamsAnonym, &ix.StreamMU, ix.maxNodesConn)
|
||||
if err != nil {
|
||||
// Stream-level errors (EOF, reset, closed) mean the connection is gone
|
||||
// — exit so the goroutine doesn't spin forever on a dead stream.
|
||||
// Metric/policy errors (score too low, too many connections) are transient
|
||||
// — those are also stream-terminal since the stream carries one session.
|
||||
if errors.Is(err, io.EOF) || errors.Is(err, io.ErrUnexpectedEOF) ||
|
||||
strings.Contains(err.Error(), "reset") ||
|
||||
strings.Contains(err.Error(), "closed") ||
|
||||
strings.Contains(err.Error(), "too many connections") {
|
||||
logger.Info().Err(err).Msg("heartbeat stream terminated, closing handler")
|
||||
return
|
||||
}
|
||||
logger.Warn().Err(err).Msg("heartbeat check failed, retrying on same stream")
|
||||
continue
|
||||
}
|
||||
ix.StreamMU.Lock()
|
||||
// if record already seen update last seen
|
||||
if rec, ok := streams[*pid]; ok {
|
||||
rec.DID = hb.DID
|
||||
if rec.HeartbeatStream == nil {
|
||||
rec.HeartbeatStream = hb.Stream
|
||||
}
|
||||
rec.HeartbeatStream = hb.Stream
|
||||
if rec.HeartbeatStream.UptimeTracker == nil {
|
||||
rec.HeartbeatStream.UptimeTracker = &UptimeTracker{
|
||||
FirstSeen: time.Now().UTC(),
|
||||
LastSeen: time.Now().UTC(),
|
||||
}
|
||||
}
|
||||
logger.Info().Msg("A new node is updated : " + pid.String())
|
||||
} else {
|
||||
hb.Stream.UptimeTracker = &UptimeTracker{
|
||||
FirstSeen: time.Now().UTC(),
|
||||
LastSeen: time.Now().UTC(),
|
||||
}
|
||||
streams[*pid] = &StreamRecord[T]{
|
||||
DID: hb.DID,
|
||||
HeartbeatStream: hb.Stream,
|
||||
}
|
||||
logger.Info().Msg("A new node is subscribed : " + pid.String())
|
||||
}
|
||||
ix.StreamMU.Unlock()
|
||||
// Let the indexer republish the embedded signed record to the DHT.
|
||||
if ix.AfterHeartbeat != nil {
|
||||
ix.AfterHeartbeat(*pid)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func CheckHeartbeat(h host.Host, s network.Stream, dec *json.Decoder, streams map[pp.ID]HeartBeatStreamed, lock *sync.RWMutex, maxNodes int) (*pp.ID, *Heartbeat, error) {
|
||||
if len(h.Network().Peers()) >= maxNodes {
|
||||
return nil, nil, fmt.Errorf("too many connections, try another indexer")
|
||||
}
|
||||
var hb Heartbeat
|
||||
if err := dec.Decode(&hb); err != nil {
|
||||
return nil, nil, err
|
||||
}
|
||||
_, bpms, _ := getBandwidthChallengeRate(h, s.Conn().RemotePeer(), MinPayloadChallenge+int(rand.Float64()*(MaxPayloadChallenge-MinPayloadChallenge)))
|
||||
{
|
||||
pid, err := pp.Decode(hb.PeerID)
|
||||
if err != nil {
|
||||
return nil, nil, err
|
||||
}
|
||||
upTime := float64(0)
|
||||
isFirstHeartbeat := true
|
||||
lock.Lock()
|
||||
if rec, ok := streams[pid]; ok && rec.GetUptimeTracker() != nil {
|
||||
upTime = rec.GetUptimeTracker().Uptime().Hours() / float64(time.Since(TimeWatcher).Hours())
|
||||
isFirstHeartbeat = false
|
||||
}
|
||||
lock.Unlock()
|
||||
diversity := getDiversityRate(h, hb.IndexersBinded)
|
||||
fmt.Println(upTime, bpms, diversity)
|
||||
hb.ComputeIndexerScore(upTime, bpms, diversity)
|
||||
// First heartbeat: uptime is always 0 so the score ceiling is 60, below the
|
||||
// steady-state threshold of 75. Use a lower admission threshold so new peers
|
||||
// can enter and start accumulating uptime. Subsequent heartbeats must meet
|
||||
// the full threshold once uptime is tracked.
|
||||
minScore := float64(50)
|
||||
if isFirstHeartbeat {
|
||||
minScore = 40
|
||||
}
|
||||
fmt.Println(hb.Score, minScore)
|
||||
if hb.Score < minScore {
|
||||
return nil, nil, errors.New("not enough trusting value")
|
||||
}
|
||||
hb.Stream = &Stream{
|
||||
Name: hb.Name,
|
||||
DID: hb.DID,
|
||||
Stream: s,
|
||||
Expiry: time.Now().UTC().Add(2 * time.Minute),
|
||||
} // here is the long-lived bidirectionnal heart bit.
|
||||
return &pid, &hb, err
|
||||
}
|
||||
}
|
||||
|
||||
func getDiversityRate(h host.Host, peers []string) float64 {
|
||||
|
||||
peers, _ = checkPeers(h, peers)
|
||||
diverse := []string{}
|
||||
for _, p := range peers {
|
||||
ip, err := ExtractIP(p)
|
||||
if err != nil {
|
||||
fmt.Println("NO IP", p, err)
|
||||
continue
|
||||
}
|
||||
div := ip.Mask(net.CIDRMask(24, 32)).String()
|
||||
if !slices.Contains(diverse, div) {
|
||||
diverse = append(diverse, div)
|
||||
}
|
||||
}
|
||||
if len(diverse) == 0 || len(peers) == 0 {
|
||||
return 1
|
||||
}
|
||||
return float64(len(diverse) / len(peers))
|
||||
}
|
||||
|
||||
func checkPeers(h host.Host, peers []string) ([]string, []string) {
|
||||
concretePeer := []string{}
|
||||
ips := []string{}
|
||||
for _, p := range peers {
|
||||
ad, err := pp.AddrInfoFromString(p)
|
||||
if err != nil {
|
||||
continue
|
||||
}
|
||||
if PeerIsAlive(h, *ad) {
|
||||
concretePeer = append(concretePeer, p)
|
||||
if ip, err := ExtractIP(p); err == nil {
|
||||
ips = append(ips, ip.Mask(net.CIDRMask(24, 32)).String())
|
||||
}
|
||||
}
|
||||
}
|
||||
return concretePeer, ips
|
||||
}
|
||||
|
||||
const MaxExpectedMbps = 100.0
|
||||
const MinPayloadChallenge = 512
|
||||
const MaxPayloadChallenge = 2048
|
||||
const BaseRoundTrip = 400 * time.Millisecond
|
||||
|
||||
// getBandwidthChallengeRate opens a dedicated ProtocolBandwidthProbe stream to
|
||||
// remotePeer, sends a random payload, reads the echo, and computes throughput.
|
||||
// Using a separate stream avoids mixing binary data on the JSON heartbeat stream
|
||||
// and ensures the echo handler is actually running on the remote side.
|
||||
func getBandwidthChallengeRate(h host.Host, remotePeer pp.ID, payloadSize int) (bool, float64, error) {
|
||||
payload := make([]byte, payloadSize)
|
||||
if _, err := cr.Read(payload); err != nil {
|
||||
return false, 0, err
|
||||
}
|
||||
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
|
||||
defer cancel()
|
||||
s, err := h.NewStream(ctx, remotePeer, ProtocolBandwidthProbe)
|
||||
if err != nil {
|
||||
return false, 0, err
|
||||
}
|
||||
defer s.Reset()
|
||||
s.SetDeadline(time.Now().Add(10 * time.Second))
|
||||
start := time.Now()
|
||||
if _, err = s.Write(payload); err != nil {
|
||||
return false, 0, err
|
||||
}
|
||||
s.CloseWrite()
|
||||
// Half-close the write side so the handler's io.Copy sees EOF and stops.
|
||||
// Read the echo.
|
||||
response := make([]byte, payloadSize)
|
||||
if _, err = io.ReadFull(s, response); err != nil {
|
||||
return false, 0, err
|
||||
}
|
||||
|
||||
duration := time.Since(start)
|
||||
maxRoundTrip := BaseRoundTrip + (time.Duration(payloadSize) * (100 * time.Millisecond))
|
||||
mbps := float64(payloadSize*8) / duration.Seconds() / 1e6
|
||||
if duration > maxRoundTrip || mbps < 5.0 {
|
||||
return false, float64(mbps / MaxExpectedMbps), nil
|
||||
}
|
||||
return true, float64(mbps / MaxExpectedMbps), nil
|
||||
}
|
||||
|
||||
type UptimeTracker struct {
|
||||
FirstSeen time.Time
|
||||
LastSeen time.Time
|
||||
}
|
||||
|
||||
func (u *UptimeTracker) Uptime() time.Duration {
|
||||
return time.Since(u.FirstSeen)
|
||||
}
|
||||
|
||||
func (u *UptimeTracker) IsEligible(min time.Duration) bool {
|
||||
return u.Uptime() >= min
|
||||
}
|
||||
|
||||
type StreamRecord[T interface{}] struct {
|
||||
DID string
|
||||
HeartbeatStream *Stream
|
||||
Record T
|
||||
}
|
||||
|
||||
func (s *StreamRecord[T]) GetUptimeTracker() *UptimeTracker {
|
||||
if s.HeartbeatStream == nil {
|
||||
return nil
|
||||
}
|
||||
return s.HeartbeatStream.UptimeTracker
|
||||
}
|
||||
// ProtocolBandwidthProbe is a dedicated short-lived stream used exclusively
|
||||
// for bandwidth/latency measurement. The handler echoes any bytes it receives.
|
||||
// All nodes and indexers register this handler so peers can measure them.
|
||||
const ProtocolBandwidthProbe = "/opencloud/probe/1.0"
|
||||
|
||||
type Stream struct {
|
||||
Name string `json:"name"`
|
||||
@@ -367,112 +58,115 @@ func NewStream[T interface{}](s network.Stream, did string, record T) *Stream {
|
||||
}
|
||||
}
|
||||
|
||||
type ProtocolStream map[protocol.ID]map[pp.ID]*Stream
|
||||
|
||||
func (ps ProtocolStream) Get(protocol protocol.ID) map[pp.ID]*Stream {
|
||||
if ps[protocol] == nil {
|
||||
ps[protocol] = map[pp.ID]*Stream{}
|
||||
}
|
||||
|
||||
return ps[protocol]
|
||||
type StreamRecord[T interface{}] struct {
|
||||
DID string
|
||||
HeartbeatStream *Stream
|
||||
Record T
|
||||
LastScore float64
|
||||
}
|
||||
|
||||
func (ps ProtocolStream) Add(protocol protocol.ID, peerID *pp.ID, s *Stream) error {
|
||||
if ps[protocol] == nil {
|
||||
ps[protocol] = map[pp.ID]*Stream{}
|
||||
}
|
||||
if peerID != nil {
|
||||
if s != nil {
|
||||
ps[protocol][*peerID] = s
|
||||
} else {
|
||||
return errors.New("unable to add stream : stream missing")
|
||||
}
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
func (ps ProtocolStream) Delete(protocol protocol.ID, peerID *pp.ID) {
|
||||
if streams, ok := ps[protocol]; ok {
|
||||
if peerID != nil && streams[*peerID] != nil {
|
||||
streams[*peerID].Stream.Close()
|
||||
delete(streams, *peerID)
|
||||
} else {
|
||||
for _, s := range ps {
|
||||
for _, v := range s {
|
||||
v.Stream.Close()
|
||||
}
|
||||
}
|
||||
delete(ps, protocol)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
const (
|
||||
ProtocolPublish = "/opencloud/record/publish/1.0"
|
||||
ProtocolGet = "/opencloud/record/get/1.0"
|
||||
)
|
||||
|
||||
var TimeWatcher time.Time
|
||||
|
||||
var StaticIndexers map[string]*pp.AddrInfo = map[string]*pp.AddrInfo{}
|
||||
var StreamMuIndexes sync.RWMutex
|
||||
var StreamIndexers ProtocolStream = ProtocolStream{}
|
||||
|
||||
// indexerHeartbeatNudge allows replenishIndexersFromNative to trigger an immediate
|
||||
// heartbeat tick after adding new entries to StaticIndexers, without waiting up
|
||||
// to 20s for the regular ticker. Buffered(1) so the sender never blocks.
|
||||
var indexerHeartbeatNudge = make(chan struct{}, 1)
|
||||
|
||||
// NudgeIndexerHeartbeat signals the indexer heartbeat goroutine to fire immediately.
|
||||
func NudgeIndexerHeartbeat() {
|
||||
select {
|
||||
case indexerHeartbeatNudge <- struct{}{}:
|
||||
default: // nudge already pending, skip
|
||||
}
|
||||
}
|
||||
|
||||
func ConnectToIndexers(h host.Host, minIndexer int, maxIndexer int, myPID pp.ID, recordFn ...func() json.RawMessage) error {
|
||||
TimeWatcher = time.Now().UTC()
|
||||
logger := oclib.GetLogger()
|
||||
|
||||
// If native addresses are configured, get the indexer pool from the native mesh,
|
||||
// then start the long-lived heartbeat goroutine toward those indexers.
|
||||
if conf.GetConfig().NativeIndexerAddresses != "" {
|
||||
if err := ConnectToNatives(h, minIndexer, maxIndexer, myPID); err != nil {
|
||||
return err
|
||||
}
|
||||
// Step 2: start the long-lived heartbeat goroutine toward the indexer pool.
|
||||
// replaceStaticIndexers/replenishIndexersFromNative update the map in-place
|
||||
// so this single goroutine follows all pool changes automatically.
|
||||
logger.Info().Msg("[native] step 2 — starting long-lived heartbeat to indexer pool")
|
||||
SendHeartbeat(context.Background(), ProtocolHeartbeat, conf.GetConfig().Name,
|
||||
h, StreamIndexers, StaticIndexers, &StreamMuIndexes, 20*time.Second, recordFn...)
|
||||
func (s *StreamRecord[T]) GetUptimeTracker() *UptimeTracker {
|
||||
if s.HeartbeatStream == nil {
|
||||
return nil
|
||||
}
|
||||
return s.HeartbeatStream.UptimeTracker
|
||||
}
|
||||
|
||||
addresses := strings.Split(conf.GetConfig().IndexerAddresses, ",")
|
||||
type ProtocolInfo struct {
|
||||
PersistantStream bool
|
||||
WaitResponse bool
|
||||
TTL time.Duration
|
||||
}
|
||||
|
||||
if len(addresses) > maxIndexer {
|
||||
addresses = addresses[0:maxIndexer]
|
||||
func TempStream(h host.Host, ad pp.AddrInfo, proto protocol.ID, did string, streams ProtocolStream, pts map[protocol.ID]*ProtocolInfo, mu *sync.RWMutex) (ProtocolStream, error) {
|
||||
expiry := 2 * time.Second
|
||||
if pts[proto] != nil {
|
||||
expiry = pts[proto].TTL
|
||||
}
|
||||
|
||||
StreamMuIndexes.Lock()
|
||||
for _, indexerAddr := range addresses {
|
||||
ad, err := pp.AddrInfoFromString(indexerAddr)
|
||||
if err != nil {
|
||||
logger.Err(err)
|
||||
continue
|
||||
ctxTTL, cancelTTL := context.WithTimeout(context.Background(), expiry)
|
||||
defer cancelTTL()
|
||||
if h.Network().Connectedness(ad.ID) != network.Connected {
|
||||
if err := h.Connect(ctxTTL, ad); err != nil {
|
||||
return streams, err
|
||||
}
|
||||
StaticIndexers[indexerAddr] = ad
|
||||
}
|
||||
indexerCount := len(StaticIndexers)
|
||||
StreamMuIndexes.Unlock()
|
||||
|
||||
SendHeartbeat(context.Background(), ProtocolHeartbeat, conf.GetConfig().Name, h, StreamIndexers, StaticIndexers, &StreamMuIndexes, 20*time.Second, recordFn...) // your indexer is just like a node for the next indexer.
|
||||
if indexerCount < minIndexer {
|
||||
return errors.New("you run a node without indexers... your gonna be isolated.")
|
||||
if streams[proto] != nil && streams[proto][ad.ID] != nil {
|
||||
return streams, nil
|
||||
} else if s, err := h.NewStream(ctxTTL, ad.ID, proto); err == nil {
|
||||
mu.Lock()
|
||||
if streams[proto] == nil {
|
||||
streams[proto] = map[pp.ID]*Stream{}
|
||||
}
|
||||
mu.Unlock()
|
||||
time.AfterFunc(expiry, func() {
|
||||
mu.Lock()
|
||||
delete(streams[proto], ad.ID)
|
||||
mu.Unlock()
|
||||
})
|
||||
mu.Lock()
|
||||
streams[proto][ad.ID] = &Stream{
|
||||
DID: did,
|
||||
Stream: s,
|
||||
Expiry: time.Now().UTC().Add(expiry),
|
||||
}
|
||||
mu.Unlock()
|
||||
return streams, nil
|
||||
} else {
|
||||
return streams, err
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
func sendHeartbeat(ctx context.Context, h host.Host, proto protocol.ID, p *pp.AddrInfo,
|
||||
hb Heartbeat, ps ProtocolStream, interval time.Duration) (*HeartbeatResponse, time.Duration, error) {
|
||||
logger := oclib.GetLogger()
|
||||
if ps[proto] == nil {
|
||||
ps[proto] = map[pp.ID]*Stream{}
|
||||
}
|
||||
streams := ps[proto]
|
||||
pss, exists := streams[p.ID]
|
||||
ctxTTL, cancel := context.WithTimeout(ctx, 3*interval)
|
||||
defer cancel()
|
||||
if h.Network().Connectedness(p.ID) != network.Connected {
|
||||
if err := h.Connect(ctxTTL, *p); err != nil {
|
||||
logger.Err(err)
|
||||
return nil, 0, err
|
||||
}
|
||||
exists = false
|
||||
}
|
||||
if !exists || pss.Stream == nil {
|
||||
logger.Info().Msg("New Stream engaged as Heartbeat " + fmt.Sprintf("%v", proto) + " " + p.ID.String())
|
||||
s, err := h.NewStream(ctx, p.ID, proto)
|
||||
if err != nil {
|
||||
logger.Err(err).Msg(err.Error())
|
||||
return nil, 0, err
|
||||
}
|
||||
pss = &Stream{
|
||||
Stream: s,
|
||||
Expiry: time.Now().UTC().Add(2 * time.Minute),
|
||||
}
|
||||
streams[p.ID] = pss
|
||||
}
|
||||
|
||||
sentAt := time.Now()
|
||||
if err := json.NewEncoder(pss.Stream).Encode(&hb); err != nil {
|
||||
pss.Stream.Close()
|
||||
pss.Stream = nil
|
||||
return nil, 0, err
|
||||
}
|
||||
pss.Expiry = time.Now().UTC().Add(2 * time.Minute)
|
||||
|
||||
// Try to read a response (indexers that support bidirectional heartbeat respond).
|
||||
pss.Stream.SetReadDeadline(time.Now().Add(5 * time.Second))
|
||||
var resp HeartbeatResponse
|
||||
rtt := time.Since(sentAt)
|
||||
if err := json.NewDecoder(pss.Stream).Decode(&resp); err == nil {
|
||||
rtt = time.Since(sentAt)
|
||||
pss.Stream.SetReadDeadline(time.Time{})
|
||||
return &resp, rtt, nil
|
||||
}
|
||||
pss.Stream.SetReadDeadline(time.Time{})
|
||||
return nil, rtt, nil
|
||||
}
|
||||
|
||||
func AddStreamProtocol(ctx *context.Context, protoS ProtocolStream, h host.Host, proto protocol.ID, id pp.ID, mypid pp.ID, force bool, onStreamCreated *func(network.Stream)) ProtocolStream {
|
||||
@@ -509,332 +203,3 @@ func AddStreamProtocol(ctx *context.Context, protoS ProtocolStream, h host.Host,
|
||||
}
|
||||
return protoS
|
||||
}
|
||||
|
||||
type Heartbeat struct {
|
||||
Name string `json:"name"`
|
||||
Stream *Stream `json:"stream"`
|
||||
DID string `json:"did"`
|
||||
PeerID string `json:"peer_id"`
|
||||
Timestamp int64 `json:"timestamp"`
|
||||
IndexersBinded []string `json:"indexers_binded"`
|
||||
Score float64
|
||||
// Record carries a fresh signed PeerRecord (JSON) so the receiving indexer
|
||||
// can republish it to the DHT without an extra round-trip.
|
||||
// Only set by nodes (not indexers heartbeating other indexers).
|
||||
Record json.RawMessage `json:"record,omitempty"`
|
||||
}
|
||||
|
||||
func (hb *Heartbeat) ComputeIndexerScore(uptimeHours float64, bpms float64, diversity float64) {
|
||||
hb.Score = ((0.3 * uptimeHours) +
|
||||
(0.3 * bpms) +
|
||||
(0.4 * diversity)) * 100
|
||||
}
|
||||
|
||||
type HeartbeatInfo []struct {
|
||||
Info []byte `json:"info"`
|
||||
}
|
||||
|
||||
const ProtocolHeartbeat = "/opencloud/heartbeat/1.0"
|
||||
|
||||
// ProtocolBandwidthProbe is a dedicated short-lived stream used exclusively
|
||||
// for bandwidth/latency measurement. The handler echoes any bytes it receives.
|
||||
// All nodes and indexers register this handler so peers can measure them.
|
||||
const ProtocolBandwidthProbe = "/opencloud/probe/1.0"
|
||||
|
||||
// HandleBandwidthProbe echoes back everything written on the stream, then closes.
|
||||
// It is registered by all participants so the measuring side (the heartbeat receiver)
|
||||
// can open a dedicated probe stream and read the round-trip latency + throughput.
|
||||
func HandleBandwidthProbe(s network.Stream) {
|
||||
defer s.Close()
|
||||
s.SetDeadline(time.Now().Add(10 * time.Second))
|
||||
io.Copy(s, s) // echo every byte back to the sender
|
||||
}
|
||||
|
||||
// SendHeartbeat starts a goroutine that sends periodic heartbeats to peers.
|
||||
// recordFn, when provided, is called on each tick and its output is embedded in
|
||||
// the heartbeat as a fresh signed PeerRecord so the receiving indexer can
|
||||
// republish it to the DHT without an extra round-trip.
|
||||
// Pass no recordFn (or nil) for indexer→indexer / native heartbeats.
|
||||
func SendHeartbeat(ctx context.Context, proto protocol.ID, name string, h host.Host, ps ProtocolStream, peers map[string]*pp.AddrInfo, mu *sync.RWMutex, interval time.Duration, recordFn ...func() json.RawMessage) {
|
||||
logger := oclib.GetLogger()
|
||||
// isIndexerHB is true when this goroutine drives the indexer heartbeat.
|
||||
// isNativeHB is true when it drives the native heartbeat.
|
||||
isIndexerHB := mu == &StreamMuIndexes
|
||||
isNativeHB := mu == &StreamNativeMu
|
||||
var recFn func() json.RawMessage
|
||||
if len(recordFn) > 0 {
|
||||
recFn = recordFn[0]
|
||||
}
|
||||
go func() {
|
||||
logger.Info().Str("proto", string(proto)).Int("peers", len(peers)).Msg("heartbeat started")
|
||||
t := time.NewTicker(interval)
|
||||
defer t.Stop()
|
||||
|
||||
// doTick sends one round of heartbeats to the current peer snapshot.
|
||||
doTick := func() {
|
||||
// Build the heartbeat payload — snapshot current indexer addresses.
|
||||
StreamMuIndexes.RLock()
|
||||
addrs := make([]string, 0, len(StaticIndexers))
|
||||
for addr := range StaticIndexers {
|
||||
addrs = append(addrs, addr)
|
||||
}
|
||||
StreamMuIndexes.RUnlock()
|
||||
hb := Heartbeat{
|
||||
Name: name,
|
||||
PeerID: h.ID().String(),
|
||||
Timestamp: time.Now().UTC().Unix(),
|
||||
IndexersBinded: addrs,
|
||||
}
|
||||
if recFn != nil {
|
||||
hb.Record = recFn()
|
||||
}
|
||||
|
||||
// Snapshot the peer list under a read lock so we don't hold the
|
||||
// write lock during network I/O.
|
||||
if mu != nil {
|
||||
mu.RLock()
|
||||
}
|
||||
snapshot := make([]*pp.AddrInfo, 0, len(peers))
|
||||
for _, ix := range peers {
|
||||
snapshot = append(snapshot, ix)
|
||||
}
|
||||
if mu != nil {
|
||||
mu.RUnlock()
|
||||
}
|
||||
|
||||
for _, ix := range snapshot {
|
||||
wasConnected := h.Network().Connectedness(ix.ID) == network.Connected
|
||||
if err := sendHeartbeat(ctx, h, proto, ix, hb, ps, interval*time.Second); err != nil {
|
||||
// Step 3: heartbeat failed — remove from pool and trigger replenish.
|
||||
logger.Info().Str("peer", ix.ID.String()).Str("proto", string(proto)).Msg("[native] step 3 — heartbeat failed, removing peer from pool")
|
||||
|
||||
// Remove the dead peer and clean up its stream.
|
||||
// mu already covers ps when isIndexerHB (same mutex), so one
|
||||
// lock acquisition is sufficient — no re-entrant double-lock.
|
||||
if mu != nil {
|
||||
mu.Lock()
|
||||
}
|
||||
if ps[proto] != nil {
|
||||
if s, ok := ps[proto][ix.ID]; ok {
|
||||
if s.Stream != nil {
|
||||
s.Stream.Close()
|
||||
}
|
||||
delete(ps[proto], ix.ID)
|
||||
}
|
||||
}
|
||||
lostAddr := ""
|
||||
for addr, ad := range peers {
|
||||
if ad.ID == ix.ID {
|
||||
lostAddr = addr
|
||||
delete(peers, addr)
|
||||
break
|
||||
}
|
||||
}
|
||||
need := conf.GetConfig().MinIndexer - len(peers)
|
||||
remaining := len(peers)
|
||||
if mu != nil {
|
||||
mu.Unlock()
|
||||
}
|
||||
logger.Info().Int("remaining", remaining).Int("min", conf.GetConfig().MinIndexer).Int("need", need).Msg("[native] step 3 — pool state after removal")
|
||||
|
||||
// Step 4: ask the native for the missing indexer count.
|
||||
if isIndexerHB && conf.GetConfig().NativeIndexerAddresses != "" {
|
||||
if need < 1 {
|
||||
need = 1
|
||||
}
|
||||
logger.Info().Int("need", need).Msg("[native] step 3→4 — triggering replenish")
|
||||
go replenishIndexersFromNative(h, need)
|
||||
}
|
||||
|
||||
// Native heartbeat failed — find a replacement native.
|
||||
// Case 1: if the dead native was also serving as an indexer, evict it
|
||||
// from StaticIndexers immediately without waiting for the indexer HB tick.
|
||||
if isNativeHB {
|
||||
logger.Info().Str("addr", lostAddr).Msg("[native] step 3 — native heartbeat failed, triggering native replenish")
|
||||
if lostAddr != "" && conf.GetConfig().NativeIndexerAddresses != "" {
|
||||
StreamMuIndexes.Lock()
|
||||
if _, wasIndexer := StaticIndexers[lostAddr]; wasIndexer {
|
||||
delete(StaticIndexers, lostAddr)
|
||||
if s := StreamIndexers[ProtocolHeartbeat]; s != nil {
|
||||
if stream, ok := s[ix.ID]; ok {
|
||||
if stream.Stream != nil {
|
||||
stream.Stream.Close()
|
||||
}
|
||||
delete(s, ix.ID)
|
||||
}
|
||||
}
|
||||
idxNeed := conf.GetConfig().MinIndexer - len(StaticIndexers)
|
||||
StreamMuIndexes.Unlock()
|
||||
if idxNeed < 1 {
|
||||
idxNeed = 1
|
||||
}
|
||||
logger.Info().Str("addr", lostAddr).Msg("[native] dead native evicted from indexer pool, triggering replenish")
|
||||
go replenishIndexersFromNative(h, idxNeed)
|
||||
} else {
|
||||
StreamMuIndexes.Unlock()
|
||||
}
|
||||
}
|
||||
go replenishNativesFromPeers(h, lostAddr, proto)
|
||||
}
|
||||
} else {
|
||||
// Case 2: native-as-indexer reconnected after a restart.
|
||||
// If the peer was disconnected before this tick and the heartbeat just
|
||||
// succeeded (transparent reconnect), the native may have restarted with
|
||||
// blank state (responsiblePeers empty). Evict it from StaticIndexers and
|
||||
// re-request an assignment so the native re-tracks us properly and
|
||||
// runOffloadLoop can eventually migrate us to real indexers.
|
||||
if !wasConnected && isIndexerHB && conf.GetConfig().NativeIndexerAddresses != "" {
|
||||
StreamNativeMu.RLock()
|
||||
isNativeIndexer := false
|
||||
for _, ad := range StaticNatives {
|
||||
if ad.ID == ix.ID {
|
||||
isNativeIndexer = true
|
||||
break
|
||||
}
|
||||
}
|
||||
StreamNativeMu.RUnlock()
|
||||
if isNativeIndexer {
|
||||
if mu != nil {
|
||||
mu.Lock()
|
||||
}
|
||||
if ps[proto] != nil {
|
||||
if s, ok := ps[proto][ix.ID]; ok {
|
||||
if s.Stream != nil {
|
||||
s.Stream.Close()
|
||||
}
|
||||
delete(ps[proto], ix.ID)
|
||||
}
|
||||
}
|
||||
reconnectedAddr := ""
|
||||
for addr, ad := range peers {
|
||||
if ad.ID == ix.ID {
|
||||
reconnectedAddr = addr
|
||||
delete(peers, addr)
|
||||
break
|
||||
}
|
||||
}
|
||||
idxNeed := conf.GetConfig().MinIndexer - len(peers)
|
||||
if mu != nil {
|
||||
mu.Unlock()
|
||||
}
|
||||
if idxNeed < 1 {
|
||||
idxNeed = 1
|
||||
}
|
||||
logger.Info().Str("addr", reconnectedAddr).Str("peer", ix.ID.String()).Msg(
|
||||
"[native] native-as-indexer reconnected after restart — evicting and re-requesting assignment")
|
||||
go replenishIndexersFromNative(h, idxNeed)
|
||||
}
|
||||
}
|
||||
logger.Debug().Str("peer", ix.ID.String()).Str("proto", string(proto)).Msg("[native] step 2 — heartbeat sent ok")
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
for {
|
||||
select {
|
||||
case <-t.C:
|
||||
doTick()
|
||||
case <-indexerHeartbeatNudge:
|
||||
if isIndexerHB {
|
||||
logger.Info().Msg("[native] step 2 — nudge received, heartbeating new indexers immediately")
|
||||
doTick()
|
||||
}
|
||||
case <-nativeHeartbeatNudge:
|
||||
if isNativeHB {
|
||||
logger.Info().Msg("[native] native nudge received, heartbeating replacement native immediately")
|
||||
doTick()
|
||||
}
|
||||
case <-ctx.Done():
|
||||
return
|
||||
}
|
||||
}
|
||||
}()
|
||||
}
|
||||
|
||||
type ProtocolInfo struct {
|
||||
PersistantStream bool
|
||||
WaitResponse bool
|
||||
TTL time.Duration
|
||||
}
|
||||
|
||||
func TempStream(h host.Host, ad pp.AddrInfo, proto protocol.ID, did string, streams ProtocolStream, pts map[protocol.ID]*ProtocolInfo, mu *sync.RWMutex) (ProtocolStream, error) {
|
||||
expiry := 2 * time.Second
|
||||
if pts[proto] != nil {
|
||||
expiry = pts[proto].TTL
|
||||
}
|
||||
ctxTTL, _ := context.WithTimeout(context.Background(), expiry)
|
||||
if h.Network().Connectedness(ad.ID) != network.Connected {
|
||||
if err := h.Connect(ctxTTL, ad); err != nil {
|
||||
return streams, err
|
||||
}
|
||||
}
|
||||
if streams[proto] != nil && streams[proto][ad.ID] != nil {
|
||||
return streams, nil
|
||||
} else if s, err := h.NewStream(ctxTTL, ad.ID, proto); err == nil {
|
||||
mu.Lock()
|
||||
if streams[proto] == nil {
|
||||
streams[proto] = map[pp.ID]*Stream{}
|
||||
}
|
||||
mu.Unlock()
|
||||
time.AfterFunc(expiry, func() {
|
||||
mu.Lock()
|
||||
delete(streams[proto], ad.ID)
|
||||
mu.Unlock()
|
||||
})
|
||||
mu.Lock()
|
||||
streams[proto][ad.ID] = &Stream{
|
||||
DID: did,
|
||||
Stream: s,
|
||||
Expiry: time.Now().UTC().Add(expiry),
|
||||
}
|
||||
mu.Unlock()
|
||||
return streams, nil
|
||||
} else {
|
||||
return streams, err
|
||||
}
|
||||
}
|
||||
|
||||
func sendHeartbeat(ctx context.Context, h host.Host, proto protocol.ID, p *pp.AddrInfo,
|
||||
hb Heartbeat, ps ProtocolStream, interval time.Duration) error {
|
||||
logger := oclib.GetLogger()
|
||||
if ps[proto] == nil {
|
||||
ps[proto] = map[pp.ID]*Stream{}
|
||||
}
|
||||
streams := ps[proto]
|
||||
pss, exists := streams[p.ID]
|
||||
ctxTTL, cancel := context.WithTimeout(ctx, 3*interval)
|
||||
defer cancel()
|
||||
// Connect si nécessaire
|
||||
if h.Network().Connectedness(p.ID) != network.Connected {
|
||||
if err := h.Connect(ctxTTL, *p); err != nil {
|
||||
logger.Err(err)
|
||||
return err
|
||||
}
|
||||
exists = false // on devra recréer le stream
|
||||
}
|
||||
// Crée le stream si inexistant ou fermé
|
||||
if !exists || pss.Stream == nil {
|
||||
logger.Info().Msg("New Stream engaged as Heartbeat " + fmt.Sprintf("%v", proto) + " " + p.ID.String())
|
||||
s, err := h.NewStream(ctx, p.ID, proto)
|
||||
if err != nil {
|
||||
logger.Err(err)
|
||||
return err
|
||||
}
|
||||
pss = &Stream{
|
||||
Stream: s,
|
||||
Expiry: time.Now().UTC().Add(2 * time.Minute),
|
||||
}
|
||||
streams[p.ID] = pss
|
||||
}
|
||||
|
||||
// Envoie le heartbeat
|
||||
ss := json.NewEncoder(pss.Stream)
|
||||
err := ss.Encode(&hb)
|
||||
if err != nil {
|
||||
pss.Stream.Close()
|
||||
pss.Stream = nil // recréera au prochain tick
|
||||
return err
|
||||
}
|
||||
pss.Expiry = time.Now().UTC().Add(2 * time.Minute)
|
||||
return nil
|
||||
}
|
||||
|
||||
199
daemons/node/common/consensus.go
Normal file
199
daemons/node/common/consensus.go
Normal file
@@ -0,0 +1,199 @@
|
||||
package common
|
||||
|
||||
import (
|
||||
"context"
|
||||
"encoding/json"
|
||||
"sort"
|
||||
"time"
|
||||
|
||||
oclib "cloud.o-forge.io/core/oc-lib"
|
||||
"github.com/libp2p/go-libp2p/core/host"
|
||||
"github.com/libp2p/go-libp2p/core/network"
|
||||
pp "github.com/libp2p/go-libp2p/core/peer"
|
||||
)
|
||||
|
||||
// ProtocolIndexerCandidates is opened by a node toward its remaining indexers
|
||||
// to request candidate replacement indexers after an ejection event.
|
||||
const ProtocolIndexerCandidates = "/opencloud/indexer/candidates/1.0"
|
||||
|
||||
// IndexerCandidatesRequest is sent by a node to one of its indexers.
|
||||
// Count is how many candidates are needed.
|
||||
type IndexerCandidatesRequest struct {
|
||||
Count int `json:"count"`
|
||||
}
|
||||
|
||||
// IndexerCandidatesResponse carries a random sample of known indexers from
|
||||
// the responding indexer's DHT cache.
|
||||
type IndexerCandidatesResponse struct {
|
||||
Candidates []pp.AddrInfo `json:"candidates"`
|
||||
}
|
||||
|
||||
// TriggerConsensus asks each remaining indexer for a random pool of candidates,
|
||||
// scores them asynchronously via a one-shot probe heartbeat, and admits the
|
||||
// best ones to StaticIndexers. Falls back to DHT replenishment for any gap.
|
||||
//
|
||||
// Must be called in a goroutine — it blocks until all probes have returned
|
||||
// (or timed out), which can take up to ~10s.
|
||||
func TriggerConsensus(h host.Host, remaining []pp.AddrInfo, need int) {
|
||||
if need <= 0 || len(remaining) == 0 {
|
||||
return
|
||||
}
|
||||
logger := oclib.GetLogger()
|
||||
logger.Info().Int("voters", len(remaining)).Int("need", need).
|
||||
Msg("[consensus] starting indexer candidate consensus")
|
||||
|
||||
// Phase 1 — collect candidates from all remaining indexers in parallel.
|
||||
type collectResult struct{ candidates []pp.AddrInfo }
|
||||
collectCh := make(chan collectResult, len(remaining))
|
||||
for _, ai := range remaining {
|
||||
go func(ai pp.AddrInfo) {
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
|
||||
defer cancel()
|
||||
s, err := h.NewStream(ctx, ai.ID, ProtocolIndexerCandidates)
|
||||
if err != nil {
|
||||
collectCh <- collectResult{}
|
||||
return
|
||||
}
|
||||
defer s.Close()
|
||||
s.SetDeadline(time.Now().Add(5 * time.Second))
|
||||
if err := json.NewEncoder(s).Encode(IndexerCandidatesRequest{Count: need + 2}); err != nil {
|
||||
collectCh <- collectResult{}
|
||||
return
|
||||
}
|
||||
var resp IndexerCandidatesResponse
|
||||
if err := json.NewDecoder(s).Decode(&resp); err != nil {
|
||||
collectCh <- collectResult{}
|
||||
return
|
||||
}
|
||||
collectCh <- collectResult{candidates: resp.Candidates}
|
||||
}(ai)
|
||||
}
|
||||
|
||||
// Merge and deduplicate, excluding indexers already in the pool.
|
||||
seen := map[pp.ID]struct{}{}
|
||||
for _, ai := range Indexers.GetAddrIDs() {
|
||||
seen[ai] = struct{}{}
|
||||
|
||||
}
|
||||
var candidates []pp.AddrInfo
|
||||
for range remaining {
|
||||
r := <-collectCh
|
||||
for _, ai := range r.candidates {
|
||||
if _, dup := seen[ai.ID]; !dup {
|
||||
seen[ai.ID] = struct{}{}
|
||||
candidates = append(candidates, ai)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
if len(candidates) == 0 {
|
||||
logger.Info().Msg("[consensus] no candidates from voters, falling back to DHT")
|
||||
replenishIndexersFromDHT(h, need)
|
||||
return
|
||||
}
|
||||
logger.Info().Int("candidates", len(candidates)).Msg("[consensus] scoring candidates")
|
||||
|
||||
// Phase 2 — score all candidates in parallel via a one-shot probe heartbeat.
|
||||
type scoreResult struct {
|
||||
ai pp.AddrInfo
|
||||
score float64
|
||||
}
|
||||
scoreCh := make(chan scoreResult, len(candidates))
|
||||
for _, ai := range candidates {
|
||||
go func(ai pp.AddrInfo) {
|
||||
resp, rtt, err := probeIndexer(h, ai)
|
||||
if err != nil {
|
||||
scoreCh <- scoreResult{ai: ai, score: 0}
|
||||
return
|
||||
}
|
||||
scoreCh <- scoreResult{ai: ai, score: quickScore(resp, rtt)}
|
||||
}(ai)
|
||||
}
|
||||
|
||||
results := make([]scoreResult, 0, len(candidates))
|
||||
for range candidates {
|
||||
results = append(results, <-scoreCh)
|
||||
}
|
||||
|
||||
// Sort descending by quick score, admit top `need` above the minimum bar.
|
||||
sort.Slice(results, func(i, j int) bool { return results[i].score > results[j].score })
|
||||
minQ := dynamicMinScore(0) // fresh peer: threshold starts at 20
|
||||
|
||||
admitted := 0
|
||||
for _, res := range results {
|
||||
if admitted >= need {
|
||||
break
|
||||
}
|
||||
if res.score < minQ {
|
||||
break // sorted desc: everything after is worse
|
||||
}
|
||||
key := addrKey(res.ai)
|
||||
if Indexers.ExistsAddr(key) {
|
||||
continue // already in pool (race with heartbeat path)
|
||||
}
|
||||
cpy := res.ai
|
||||
Indexers.SetAddr(key, &cpy)
|
||||
admitted++
|
||||
}
|
||||
|
||||
if admitted > 0 {
|
||||
logger.Info().Int("admitted", admitted).Msg("[consensus] candidates admitted to pool")
|
||||
Indexers.NudgeIt()
|
||||
}
|
||||
|
||||
// Fill any remaining gap with DHT discovery.
|
||||
if gap := need - admitted; gap > 0 {
|
||||
logger.Info().Int("gap", gap).Msg("[consensus] gap after consensus, falling back to DHT")
|
||||
replenishIndexersFromDHT(h, gap)
|
||||
}
|
||||
}
|
||||
|
||||
// probeIndexer dials the candidate, sends one lightweight heartbeat, and
|
||||
// returns the HeartbeatResponse (nil if the indexer doesn't support it) and RTT.
|
||||
func probeIndexer(h host.Host, ai pp.AddrInfo) (*HeartbeatResponse, time.Duration, error) {
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 8*time.Second)
|
||||
defer cancel()
|
||||
if h.Network().Connectedness(ai.ID) != network.Connected {
|
||||
if err := h.Connect(ctx, ai); err != nil {
|
||||
return nil, 0, err
|
||||
}
|
||||
}
|
||||
s, err := h.NewStream(ctx, ai.ID, ProtocolHeartbeat)
|
||||
if err != nil {
|
||||
return nil, 0, err
|
||||
}
|
||||
defer s.Close()
|
||||
|
||||
hb := Heartbeat{PeerID: h.ID().String(), Timestamp: time.Now().UTC().Unix()}
|
||||
s.SetWriteDeadline(time.Now().Add(3 * time.Second))
|
||||
if err := json.NewEncoder(s).Encode(hb); err != nil {
|
||||
return nil, 0, err
|
||||
}
|
||||
s.SetWriteDeadline(time.Time{})
|
||||
|
||||
sentAt := time.Now()
|
||||
s.SetReadDeadline(time.Now().Add(5 * time.Second))
|
||||
var resp HeartbeatResponse
|
||||
if err := json.NewDecoder(s).Decode(&resp); err != nil {
|
||||
// Indexer connected but no response: connection itself is the signal.
|
||||
return nil, time.Since(sentAt), nil
|
||||
}
|
||||
return &resp, time.Since(sentAt), nil
|
||||
}
|
||||
|
||||
// quickScore computes a lightweight score [0,100] from a probe result.
|
||||
// Uses only fill rate (inverse) and latency — the two signals available
|
||||
// without a full heartbeat history.
|
||||
func quickScore(resp *HeartbeatResponse, rtt time.Duration) float64 {
|
||||
maxRTT := BaseRoundTrip * 10
|
||||
latencyScore := 1.0 - float64(rtt)/float64(maxRTT)
|
||||
if latencyScore < 0 {
|
||||
latencyScore = 0
|
||||
}
|
||||
if resp == nil {
|
||||
// Connection worked but no response (old indexer): moderate score.
|
||||
return latencyScore * 50
|
||||
}
|
||||
fillScore := 1.0 - resp.FillRate // prefer less-loaded indexers
|
||||
return (0.5*latencyScore + 0.5*fillScore) * 100
|
||||
}
|
||||
219
daemons/node/common/dht_discovery.go
Normal file
219
daemons/node/common/dht_discovery.go
Normal file
@@ -0,0 +1,219 @@
|
||||
package common
|
||||
|
||||
import (
|
||||
"context"
|
||||
"math/rand"
|
||||
"strings"
|
||||
"time"
|
||||
|
||||
oclib "cloud.o-forge.io/core/oc-lib"
|
||||
"github.com/ipfs/go-cid"
|
||||
dht "github.com/libp2p/go-libp2p-kad-dht"
|
||||
"github.com/libp2p/go-libp2p/core/host"
|
||||
pp "github.com/libp2p/go-libp2p/core/peer"
|
||||
ma "github.com/multiformats/go-multiaddr"
|
||||
mh "github.com/multiformats/go-multihash"
|
||||
)
|
||||
|
||||
// FilterLoopbackAddrs strips loopback (127.x, ::1) and unspecified addresses
|
||||
// from an AddrInfo so we never hand peers an address they cannot dial externally.
|
||||
func FilterLoopbackAddrs(ai pp.AddrInfo) pp.AddrInfo {
|
||||
filtered := make([]ma.Multiaddr, 0, len(ai.Addrs))
|
||||
for _, addr := range ai.Addrs {
|
||||
ip, err := ExtractIP(addr.String())
|
||||
if err != nil || ip.IsLoopback() || ip.IsUnspecified() {
|
||||
continue
|
||||
}
|
||||
filtered = append(filtered, addr)
|
||||
}
|
||||
return pp.AddrInfo{ID: ai.ID, Addrs: filtered}
|
||||
}
|
||||
|
||||
// RecommendedHeartbeatInterval is the target period between heartbeat ticks.
|
||||
// Indexers use this as the DHT Provide refresh interval.
|
||||
const RecommendedHeartbeatInterval = 60 * time.Second
|
||||
|
||||
// discoveryDHT is the DHT instance used for indexer discovery.
|
||||
// Set by SetDiscoveryDHT once the indexer service initialises its DHT.
|
||||
var discoveryDHT *dht.IpfsDHT
|
||||
|
||||
// SetDiscoveryDHT stores the DHT instance used by replenishIndexersFromDHT.
|
||||
// Called by NewIndexerService once the DHT is ready.
|
||||
func SetDiscoveryDHT(d *dht.IpfsDHT) {
|
||||
discoveryDHT = d
|
||||
}
|
||||
|
||||
// initNodeDHT creates a lightweight DHT client for pure nodes (no IndexerService).
|
||||
// Uses the seed indexers as bootstrap peers. Called lazily by ConnectToIndexers
|
||||
// when discoveryDHT is still nil after the initial warm-up delay.
|
||||
func initNodeDHT(h host.Host, seeds []Entry) {
|
||||
logger := oclib.GetLogger()
|
||||
bootstrapPeers := []pp.AddrInfo{}
|
||||
for _, s := range seeds {
|
||||
bootstrapPeers = append(bootstrapPeers, *s.Info)
|
||||
}
|
||||
d, err := dht.New(context.Background(), h,
|
||||
dht.Mode(dht.ModeClient),
|
||||
dht.ProtocolPrefix("oc"),
|
||||
dht.BootstrapPeers(bootstrapPeers...),
|
||||
)
|
||||
if err != nil {
|
||||
logger.Warn().Err(err).Msg("[dht] node DHT client init failed")
|
||||
return
|
||||
}
|
||||
SetDiscoveryDHT(d)
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 15*time.Second)
|
||||
defer cancel()
|
||||
if err := d.Bootstrap(ctx); err != nil {
|
||||
logger.Warn().Err(err).Msg("[dht] node DHT client bootstrap failed")
|
||||
}
|
||||
logger.Info().Msg("[dht] node DHT client ready")
|
||||
}
|
||||
|
||||
// IndexerCID returns the well-known CID under which all indexers advertise.
|
||||
func IndexerCID() cid.Cid {
|
||||
h, _ := mh.Sum([]byte("/opencloud/indexers"), mh.SHA2_256, -1)
|
||||
return cid.NewCidV1(cid.Raw, h)
|
||||
}
|
||||
|
||||
// DiscoverIndexersFromDHT uses the DHT to find up to count indexers advertising
|
||||
// under the well-known key. Excludes self. Resolves addresses when the provider
|
||||
// record carries none.
|
||||
func DiscoverIndexersFromDHT(h host.Host, d *dht.IpfsDHT, count int) []pp.AddrInfo {
|
||||
logger := oclib.GetLogger()
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
|
||||
defer cancel()
|
||||
|
||||
c := IndexerCID()
|
||||
ch := d.FindProvidersAsync(ctx, c, count*2)
|
||||
seen := map[pp.ID]struct{}{}
|
||||
var results []pp.AddrInfo
|
||||
for ai := range ch {
|
||||
if ai.ID == h.ID() {
|
||||
continue
|
||||
}
|
||||
if _, dup := seen[ai.ID]; dup {
|
||||
continue
|
||||
}
|
||||
seen[ai.ID] = struct{}{}
|
||||
if len(ai.Addrs) == 0 {
|
||||
resolved, err := d.FindPeer(ctx, ai.ID)
|
||||
if err != nil {
|
||||
logger.Warn().Str("peer", ai.ID.String()).Msg("[dht] no addrs and FindPeer failed, skipping")
|
||||
continue
|
||||
}
|
||||
ai = resolved
|
||||
}
|
||||
ai = FilterLoopbackAddrs(ai)
|
||||
if len(ai.Addrs) == 0 {
|
||||
continue
|
||||
}
|
||||
results = append(results, ai)
|
||||
if len(results) >= count {
|
||||
break
|
||||
}
|
||||
}
|
||||
logger.Info().Int("found", len(results)).Msg("[dht] indexer discovery complete")
|
||||
return results
|
||||
}
|
||||
|
||||
// SelectByFillRate picks up to want providers using fill-rate weighted random
|
||||
// selection w(F) = F*(1-F) — peaks at F=0.5, prefers less-loaded indexers.
|
||||
// Providers with unknown fill rate receive F=0.5 (neutral prior).
|
||||
// Enforces subnet /24 diversity: at most one indexer per /24.
|
||||
func SelectByFillRate(providers []pp.AddrInfo, fillRates map[pp.ID]float64, want int) []pp.AddrInfo {
|
||||
if len(providers) == 0 || want <= 0 {
|
||||
return nil
|
||||
}
|
||||
type weighted struct {
|
||||
ai pp.AddrInfo
|
||||
weight float64
|
||||
}
|
||||
ws := make([]weighted, 0, len(providers))
|
||||
for _, ai := range providers {
|
||||
f, ok := fillRates[ai.ID]
|
||||
if !ok {
|
||||
f = 0.5
|
||||
}
|
||||
ws = append(ws, weighted{ai: ai, weight: f * (1 - f)})
|
||||
}
|
||||
// Shuffle first for fairness among equal-weight peers.
|
||||
rand.Shuffle(len(ws), func(i, j int) { ws[i], ws[j] = ws[j], ws[i] })
|
||||
// Sort descending by weight (simple insertion sort — small N).
|
||||
for i := 1; i < len(ws); i++ {
|
||||
for j := i; j > 0 && ws[j].weight > ws[j-1].weight; j-- {
|
||||
ws[j], ws[j-1] = ws[j-1], ws[j]
|
||||
}
|
||||
}
|
||||
|
||||
subnets := map[string]struct{}{}
|
||||
var selected []pp.AddrInfo
|
||||
for _, w := range ws {
|
||||
if len(selected) >= want {
|
||||
break
|
||||
}
|
||||
subnet := subnetOf(w.ai)
|
||||
if subnet != "" {
|
||||
if _, dup := subnets[subnet]; dup {
|
||||
continue
|
||||
}
|
||||
subnets[subnet] = struct{}{}
|
||||
}
|
||||
selected = append(selected, w.ai)
|
||||
}
|
||||
return selected
|
||||
}
|
||||
|
||||
// subnetOf returns the /24 subnet string for the first non-loopback address of ai.
|
||||
func subnetOf(ai pp.AddrInfo) string {
|
||||
for _, ma := range ai.Addrs {
|
||||
ip, err := ExtractIP(ma.String())
|
||||
if err != nil || ip.IsLoopback() {
|
||||
continue
|
||||
}
|
||||
parts := strings.Split(ip.String(), ".")
|
||||
if len(parts) >= 3 {
|
||||
return parts[0] + "." + parts[1] + "." + parts[2]
|
||||
}
|
||||
}
|
||||
return ""
|
||||
}
|
||||
|
||||
// replenishIndexersFromDHT is called when an indexer heartbeat fails and more
|
||||
// indexers are needed. Queries the DHT and adds fresh entries to StaticIndexers.
|
||||
func replenishIndexersFromDHT(h host.Host, need int) {
|
||||
if need <= 0 || discoveryDHT == nil {
|
||||
return
|
||||
}
|
||||
logger := oclib.GetLogger()
|
||||
logger.Info().Int("need", need).Msg("[dht] replenishing indexer pool from DHT")
|
||||
|
||||
providers := DiscoverIndexersFromDHT(h, discoveryDHT, need*3)
|
||||
selected := SelectByFillRate(providers, nil, need)
|
||||
if len(selected) == 0 {
|
||||
logger.Warn().Msg("[dht] no indexers found in DHT for replenishment")
|
||||
return
|
||||
}
|
||||
|
||||
added := 0
|
||||
for _, ai := range selected {
|
||||
addr := addrKey(ai)
|
||||
if !Indexers.ExistsAddr(addr) {
|
||||
adCopy := ai
|
||||
Indexers.SetAddr(addr, &adCopy)
|
||||
added++
|
||||
}
|
||||
}
|
||||
|
||||
if added > 0 {
|
||||
logger.Info().Int("added", added).Msg("[dht] indexers added from DHT")
|
||||
Indexers.NudgeIt()
|
||||
}
|
||||
}
|
||||
|
||||
// addrKey returns the canonical map key for an AddrInfo.
|
||||
// The PeerID is used as key so the same peer is never stored twice regardless
|
||||
// of which of its addresses was seen first.
|
||||
func addrKey(ai pp.AddrInfo) string {
|
||||
return ai.ID.String()
|
||||
}
|
||||
@@ -4,6 +4,7 @@ import (
|
||||
"context"
|
||||
|
||||
"cloud.o-forge.io/core/oc-lib/models/peer"
|
||||
pubsub "github.com/libp2p/go-libp2p-pubsub"
|
||||
)
|
||||
|
||||
type HeartBeatStreamed interface {
|
||||
@@ -12,4 +13,5 @@ type HeartBeatStreamed interface {
|
||||
|
||||
type DiscoveryPeer interface {
|
||||
GetPeerRecord(ctx context.Context, key string) ([]*peer.Peer, error)
|
||||
GetPubSub(topicName string) *pubsub.Topic
|
||||
}
|
||||
|
||||
@@ -1,777 +0,0 @@
|
||||
package common
|
||||
|
||||
import (
|
||||
"context"
|
||||
"encoding/json"
|
||||
"errors"
|
||||
"math/rand"
|
||||
"oc-discovery/conf"
|
||||
"strings"
|
||||
"sync"
|
||||
"time"
|
||||
|
||||
oclib "cloud.o-forge.io/core/oc-lib"
|
||||
"github.com/libp2p/go-libp2p/core/host"
|
||||
pp "github.com/libp2p/go-libp2p/core/peer"
|
||||
"github.com/libp2p/go-libp2p/core/protocol"
|
||||
)
|
||||
|
||||
const (
|
||||
ProtocolNativeSubscription = "/opencloud/native/subscribe/1.0"
|
||||
ProtocolNativeGetIndexers = "/opencloud/native/indexers/1.0"
|
||||
// ProtocolNativeConsensus is used by nodes/indexers to cross-validate an indexer
|
||||
// pool against all configured native peers.
|
||||
ProtocolNativeConsensus = "/opencloud/native/consensus/1.0"
|
||||
RecommendedHeartbeatInterval = 60 * time.Second
|
||||
|
||||
// TopicIndexerRegistry is the PubSub topic used by native indexers to gossip
|
||||
// newly registered indexer PeerIDs to neighbouring natives.
|
||||
TopicIndexerRegistry = "oc-indexer-registry"
|
||||
|
||||
// consensusQueryTimeout is the per-native timeout for a consensus query.
|
||||
consensusQueryTimeout = 3 * time.Second
|
||||
// consensusCollectTimeout is the total wait for all native responses.
|
||||
consensusCollectTimeout = 4 * time.Second
|
||||
)
|
||||
|
||||
// ConsensusRequest is sent by a node/indexer to a native to validate a candidate
|
||||
// indexer list. The native replies with what it trusts and what it suggests instead.
|
||||
type ConsensusRequest struct {
|
||||
Candidates []string `json:"candidates"`
|
||||
}
|
||||
|
||||
// ConsensusResponse is returned by a native during a consensus challenge.
|
||||
// Trusted = candidates the native considers alive.
|
||||
// Suggestions = extras the native knows and trusts but that were not in the candidate list.
|
||||
type ConsensusResponse struct {
|
||||
Trusted []string `json:"trusted"`
|
||||
Suggestions []string `json:"suggestions,omitempty"`
|
||||
}
|
||||
|
||||
// IndexerRegistration is sent by an indexer to a native to signal its alive state.
|
||||
// Only Addr is required; PeerID is derived from it if omitted.
|
||||
type IndexerRegistration struct {
|
||||
PeerID string `json:"peer_id,omitempty"`
|
||||
Addr string `json:"addr"`
|
||||
}
|
||||
|
||||
// GetIndexersRequest asks a native for a pool of live indexers.
|
||||
type GetIndexersRequest struct {
|
||||
Count int `json:"count"`
|
||||
From string `json:"from"`
|
||||
}
|
||||
|
||||
// GetIndexersResponse is returned by the native with live indexer multiaddrs.
|
||||
type GetIndexersResponse struct {
|
||||
Indexers []string `json:"indexers"`
|
||||
IsSelfFallback bool `json:"is_self_fallback,omitempty"`
|
||||
}
|
||||
|
||||
var StaticNatives = map[string]*pp.AddrInfo{}
|
||||
var StreamNativeMu sync.RWMutex
|
||||
var StreamNatives ProtocolStream = ProtocolStream{}
|
||||
|
||||
// nativeHeartbeatOnce ensures we start exactly one long-lived heartbeat goroutine
|
||||
// toward the native mesh, even when ConnectToNatives is called from recovery paths.
|
||||
var nativeHeartbeatOnce sync.Once
|
||||
|
||||
// nativeMeshHeartbeatOnce guards the native-to-native heartbeat goroutine started
|
||||
// by EnsureNativePeers so only one goroutine covers the whole StaticNatives map.
|
||||
var nativeMeshHeartbeatOnce sync.Once
|
||||
|
||||
// ConnectToNatives is the initial setup for nodes/indexers in native mode:
|
||||
// 1. Parses native addresses → StaticNatives.
|
||||
// 2. Starts a single long-lived heartbeat goroutine toward the native mesh.
|
||||
// 3. Fetches an initial indexer pool from the first responsive native.
|
||||
// 4. Runs consensus when real (non-fallback) indexers are returned.
|
||||
// 5. Replaces StaticIndexers with the confirmed pool.
|
||||
func ConnectToNatives(h host.Host, minIndexer int, maxIndexer int, myPID pp.ID) error {
|
||||
logger := oclib.GetLogger()
|
||||
logger.Info().Msg("[native] step 1 — parsing native addresses")
|
||||
|
||||
// Parse native addresses — safe to call multiple times.
|
||||
StreamNativeMu.Lock()
|
||||
orderedAddrs := []string{}
|
||||
for _, addr := range strings.Split(conf.GetConfig().NativeIndexerAddresses, ",") {
|
||||
addr = strings.TrimSpace(addr)
|
||||
if addr == "" {
|
||||
continue
|
||||
}
|
||||
ad, err := pp.AddrInfoFromString(addr)
|
||||
if err != nil {
|
||||
logger.Err(err).Msg("[native] step 1 — invalid native addr")
|
||||
continue
|
||||
}
|
||||
StaticNatives[addr] = ad
|
||||
orderedAddrs = append(orderedAddrs, addr)
|
||||
logger.Info().Str("addr", addr).Msg("[native] step 1 — native registered")
|
||||
}
|
||||
if len(StaticNatives) == 0 {
|
||||
StreamNativeMu.Unlock()
|
||||
return errors.New("no valid native addresses configured")
|
||||
}
|
||||
StreamNativeMu.Unlock()
|
||||
logger.Info().Int("count", len(orderedAddrs)).Msg("[native] step 1 — natives parsed")
|
||||
|
||||
// Step 1: one long-lived heartbeat to each native.
|
||||
nativeHeartbeatOnce.Do(func() {
|
||||
logger.Info().Msg("[native] step 1 — starting long-lived heartbeat to native mesh")
|
||||
SendHeartbeat(context.Background(), ProtocolHeartbeat,
|
||||
conf.GetConfig().Name, h, StreamNatives, StaticNatives, &StreamNativeMu, 20*time.Second)
|
||||
})
|
||||
|
||||
// Fetch initial pool from the first responsive native.
|
||||
logger.Info().Int("want", maxIndexer).Msg("[native] step 1 — fetching indexer pool from native")
|
||||
candidates, isFallback := fetchIndexersFromNative(h, orderedAddrs, maxIndexer)
|
||||
if len(candidates) == 0 {
|
||||
logger.Warn().Msg("[native] step 1 — no candidates returned by any native")
|
||||
if minIndexer > 0 {
|
||||
return errors.New("ConnectToNatives: no indexers available from any native")
|
||||
}
|
||||
return nil
|
||||
}
|
||||
logger.Info().Int("candidates", len(candidates)).Bool("fallback", isFallback).Msg("[native] step 1 — pool received")
|
||||
|
||||
// Step 2: populate StaticIndexers — consensus for real indexers, direct for fallback.
|
||||
pool := resolvePool(h, candidates, isFallback, maxIndexer)
|
||||
replaceStaticIndexers(pool)
|
||||
|
||||
StreamMuIndexes.RLock()
|
||||
indexerCount := len(StaticIndexers)
|
||||
StreamMuIndexes.RUnlock()
|
||||
logger.Info().Int("pool_size", indexerCount).Msg("[native] step 2 — StaticIndexers replaced")
|
||||
|
||||
if minIndexer > 0 && indexerCount < minIndexer {
|
||||
return errors.New("not enough majority-confirmed indexers available")
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
// replenishIndexersFromNative is called when an indexer heartbeat fails (step 3→4).
|
||||
// It asks the native for exactly `need` replacement indexers, runs consensus when
|
||||
// real indexers are returned, and adds the results to StaticIndexers without
|
||||
// clearing the existing pool.
|
||||
func replenishIndexersFromNative(h host.Host, need int) {
|
||||
if need <= 0 {
|
||||
return
|
||||
}
|
||||
logger := oclib.GetLogger()
|
||||
logger.Info().Int("need", need).Msg("[native] step 4 — replenishing indexer pool from native")
|
||||
|
||||
StreamNativeMu.RLock()
|
||||
addrs := make([]string, 0, len(StaticNatives))
|
||||
for addr := range StaticNatives {
|
||||
addrs = append(addrs, addr)
|
||||
}
|
||||
StreamNativeMu.RUnlock()
|
||||
|
||||
candidates, isFallback := fetchIndexersFromNative(h, addrs, need)
|
||||
if len(candidates) == 0 {
|
||||
logger.Warn().Msg("[native] step 4 — no candidates returned by any native")
|
||||
return
|
||||
}
|
||||
logger.Info().Int("candidates", len(candidates)).Bool("fallback", isFallback).Msg("[native] step 4 — candidates received")
|
||||
|
||||
pool := resolvePool(h, candidates, isFallback, need)
|
||||
if len(pool) == 0 {
|
||||
logger.Warn().Msg("[native] step 4 — consensus yielded no confirmed indexers")
|
||||
return
|
||||
}
|
||||
|
||||
// Add new indexers to the pool — do NOT clear existing ones.
|
||||
StreamMuIndexes.Lock()
|
||||
for addr, ad := range pool {
|
||||
StaticIndexers[addr] = ad
|
||||
}
|
||||
total := len(StaticIndexers)
|
||||
|
||||
StreamMuIndexes.Unlock()
|
||||
logger.Info().Int("added", len(pool)).Int("total", total).Msg("[native] step 4 — pool replenished")
|
||||
|
||||
// Nudge the heartbeat goroutine to connect immediately instead of waiting
|
||||
// for the next 20s tick.
|
||||
NudgeIndexerHeartbeat()
|
||||
logger.Info().Msg("[native] step 4 — heartbeat goroutine nudged")
|
||||
}
|
||||
|
||||
// fetchIndexersFromNative opens a ProtocolNativeGetIndexers stream to the first
|
||||
// responsive native and returns the candidate list and fallback flag.
|
||||
func fetchIndexersFromNative(h host.Host, nativeAddrs []string, count int) (candidates []string, isFallback bool) {
|
||||
logger := oclib.GetLogger()
|
||||
for _, addr := range nativeAddrs {
|
||||
ad, err := pp.AddrInfoFromString(addr)
|
||||
if err != nil {
|
||||
logger.Warn().Str("addr", addr).Msg("[native] fetch — skipping invalid addr")
|
||||
continue
|
||||
}
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
|
||||
if err := h.Connect(ctx, *ad); err != nil {
|
||||
cancel()
|
||||
logger.Warn().Str("addr", addr).Err(err).Msg("[native] fetch — connect failed")
|
||||
continue
|
||||
}
|
||||
s, err := h.NewStream(ctx, ad.ID, ProtocolNativeGetIndexers)
|
||||
cancel()
|
||||
if err != nil {
|
||||
logger.Warn().Str("addr", addr).Err(err).Msg("[native] fetch — stream open failed")
|
||||
continue
|
||||
}
|
||||
req := GetIndexersRequest{Count: count, From: h.ID().String()}
|
||||
if encErr := json.NewEncoder(s).Encode(req); encErr != nil {
|
||||
s.Close()
|
||||
logger.Warn().Str("addr", addr).Err(encErr).Msg("[native] fetch — encode request failed")
|
||||
continue
|
||||
}
|
||||
var resp GetIndexersResponse
|
||||
if decErr := json.NewDecoder(s).Decode(&resp); decErr != nil {
|
||||
s.Close()
|
||||
logger.Warn().Str("addr", addr).Err(decErr).Msg("[native] fetch — decode response failed")
|
||||
continue
|
||||
}
|
||||
s.Close()
|
||||
logger.Info().Str("native", addr).Int("indexers", len(resp.Indexers)).Bool("fallback", resp.IsSelfFallback).Msg("[native] fetch — response received")
|
||||
return resp.Indexers, resp.IsSelfFallback
|
||||
}
|
||||
logger.Warn().Msg("[native] fetch — no native responded")
|
||||
return nil, false
|
||||
}
|
||||
|
||||
// resolvePool converts a candidate list to a validated addr→AddrInfo map.
|
||||
// When isFallback is true the native itself is the indexer — no consensus needed.
|
||||
// When isFallback is false, consensus is run before accepting the candidates.
|
||||
func resolvePool(h host.Host, candidates []string, isFallback bool, maxIndexer int) map[string]*pp.AddrInfo {
|
||||
logger := oclib.GetLogger()
|
||||
if isFallback {
|
||||
logger.Info().Strs("addrs", candidates).Msg("[native] resolve — fallback mode, skipping consensus")
|
||||
pool := make(map[string]*pp.AddrInfo, len(candidates))
|
||||
for _, addr := range candidates {
|
||||
ad, err := pp.AddrInfoFromString(addr)
|
||||
if err != nil {
|
||||
continue
|
||||
}
|
||||
pool[addr] = ad
|
||||
}
|
||||
return pool
|
||||
}
|
||||
|
||||
// Round 1.
|
||||
logger.Info().Int("candidates", len(candidates)).Msg("[native] resolve — consensus round 1")
|
||||
confirmed, suggestions := clientSideConsensus(h, candidates)
|
||||
logger.Info().Int("confirmed", len(confirmed)).Int("suggestions", len(suggestions)).Msg("[native] resolve — consensus round 1 done")
|
||||
|
||||
// Round 2: fill gaps from suggestions if below target.
|
||||
if len(confirmed) < maxIndexer && len(suggestions) > 0 {
|
||||
rand.Shuffle(len(suggestions), func(i, j int) { suggestions[i], suggestions[j] = suggestions[j], suggestions[i] })
|
||||
gap := maxIndexer - len(confirmed)
|
||||
if gap > len(suggestions) {
|
||||
gap = len(suggestions)
|
||||
}
|
||||
logger.Info().Int("gap", gap).Msg("[native] resolve — consensus round 2 (filling gaps)")
|
||||
confirmed2, _ := clientSideConsensus(h, append(confirmed, suggestions[:gap]...))
|
||||
if len(confirmed2) > 0 {
|
||||
confirmed = confirmed2
|
||||
}
|
||||
logger.Info().Int("confirmed", len(confirmed)).Msg("[native] resolve — consensus round 2 done")
|
||||
}
|
||||
|
||||
pool := make(map[string]*pp.AddrInfo, len(confirmed))
|
||||
for _, addr := range confirmed {
|
||||
ad, err := pp.AddrInfoFromString(addr)
|
||||
if err != nil {
|
||||
continue
|
||||
}
|
||||
pool[addr] = ad
|
||||
}
|
||||
logger.Info().Int("pool_size", len(pool)).Msg("[native] resolve — pool ready")
|
||||
return pool
|
||||
}
|
||||
|
||||
// replaceStaticIndexers atomically replaces the active indexer pool.
|
||||
// Peers no longer in next have their heartbeat streams closed so the SendHeartbeat
|
||||
// goroutine stops sending to them on the next tick.
|
||||
func replaceStaticIndexers(next map[string]*pp.AddrInfo) {
|
||||
StreamMuIndexes.Lock()
|
||||
defer StreamMuIndexes.Unlock()
|
||||
for addr, ad := range next {
|
||||
StaticIndexers[addr] = ad
|
||||
}
|
||||
}
|
||||
|
||||
// clientSideConsensus challenges a candidate list to ALL configured native peers
|
||||
// in parallel. Each native replies with the candidates it trusts plus extras it
|
||||
// recommends. An indexer is confirmed when strictly more than 50% of responding
|
||||
// natives trust it.
|
||||
func clientSideConsensus(h host.Host, candidates []string) (confirmed []string, suggestions []string) {
|
||||
if len(candidates) == 0 {
|
||||
return nil, nil
|
||||
}
|
||||
|
||||
StreamNativeMu.RLock()
|
||||
peers := make([]*pp.AddrInfo, 0, len(StaticNatives))
|
||||
for _, ad := range StaticNatives {
|
||||
peers = append(peers, ad)
|
||||
}
|
||||
StreamNativeMu.RUnlock()
|
||||
|
||||
if len(peers) == 0 {
|
||||
return candidates, nil
|
||||
}
|
||||
|
||||
type nativeResult struct {
|
||||
trusted []string
|
||||
suggestions []string
|
||||
responded bool
|
||||
}
|
||||
ch := make(chan nativeResult, len(peers))
|
||||
|
||||
for _, ad := range peers {
|
||||
go func(ad *pp.AddrInfo) {
|
||||
ctx, cancel := context.WithTimeout(context.Background(), consensusQueryTimeout)
|
||||
defer cancel()
|
||||
if err := h.Connect(ctx, *ad); err != nil {
|
||||
ch <- nativeResult{}
|
||||
return
|
||||
}
|
||||
s, err := h.NewStream(ctx, ad.ID, ProtocolNativeConsensus)
|
||||
if err != nil {
|
||||
ch <- nativeResult{}
|
||||
return
|
||||
}
|
||||
defer s.Close()
|
||||
if err := json.NewEncoder(s).Encode(ConsensusRequest{Candidates: candidates}); err != nil {
|
||||
ch <- nativeResult{}
|
||||
return
|
||||
}
|
||||
var resp ConsensusResponse
|
||||
if err := json.NewDecoder(s).Decode(&resp); err != nil {
|
||||
ch <- nativeResult{}
|
||||
return
|
||||
}
|
||||
ch <- nativeResult{trusted: resp.Trusted, suggestions: resp.Suggestions, responded: true}
|
||||
}(ad)
|
||||
}
|
||||
|
||||
timer := time.NewTimer(consensusCollectTimeout)
|
||||
defer timer.Stop()
|
||||
|
||||
trustedCounts := map[string]int{}
|
||||
suggestionPool := map[string]struct{}{}
|
||||
total := 0
|
||||
collected := 0
|
||||
|
||||
collect:
|
||||
for collected < len(peers) {
|
||||
select {
|
||||
case r := <-ch:
|
||||
collected++
|
||||
if !r.responded {
|
||||
continue
|
||||
}
|
||||
total++
|
||||
seen := map[string]struct{}{}
|
||||
for _, addr := range r.trusted {
|
||||
if _, already := seen[addr]; !already {
|
||||
trustedCounts[addr]++
|
||||
seen[addr] = struct{}{}
|
||||
}
|
||||
}
|
||||
for _, addr := range r.suggestions {
|
||||
suggestionPool[addr] = struct{}{}
|
||||
}
|
||||
case <-timer.C:
|
||||
break collect
|
||||
}
|
||||
}
|
||||
|
||||
if total == 0 {
|
||||
return candidates, nil
|
||||
}
|
||||
|
||||
confirmedSet := map[string]struct{}{}
|
||||
for addr, count := range trustedCounts {
|
||||
if count*2 > total {
|
||||
confirmed = append(confirmed, addr)
|
||||
confirmedSet[addr] = struct{}{}
|
||||
}
|
||||
}
|
||||
for addr := range suggestionPool {
|
||||
if _, ok := confirmedSet[addr]; !ok {
|
||||
suggestions = append(suggestions, addr)
|
||||
}
|
||||
}
|
||||
return
|
||||
}
|
||||
|
||||
// RegisterWithNative sends a one-shot registration to each configured native indexer.
|
||||
// Should be called periodically every RecommendedHeartbeatInterval.
|
||||
func RegisterWithNative(h host.Host, nativeAddressesStr string) {
|
||||
logger := oclib.GetLogger()
|
||||
myAddr := ""
|
||||
if !strings.Contains(h.Addrs()[len(h.Addrs())-1].String(), "127.0.0.1") {
|
||||
myAddr = h.Addrs()[len(h.Addrs())-1].String() + "/p2p/" + h.ID().String()
|
||||
}
|
||||
if myAddr == "" {
|
||||
logger.Warn().Msg("RegisterWithNative: no routable address yet, skipping")
|
||||
return
|
||||
}
|
||||
reg := IndexerRegistration{
|
||||
PeerID: h.ID().String(),
|
||||
Addr: myAddr,
|
||||
}
|
||||
for _, addr := range strings.Split(nativeAddressesStr, ",") {
|
||||
addr = strings.TrimSpace(addr)
|
||||
if addr == "" {
|
||||
continue
|
||||
}
|
||||
ad, err := pp.AddrInfoFromString(addr)
|
||||
if err != nil {
|
||||
logger.Err(err).Msg("RegisterWithNative: invalid addr")
|
||||
continue
|
||||
}
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
|
||||
if err := h.Connect(ctx, *ad); err != nil {
|
||||
cancel()
|
||||
continue
|
||||
}
|
||||
s, err := h.NewStream(ctx, ad.ID, ProtocolNativeSubscription)
|
||||
cancel()
|
||||
if err != nil {
|
||||
logger.Err(err).Msg("RegisterWithNative: stream open failed")
|
||||
continue
|
||||
}
|
||||
if err := json.NewEncoder(s).Encode(reg); err != nil {
|
||||
logger.Err(err).Msg("RegisterWithNative: encode failed")
|
||||
}
|
||||
s.Close()
|
||||
}
|
||||
}
|
||||
|
||||
// EnsureNativePeers populates StaticNatives from config and starts a single
|
||||
// heartbeat goroutine toward the native mesh. Safe to call multiple times;
|
||||
// the heartbeat goroutine is started at most once (nativeMeshHeartbeatOnce).
|
||||
func EnsureNativePeers(h host.Host) {
|
||||
logger := oclib.GetLogger()
|
||||
nativeAddrs := conf.GetConfig().NativeIndexerAddresses
|
||||
if nativeAddrs == "" {
|
||||
return
|
||||
}
|
||||
StreamNativeMu.Lock()
|
||||
for _, addr := range strings.Split(nativeAddrs, ",") {
|
||||
addr = strings.TrimSpace(addr)
|
||||
if addr == "" {
|
||||
continue
|
||||
}
|
||||
ad, err := pp.AddrInfoFromString(addr)
|
||||
if err != nil {
|
||||
continue
|
||||
}
|
||||
StaticNatives[addr] = ad
|
||||
logger.Info().Str("addr", addr).Msg("native: registered peer in native mesh")
|
||||
}
|
||||
StreamNativeMu.Unlock()
|
||||
// One heartbeat goroutine iterates over all of StaticNatives on each tick;
|
||||
// starting one per address would multiply heartbeats by the native count.
|
||||
nativeMeshHeartbeatOnce.Do(func() {
|
||||
logger.Info().Msg("native: starting mesh heartbeat goroutine")
|
||||
SendHeartbeat(context.Background(), ProtocolHeartbeat,
|
||||
conf.GetConfig().Name, h, StreamNatives, StaticNatives, &StreamNativeMu, 20*time.Second)
|
||||
})
|
||||
}
|
||||
|
||||
func StartNativeRegistration(h host.Host, nativeAddressesStr string) {
|
||||
go func() {
|
||||
// Poll until a routable (non-loopback) address is available before the first
|
||||
// registration attempt. libp2p may not have discovered external addresses yet
|
||||
// at startup. Cap at 12 retries (~1 minute) so we don't spin indefinitely.
|
||||
for i := 0; i < 12; i++ {
|
||||
hasRoutable := false
|
||||
if !strings.Contains(h.Addrs()[len(h.Addrs())-1].String(), "127.0.0.1") {
|
||||
hasRoutable = true
|
||||
break
|
||||
}
|
||||
|
||||
if hasRoutable {
|
||||
break
|
||||
}
|
||||
time.Sleep(5 * time.Second)
|
||||
}
|
||||
RegisterWithNative(h, nativeAddressesStr)
|
||||
t := time.NewTicker(RecommendedHeartbeatInterval)
|
||||
defer t.Stop()
|
||||
for range t.C {
|
||||
RegisterWithNative(h, nativeAddressesStr)
|
||||
}
|
||||
}()
|
||||
}
|
||||
|
||||
// ── Lost-native replacement ───────────────────────────────────────────────────
|
||||
|
||||
const (
|
||||
// ProtocolNativeGetPeers lets a node/indexer ask a native for a random
|
||||
// selection of that native's own native contacts (to replace a dead native).
|
||||
ProtocolNativeGetPeers = "/opencloud/native/peers/1.0"
|
||||
// ProtocolIndexerGetNatives lets nodes/indexers ask a connected indexer for
|
||||
// its configured native addresses (fallback when no alive native responds).
|
||||
ProtocolIndexerGetNatives = "/opencloud/indexer/natives/1.0"
|
||||
// retryNativeInterval is how often retryLostNative polls a dead native.
|
||||
retryNativeInterval = 30 * time.Second
|
||||
)
|
||||
|
||||
// GetNativePeersRequest is sent to a native to ask for its known native contacts.
|
||||
type GetNativePeersRequest struct {
|
||||
Exclude []string `json:"exclude"`
|
||||
Count int `json:"count"`
|
||||
}
|
||||
|
||||
// GetNativePeersResponse carries native addresses returned by a native's peer list.
|
||||
type GetNativePeersResponse struct {
|
||||
Peers []string `json:"peers"`
|
||||
}
|
||||
|
||||
// GetIndexerNativesRequest is sent to an indexer to ask for its configured native addresses.
|
||||
type GetIndexerNativesRequest struct {
|
||||
Exclude []string `json:"exclude"`
|
||||
}
|
||||
|
||||
// GetIndexerNativesResponse carries native addresses returned by an indexer.
|
||||
type GetIndexerNativesResponse struct {
|
||||
Natives []string `json:"natives"`
|
||||
}
|
||||
|
||||
// nativeHeartbeatNudge allows replenishNativesFromPeers to trigger an immediate
|
||||
// native heartbeat tick after adding a replacement native to the pool.
|
||||
var nativeHeartbeatNudge = make(chan struct{}, 1)
|
||||
|
||||
// NudgeNativeHeartbeat signals the native heartbeat goroutine to fire immediately.
|
||||
func NudgeNativeHeartbeat() {
|
||||
select {
|
||||
case nativeHeartbeatNudge <- struct{}{}:
|
||||
default: // nudge already pending, skip
|
||||
}
|
||||
}
|
||||
|
||||
// replenishIndexersIfNeeded checks if the indexer pool is below the configured
|
||||
// minimum (or empty) and, if so, asks the native mesh for replacements.
|
||||
// Called whenever a native is recovered so the indexer pool is restored.
|
||||
func replenishIndexersIfNeeded(h host.Host) {
|
||||
logger := oclib.GetLogger()
|
||||
minIdx := conf.GetConfig().MinIndexer
|
||||
if minIdx < 1 {
|
||||
minIdx = 1
|
||||
}
|
||||
StreamMuIndexes.RLock()
|
||||
indexerCount := len(StaticIndexers)
|
||||
StreamMuIndexes.RUnlock()
|
||||
if indexerCount < minIdx {
|
||||
need := minIdx - indexerCount
|
||||
logger.Info().Int("need", need).Int("current", indexerCount).Msg("[native] native recovered — replenishing indexer pool")
|
||||
go replenishIndexersFromNative(h, need)
|
||||
}
|
||||
}
|
||||
|
||||
// replenishNativesFromPeers is called when the heartbeat to a native fails.
|
||||
// Flow:
|
||||
// 1. Ask other alive natives for one of their native contacts (ProtocolNativeGetPeers).
|
||||
// 2. If none respond or return a new address, ask connected indexers (ProtocolIndexerGetNatives).
|
||||
// 3. If no replacement found:
|
||||
// - remaining > 1 → ignore (enough natives remain).
|
||||
// - remaining ≤ 1 → start periodic retry (retryLostNative).
|
||||
func replenishNativesFromPeers(h host.Host, lostAddr string, proto protocol.ID) {
|
||||
if lostAddr == "" {
|
||||
return
|
||||
}
|
||||
logger := oclib.GetLogger()
|
||||
logger.Info().Str("lost", lostAddr).Msg("[native] replenish natives — start")
|
||||
|
||||
// Build exclude list: the lost addr + all currently alive natives.
|
||||
// lostAddr has already been removed from StaticNatives by doTick.
|
||||
StreamNativeMu.RLock()
|
||||
remaining := len(StaticNatives)
|
||||
exclude := make([]string, 0, remaining+1)
|
||||
exclude = append(exclude, lostAddr)
|
||||
for addr := range StaticNatives {
|
||||
exclude = append(exclude, addr)
|
||||
}
|
||||
StreamNativeMu.RUnlock()
|
||||
|
||||
logger.Info().Int("remaining", remaining).Msg("[native] replenish natives — step 1: ask alive natives for a peer")
|
||||
|
||||
// Step 1: ask other alive natives for a replacement.
|
||||
newAddr := fetchNativeFromNatives(h, exclude)
|
||||
|
||||
// Step 2: fallback — ask connected indexers for their native addresses.
|
||||
if newAddr == "" {
|
||||
logger.Info().Msg("[native] replenish natives — step 2: ask indexers for their native addresses")
|
||||
newAddr = fetchNativeFromIndexers(h, exclude)
|
||||
}
|
||||
|
||||
if newAddr != "" {
|
||||
ad, err := pp.AddrInfoFromString(newAddr)
|
||||
if err == nil {
|
||||
StreamNativeMu.Lock()
|
||||
StaticNatives[newAddr] = ad
|
||||
StreamNativeMu.Unlock()
|
||||
logger.Info().Str("new", newAddr).Msg("[native] replenish natives — replacement added, nudging heartbeat")
|
||||
NudgeNativeHeartbeat()
|
||||
replenishIndexersIfNeeded(h)
|
||||
return
|
||||
}
|
||||
}
|
||||
|
||||
// Step 3: no replacement found.
|
||||
logger.Warn().Int("remaining", remaining).Msg("[native] replenish natives — no replacement found")
|
||||
if remaining > 1 {
|
||||
logger.Info().Msg("[native] replenish natives — enough natives remain, ignoring loss")
|
||||
return
|
||||
}
|
||||
// Last (or only) native — retry periodically.
|
||||
logger.Info().Str("addr", lostAddr).Msg("[native] replenish natives — last native lost, starting periodic retry")
|
||||
go retryLostNative(h, lostAddr, proto)
|
||||
}
|
||||
|
||||
// fetchNativeFromNatives asks each alive native for one of its own native contacts
|
||||
// not in exclude. Returns the first new address found or "" if none.
|
||||
func fetchNativeFromNatives(h host.Host, exclude []string) string {
|
||||
logger := oclib.GetLogger()
|
||||
excludeSet := make(map[string]struct{}, len(exclude))
|
||||
for _, e := range exclude {
|
||||
excludeSet[e] = struct{}{}
|
||||
}
|
||||
|
||||
StreamNativeMu.RLock()
|
||||
natives := make([]*pp.AddrInfo, 0, len(StaticNatives))
|
||||
for _, ad := range StaticNatives {
|
||||
natives = append(natives, ad)
|
||||
}
|
||||
StreamNativeMu.RUnlock()
|
||||
|
||||
rand.Shuffle(len(natives), func(i, j int) { natives[i], natives[j] = natives[j], natives[i] })
|
||||
|
||||
for _, ad := range natives {
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
|
||||
if err := h.Connect(ctx, *ad); err != nil {
|
||||
cancel()
|
||||
logger.Warn().Str("native", ad.ID.String()).Err(err).Msg("[native] fetch native peers — connect failed")
|
||||
continue
|
||||
}
|
||||
s, err := h.NewStream(ctx, ad.ID, ProtocolNativeGetPeers)
|
||||
cancel()
|
||||
if err != nil {
|
||||
logger.Warn().Str("native", ad.ID.String()).Err(err).Msg("[native] fetch native peers — stream failed")
|
||||
continue
|
||||
}
|
||||
req := GetNativePeersRequest{Exclude: exclude, Count: 1}
|
||||
if encErr := json.NewEncoder(s).Encode(req); encErr != nil {
|
||||
s.Close()
|
||||
continue
|
||||
}
|
||||
var resp GetNativePeersResponse
|
||||
if decErr := json.NewDecoder(s).Decode(&resp); decErr != nil {
|
||||
s.Close()
|
||||
continue
|
||||
}
|
||||
s.Close()
|
||||
for _, peer := range resp.Peers {
|
||||
if _, excluded := excludeSet[peer]; !excluded && peer != "" {
|
||||
logger.Info().Str("from", ad.ID.String()).Str("new", peer).Msg("[native] fetch native peers — got replacement")
|
||||
return peer
|
||||
}
|
||||
}
|
||||
logger.Debug().Str("native", ad.ID.String()).Msg("[native] fetch native peers — no new native from this peer")
|
||||
}
|
||||
return ""
|
||||
}
|
||||
|
||||
// fetchNativeFromIndexers asks connected indexers for their configured native addresses,
|
||||
// returning the first one not in exclude.
|
||||
func fetchNativeFromIndexers(h host.Host, exclude []string) string {
|
||||
logger := oclib.GetLogger()
|
||||
excludeSet := make(map[string]struct{}, len(exclude))
|
||||
for _, e := range exclude {
|
||||
excludeSet[e] = struct{}{}
|
||||
}
|
||||
|
||||
StreamMuIndexes.RLock()
|
||||
indexers := make([]*pp.AddrInfo, 0, len(StaticIndexers))
|
||||
for _, ad := range StaticIndexers {
|
||||
indexers = append(indexers, ad)
|
||||
}
|
||||
StreamMuIndexes.RUnlock()
|
||||
|
||||
rand.Shuffle(len(indexers), func(i, j int) { indexers[i], indexers[j] = indexers[j], indexers[i] })
|
||||
|
||||
for _, ad := range indexers {
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
|
||||
if err := h.Connect(ctx, *ad); err != nil {
|
||||
cancel()
|
||||
continue
|
||||
}
|
||||
s, err := h.NewStream(ctx, ad.ID, ProtocolIndexerGetNatives)
|
||||
cancel()
|
||||
if err != nil {
|
||||
logger.Warn().Str("indexer", ad.ID.String()).Err(err).Msg("[native] fetch indexer natives — stream failed")
|
||||
continue
|
||||
}
|
||||
req := GetIndexerNativesRequest{Exclude: exclude}
|
||||
if encErr := json.NewEncoder(s).Encode(req); encErr != nil {
|
||||
s.Close()
|
||||
continue
|
||||
}
|
||||
var resp GetIndexerNativesResponse
|
||||
if decErr := json.NewDecoder(s).Decode(&resp); decErr != nil {
|
||||
s.Close()
|
||||
continue
|
||||
}
|
||||
s.Close()
|
||||
for _, nativeAddr := range resp.Natives {
|
||||
if _, excluded := excludeSet[nativeAddr]; !excluded && nativeAddr != "" {
|
||||
logger.Info().Str("indexer", ad.ID.String()).Str("native", nativeAddr).Msg("[native] fetch indexer natives — got native")
|
||||
return nativeAddr
|
||||
}
|
||||
}
|
||||
}
|
||||
logger.Warn().Msg("[native] fetch indexer natives — no native found from indexers")
|
||||
return ""
|
||||
}
|
||||
|
||||
// retryLostNative periodically retries connecting to a lost native address until
|
||||
// it becomes reachable again or was already restored by another path.
|
||||
func retryLostNative(h host.Host, addr string, nativeProto protocol.ID) {
|
||||
logger := oclib.GetLogger()
|
||||
logger.Info().Str("addr", addr).Msg("[native] retry — periodic retry for lost native started")
|
||||
t := time.NewTicker(retryNativeInterval)
|
||||
defer t.Stop()
|
||||
for range t.C {
|
||||
StreamNativeMu.RLock()
|
||||
_, alreadyRestored := StaticNatives[addr]
|
||||
StreamNativeMu.RUnlock()
|
||||
if alreadyRestored {
|
||||
logger.Info().Str("addr", addr).Msg("[native] retry — native already restored, stopping retry")
|
||||
return
|
||||
}
|
||||
|
||||
ad, err := pp.AddrInfoFromString(addr)
|
||||
if err != nil {
|
||||
logger.Warn().Str("addr", addr).Msg("[native] retry — invalid addr, stopping retry")
|
||||
return
|
||||
}
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
|
||||
err = h.Connect(ctx, *ad)
|
||||
cancel()
|
||||
if err != nil {
|
||||
logger.Warn().Str("addr", addr).Msg("[native] retry — still unreachable")
|
||||
continue
|
||||
}
|
||||
// Reachable again — add back to pool.
|
||||
StreamNativeMu.Lock()
|
||||
StaticNatives[addr] = ad
|
||||
StreamNativeMu.Unlock()
|
||||
logger.Info().Str("addr", addr).Msg("[native] retry — native reconnected and added back to pool")
|
||||
NudgeNativeHeartbeat()
|
||||
replenishIndexersIfNeeded(h)
|
||||
if nativeProto == ProtocolNativeGetIndexers {
|
||||
StartNativeRegistration(h, addr) // register back
|
||||
}
|
||||
return
|
||||
}
|
||||
}
|
||||
94
daemons/node/common/search_tracker.go
Normal file
94
daemons/node/common/search_tracker.go
Normal file
@@ -0,0 +1,94 @@
|
||||
package common
|
||||
|
||||
import (
|
||||
"context"
|
||||
"oc-discovery/conf"
|
||||
"strings"
|
||||
"sync"
|
||||
"time"
|
||||
|
||||
"github.com/google/uuid"
|
||||
)
|
||||
|
||||
// SearchIdleTimeout returns the configured search idle timeout (default 5s).
|
||||
func SearchIdleTimeout() time.Duration {
|
||||
if t := conf.GetConfig().SearchTimeout; t > 0 {
|
||||
return time.Duration(t) * time.Second
|
||||
}
|
||||
return 5 * time.Second
|
||||
}
|
||||
|
||||
// searchEntry holds the lifecycle state for one active search.
|
||||
type searchEntry struct {
|
||||
cancel context.CancelFunc
|
||||
timer *time.Timer
|
||||
idleTimeout time.Duration
|
||||
}
|
||||
|
||||
// SearchTracker tracks one active search per user (peer or resource).
|
||||
// Each search is keyed by a composite "user:searchID" so that a replaced
|
||||
// search's late-arriving results can be told apart from the current one.
|
||||
//
|
||||
// Typical usage:
|
||||
//
|
||||
// ctx, cancel := context.WithCancel(parent)
|
||||
// key := tracker.Register(userKey, cancel, idleTimeout)
|
||||
// defer tracker.Cancel(key)
|
||||
// // ... on each result: tracker.ResetIdle(key) + tracker.IsActive(key)
|
||||
type SearchTracker struct {
|
||||
mu sync.Mutex
|
||||
entries map[string]*searchEntry
|
||||
}
|
||||
|
||||
func NewSearchTracker() *SearchTracker {
|
||||
return &SearchTracker{entries: map[string]*searchEntry{}}
|
||||
}
|
||||
|
||||
// Register starts a new search for baseUser, cancelling any previous one.
|
||||
// Returns the composite key "baseUser:searchID" to be used as the search identifier.
|
||||
func (t *SearchTracker) Register(baseUser string, cancel context.CancelFunc, idleTimeout time.Duration) string {
|
||||
compositeKey := baseUser + ":" + uuid.New().String()
|
||||
t.mu.Lock()
|
||||
t.cancelByPrefix(baseUser)
|
||||
e := &searchEntry{cancel: cancel, idleTimeout: idleTimeout}
|
||||
e.timer = time.AfterFunc(idleTimeout, func() { t.Cancel(compositeKey) })
|
||||
t.entries[compositeKey] = e
|
||||
t.mu.Unlock()
|
||||
return compositeKey
|
||||
}
|
||||
|
||||
// Cancel cancels the search(es) matching user (bare user key or composite key).
|
||||
func (t *SearchTracker) Cancel(user string) {
|
||||
t.mu.Lock()
|
||||
t.cancelByPrefix(user)
|
||||
t.mu.Unlock()
|
||||
}
|
||||
|
||||
// ResetIdle resets the idle timer for compositeKey after a response arrives.
|
||||
func (t *SearchTracker) ResetIdle(compositeKey string) {
|
||||
t.mu.Lock()
|
||||
if e, ok := t.entries[compositeKey]; ok {
|
||||
e.timer.Reset(e.idleTimeout)
|
||||
}
|
||||
t.mu.Unlock()
|
||||
}
|
||||
|
||||
// IsActive returns true if compositeKey is still the current active search.
|
||||
func (t *SearchTracker) IsActive(compositeKey string) bool {
|
||||
t.mu.Lock()
|
||||
_, ok := t.entries[compositeKey]
|
||||
t.mu.Unlock()
|
||||
return ok
|
||||
}
|
||||
|
||||
// cancelByPrefix cancels all entries whose key equals user or starts with "user:".
|
||||
// Must be called with t.mu held.
|
||||
func (t *SearchTracker) cancelByPrefix(user string) {
|
||||
for k, e := range t.entries {
|
||||
if k == user || strings.HasPrefix(k, user+":") {
|
||||
e.timer.Stop()
|
||||
e.cancel()
|
||||
delete(t.entries, k)
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -3,6 +3,7 @@ package common
|
||||
import (
|
||||
"context"
|
||||
"fmt"
|
||||
"math/rand"
|
||||
"net"
|
||||
"time"
|
||||
|
||||
@@ -37,3 +38,31 @@ func ExtractIP(addr string) (net.IP, error) {
|
||||
}
|
||||
return ip, nil
|
||||
}
|
||||
|
||||
func GetIndexer(addrOrId string) *pp.AddrInfo {
|
||||
return Indexers.GetAddr(addrOrId)
|
||||
}
|
||||
|
||||
func GetIndexersIDs() []pp.ID {
|
||||
return Indexers.GetAddrIDs()
|
||||
}
|
||||
|
||||
func GetIndexersStr() []string {
|
||||
return Indexers.GetAddrsStr()
|
||||
}
|
||||
|
||||
func GetIndexers() []*pp.AddrInfo {
|
||||
entries := Indexers.GetAddrs()
|
||||
result := make([]*pp.AddrInfo, 0, len(entries))
|
||||
for _, e := range entries {
|
||||
result = append(result, e.Info)
|
||||
}
|
||||
return result
|
||||
}
|
||||
|
||||
func Shuffle[T any](slice []T) []T {
|
||||
rand.Shuffle(len(slice), func(i, j int) {
|
||||
slice[i], slice[j] = slice[j], slice[i]
|
||||
})
|
||||
return slice
|
||||
}
|
||||
|
||||
31
daemons/node/connection_gater.go
Normal file
31
daemons/node/connection_gater.go
Normal file
@@ -0,0 +1,31 @@
|
||||
package node
|
||||
|
||||
import (
|
||||
"github.com/libp2p/go-libp2p/core/control"
|
||||
"github.com/libp2p/go-libp2p/core/host"
|
||||
"github.com/libp2p/go-libp2p/core/network"
|
||||
pp "github.com/libp2p/go-libp2p/core/peer"
|
||||
ma "github.com/multiformats/go-multiaddr"
|
||||
)
|
||||
|
||||
// OCConnectionGater allows all connections unconditionally.
|
||||
// Peer validation (local DB + DHT by peer_id) is enforced at the stream level
|
||||
// in each handler, so ProtocolHeartbeat and ProtocolPublish — through which a
|
||||
// node first registers itself — are never blocked.
|
||||
type OCConnectionGater struct {
|
||||
host host.Host
|
||||
}
|
||||
|
||||
func newOCConnectionGater(h host.Host) *OCConnectionGater {
|
||||
return &OCConnectionGater{host: h}
|
||||
}
|
||||
|
||||
func (g *OCConnectionGater) InterceptPeerDial(_ pp.ID) bool { return true }
|
||||
func (g *OCConnectionGater) InterceptAddrDial(_ pp.ID, _ ma.Multiaddr) bool { return true }
|
||||
func (g *OCConnectionGater) InterceptAccept(_ network.ConnMultiaddrs) bool { return true }
|
||||
func (g *OCConnectionGater) InterceptSecured(_ network.Direction, _ pp.ID, _ network.ConnMultiaddrs) bool {
|
||||
return true
|
||||
}
|
||||
func (g *OCConnectionGater) InterceptUpgraded(_ network.Conn) (bool, control.DisconnectReason) {
|
||||
return true, 0
|
||||
}
|
||||
254
daemons/node/indexer/behavior.go
Normal file
254
daemons/node/indexer/behavior.go
Normal file
@@ -0,0 +1,254 @@
|
||||
package indexer
|
||||
|
||||
import (
|
||||
"errors"
|
||||
"sync"
|
||||
"time"
|
||||
|
||||
"oc-discovery/conf"
|
||||
|
||||
pp "github.com/libp2p/go-libp2p/core/peer"
|
||||
)
|
||||
|
||||
// ── defaults ──────────────────────────────────────────────────────────────────
|
||||
|
||||
const (
|
||||
defaultMaxConnPerWindow = 20
|
||||
defaultConnWindowSecs = 30
|
||||
defaultMaxHBPerMinute = 5
|
||||
defaultMaxPublishPerMin = 10
|
||||
defaultMaxGetPerMin = 50
|
||||
strikeThreshold = 3
|
||||
banDuration = 10 * time.Minute
|
||||
behaviorWindowDur = 60 * time.Second
|
||||
)
|
||||
|
||||
func cfgOr(v, def int) int {
|
||||
if v > 0 {
|
||||
return v
|
||||
}
|
||||
return def
|
||||
}
|
||||
|
||||
// ── ConnectionRateGuard ───────────────────────────────────────────────────────
|
||||
|
||||
// ConnectionRateGuard limits the number of NEW incoming connections accepted
|
||||
// within a sliding time window. It protects public indexers against coordinated
|
||||
// registration floods (Sybil bursts).
|
||||
type ConnectionRateGuard struct {
|
||||
mu sync.Mutex
|
||||
window []time.Time
|
||||
maxInWindow int
|
||||
windowDur time.Duration
|
||||
}
|
||||
|
||||
func newConnectionRateGuard() *ConnectionRateGuard {
|
||||
cfg := conf.GetConfig()
|
||||
return &ConnectionRateGuard{
|
||||
maxInWindow: cfgOr(cfg.MaxConnPerWindow, defaultMaxConnPerWindow),
|
||||
windowDur: time.Duration(cfgOr(cfg.ConnWindowSecs, defaultConnWindowSecs)) * time.Second,
|
||||
}
|
||||
}
|
||||
|
||||
// Allow returns true if a new connection may be accepted.
|
||||
// The internal window is pruned on each call so memory stays bounded.
|
||||
func (g *ConnectionRateGuard) Allow() bool {
|
||||
g.mu.Lock()
|
||||
defer g.mu.Unlock()
|
||||
now := time.Now()
|
||||
cutoff := now.Add(-g.windowDur)
|
||||
i := 0
|
||||
for i < len(g.window) && g.window[i].Before(cutoff) {
|
||||
i++
|
||||
}
|
||||
g.window = g.window[i:]
|
||||
if len(g.window) >= g.maxInWindow {
|
||||
return false
|
||||
}
|
||||
g.window = append(g.window, now)
|
||||
return true
|
||||
}
|
||||
|
||||
// ── per-node state ────────────────────────────────────────────────────────────
|
||||
|
||||
type nodeBehavior struct {
|
||||
mu sync.Mutex
|
||||
knownDID string
|
||||
hbTimes []time.Time
|
||||
pubTimes []time.Time
|
||||
getTimes []time.Time
|
||||
strikes int
|
||||
bannedUntil time.Time
|
||||
}
|
||||
|
||||
func (nb *nodeBehavior) isBanned() bool {
|
||||
return time.Now().UTC().Before(nb.bannedUntil)
|
||||
}
|
||||
|
||||
func (nb *nodeBehavior) strike(n int) {
|
||||
nb.strikes += n
|
||||
if nb.strikes >= strikeThreshold {
|
||||
nb.bannedUntil = time.Now().Add(banDuration)
|
||||
}
|
||||
}
|
||||
|
||||
func pruneWindow(ts []time.Time, dur time.Duration) []time.Time {
|
||||
cutoff := time.Now().Add(-dur)
|
||||
i := 0
|
||||
for i < len(ts) && ts[i].Before(cutoff) {
|
||||
i++
|
||||
}
|
||||
return ts[i:]
|
||||
}
|
||||
|
||||
// recordInWindow appends now to the window slice and returns false (+ adds a
|
||||
// strike) when the count exceeds max.
|
||||
func (nb *nodeBehavior) recordInWindow(ts *[]time.Time, max int) bool {
|
||||
*ts = pruneWindow(*ts, behaviorWindowDur)
|
||||
if len(*ts) >= max {
|
||||
nb.strike(1)
|
||||
return false
|
||||
}
|
||||
*ts = append(*ts, time.Now())
|
||||
return true
|
||||
}
|
||||
|
||||
// ── NodeBehaviorTracker ───────────────────────────────────────────────────────
|
||||
|
||||
// NodeBehaviorTracker is the indexer-side per-node compliance monitor.
|
||||
// It is entirely local: no state is shared with other indexers.
|
||||
type NodeBehaviorTracker struct {
|
||||
mu sync.RWMutex
|
||||
nodes map[pp.ID]*nodeBehavior
|
||||
|
||||
maxHB int
|
||||
maxPub int
|
||||
maxGet int
|
||||
}
|
||||
|
||||
func newNodeBehaviorTracker() *NodeBehaviorTracker {
|
||||
cfg := conf.GetConfig()
|
||||
return &NodeBehaviorTracker{
|
||||
nodes: make(map[pp.ID]*nodeBehavior),
|
||||
maxHB: cfgOr(cfg.MaxHBPerMinute, defaultMaxHBPerMinute),
|
||||
maxPub: cfgOr(cfg.MaxPublishPerMinute, defaultMaxPublishPerMin),
|
||||
maxGet: cfgOr(cfg.MaxGetPerMinute, defaultMaxGetPerMin),
|
||||
}
|
||||
}
|
||||
|
||||
func (t *NodeBehaviorTracker) get(pid pp.ID) *nodeBehavior {
|
||||
t.mu.RLock()
|
||||
nb := t.nodes[pid]
|
||||
t.mu.RUnlock()
|
||||
if nb != nil {
|
||||
return nb
|
||||
}
|
||||
t.mu.Lock()
|
||||
defer t.mu.Unlock()
|
||||
if nb = t.nodes[pid]; nb == nil {
|
||||
nb = &nodeBehavior{}
|
||||
t.nodes[pid] = nb
|
||||
}
|
||||
return nb
|
||||
}
|
||||
|
||||
// IsBanned returns true when the peer is in an active ban period.
|
||||
func (t *NodeBehaviorTracker) IsBanned(pid pp.ID) bool {
|
||||
nb := t.get(pid)
|
||||
nb.mu.Lock()
|
||||
defer nb.mu.Unlock()
|
||||
return nb.isBanned()
|
||||
}
|
||||
|
||||
// RecordHeartbeat checks heartbeat cadence. Returns an error if the peer is
|
||||
// flooding (too many heartbeats in the sliding window).
|
||||
func (t *NodeBehaviorTracker) RecordHeartbeat(pid pp.ID) error {
|
||||
nb := t.get(pid)
|
||||
nb.mu.Lock()
|
||||
defer nb.mu.Unlock()
|
||||
if nb.isBanned() {
|
||||
return errors.New("peer is banned")
|
||||
}
|
||||
if !nb.recordInWindow(&nb.hbTimes, t.maxHB) {
|
||||
return errors.New("heartbeat flood detected")
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
// CheckIdentity verifies that the DID associated with a PeerID never changes.
|
||||
// A DID change is a strong signal of identity spoofing.
|
||||
func (t *NodeBehaviorTracker) CheckIdentity(pid pp.ID, did string) error {
|
||||
if did == "" {
|
||||
return nil
|
||||
}
|
||||
nb := t.get(pid)
|
||||
nb.mu.Lock()
|
||||
defer nb.mu.Unlock()
|
||||
if nb.knownDID == "" {
|
||||
nb.knownDID = did
|
||||
return nil
|
||||
}
|
||||
if nb.knownDID != did {
|
||||
nb.strike(2) // identity change is severe
|
||||
return errors.New("DID mismatch for peer " + pid.String())
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
// RecordBadSignature registers a cryptographic verification failure.
|
||||
// A single bad signature is worth 2 strikes (near-immediate ban).
|
||||
func (t *NodeBehaviorTracker) RecordBadSignature(pid pp.ID) {
|
||||
nb := t.get(pid)
|
||||
nb.mu.Lock()
|
||||
defer nb.mu.Unlock()
|
||||
nb.strike(2)
|
||||
}
|
||||
|
||||
// RecordPublish checks publish volume. Returns an error if the peer is
|
||||
// sending too many publish requests.
|
||||
func (t *NodeBehaviorTracker) RecordPublish(pid pp.ID) error {
|
||||
nb := t.get(pid)
|
||||
nb.mu.Lock()
|
||||
defer nb.mu.Unlock()
|
||||
if nb.isBanned() {
|
||||
return errors.New("peer is banned")
|
||||
}
|
||||
if !nb.recordInWindow(&nb.pubTimes, t.maxPub) {
|
||||
return errors.New("publish volume exceeded")
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
// RecordGet checks get volume. Returns an error if the peer is enumerating
|
||||
// the DHT at an abnormal rate.
|
||||
func (t *NodeBehaviorTracker) RecordGet(pid pp.ID) error {
|
||||
nb := t.get(pid)
|
||||
nb.mu.Lock()
|
||||
defer nb.mu.Unlock()
|
||||
if nb.isBanned() {
|
||||
return errors.New("peer is banned")
|
||||
}
|
||||
if !nb.recordInWindow(&nb.getTimes, t.maxGet) {
|
||||
return errors.New("get volume exceeded")
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
// Cleanup removes the behavior entry for a peer if it is not currently banned.
|
||||
// Called when the peer is evicted from StreamRecords by the GC.
|
||||
func (t *NodeBehaviorTracker) Cleanup(pid pp.ID) {
|
||||
t.mu.RLock()
|
||||
nb := t.nodes[pid]
|
||||
t.mu.RUnlock()
|
||||
if nb == nil {
|
||||
return
|
||||
}
|
||||
nb.mu.Lock()
|
||||
banned := nb.isBanned()
|
||||
nb.mu.Unlock()
|
||||
if !banned {
|
||||
t.mu.Lock()
|
||||
delete(t.nodes, pid)
|
||||
t.mu.Unlock()
|
||||
}
|
||||
}
|
||||
@@ -5,18 +5,20 @@ import (
|
||||
"encoding/base64"
|
||||
"encoding/json"
|
||||
"errors"
|
||||
"oc-discovery/conf"
|
||||
"io"
|
||||
"math/rand"
|
||||
"oc-discovery/daemons/node/common"
|
||||
"strings"
|
||||
"time"
|
||||
|
||||
oclib "cloud.o-forge.io/core/oc-lib"
|
||||
"cloud.o-forge.io/core/oc-lib/dbs"
|
||||
pp "cloud.o-forge.io/core/oc-lib/models/peer"
|
||||
"cloud.o-forge.io/core/oc-lib/models/utils"
|
||||
"cloud.o-forge.io/core/oc-lib/tools"
|
||||
"github.com/libp2p/go-libp2p/core/crypto"
|
||||
"github.com/libp2p/go-libp2p/core/network"
|
||||
"github.com/libp2p/go-libp2p/core/peer"
|
||||
lpp "github.com/libp2p/go-libp2p/core/peer"
|
||||
)
|
||||
|
||||
type PeerRecordPayload struct {
|
||||
@@ -83,30 +85,15 @@ func (pr *PeerRecord) ExtractPeer(ourkey string, key string, pubKey crypto.PubKe
|
||||
NATSAddress: pr.NATSAddress,
|
||||
WalletAddress: pr.WalletAddress,
|
||||
}
|
||||
b, err := json.Marshal(p)
|
||||
if err != nil {
|
||||
return pp.SELF == p.Relation, nil, err
|
||||
}
|
||||
|
||||
if time.Now().UTC().After(pr.ExpiryDate) {
|
||||
return pp.SELF == p.Relation, nil, errors.New("peer " + key + " is offline")
|
||||
}
|
||||
go tools.NewNATSCaller().SetNATSPub(tools.CREATE_RESOURCE, tools.NATSResponse{
|
||||
FromApp: "oc-discovery",
|
||||
Datatype: tools.PEER,
|
||||
Method: int(tools.CREATE_RESOURCE),
|
||||
SearchAttr: "peer_id",
|
||||
Payload: b,
|
||||
})
|
||||
|
||||
return pp.SELF == p.Relation, p, nil
|
||||
}
|
||||
|
||||
type GetValue struct {
|
||||
Key string `json:"key"`
|
||||
PeerID peer.ID `json:"peer_id"`
|
||||
Name string `json:"name,omitempty"`
|
||||
Search bool `json:"search,omitempty"`
|
||||
Key string `json:"key"`
|
||||
PeerID string `json:"peer_id,omitempty"`
|
||||
}
|
||||
|
||||
type GetResponse struct {
|
||||
@@ -118,233 +105,297 @@ func (ix *IndexerService) genKey(did string) string {
|
||||
return "/node/" + did
|
||||
}
|
||||
|
||||
func (ix *IndexerService) genNameKey(name string) string {
|
||||
return "/name/" + name
|
||||
}
|
||||
|
||||
func (ix *IndexerService) genPIDKey(peerID string) string {
|
||||
return "/pid/" + peerID
|
||||
}
|
||||
|
||||
// isPeerKnown is the stream-level gate: returns true if pid is allowed.
|
||||
// Check order (fast → slow):
|
||||
// 1. In-memory stream records — currently heartbeating to this indexer.
|
||||
// 2. Local DB by peer_id — known peer, blacklist enforced here.
|
||||
// 3. DHT /pid/{peerID} → /node/{DID} — registered on any indexer.
|
||||
//
|
||||
// ProtocolHeartbeat and ProtocolPublish handlers do NOT call this — they are
|
||||
// the streams through which a node first makes itself known.
|
||||
func (ix *IndexerService) isPeerKnown(pid lpp.ID) bool {
|
||||
// 1. Fast path: active heartbeat session.
|
||||
ix.StreamMU.RLock()
|
||||
_, active := ix.StreamRecords[common.ProtocolHeartbeat][pid]
|
||||
ix.StreamMU.RUnlock()
|
||||
if active {
|
||||
return true
|
||||
}
|
||||
// 2. Local DB: known peer (handles blacklist).
|
||||
access := oclib.NewRequestAdmin(oclib.LibDataEnum(oclib.PEER), nil)
|
||||
results := access.Search(&dbs.Filters{
|
||||
And: map[string][]dbs.Filter{
|
||||
"peer_id": {{Operator: dbs.EQUAL.String(), Value: pid.String()}},
|
||||
},
|
||||
}, pid.String(), false)
|
||||
for _, item := range results.Data {
|
||||
p, ok := item.(*pp.Peer)
|
||||
if !ok || p.PeerID != pid.String() {
|
||||
continue
|
||||
}
|
||||
return p.Relation != pp.BLACKLIST
|
||||
}
|
||||
// 3. DHT lookup by peer_id.
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 3*time.Second)
|
||||
did, err := ix.DHT.GetValue(ctx, ix.genPIDKey(pid.String()))
|
||||
cancel()
|
||||
if err != nil || len(did) == 0 {
|
||||
return false
|
||||
}
|
||||
ctx2, cancel2 := context.WithTimeout(context.Background(), 3*time.Second)
|
||||
_, err = ix.DHT.GetValue(ctx2, ix.genKey(string(did)))
|
||||
cancel2()
|
||||
return err == nil
|
||||
}
|
||||
|
||||
func (ix *IndexerService) initNodeHandler() {
|
||||
logger := oclib.GetLogger()
|
||||
logger.Info().Msg("Init Node Handler")
|
||||
|
||||
// Each heartbeat from a node carries a freshly signed PeerRecord.
|
||||
// Republish it to the DHT so the record never expires as long as the node
|
||||
// is alive — no separate publish stream needed from the node side.
|
||||
ix.AfterHeartbeat = func(pid peer.ID) {
|
||||
ctx1, cancel1 := context.WithTimeout(context.Background(), 10*time.Second)
|
||||
defer cancel1()
|
||||
res, err := ix.DHT.GetValue(ctx1, ix.genPIDKey(pid.String()))
|
||||
if err != nil {
|
||||
logger.Warn().Err(err)
|
||||
return
|
||||
}
|
||||
did := string(res)
|
||||
ctx2, cancel2 := context.WithTimeout(context.Background(), 10*time.Second)
|
||||
defer cancel2()
|
||||
res, err = ix.DHT.GetValue(ctx2, ix.genKey(did))
|
||||
if err != nil {
|
||||
logger.Warn().Err(err)
|
||||
return
|
||||
}
|
||||
ix.AfterHeartbeat = func(hb *common.Heartbeat) {
|
||||
// Priority 1: use the fresh signed PeerRecord embedded in the heartbeat.
|
||||
// Each heartbeat tick, the node re-signs with ExpiryDate = now+2min, so
|
||||
// this record is always fresh. Fetching from DHT would give a stale expiry.
|
||||
var rec PeerRecord
|
||||
if err := json.Unmarshal(res, &rec); err != nil {
|
||||
logger.Warn().Err(err).Str("peer", pid.String()).Msg("indexer: heartbeat record unmarshal failed")
|
||||
return
|
||||
if len(hb.Record) > 0 {
|
||||
if err := json.Unmarshal(hb.Record, &rec); err != nil {
|
||||
logger.Warn().Err(err).Msg("indexer: heartbeat embedded record unmarshal failed")
|
||||
return
|
||||
}
|
||||
} else {
|
||||
// Fallback: node didn't embed a record yet (first heartbeat before claimInfo).
|
||||
// Fetch from DHT using the DID resolved by HandleHeartbeat.
|
||||
ctx2, cancel2 := context.WithTimeout(context.Background(), 10*time.Second)
|
||||
res, err := ix.DHT.GetValue(ctx2, ix.genKey(hb.DID))
|
||||
cancel2()
|
||||
if err != nil {
|
||||
logger.Warn().Err(err).Str("did", hb.DID).Msg("indexer: DHT fetch for refresh failed")
|
||||
return
|
||||
}
|
||||
if err := json.Unmarshal(res, &rec); err != nil {
|
||||
logger.Warn().Err(err).Str("did", hb.DID).Msg("indexer: heartbeat record unmarshal failed")
|
||||
return
|
||||
}
|
||||
}
|
||||
if _, err := rec.Verify(); err != nil {
|
||||
logger.Warn().Err(err).Str("peer", pid.String()).Msg("indexer: heartbeat record signature invalid")
|
||||
logger.Warn().Err(err).Str("did", rec.DID).Msg("indexer: heartbeat record signature invalid")
|
||||
return
|
||||
}
|
||||
// Keep StreamRecord.Record in sync so BuildHeartbeatResponse always
|
||||
// sees a populated PeerRecord (Name, DID, etc.) regardless of whether
|
||||
// handleNodePublish ran before or after the heartbeat stream was opened.
|
||||
if pid, err := lpp.Decode(rec.PeerID); err == nil {
|
||||
ix.StreamMU.Lock()
|
||||
if srec, ok := ix.StreamRecords[common.ProtocolHeartbeat][pid]; ok {
|
||||
srec.Record = rec
|
||||
}
|
||||
ix.StreamMU.Unlock()
|
||||
}
|
||||
data, err := json.Marshal(rec)
|
||||
if err != nil {
|
||||
return
|
||||
}
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
|
||||
defer cancel()
|
||||
logger.Info().Msg("REFRESH PutValue " + ix.genKey(rec.DID))
|
||||
if err := ix.DHT.PutValue(ctx, ix.genKey(rec.DID), data); err != nil {
|
||||
logger.Warn().Err(err).Str("did", rec.DID).Msg("indexer: DHT refresh failed")
|
||||
return
|
||||
}
|
||||
if rec.Name != "" {
|
||||
ctx2, cancel2 := context.WithTimeout(context.Background(), 10*time.Second)
|
||||
ix.DHT.PutValue(ctx2, ix.genNameKey(rec.Name), []byte(rec.DID))
|
||||
cancel2()
|
||||
logger.Warn().Err(err).Str("did", rec.DID).Msg("indexer: DHT refresh /node/ failed")
|
||||
}
|
||||
cancel()
|
||||
// /pid/ is written unconditionally — the gater queries by PeerID and this
|
||||
// index must stay fresh regardless of whether the /node/ write succeeded.
|
||||
if rec.PeerID != "" {
|
||||
ctx3, cancel3 := context.WithTimeout(context.Background(), 10*time.Second)
|
||||
ix.DHT.PutValue(ctx3, ix.genPIDKey(rec.PeerID), []byte(rec.DID))
|
||||
cancel3()
|
||||
ctx2, cancel2 := context.WithTimeout(context.Background(), 10*time.Second)
|
||||
if err := ix.DHT.PutValue(ctx2, ix.genPIDKey(rec.PeerID), []byte(rec.DID)); err != nil {
|
||||
logger.Warn().Err(err).Str("pid", rec.PeerID).Msg("indexer: DHT refresh /pid/ failed")
|
||||
}
|
||||
cancel2()
|
||||
}
|
||||
}
|
||||
ix.Host.SetStreamHandler(common.ProtocolHeartbeat, ix.HandleHeartbeat)
|
||||
ix.Host.SetStreamHandler(common.ProtocolPublish, ix.handleNodePublish)
|
||||
ix.Host.SetStreamHandler(common.ProtocolGet, ix.handleNodeGet)
|
||||
ix.Host.SetStreamHandler(common.ProtocolIndexerGetNatives, ix.handleGetNatives)
|
||||
ix.Host.SetStreamHandler(common.ProtocolIndexerCandidates, ix.handleCandidateRequest)
|
||||
ix.initSearchHandlers()
|
||||
}
|
||||
|
||||
// handleCandidateRequest responds to a node's consensus candidate request.
|
||||
// Returns a random sample of indexers from the local DHT cache.
|
||||
func (ix *IndexerService) handleCandidateRequest(s network.Stream) {
|
||||
defer s.Close()
|
||||
if !ix.isPeerKnown(s.Conn().RemotePeer()) {
|
||||
logger := oclib.GetLogger()
|
||||
logger.Warn().Str("peer", s.Conn().RemotePeer().String()).Msg("[candidates] unknown peer, rejecting stream")
|
||||
s.Reset()
|
||||
return
|
||||
}
|
||||
s.SetDeadline(time.Now().Add(5 * time.Second))
|
||||
var req common.IndexerCandidatesRequest
|
||||
if err := json.NewDecoder(s).Decode(&req); err != nil {
|
||||
return
|
||||
}
|
||||
if req.Count <= 0 || req.Count > 10 {
|
||||
req.Count = 3
|
||||
}
|
||||
ix.dhtCacheMu.RLock()
|
||||
cache := make([]dhtCacheEntry, len(ix.dhtCache))
|
||||
copy(cache, ix.dhtCache)
|
||||
ix.dhtCacheMu.RUnlock()
|
||||
|
||||
// Shuffle for randomness: each voter offers a different subset.
|
||||
rand.Shuffle(len(cache), func(i, j int) { cache[i], cache[j] = cache[j], cache[i] })
|
||||
candidates := make([]lpp.AddrInfo, 0, req.Count)
|
||||
for _, e := range cache {
|
||||
if len(candidates) >= req.Count {
|
||||
break
|
||||
}
|
||||
candidates = append(candidates, e.AI)
|
||||
}
|
||||
json.NewEncoder(s).Encode(common.IndexerCandidatesResponse{Candidates: candidates})
|
||||
}
|
||||
|
||||
func (ix *IndexerService) handleNodePublish(s network.Stream) {
|
||||
defer s.Close()
|
||||
logger := oclib.GetLogger()
|
||||
remotePeer := s.Conn().RemotePeer()
|
||||
if err := ix.behavior.RecordPublish(remotePeer); err != nil {
|
||||
logger.Warn().Err(err).Str("peer", remotePeer.String()).Msg("publish refused")
|
||||
s.Reset()
|
||||
return
|
||||
}
|
||||
for {
|
||||
var rec PeerRecord
|
||||
if err := json.NewDecoder(s).Decode(&rec); err != nil {
|
||||
logger.Err(err)
|
||||
if errors.Is(err, io.EOF) || errors.Is(err, io.ErrUnexpectedEOF) ||
|
||||
strings.Contains(err.Error(), "reset") ||
|
||||
strings.Contains(err.Error(), "closed") ||
|
||||
strings.Contains(err.Error(), "too many connections") {
|
||||
return
|
||||
}
|
||||
continue
|
||||
}
|
||||
if _, err := rec.Verify(); err != nil {
|
||||
ix.behavior.RecordBadSignature(remotePeer)
|
||||
logger.Warn().Err(err).Str("peer", remotePeer.String()).Msg("bad signature on publish")
|
||||
return
|
||||
}
|
||||
if err := ix.behavior.CheckIdentity(remotePeer, rec.DID); err != nil {
|
||||
logger.Warn().Err(err).Msg("identity mismatch on publish")
|
||||
s.Reset()
|
||||
return
|
||||
}
|
||||
if rec.PeerID == "" || rec.ExpiryDate.Before(time.Now().UTC()) {
|
||||
logger.Err(errors.New(rec.PeerID + " is expired."))
|
||||
return
|
||||
}
|
||||
pid, err := lpp.Decode(rec.PeerID)
|
||||
if err != nil {
|
||||
return
|
||||
}
|
||||
|
||||
var rec PeerRecord
|
||||
if err := json.NewDecoder(s).Decode(&rec); err != nil {
|
||||
logger.Err(err)
|
||||
return
|
||||
}
|
||||
if _, err := rec.Verify(); err != nil {
|
||||
logger.Err(err)
|
||||
return
|
||||
}
|
||||
if rec.PeerID == "" || rec.ExpiryDate.Before(time.Now().UTC()) {
|
||||
logger.Err(errors.New(rec.PeerID + " is expired."))
|
||||
return
|
||||
}
|
||||
pid, err := peer.Decode(rec.PeerID)
|
||||
if err != nil {
|
||||
return
|
||||
}
|
||||
ix.StreamMU.Lock()
|
||||
defer ix.StreamMU.Unlock()
|
||||
if ix.StreamRecords[common.ProtocolHeartbeat] == nil {
|
||||
ix.StreamRecords[common.ProtocolHeartbeat] = map[lpp.ID]*common.StreamRecord[PeerRecord]{}
|
||||
}
|
||||
streams := ix.StreamRecords[common.ProtocolHeartbeat]
|
||||
if srec, ok := streams[pid]; ok {
|
||||
srec.DID = rec.DID
|
||||
srec.Record = rec
|
||||
srec.HeartbeatStream.UptimeTracker.LastSeen = time.Now().UTC()
|
||||
}
|
||||
|
||||
ix.StreamMU.Lock()
|
||||
defer ix.StreamMU.Unlock()
|
||||
if ix.StreamRecords[common.ProtocolHeartbeat] == nil {
|
||||
ix.StreamRecords[common.ProtocolHeartbeat] = map[peer.ID]*common.StreamRecord[PeerRecord]{}
|
||||
}
|
||||
streams := ix.StreamRecords[common.ProtocolHeartbeat]
|
||||
if srec, ok := streams[pid]; ok {
|
||||
srec.DID = rec.DID
|
||||
srec.Record = rec
|
||||
srec.HeartbeatStream.UptimeTracker.LastSeen = time.Now().UTC()
|
||||
}
|
||||
|
||||
key := ix.genKey(rec.DID)
|
||||
data, err := json.Marshal(rec)
|
||||
if err != nil {
|
||||
logger.Err(err)
|
||||
return
|
||||
}
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
|
||||
if err := ix.DHT.PutValue(ctx, key, data); err != nil {
|
||||
logger.Err(err)
|
||||
key := ix.genKey(rec.DID)
|
||||
data, err := json.Marshal(rec)
|
||||
if err != nil {
|
||||
logger.Err(err)
|
||||
return
|
||||
}
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
|
||||
if err := ix.DHT.PutValue(ctx, key, data); err != nil {
|
||||
logger.Err(err)
|
||||
cancel()
|
||||
return
|
||||
}
|
||||
cancel()
|
||||
return
|
||||
}
|
||||
cancel()
|
||||
|
||||
// Secondary index: /name/<name> → DID, so peers can resolve by human-readable name.
|
||||
if rec.Name != "" {
|
||||
ctx2, cancel2 := context.WithTimeout(context.Background(), 10*time.Second)
|
||||
if err := ix.DHT.PutValue(ctx2, ix.genNameKey(rec.Name), []byte(rec.DID)); err != nil {
|
||||
logger.Err(err).Str("name", rec.Name).Msg("indexer: failed to write name index")
|
||||
// Secondary index: /pid/<peerID> → DID, so peers can resolve by libp2p PeerID.
|
||||
if rec.PeerID != "" {
|
||||
ctx2, cancel2 := context.WithTimeout(context.Background(), 10*time.Second)
|
||||
if err := ix.DHT.PutValue(ctx2, ix.genPIDKey(rec.PeerID), []byte(rec.DID)); err != nil {
|
||||
logger.Err(err).Str("pid", rec.PeerID).Msg("indexer: failed to write pid index")
|
||||
}
|
||||
cancel2()
|
||||
}
|
||||
cancel2()
|
||||
}
|
||||
// Secondary index: /pid/<peerID> → DID, so peers can resolve by libp2p PeerID.
|
||||
if rec.PeerID != "" {
|
||||
ctx3, cancel3 := context.WithTimeout(context.Background(), 10*time.Second)
|
||||
if err := ix.DHT.PutValue(ctx3, ix.genPIDKey(rec.PeerID), []byte(rec.DID)); err != nil {
|
||||
logger.Err(err).Str("pid", rec.PeerID).Msg("indexer: failed to write pid index")
|
||||
}
|
||||
cancel3()
|
||||
return
|
||||
}
|
||||
}
|
||||
|
||||
func (ix *IndexerService) handleNodeGet(s network.Stream) {
|
||||
|
||||
defer s.Close()
|
||||
logger := oclib.GetLogger()
|
||||
|
||||
var req GetValue
|
||||
if err := json.NewDecoder(s).Decode(&req); err != nil {
|
||||
logger.Err(err)
|
||||
remotePeer := s.Conn().RemotePeer()
|
||||
if !ix.isPeerKnown(remotePeer) {
|
||||
logger.Warn().Str("peer", remotePeer.String()).Msg("[get] unknown peer, rejecting stream")
|
||||
s.Reset()
|
||||
return
|
||||
}
|
||||
if err := ix.behavior.RecordGet(remotePeer); err != nil {
|
||||
logger.Warn().Err(err).Str("peer", remotePeer.String()).Msg("get refused")
|
||||
s.Reset()
|
||||
return
|
||||
}
|
||||
for {
|
||||
var req GetValue
|
||||
if err := json.NewDecoder(s).Decode(&req); err != nil {
|
||||
if errors.Is(err, io.EOF) || errors.Is(err, io.ErrUnexpectedEOF) ||
|
||||
strings.Contains(err.Error(), "reset") ||
|
||||
strings.Contains(err.Error(), "closed") ||
|
||||
strings.Contains(err.Error(), "too many connections") {
|
||||
return
|
||||
}
|
||||
logger.Err(err)
|
||||
continue
|
||||
}
|
||||
|
||||
resp := GetResponse{Found: false, Records: map[string]PeerRecord{}}
|
||||
resp := GetResponse{Found: false, Records: map[string]PeerRecord{}}
|
||||
|
||||
keys := []string{}
|
||||
// Name substring search — scan in-memory connected nodes first, then DHT exact match.
|
||||
if req.Name != "" {
|
||||
if req.Search {
|
||||
for _, did := range ix.LookupNameIndex(strings.ToLower(req.Name)) {
|
||||
keys = append(keys, did)
|
||||
// Resolve DID key: by PeerID (secondary /pid/ index) or direct DID key.
|
||||
var key string
|
||||
if req.PeerID != "" {
|
||||
pidCtx, pidCancel := context.WithTimeout(context.Background(), 5*time.Second)
|
||||
did, err := ix.DHT.GetValue(pidCtx, ix.genPIDKey(req.PeerID))
|
||||
pidCancel()
|
||||
if err == nil {
|
||||
key = string(did)
|
||||
}
|
||||
} else {
|
||||
// 2. DHT exact-name lookup: covers nodes that published but aren't currently connected.
|
||||
nameCtx, nameCancel := context.WithTimeout(context.Background(), 5*time.Second)
|
||||
if ch, err := ix.DHT.SearchValue(nameCtx, ix.genNameKey(req.Name)); err == nil {
|
||||
for did := range ch {
|
||||
keys = append(keys, string(did))
|
||||
break
|
||||
}
|
||||
}
|
||||
nameCancel()
|
||||
key = req.Key
|
||||
}
|
||||
} else if req.PeerID != "" {
|
||||
pidCtx, pidCancel := context.WithTimeout(context.Background(), 5*time.Second)
|
||||
if did, err := ix.DHT.GetValue(pidCtx, ix.genPIDKey(req.PeerID.String())); err == nil {
|
||||
keys = append(keys, string(did))
|
||||
}
|
||||
pidCancel()
|
||||
} else {
|
||||
keys = append(keys, req.Key)
|
||||
}
|
||||
|
||||
// DHT record fetch by DID key (covers exact-name and PeerID paths).
|
||||
if len(keys) > 0 {
|
||||
for _, k := range keys {
|
||||
if key != "" {
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
|
||||
c, err := ix.DHT.GetValue(ctx, ix.genKey(k))
|
||||
c, err := ix.DHT.GetValue(ctx, ix.genKey(key))
|
||||
cancel()
|
||||
if err == nil {
|
||||
var rec PeerRecord
|
||||
if json.Unmarshal(c, &rec) == nil {
|
||||
// Filter by PeerID only when one was explicitly specified.
|
||||
if req.PeerID == "" || rec.PeerID == req.PeerID.String() {
|
||||
resp.Records[rec.PeerID] = rec
|
||||
}
|
||||
resp.Records[rec.PeerID] = rec
|
||||
}
|
||||
} else if req.Name == "" && req.PeerID == "" {
|
||||
logger.Err(err).Msg("Failed to fetch PeerRecord from DHT " + req.Key)
|
||||
} else {
|
||||
logger.Err(err).Msg("Failed to fetch PeerRecord from DHT " + key)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
resp.Found = len(resp.Records) > 0
|
||||
_ = json.NewEncoder(s).Encode(resp)
|
||||
}
|
||||
|
||||
// handleGetNatives returns this indexer's configured native addresses,
|
||||
// excluding any in the request's Exclude list.
|
||||
func (ix *IndexerService) handleGetNatives(s network.Stream) {
|
||||
defer s.Close()
|
||||
logger := oclib.GetLogger()
|
||||
|
||||
var req common.GetIndexerNativesRequest
|
||||
if err := json.NewDecoder(s).Decode(&req); err != nil {
|
||||
logger.Err(err).Msg("indexer get natives: decode")
|
||||
return
|
||||
}
|
||||
|
||||
excludeSet := make(map[string]struct{}, len(req.Exclude))
|
||||
for _, e := range req.Exclude {
|
||||
excludeSet[e] = struct{}{}
|
||||
}
|
||||
|
||||
resp := common.GetIndexerNativesResponse{}
|
||||
for _, addr := range strings.Split(conf.GetConfig().NativeIndexerAddresses, ",") {
|
||||
addr = strings.TrimSpace(addr)
|
||||
if addr == "" {
|
||||
continue
|
||||
}
|
||||
if _, excluded := excludeSet[addr]; !excluded {
|
||||
resp.Natives = append(resp.Natives, addr)
|
||||
}
|
||||
}
|
||||
|
||||
if err := json.NewEncoder(s).Encode(resp); err != nil {
|
||||
logger.Err(err).Msg("indexer get natives: encode response")
|
||||
resp.Found = len(resp.Records) > 0
|
||||
_ = json.NewEncoder(s).Encode(resp)
|
||||
break
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -1,168 +0,0 @@
|
||||
package indexer
|
||||
|
||||
import (
|
||||
"context"
|
||||
"encoding/json"
|
||||
"strings"
|
||||
"sync"
|
||||
"time"
|
||||
|
||||
"oc-discovery/daemons/node/common"
|
||||
|
||||
oclib "cloud.o-forge.io/core/oc-lib"
|
||||
pubsub "github.com/libp2p/go-libp2p-pubsub"
|
||||
pp "github.com/libp2p/go-libp2p/core/peer"
|
||||
)
|
||||
|
||||
// TopicNameIndex is the GossipSub topic shared by regular indexers to exchange
|
||||
// add/delete events for the distributed name→peerID mapping.
|
||||
const TopicNameIndex = "oc-name-index"
|
||||
|
||||
// nameIndexDedupWindow suppresses re-emission of the same (action, name, peerID)
|
||||
// tuple within this window, reducing duplicate events when a node is registered
|
||||
// with multiple indexers simultaneously.
|
||||
const nameIndexDedupWindow = 30 * time.Second
|
||||
|
||||
// NameIndexAction indicates whether a name mapping is being added or removed.
|
||||
type NameIndexAction string
|
||||
|
||||
const (
|
||||
NameIndexAdd NameIndexAction = "add"
|
||||
NameIndexDelete NameIndexAction = "delete"
|
||||
)
|
||||
|
||||
// NameIndexEvent is published on TopicNameIndex by each indexer when a node
|
||||
// registers (add) or is evicted by the GC (delete).
|
||||
type NameIndexEvent struct {
|
||||
Action NameIndexAction `json:"action"`
|
||||
Name string `json:"name"`
|
||||
PeerID string `json:"peer_id"`
|
||||
DID string `json:"did"`
|
||||
}
|
||||
|
||||
// nameIndexState holds the local in-memory name index and the sender-side
|
||||
// deduplication tracker.
|
||||
type nameIndexState struct {
|
||||
// index: name → peerID → DID, built from events received from all indexers.
|
||||
index map[string]map[string]string
|
||||
indexMu sync.RWMutex
|
||||
|
||||
// emitted tracks the last emission time for each (action, name, peerID) key
|
||||
// to suppress duplicates within nameIndexDedupWindow.
|
||||
emitted map[string]time.Time
|
||||
emittedMu sync.Mutex
|
||||
}
|
||||
|
||||
// shouldEmit returns true if the (action, name, peerID) tuple has not been
|
||||
// emitted within nameIndexDedupWindow, updating the tracker if so.
|
||||
func (s *nameIndexState) shouldEmit(action NameIndexAction, name, peerID string) bool {
|
||||
key := string(action) + ":" + name + ":" + peerID
|
||||
s.emittedMu.Lock()
|
||||
defer s.emittedMu.Unlock()
|
||||
if t, ok := s.emitted[key]; ok && time.Since(t) < nameIndexDedupWindow {
|
||||
return false
|
||||
}
|
||||
s.emitted[key] = time.Now()
|
||||
return true
|
||||
}
|
||||
|
||||
// onEvent applies a received NameIndexEvent to the local index.
|
||||
// "add" inserts/updates the mapping; "delete" removes it.
|
||||
// Operations are idempotent — duplicate events from multiple indexers are harmless.
|
||||
func (s *nameIndexState) onEvent(evt NameIndexEvent) {
|
||||
if evt.Name == "" || evt.PeerID == "" {
|
||||
return
|
||||
}
|
||||
s.indexMu.Lock()
|
||||
defer s.indexMu.Unlock()
|
||||
switch evt.Action {
|
||||
case NameIndexAdd:
|
||||
if s.index[evt.Name] == nil {
|
||||
s.index[evt.Name] = map[string]string{}
|
||||
}
|
||||
s.index[evt.Name][evt.PeerID] = evt.DID
|
||||
case NameIndexDelete:
|
||||
if s.index[evt.Name] != nil {
|
||||
delete(s.index[evt.Name], evt.PeerID)
|
||||
if len(s.index[evt.Name]) == 0 {
|
||||
delete(s.index, evt.Name)
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// initNameIndex joins TopicNameIndex and starts consuming events.
|
||||
// Must be called after ix.PS is ready.
|
||||
func (ix *IndexerService) initNameIndex(ps *pubsub.PubSub) {
|
||||
logger := oclib.GetLogger()
|
||||
ix.nameIndex = &nameIndexState{
|
||||
index: map[string]map[string]string{},
|
||||
emitted: map[string]time.Time{},
|
||||
}
|
||||
|
||||
ps.RegisterTopicValidator(TopicNameIndex, func(_ context.Context, _ pp.ID, _ *pubsub.Message) bool {
|
||||
return true
|
||||
})
|
||||
topic, err := ps.Join(TopicNameIndex)
|
||||
if err != nil {
|
||||
logger.Err(err).Msg("name index: failed to join topic")
|
||||
return
|
||||
}
|
||||
ix.LongLivedStreamRecordedService.LongLivedPubSubService.PubsubMu.Lock()
|
||||
ix.LongLivedStreamRecordedService.LongLivedPubSubService.LongLivedPubSubs[TopicNameIndex] = topic
|
||||
ix.LongLivedStreamRecordedService.LongLivedPubSubService.PubsubMu.Unlock()
|
||||
|
||||
common.SubscribeEvents(
|
||||
ix.LongLivedStreamRecordedService.LongLivedPubSubService,
|
||||
context.Background(),
|
||||
TopicNameIndex,
|
||||
-1,
|
||||
func(_ context.Context, evt NameIndexEvent, _ string) {
|
||||
ix.nameIndex.onEvent(evt)
|
||||
},
|
||||
)
|
||||
}
|
||||
|
||||
// publishNameEvent emits a NameIndexEvent on TopicNameIndex, subject to the
|
||||
// sender-side deduplication window.
|
||||
func (ix *IndexerService) publishNameEvent(action NameIndexAction, name, peerID, did string) {
|
||||
if ix.nameIndex == nil || name == "" || peerID == "" {
|
||||
return
|
||||
}
|
||||
if !ix.nameIndex.shouldEmit(action, name, peerID) {
|
||||
return
|
||||
}
|
||||
ix.LongLivedStreamRecordedService.LongLivedPubSubService.PubsubMu.RLock()
|
||||
topic := ix.LongLivedStreamRecordedService.LongLivedPubSubService.LongLivedPubSubs[TopicNameIndex]
|
||||
ix.LongLivedStreamRecordedService.LongLivedPubSubService.PubsubMu.RUnlock()
|
||||
if topic == nil {
|
||||
return
|
||||
}
|
||||
evt := NameIndexEvent{Action: action, Name: name, PeerID: peerID, DID: did}
|
||||
b, err := json.Marshal(evt)
|
||||
if err != nil {
|
||||
return
|
||||
}
|
||||
_ = topic.Publish(context.Background(), b)
|
||||
}
|
||||
|
||||
// LookupNameIndex searches the distributed name index for peers whose name
|
||||
// contains needle (case-insensitive). Returns peerID → DID for matched peers.
|
||||
// Returns nil if the name index is not initialised (e.g. native indexers).
|
||||
func (ix *IndexerService) LookupNameIndex(needle string) map[string]string {
|
||||
if ix.nameIndex == nil {
|
||||
return nil
|
||||
}
|
||||
result := map[string]string{}
|
||||
needleLow := strings.ToLower(needle)
|
||||
ix.nameIndex.indexMu.RLock()
|
||||
defer ix.nameIndex.indexMu.RUnlock()
|
||||
for name, peers := range ix.nameIndex.index {
|
||||
if strings.Contains(strings.ToLower(name), needleLow) {
|
||||
for peerID, did := range peers {
|
||||
result[peerID] = did
|
||||
}
|
||||
}
|
||||
}
|
||||
return result
|
||||
}
|
||||
@@ -1,579 +0,0 @@
|
||||
package indexer
|
||||
|
||||
import (
|
||||
"context"
|
||||
"encoding/json"
|
||||
"errors"
|
||||
"fmt"
|
||||
"math/rand"
|
||||
"slices"
|
||||
"strings"
|
||||
"sync"
|
||||
"time"
|
||||
|
||||
"oc-discovery/daemons/node/common"
|
||||
|
||||
oclib "cloud.o-forge.io/core/oc-lib"
|
||||
pubsub "github.com/libp2p/go-libp2p-pubsub"
|
||||
"github.com/libp2p/go-libp2p/core/network"
|
||||
pp "github.com/libp2p/go-libp2p/core/peer"
|
||||
)
|
||||
|
||||
const (
|
||||
// IndexerTTL is the lifetime of a live-indexer cache entry. Set to 50% above
|
||||
// the recommended 60s heartbeat interval so a single delayed renewal does not
|
||||
// evict a healthy indexer from the native's cache.
|
||||
IndexerTTL = 90 * time.Second
|
||||
// offloadInterval is how often the native checks if it can release responsible peers.
|
||||
offloadInterval = 30 * time.Second
|
||||
// dhtRefreshInterval is how often the background goroutine queries the DHT for
|
||||
// known-but-expired indexer entries (written by neighbouring natives).
|
||||
dhtRefreshInterval = 30 * time.Second
|
||||
// maxFallbackPeers caps how many peers the native will accept in self-delegation
|
||||
// mode. Beyond this limit the native refuses to act as a fallback indexer so it
|
||||
// is not overwhelmed during prolonged indexer outages.
|
||||
maxFallbackPeers = 50
|
||||
)
|
||||
|
||||
// liveIndexerEntry tracks a registered indexer in the native's in-memory cache and DHT.
|
||||
type liveIndexerEntry struct {
|
||||
PeerID string `json:"peer_id"`
|
||||
Addr string `json:"addr"`
|
||||
ExpiresAt time.Time `json:"expires_at"`
|
||||
}
|
||||
|
||||
// NativeState holds runtime state specific to native indexer operation.
|
||||
type NativeState struct {
|
||||
liveIndexers map[string]*liveIndexerEntry // keyed by PeerID, local cache with TTL
|
||||
liveIndexersMu sync.RWMutex
|
||||
responsiblePeers map[pp.ID]struct{} // peers for which the native is fallback indexer
|
||||
responsibleMu sync.RWMutex
|
||||
// knownPeerIDs accumulates all indexer PeerIDs ever seen (local stream or gossip).
|
||||
// Used by refreshIndexersFromDHT to re-hydrate expired entries from the shared DHT,
|
||||
// including entries written by other natives.
|
||||
knownPeerIDs map[string]string
|
||||
knownMu sync.RWMutex
|
||||
}
|
||||
|
||||
func newNativeState() *NativeState {
|
||||
return &NativeState{
|
||||
liveIndexers: map[string]*liveIndexerEntry{},
|
||||
responsiblePeers: map[pp.ID]struct{}{},
|
||||
knownPeerIDs: map[string]string{},
|
||||
}
|
||||
}
|
||||
|
||||
// IndexerRecordValidator validates indexer DHT entries under the "indexer" namespace.
|
||||
type IndexerRecordValidator struct{}
|
||||
|
||||
func (v IndexerRecordValidator) Validate(_ string, value []byte) error {
|
||||
var e liveIndexerEntry
|
||||
if err := json.Unmarshal(value, &e); err != nil {
|
||||
return err
|
||||
}
|
||||
if e.Addr == "" {
|
||||
return errors.New("missing addr")
|
||||
}
|
||||
if e.ExpiresAt.Before(time.Now().UTC()) {
|
||||
return errors.New("expired indexer record")
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
func (v IndexerRecordValidator) Select(_ string, values [][]byte) (int, error) {
|
||||
var newest time.Time
|
||||
index := 0
|
||||
for i, val := range values {
|
||||
var e liveIndexerEntry
|
||||
if err := json.Unmarshal(val, &e); err != nil {
|
||||
continue
|
||||
}
|
||||
if e.ExpiresAt.After(newest) {
|
||||
newest = e.ExpiresAt
|
||||
index = i
|
||||
}
|
||||
}
|
||||
return index, nil
|
||||
}
|
||||
|
||||
// InitNative registers native-specific stream handlers and starts background loops.
|
||||
// Must be called after DHT is initialized.
|
||||
func (ix *IndexerService) InitNative() {
|
||||
ix.Native = newNativeState()
|
||||
ix.Host.SetStreamHandler(common.ProtocolHeartbeat, ix.HandleHeartbeat) // specific heartbeat for Indexer.
|
||||
ix.Host.SetStreamHandler(common.ProtocolNativeSubscription, ix.handleNativeSubscription)
|
||||
ix.Host.SetStreamHandler(common.ProtocolNativeGetIndexers, ix.handleNativeGetIndexers)
|
||||
ix.Host.SetStreamHandler(common.ProtocolNativeConsensus, ix.handleNativeConsensus)
|
||||
ix.Host.SetStreamHandler(common.ProtocolNativeGetPeers, ix.handleNativeGetPeers)
|
||||
ix.Host.SetStreamHandler(common.ProtocolIndexerGetNatives, ix.handleGetNatives)
|
||||
ix.subscribeIndexerRegistry()
|
||||
// Ensure long connections to other configured natives (native-to-native mesh).
|
||||
common.EnsureNativePeers(ix.Host)
|
||||
go ix.runOffloadLoop()
|
||||
go ix.refreshIndexersFromDHT()
|
||||
}
|
||||
|
||||
// subscribeIndexerRegistry joins the PubSub topic used by natives to gossip newly
|
||||
// registered indexer PeerIDs to one another, enabling cross-native DHT discovery.
|
||||
func (ix *IndexerService) subscribeIndexerRegistry() {
|
||||
logger := oclib.GetLogger()
|
||||
ix.PS.RegisterTopicValidator(common.TopicIndexerRegistry, func(_ context.Context, _ pp.ID, msg *pubsub.Message) bool {
|
||||
// Reject empty or syntactically invalid multiaddrs before they reach the
|
||||
// message loop. A compromised native could otherwise gossip arbitrary data.
|
||||
addr := string(msg.Data)
|
||||
if addr == "" {
|
||||
return false
|
||||
}
|
||||
_, err := pp.AddrInfoFromString(addr)
|
||||
return err == nil
|
||||
})
|
||||
topic, err := ix.PS.Join(common.TopicIndexerRegistry)
|
||||
if err != nil {
|
||||
logger.Err(err).Msg("native: failed to join indexer registry topic")
|
||||
return
|
||||
}
|
||||
sub, err := topic.Subscribe()
|
||||
if err != nil {
|
||||
logger.Err(err).Msg("native: failed to subscribe to indexer registry topic")
|
||||
return
|
||||
}
|
||||
ix.PubsubMu.Lock()
|
||||
ix.LongLivedPubSubs[common.TopicIndexerRegistry] = topic
|
||||
ix.PubsubMu.Unlock()
|
||||
|
||||
go func() {
|
||||
for {
|
||||
msg, err := sub.Next(context.Background())
|
||||
if err != nil {
|
||||
return
|
||||
}
|
||||
addr := string(msg.Data)
|
||||
if addr == "" {
|
||||
continue
|
||||
}
|
||||
if peer, err := pp.AddrInfoFromString(addr); err == nil {
|
||||
ix.Native.knownMu.Lock()
|
||||
ix.Native.knownPeerIDs[peer.ID.String()] = addr
|
||||
ix.Native.knownMu.Unlock()
|
||||
|
||||
}
|
||||
// A neighbouring native registered this PeerID; add to known set for DHT refresh.
|
||||
|
||||
}
|
||||
}()
|
||||
}
|
||||
|
||||
// handleNativeSubscription stores an indexer's alive registration in the local cache
|
||||
// immediately, then persists it to the DHT asynchronously.
|
||||
// The stream is temporary: indexer sends one IndexerRegistration and closes.
|
||||
func (ix *IndexerService) handleNativeSubscription(s network.Stream) {
|
||||
defer s.Close()
|
||||
logger := oclib.GetLogger()
|
||||
|
||||
logger.Info().Msg("Subscription")
|
||||
|
||||
var reg common.IndexerRegistration
|
||||
if err := json.NewDecoder(s).Decode(®); err != nil {
|
||||
logger.Err(err).Msg("native subscription: decode")
|
||||
return
|
||||
}
|
||||
logger.Info().Msg("Subscription " + reg.Addr)
|
||||
|
||||
if reg.Addr == "" {
|
||||
logger.Error().Msg("native subscription: missing addr")
|
||||
return
|
||||
}
|
||||
if reg.PeerID == "" {
|
||||
ad, err := pp.AddrInfoFromString(reg.Addr)
|
||||
if err != nil {
|
||||
logger.Err(err).Msg("native subscription: invalid addr")
|
||||
return
|
||||
}
|
||||
reg.PeerID = ad.ID.String()
|
||||
}
|
||||
|
||||
// Build entry with a fresh TTL — must happen before the cache write so the 66s
|
||||
// window is not consumed by DHT retries.
|
||||
entry := &liveIndexerEntry{
|
||||
PeerID: reg.PeerID,
|
||||
Addr: reg.Addr,
|
||||
ExpiresAt: time.Now().UTC().Add(IndexerTTL),
|
||||
}
|
||||
|
||||
// Update local cache and known set immediately so concurrent GetIndexers calls
|
||||
// can already see this indexer without waiting for the DHT write to complete.
|
||||
ix.Native.liveIndexersMu.Lock()
|
||||
_, isRenewal := ix.Native.liveIndexers[reg.PeerID]
|
||||
ix.Native.liveIndexers[reg.PeerID] = entry
|
||||
ix.Native.liveIndexersMu.Unlock()
|
||||
|
||||
ix.Native.knownMu.Lock()
|
||||
ix.Native.knownPeerIDs[reg.PeerID] = reg.Addr
|
||||
ix.Native.knownMu.Unlock()
|
||||
|
||||
// Gossip PeerID to neighbouring natives so they discover it via DHT.
|
||||
ix.PubsubMu.RLock()
|
||||
topic := ix.LongLivedPubSubs[common.TopicIndexerRegistry]
|
||||
ix.PubsubMu.RUnlock()
|
||||
if topic != nil {
|
||||
if err := topic.Publish(context.Background(), []byte(reg.Addr)); err != nil {
|
||||
logger.Err(err).Msg("native subscription: registry gossip publish")
|
||||
}
|
||||
}
|
||||
|
||||
if isRenewal {
|
||||
logger.Debug().Str("peer", reg.PeerID).Msg("native: indexer TTL renewed : " + fmt.Sprintf("%v", len(ix.Native.liveIndexers)))
|
||||
} else {
|
||||
logger.Info().Str("peer", reg.PeerID).Msg("native: indexer registered : " + fmt.Sprintf("%v", len(ix.Native.liveIndexers)))
|
||||
}
|
||||
|
||||
// Persist in DHT asynchronously — retries must not block the handler or consume
|
||||
// the local cache TTL.
|
||||
key := ix.genIndexerKey(reg.PeerID)
|
||||
data, err := json.Marshal(entry)
|
||||
if err != nil {
|
||||
logger.Err(err).Msg("native subscription: marshal entry")
|
||||
return
|
||||
}
|
||||
go func() {
|
||||
for {
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
|
||||
if err := ix.DHT.PutValue(ctx, key, data); err != nil {
|
||||
cancel()
|
||||
logger.Err(err).Msg("native subscription: DHT put " + key)
|
||||
if strings.Contains(err.Error(), "failed to find any peer in table") {
|
||||
time.Sleep(10 * time.Second)
|
||||
continue
|
||||
}
|
||||
return
|
||||
}
|
||||
cancel()
|
||||
return
|
||||
}
|
||||
}()
|
||||
}
|
||||
|
||||
// handleNativeGetIndexers returns this native's own list of reachable indexers.
|
||||
// Self-delegation (native acting as temporary fallback indexer) is only permitted
|
||||
// for nodes — never for peers that are themselves registered indexers in knownPeerIDs.
|
||||
// The consensus across natives is the responsibility of the requesting node/indexer.
|
||||
func (ix *IndexerService) handleNativeGetIndexers(s network.Stream) {
|
||||
defer s.Close()
|
||||
logger := oclib.GetLogger()
|
||||
|
||||
var req common.GetIndexersRequest
|
||||
if err := json.NewDecoder(s).Decode(&req); err != nil {
|
||||
logger.Err(err).Msg("native get indexers: decode")
|
||||
return
|
||||
}
|
||||
if req.Count <= 0 {
|
||||
req.Count = 3
|
||||
}
|
||||
callerPeerID := s.Conn().RemotePeer().String()
|
||||
reachable := ix.reachableLiveIndexers(req.Count, callerPeerID)
|
||||
var resp common.GetIndexersResponse
|
||||
|
||||
if len(reachable) == 0 {
|
||||
// No live indexers reachable — try to self-delegate.
|
||||
if ix.selfDelegate(s.Conn().RemotePeer(), &resp) {
|
||||
logger.Info().Str("peer", callerPeerID).Msg("native: no indexers, acting as fallback for node")
|
||||
} else {
|
||||
// Fallback pool saturated: return empty so the caller retries another
|
||||
// native instead of piling more load onto this one.
|
||||
logger.Warn().Str("peer", callerPeerID).Int("pool", maxFallbackPeers).Msg(
|
||||
"native: fallback pool saturated, refusing self-delegation")
|
||||
}
|
||||
} else {
|
||||
rand.Shuffle(len(reachable), func(i, j int) { reachable[i], reachable[j] = reachable[j], reachable[i] })
|
||||
if req.Count > len(reachable) {
|
||||
req.Count = len(reachable)
|
||||
}
|
||||
resp.Indexers = reachable[:req.Count]
|
||||
}
|
||||
|
||||
if err := json.NewEncoder(s).Encode(resp); err != nil {
|
||||
logger.Err(err).Msg("native get indexers: encode response")
|
||||
}
|
||||
}
|
||||
|
||||
// handleNativeConsensus answers a consensus challenge from a node/indexer.
|
||||
// It returns:
|
||||
// - Trusted: which of the candidates it considers alive.
|
||||
// - Suggestions: extras it knows and trusts that were not in the candidate list.
|
||||
func (ix *IndexerService) handleNativeConsensus(s network.Stream) {
|
||||
defer s.Close()
|
||||
logger := oclib.GetLogger()
|
||||
|
||||
var req common.ConsensusRequest
|
||||
if err := json.NewDecoder(s).Decode(&req); err != nil {
|
||||
logger.Err(err).Msg("native consensus: decode")
|
||||
return
|
||||
}
|
||||
|
||||
myList := ix.reachableLiveIndexers(-1, s.Conn().RemotePeer().String())
|
||||
mySet := make(map[string]struct{}, len(myList))
|
||||
for _, addr := range myList {
|
||||
mySet[addr] = struct{}{}
|
||||
}
|
||||
|
||||
trusted := []string{}
|
||||
candidateSet := make(map[string]struct{}, len(req.Candidates))
|
||||
for _, addr := range req.Candidates {
|
||||
candidateSet[addr] = struct{}{}
|
||||
if _, ok := mySet[addr]; ok {
|
||||
trusted = append(trusted, addr) // candidate we also confirm as reachable
|
||||
}
|
||||
}
|
||||
|
||||
// Extras we trust but that the requester didn't include → suggestions.
|
||||
suggestions := []string{}
|
||||
for _, addr := range myList {
|
||||
if _, inCandidates := candidateSet[addr]; !inCandidates {
|
||||
suggestions = append(suggestions, addr)
|
||||
}
|
||||
}
|
||||
|
||||
resp := common.ConsensusResponse{Trusted: trusted, Suggestions: suggestions}
|
||||
if err := json.NewEncoder(s).Encode(resp); err != nil {
|
||||
logger.Err(err).Msg("native consensus: encode response")
|
||||
}
|
||||
}
|
||||
|
||||
// selfDelegate marks the caller as a responsible peer and exposes this native's own
|
||||
// address as its temporary indexer. Returns false when the fallback pool is saturated
|
||||
// (maxFallbackPeers reached) — the caller must return an empty response so the node
|
||||
// retries later instead of pinning indefinitely to an overloaded native.
|
||||
func (ix *IndexerService) selfDelegate(remotePeer pp.ID, resp *common.GetIndexersResponse) bool {
|
||||
ix.Native.responsibleMu.Lock()
|
||||
defer ix.Native.responsibleMu.Unlock()
|
||||
if len(ix.Native.responsiblePeers) >= maxFallbackPeers {
|
||||
return false
|
||||
}
|
||||
ix.Native.responsiblePeers[remotePeer] = struct{}{}
|
||||
resp.IsSelfFallback = true
|
||||
resp.Indexers = []string{ix.Host.Addrs()[len(ix.Host.Addrs())-1].String() + "/p2p/" + ix.Host.ID().String()}
|
||||
return true
|
||||
}
|
||||
|
||||
// reachableLiveIndexers returns the multiaddrs of non-expired, pingable indexers
|
||||
// from the local cache (kept fresh by refreshIndexersFromDHT in background).
|
||||
func (ix *IndexerService) reachableLiveIndexers(count int, from ...string) []string {
|
||||
ix.Native.liveIndexersMu.RLock()
|
||||
now := time.Now().UTC()
|
||||
candidates := []*liveIndexerEntry{}
|
||||
for _, e := range ix.Native.liveIndexers {
|
||||
fmt.Println("liveIndexers", slices.Contains(from, e.PeerID), from, e.PeerID)
|
||||
if e.ExpiresAt.After(now) && !slices.Contains(from, e.PeerID) {
|
||||
candidates = append(candidates, e)
|
||||
}
|
||||
}
|
||||
ix.Native.liveIndexersMu.RUnlock()
|
||||
|
||||
fmt.Println("midway...", candidates, from, ix.Native.knownPeerIDs)
|
||||
|
||||
if (count > 0 && len(candidates) < count) || count < 0 {
|
||||
ix.Native.knownMu.RLock()
|
||||
for k, v := range ix.Native.knownPeerIDs {
|
||||
// Include peers whose liveIndexers entry is absent OR expired.
|
||||
// A non-nil but expired entry means the peer was once known but
|
||||
// has since timed out — PeerIsAlive below will decide if it's back.
|
||||
fmt.Println("knownPeerIDs", slices.Contains(from, k), from, k)
|
||||
if !slices.Contains(from, k) {
|
||||
candidates = append(candidates, &liveIndexerEntry{
|
||||
PeerID: k,
|
||||
Addr: v,
|
||||
})
|
||||
}
|
||||
}
|
||||
ix.Native.knownMu.RUnlock()
|
||||
}
|
||||
|
||||
fmt.Println("midway...1", candidates)
|
||||
|
||||
reachable := []string{}
|
||||
for _, e := range candidates {
|
||||
ad, err := pp.AddrInfoFromString(e.Addr)
|
||||
if err != nil {
|
||||
continue
|
||||
}
|
||||
if common.PeerIsAlive(ix.Host, *ad) {
|
||||
reachable = append(reachable, e.Addr)
|
||||
}
|
||||
}
|
||||
return reachable
|
||||
}
|
||||
|
||||
// refreshIndexersFromDHT runs in background and queries the shared DHT for every known
|
||||
// indexer PeerID whose local cache entry is missing or expired. This supplements the
|
||||
// local cache with entries written by neighbouring natives.
|
||||
func (ix *IndexerService) refreshIndexersFromDHT() {
|
||||
t := time.NewTicker(dhtRefreshInterval)
|
||||
defer t.Stop()
|
||||
logger := oclib.GetLogger()
|
||||
for range t.C {
|
||||
ix.Native.knownMu.RLock()
|
||||
peerIDs := make([]string, 0, len(ix.Native.knownPeerIDs))
|
||||
for pid := range ix.Native.knownPeerIDs {
|
||||
peerIDs = append(peerIDs, pid)
|
||||
}
|
||||
ix.Native.knownMu.RUnlock()
|
||||
|
||||
now := time.Now().UTC()
|
||||
for _, pid := range peerIDs {
|
||||
ix.Native.liveIndexersMu.RLock()
|
||||
existing := ix.Native.liveIndexers[pid]
|
||||
ix.Native.liveIndexersMu.RUnlock()
|
||||
if existing != nil && existing.ExpiresAt.After(now) {
|
||||
continue // still fresh in local cache
|
||||
}
|
||||
key := ix.genIndexerKey(pid)
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
|
||||
ch, err := ix.DHT.SearchValue(ctx, key)
|
||||
if err != nil {
|
||||
cancel()
|
||||
continue
|
||||
}
|
||||
var best *liveIndexerEntry
|
||||
for b := range ch {
|
||||
var e liveIndexerEntry
|
||||
if err := json.Unmarshal(b, &e); err != nil {
|
||||
continue
|
||||
}
|
||||
if e.ExpiresAt.After(time.Now().UTC()) {
|
||||
if best == nil || e.ExpiresAt.After(best.ExpiresAt) {
|
||||
best = &e
|
||||
}
|
||||
}
|
||||
}
|
||||
cancel()
|
||||
if best != nil {
|
||||
ix.Native.liveIndexersMu.Lock()
|
||||
ix.Native.liveIndexers[best.PeerID] = best
|
||||
ix.Native.liveIndexersMu.Unlock()
|
||||
logger.Info().Str("peer", best.PeerID).Msg("native: refreshed indexer from DHT")
|
||||
} else {
|
||||
// DHT has no fresh entry — peer is gone, prune from known set.
|
||||
ix.Native.knownMu.Lock()
|
||||
delete(ix.Native.knownPeerIDs, pid)
|
||||
ix.Native.knownMu.Unlock()
|
||||
logger.Info().Str("peer", pid).Msg("native: pruned stale peer from knownPeerIDs")
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
func (ix *IndexerService) genIndexerKey(peerID string) string {
|
||||
return "/indexer/" + peerID
|
||||
}
|
||||
|
||||
// runOffloadLoop periodically checks if real indexers are available and releases
|
||||
// responsible peers so they can reconnect to actual indexers on their next attempt.
|
||||
func (ix *IndexerService) runOffloadLoop() {
|
||||
t := time.NewTicker(offloadInterval)
|
||||
defer t.Stop()
|
||||
logger := oclib.GetLogger()
|
||||
for range t.C {
|
||||
fmt.Println("runOffloadLoop", ix.Native.responsiblePeers)
|
||||
ix.Native.responsibleMu.RLock()
|
||||
count := len(ix.Native.responsiblePeers)
|
||||
ix.Native.responsibleMu.RUnlock()
|
||||
if count == 0 {
|
||||
continue
|
||||
}
|
||||
ix.Native.responsibleMu.RLock()
|
||||
peerIDS := []string{}
|
||||
for p := range ix.Native.responsiblePeers {
|
||||
peerIDS = append(peerIDS, p.String())
|
||||
}
|
||||
fmt.Println("COUNT --> ", count, len(ix.reachableLiveIndexers(-1, peerIDS...)))
|
||||
ix.Native.responsibleMu.RUnlock()
|
||||
if len(ix.reachableLiveIndexers(-1, peerIDS...)) > 0 {
|
||||
ix.Native.responsibleMu.RLock()
|
||||
released := ix.Native.responsiblePeers
|
||||
ix.Native.responsibleMu.RUnlock()
|
||||
|
||||
// Reset (not Close) heartbeat streams of released peers.
|
||||
// Close() only half-closes the native's write direction — the peer's write
|
||||
// direction stays open and sendHeartbeat never sees an error.
|
||||
// Reset() abruptly terminates both directions, making the peer's next
|
||||
// json.Encode return an error which triggers replenishIndexersFromNative.
|
||||
ix.StreamMU.Lock()
|
||||
if streams := ix.StreamRecords[common.ProtocolHeartbeat]; streams != nil {
|
||||
for pid := range released {
|
||||
if rec, ok := streams[pid]; ok {
|
||||
if rec.HeartbeatStream != nil && rec.HeartbeatStream.Stream != nil {
|
||||
rec.HeartbeatStream.Stream.Reset()
|
||||
}
|
||||
ix.Native.responsibleMu.Lock()
|
||||
delete(ix.Native.responsiblePeers, pid)
|
||||
ix.Native.responsibleMu.Unlock()
|
||||
|
||||
delete(streams, pid)
|
||||
logger.Info().Str("peer", pid.String()).Str("proto", string(common.ProtocolHeartbeat)).Msg(
|
||||
"native: offload — stream reset, peer will reconnect to real indexer")
|
||||
} else {
|
||||
// No recorded heartbeat stream for this peer: either it never
|
||||
// passed the score check (new peer, uptime=0 → score<75) or the
|
||||
// stream was GC'd. We cannot send a Reset signal, so close the
|
||||
// whole connection instead — this makes the peer's sendHeartbeat
|
||||
// return an error, which triggers replenishIndexersFromNative and
|
||||
// migrates it to a real indexer.
|
||||
ix.Native.responsibleMu.Lock()
|
||||
delete(ix.Native.responsiblePeers, pid)
|
||||
ix.Native.responsibleMu.Unlock()
|
||||
go ix.Host.Network().ClosePeer(pid)
|
||||
logger.Info().Str("peer", pid.String()).Msg(
|
||||
"native: offload — no heartbeat stream, closing connection so peer re-requests real indexers")
|
||||
}
|
||||
}
|
||||
|
||||
}
|
||||
ix.StreamMU.Unlock()
|
||||
|
||||
logger.Info().Int("released", count).Msg("native: offloaded responsible peers to real indexers")
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// handleNativeGetPeers returns a random selection of this native's known native
|
||||
// contacts, excluding any in the request's Exclude list.
|
||||
func (ix *IndexerService) handleNativeGetPeers(s network.Stream) {
|
||||
defer s.Close()
|
||||
logger := oclib.GetLogger()
|
||||
|
||||
var req common.GetNativePeersRequest
|
||||
if err := json.NewDecoder(s).Decode(&req); err != nil {
|
||||
logger.Err(err).Msg("native get peers: decode")
|
||||
return
|
||||
}
|
||||
if req.Count <= 0 {
|
||||
req.Count = 1
|
||||
}
|
||||
|
||||
excludeSet := make(map[string]struct{}, len(req.Exclude))
|
||||
for _, e := range req.Exclude {
|
||||
excludeSet[e] = struct{}{}
|
||||
}
|
||||
|
||||
common.StreamNativeMu.RLock()
|
||||
candidates := make([]string, 0, len(common.StaticNatives))
|
||||
for addr := range common.StaticNatives {
|
||||
if _, excluded := excludeSet[addr]; !excluded {
|
||||
candidates = append(candidates, addr)
|
||||
}
|
||||
}
|
||||
common.StreamNativeMu.RUnlock()
|
||||
|
||||
rand.Shuffle(len(candidates), func(i, j int) { candidates[i], candidates[j] = candidates[j], candidates[i] })
|
||||
if req.Count > len(candidates) {
|
||||
req.Count = len(candidates)
|
||||
}
|
||||
|
||||
resp := common.GetNativePeersResponse{Peers: candidates[:req.Count]}
|
||||
if err := json.NewEncoder(s).Encode(resp); err != nil {
|
||||
logger.Err(err).Msg("native get peers: encode response")
|
||||
}
|
||||
}
|
||||
|
||||
// StartNativeRegistration starts a goroutine that periodically registers this
|
||||
// indexer with all configured native indexers (every RecommendedHeartbeatInterval).
|
||||
233
daemons/node/indexer/search.go
Normal file
233
daemons/node/indexer/search.go
Normal file
@@ -0,0 +1,233 @@
|
||||
package indexer
|
||||
|
||||
import (
|
||||
"context"
|
||||
"encoding/json"
|
||||
"strings"
|
||||
"time"
|
||||
|
||||
"oc-discovery/conf"
|
||||
"oc-discovery/daemons/node/common"
|
||||
|
||||
oclib "cloud.o-forge.io/core/oc-lib"
|
||||
pp "github.com/libp2p/go-libp2p/core/peer"
|
||||
"github.com/libp2p/go-libp2p/core/network"
|
||||
)
|
||||
|
||||
const TopicSearchPeer = "oc-search-peer"
|
||||
|
||||
// searchTimeout returns the configured search timeout, defaulting to 5s.
|
||||
func searchTimeout() time.Duration {
|
||||
if t := conf.GetConfig().SearchTimeout; t > 0 {
|
||||
return time.Duration(t) * time.Second
|
||||
}
|
||||
return 5 * time.Second
|
||||
}
|
||||
|
||||
// initSearchHandlers registers ProtocolSearchPeer and ProtocolSearchPeerResponse
|
||||
// and subscribes to TopicSearchPeer on GossipSub.
|
||||
func (ix *IndexerService) initSearchHandlers() {
|
||||
ix.Host.SetStreamHandler(common.ProtocolSearchPeer, ix.handleSearchPeer)
|
||||
ix.Host.SetStreamHandler(common.ProtocolSearchPeerResponse, ix.handleSearchPeerResponse)
|
||||
ix.initSearchSubscription()
|
||||
}
|
||||
|
||||
// updateReferent is called from HandleHeartbeat when Referent flag changes.
|
||||
// If referent=true the node is added to referencedNodes; if false it is removed.
|
||||
func (ix *IndexerService) updateReferent(pid pp.ID, rec PeerRecord, referent bool) {
|
||||
ix.referencedNodesMu.Lock()
|
||||
defer ix.referencedNodesMu.Unlock()
|
||||
if referent {
|
||||
ix.referencedNodes[pid] = rec
|
||||
} else {
|
||||
delete(ix.referencedNodes, pid)
|
||||
}
|
||||
}
|
||||
|
||||
// searchReferenced looks up nodes in referencedNodes matching the query.
|
||||
// Matches on peerID (exact), DID (exact), or name (case-insensitive contains).
|
||||
func (ix *IndexerService) searchReferenced(peerID, did, name string) []common.SearchHit {
|
||||
ix.referencedNodesMu.RLock()
|
||||
defer ix.referencedNodesMu.RUnlock()
|
||||
nameLow := strings.ToLower(name)
|
||||
var hits []common.SearchHit
|
||||
for pid, rec := range ix.referencedNodes {
|
||||
pidStr := pid.String()
|
||||
matchPeerID := peerID != "" && pidStr == peerID
|
||||
matchDID := did != "" && rec.DID == did
|
||||
matchName := name != "" && strings.Contains(strings.ToLower(rec.Name), nameLow)
|
||||
if matchPeerID || matchDID || matchName {
|
||||
hits = append(hits, common.SearchHit{
|
||||
PeerID: pidStr,
|
||||
DID: rec.DID,
|
||||
Name: rec.Name,
|
||||
})
|
||||
}
|
||||
}
|
||||
return hits
|
||||
}
|
||||
|
||||
// handleSearchPeer is the ProtocolSearchPeer handler.
|
||||
// The node opens this stream, sends a SearchPeerRequest, and reads results
|
||||
// as they stream in. The stream stays open until timeout or node closes it.
|
||||
func (ix *IndexerService) handleSearchPeer(s network.Stream) {
|
||||
logger := oclib.GetLogger()
|
||||
defer s.Reset()
|
||||
|
||||
if !ix.isPeerKnown(s.Conn().RemotePeer()) {
|
||||
logger.Warn().Str("peer", s.Conn().RemotePeer().String()).Msg("[search] unknown peer, rejecting stream")
|
||||
return
|
||||
}
|
||||
|
||||
var req common.SearchPeerRequest
|
||||
if err := json.NewDecoder(s).Decode(&req); err != nil || req.QueryID == "" {
|
||||
return
|
||||
}
|
||||
|
||||
// streamCtx is cancelled when the node closes its end of the stream.
|
||||
streamCtx, streamCancel := context.WithCancel(context.Background())
|
||||
go func() {
|
||||
// Block until the stream is reset/closed, then cancel our context.
|
||||
buf := make([]byte, 1)
|
||||
s.Read(buf) //nolint:errcheck — we only care about EOF/reset
|
||||
streamCancel()
|
||||
}()
|
||||
defer streamCancel()
|
||||
|
||||
resultCh := make(chan []common.SearchHit, 16)
|
||||
ix.pendingSearchesMu.Lock()
|
||||
ix.pendingSearches[req.QueryID] = resultCh
|
||||
ix.pendingSearchesMu.Unlock()
|
||||
defer func() {
|
||||
ix.pendingSearchesMu.Lock()
|
||||
delete(ix.pendingSearches, req.QueryID)
|
||||
ix.pendingSearchesMu.Unlock()
|
||||
}()
|
||||
|
||||
// Check own referencedNodes immediately.
|
||||
if hits := ix.searchReferenced(req.PeerID, req.DID, req.Name); len(hits) > 0 {
|
||||
resultCh <- hits
|
||||
}
|
||||
|
||||
// Broadcast search on GossipSub so other indexers can respond.
|
||||
ix.publishSearchQuery(req.QueryID, req.PeerID, req.DID, req.Name)
|
||||
|
||||
// Stream results back to node as they arrive; reset idle timer on each result.
|
||||
enc := json.NewEncoder(s)
|
||||
idleTimer := time.NewTimer(searchTimeout())
|
||||
defer idleTimer.Stop()
|
||||
for {
|
||||
select {
|
||||
case hits := <-resultCh:
|
||||
if err := enc.Encode(common.SearchPeerResult{QueryID: req.QueryID, Records: hits}); err != nil {
|
||||
logger.Debug().Err(err).Msg("[search] stream write failed")
|
||||
return
|
||||
}
|
||||
// Reset idle timeout: keep alive as long as results trickle in.
|
||||
if !idleTimer.Stop() {
|
||||
select {
|
||||
case <-idleTimer.C:
|
||||
default:
|
||||
}
|
||||
}
|
||||
idleTimer.Reset(searchTimeout())
|
||||
case <-idleTimer.C:
|
||||
// No new result within timeout — close gracefully.
|
||||
return
|
||||
case <-streamCtx.Done():
|
||||
// Node closed the stream (new search superseded this one).
|
||||
return
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// handleSearchPeerResponse is the ProtocolSearchPeerResponse handler.
|
||||
// Another indexer opens this stream to deliver hits for a pending queryID.
|
||||
func (ix *IndexerService) handleSearchPeerResponse(s network.Stream) {
|
||||
defer s.Reset()
|
||||
var result common.SearchPeerResult
|
||||
if err := json.NewDecoder(s).Decode(&result); err != nil || result.QueryID == "" {
|
||||
return
|
||||
}
|
||||
ix.pendingSearchesMu.Lock()
|
||||
ch := ix.pendingSearches[result.QueryID]
|
||||
ix.pendingSearchesMu.Unlock()
|
||||
if ch != nil {
|
||||
select {
|
||||
case ch <- result.Records:
|
||||
default: // channel full, drop — node may be slow
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// publishSearchQuery broadcasts a SearchQuery on TopicSearchPeer.
|
||||
func (ix *IndexerService) publishSearchQuery(queryID, peerID, did, name string) {
|
||||
ix.LongLivedStreamRecordedService.LongLivedPubSubService.PubsubMu.RLock()
|
||||
topic := ix.LongLivedStreamRecordedService.LongLivedPubSubService.LongLivedPubSubs[TopicSearchPeer]
|
||||
ix.LongLivedStreamRecordedService.LongLivedPubSubService.PubsubMu.RUnlock()
|
||||
if topic == nil {
|
||||
return
|
||||
}
|
||||
q := common.SearchQuery{
|
||||
QueryID: queryID,
|
||||
PeerID: peerID,
|
||||
DID: did,
|
||||
Name: name,
|
||||
EmitterID: ix.Host.ID().String(),
|
||||
}
|
||||
b, err := json.Marshal(q)
|
||||
if err != nil {
|
||||
return
|
||||
}
|
||||
_ = topic.Publish(context.Background(), b)
|
||||
}
|
||||
|
||||
// initSearchSubscription joins TopicSearchPeer and dispatches incoming queries.
|
||||
func (ix *IndexerService) initSearchSubscription() {
|
||||
logger := oclib.GetLogger()
|
||||
ix.LongLivedStreamRecordedService.LongLivedPubSubService.PubsubMu.Lock()
|
||||
topic, err := ix.PS.Join(TopicSearchPeer)
|
||||
if err != nil {
|
||||
ix.LongLivedStreamRecordedService.LongLivedPubSubService.PubsubMu.Unlock()
|
||||
logger.Err(err).Msg("[search] failed to join search topic")
|
||||
return
|
||||
}
|
||||
ix.LongLivedStreamRecordedService.LongLivedPubSubService.LongLivedPubSubs[TopicSearchPeer] = topic
|
||||
ix.LongLivedStreamRecordedService.LongLivedPubSubService.PubsubMu.Unlock()
|
||||
|
||||
common.SubscribeEvents(
|
||||
ix.LongLivedStreamRecordedService.LongLivedPubSubService,
|
||||
context.Background(),
|
||||
TopicSearchPeer,
|
||||
-1,
|
||||
func(_ context.Context, q common.SearchQuery, _ string) {
|
||||
ix.onSearchQuery(q)
|
||||
},
|
||||
)
|
||||
}
|
||||
|
||||
// onSearchQuery handles an incoming GossipSub search broadcast.
|
||||
// If we have matching referencedNodes, we respond to the emitting indexer.
|
||||
func (ix *IndexerService) onSearchQuery(q common.SearchQuery) {
|
||||
// Don't respond to our own broadcasts.
|
||||
if q.EmitterID == ix.Host.ID().String() {
|
||||
return
|
||||
}
|
||||
hits := ix.searchReferenced(q.PeerID, q.DID, q.Name)
|
||||
if len(hits) == 0 {
|
||||
return
|
||||
}
|
||||
emitterID, err := pp.Decode(q.EmitterID)
|
||||
if err != nil {
|
||||
return
|
||||
}
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
|
||||
defer cancel()
|
||||
s, err := ix.Host.NewStream(ctx, emitterID, common.ProtocolSearchPeerResponse)
|
||||
if err != nil {
|
||||
return
|
||||
}
|
||||
defer s.Reset()
|
||||
s.SetDeadline(time.Now().Add(5 * time.Second))
|
||||
json.NewEncoder(s).Encode(common.SearchPeerResult{QueryID: q.QueryID, Records: hits})
|
||||
}
|
||||
@@ -2,9 +2,14 @@ package indexer
|
||||
|
||||
import (
|
||||
"context"
|
||||
"encoding/json"
|
||||
"errors"
|
||||
"math/rand"
|
||||
"oc-discovery/conf"
|
||||
"oc-discovery/daemons/node/common"
|
||||
"strings"
|
||||
"sync"
|
||||
"time"
|
||||
|
||||
oclib "cloud.o-forge.io/core/oc-lib"
|
||||
dht "github.com/libp2p/go-libp2p-kad-dht"
|
||||
@@ -14,28 +19,69 @@ import (
|
||||
pp "github.com/libp2p/go-libp2p/core/peer"
|
||||
)
|
||||
|
||||
// dhtCacheEntry holds one indexer discovered via DHT for use in suggestion responses.
|
||||
type dhtCacheEntry struct {
|
||||
AI pp.AddrInfo
|
||||
LastSeen time.Time
|
||||
}
|
||||
|
||||
// offloadState tracks which nodes we've already proposed migration to.
|
||||
// When an indexer is overloaded (fill rate > offloadThreshold) it only sends
|
||||
// SuggestMigrate to a small batch at a time; peers that don't migrate within
|
||||
// offloadGracePeriod are moved to alreadyTried so a new batch can be picked.
|
||||
type offloadState struct {
|
||||
inBatch map[pp.ID]time.Time // peer → time added to current batch
|
||||
alreadyTried map[pp.ID]struct{} // peers proposed to that didn't migrate
|
||||
mu sync.Mutex
|
||||
}
|
||||
|
||||
const (
|
||||
offloadThreshold = 0.80 // fill rate above which to start offloading
|
||||
offloadBatchSize = 5 // max concurrent "please migrate" proposals
|
||||
offloadGracePeriod = 3 * common.RecommendedHeartbeatInterval
|
||||
)
|
||||
|
||||
// IndexerService manages the indexer node's state: stream records, DHT, pubsub.
|
||||
type IndexerService struct {
|
||||
*common.LongLivedStreamRecordedService[PeerRecord]
|
||||
PS *pubsub.PubSub
|
||||
DHT *dht.IpfsDHT
|
||||
isStrictIndexer bool
|
||||
mu sync.RWMutex
|
||||
IsNative bool
|
||||
Native *NativeState // non-nil when IsNative == true
|
||||
nameIndex *nameIndexState
|
||||
PS *pubsub.PubSub
|
||||
DHT *dht.IpfsDHT
|
||||
isStrictIndexer bool
|
||||
mu sync.RWMutex
|
||||
dhtProvideCancel context.CancelFunc
|
||||
bornAt time.Time
|
||||
// Passive DHT cache: refreshed every 2 min in background, used for suggestions.
|
||||
dhtCache []dhtCacheEntry
|
||||
dhtCacheMu sync.RWMutex
|
||||
// Offload state for overloaded-indexer migration proposals.
|
||||
offload offloadState
|
||||
// referencedNodes holds nodes that have designated this indexer as their
|
||||
// search referent (Heartbeat.Referent=true). Used for distributed search.
|
||||
referencedNodes map[pp.ID]PeerRecord
|
||||
referencedNodesMu sync.RWMutex
|
||||
// pendingSearches maps queryID → result channel for in-flight searches.
|
||||
pendingSearches map[string]chan []common.SearchHit
|
||||
pendingSearchesMu sync.Mutex
|
||||
// behavior tracks per-node compliance (heartbeat rate, publish/get volume,
|
||||
// identity consistency, signature failures).
|
||||
behavior *NodeBehaviorTracker
|
||||
// connGuard limits new-connection bursts to protect public indexers.
|
||||
connGuard *ConnectionRateGuard
|
||||
}
|
||||
|
||||
// NewIndexerService creates an IndexerService.
|
||||
// If ps is nil, this is a strict indexer (no pre-existing gossip sub from a node).
|
||||
func NewIndexerService(h host.Host, ps *pubsub.PubSub, maxNode int, isNative bool) *IndexerService {
|
||||
func NewIndexerService(h host.Host, ps *pubsub.PubSub, maxNode int) *IndexerService {
|
||||
logger := oclib.GetLogger()
|
||||
logger.Info().Msg("open indexer mode...")
|
||||
var err error
|
||||
ix := &IndexerService{
|
||||
LongLivedStreamRecordedService: common.NewStreamRecordedService[PeerRecord](h, maxNode),
|
||||
isStrictIndexer: ps == nil,
|
||||
IsNative: isNative,
|
||||
referencedNodes: map[pp.ID]PeerRecord{},
|
||||
pendingSearches: map[string]chan []common.SearchHit{},
|
||||
behavior: newNodeBehaviorTracker(),
|
||||
connGuard: newConnectionRateGuard(),
|
||||
}
|
||||
if ps == nil {
|
||||
ps, err = pubsub.NewGossipSub(context.Background(), ix.Host)
|
||||
@@ -45,56 +91,410 @@ func NewIndexerService(h host.Host, ps *pubsub.PubSub, maxNode int, isNative boo
|
||||
}
|
||||
ix.PS = ps
|
||||
|
||||
if ix.isStrictIndexer && !isNative {
|
||||
if ix.isStrictIndexer {
|
||||
logger.Info().Msg("connect to indexers as strict indexer...")
|
||||
common.ConnectToIndexers(h, conf.GetConfig().MinIndexer, conf.GetConfig().MaxIndexer, ix.Host.ID())
|
||||
common.ConnectToIndexers(h, conf.GetConfig().MinIndexer, conf.GetConfig().MaxIndexer*2)
|
||||
logger.Info().Msg("subscribe to decentralized search flow as strict indexer...")
|
||||
go ix.SubscribeToSearch(ix.PS, nil)
|
||||
}
|
||||
|
||||
if !isNative {
|
||||
logger.Info().Msg("init distributed name index...")
|
||||
ix.initNameIndex(ps)
|
||||
ix.LongLivedStreamRecordedService.AfterDelete = func(pid pp.ID, name, did string) {
|
||||
ix.publishNameEvent(NameIndexDelete, name, pid.String(), did)
|
||||
}
|
||||
ix.LongLivedStreamRecordedService.AfterDelete = func(pid pp.ID, name, did string) {
|
||||
// Remove behavior state for peers that are no longer connected and
|
||||
// have no active ban — keeps memory bounded to the live node set.
|
||||
ix.behavior.Cleanup(pid)
|
||||
}
|
||||
|
||||
if ix.DHT, err = dht.New(
|
||||
context.Background(),
|
||||
ix.Host,
|
||||
// AllowInbound: fired once per stream open, before any heartbeat is decoded.
|
||||
// 1. Reject peers that are currently banned (behavioral strikes).
|
||||
// 2. For genuinely new connections, apply the burst guard.
|
||||
ix.AllowInbound = func(remotePeer pp.ID, isNew bool) error {
|
||||
if ix.behavior.IsBanned(remotePeer) {
|
||||
return errors.New("peer is banned")
|
||||
}
|
||||
if isNew && !ix.connGuard.Allow() {
|
||||
return errors.New("connection rate limit exceeded, retry later")
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
// ValidateHeartbeat: fired on every heartbeat tick for an established stream.
|
||||
// Checks heartbeat cadence — rejects if the node is sending too fast.
|
||||
ix.ValidateHeartbeat = func(remotePeer pp.ID) error {
|
||||
return ix.behavior.RecordHeartbeat(remotePeer)
|
||||
}
|
||||
|
||||
// Parse bootstrap peers from configured indexer addresses so the DHT can
|
||||
// find its routing table entries even in a fresh deployment.
|
||||
var bootstrapPeers []pp.AddrInfo
|
||||
for _, addrStr := range strings.Split(conf.GetConfig().IndexerAddresses, ",") {
|
||||
addrStr = strings.TrimSpace(addrStr)
|
||||
if addrStr == "" {
|
||||
continue
|
||||
}
|
||||
if ad, err := pp.AddrInfoFromString(addrStr); err == nil {
|
||||
bootstrapPeers = append(bootstrapPeers, *ad)
|
||||
}
|
||||
}
|
||||
dhtOpts := []dht.Option{
|
||||
dht.Mode(dht.ModeServer),
|
||||
dht.ProtocolPrefix("oc"), // 🔥 réseau privé
|
||||
dht.ProtocolPrefix("oc"),
|
||||
dht.Validator(record.NamespacedValidator{
|
||||
"node": PeerRecordValidator{},
|
||||
"indexer": IndexerRecordValidator{}, // for native indexer registry
|
||||
"name": DefaultValidator{},
|
||||
"pid": DefaultValidator{},
|
||||
"node": PeerRecordValidator{},
|
||||
"name": DefaultValidator{},
|
||||
"pid": DefaultValidator{},
|
||||
}),
|
||||
); err != nil {
|
||||
}
|
||||
if len(bootstrapPeers) > 0 {
|
||||
dhtOpts = append(dhtOpts, dht.BootstrapPeers(bootstrapPeers...))
|
||||
}
|
||||
if ix.DHT, err = dht.New(context.Background(), ix.Host, dhtOpts...); err != nil {
|
||||
logger.Info().Msg(err.Error())
|
||||
return nil
|
||||
}
|
||||
|
||||
// InitNative must happen after DHT is ready
|
||||
if isNative {
|
||||
ix.InitNative()
|
||||
} else {
|
||||
ix.initNodeHandler()
|
||||
// Register with configured natives so this indexer appears in their cache
|
||||
if nativeAddrs := conf.GetConfig().NativeIndexerAddresses; nativeAddrs != "" {
|
||||
common.StartNativeRegistration(ix.Host, nativeAddrs)
|
||||
// Make the DHT available for replenishment from other packages.
|
||||
common.SetDiscoveryDHT(ix.DHT)
|
||||
|
||||
ix.bornAt = time.Now().UTC()
|
||||
ix.offload.inBatch = make(map[pp.ID]time.Time)
|
||||
ix.offload.alreadyTried = make(map[pp.ID]struct{})
|
||||
ix.initNodeHandler()
|
||||
|
||||
// Build and send a HeartbeatResponse after each received node heartbeat.
|
||||
// Raw metrics only — no pre-cooked score. Node computes the score itself.
|
||||
ix.BuildHeartbeatResponse = func(remotePeer pp.ID, need int, challenges []string, challengeDID string, referent bool, rawRecord json.RawMessage) *common.HeartbeatResponse {
|
||||
ix.StreamMU.RLock()
|
||||
peerCount := len(ix.StreamRecords[common.ProtocolHeartbeat])
|
||||
// Collect lastSeen per active peer for challenge responses.
|
||||
type peerMeta struct {
|
||||
found bool
|
||||
lastSeen time.Time
|
||||
}
|
||||
peerLookup := make(map[string]peerMeta, peerCount)
|
||||
var remotePeerRecord PeerRecord
|
||||
for pid, rec := range ix.StreamRecords[common.ProtocolHeartbeat] {
|
||||
var ls time.Time
|
||||
if rec.HeartbeatStream != nil && rec.HeartbeatStream.UptimeTracker != nil {
|
||||
ls = rec.HeartbeatStream.UptimeTracker.LastSeen
|
||||
}
|
||||
peerLookup[pid.String()] = peerMeta{found: true, lastSeen: ls}
|
||||
if pid == remotePeer {
|
||||
remotePeerRecord = rec.Record
|
||||
}
|
||||
}
|
||||
ix.StreamMU.RUnlock()
|
||||
|
||||
// AfterHeartbeat updates srec.Record asynchronously — it may not have run yet.
|
||||
// Use rawRecord (the fresh signed PeerRecord embedded in the heartbeat) directly
|
||||
// so referencedNodes always gets the current Name/DID regardless of timing.
|
||||
if remotePeerRecord.Name == "" && len(rawRecord) > 0 {
|
||||
var fresh PeerRecord
|
||||
if json.Unmarshal(rawRecord, &fresh) == nil {
|
||||
remotePeerRecord = fresh
|
||||
}
|
||||
}
|
||||
|
||||
// Update referent designation: node marks its best-scored indexer with Referent=true.
|
||||
ix.updateReferent(remotePeer, remotePeerRecord, referent)
|
||||
|
||||
maxN := ix.MaxNodesConn()
|
||||
fillRate := 0.0
|
||||
if maxN > 0 {
|
||||
fillRate = float64(peerCount) / float64(maxN)
|
||||
if fillRate > 1 {
|
||||
fillRate = 1
|
||||
}
|
||||
}
|
||||
|
||||
resp := &common.HeartbeatResponse{
|
||||
FillRate: fillRate,
|
||||
PeerCount: peerCount,
|
||||
MaxNodes: maxN,
|
||||
BornAt: ix.bornAt,
|
||||
}
|
||||
|
||||
// Answer each challenged PeerID with raw found + lastSeen.
|
||||
for _, pidStr := range challenges {
|
||||
meta := peerLookup[pidStr] // zero value if not found
|
||||
entry := common.ChallengeEntry{
|
||||
PeerID: pidStr,
|
||||
Found: meta.found,
|
||||
LastSeen: meta.lastSeen,
|
||||
}
|
||||
resp.Challenges = append(resp.Challenges, entry)
|
||||
}
|
||||
|
||||
// DHT challenge: retrieve the node's own DID to prove DHT is functional.
|
||||
if challengeDID != "" {
|
||||
ctx3, cancel3 := context.WithTimeout(context.Background(), 3*time.Second)
|
||||
val, err := ix.DHT.GetValue(ctx3, "/node/"+challengeDID)
|
||||
cancel3()
|
||||
resp.DHTFound = err == nil
|
||||
if err == nil {
|
||||
resp.DHTPayload = json.RawMessage(val)
|
||||
}
|
||||
}
|
||||
|
||||
// Random sample of connected nodes as witnesses (up to 3).
|
||||
// Never include the requesting peer itself — asking a node to witness
|
||||
// itself is circular and meaningless.
|
||||
ix.StreamMU.RLock()
|
||||
for pidStr := range peerLookup {
|
||||
if len(resp.Witnesses) >= 3 {
|
||||
break
|
||||
}
|
||||
pid, err := pp.Decode(pidStr)
|
||||
if err != nil || pid == remotePeer || pid == ix.Host.ID() {
|
||||
continue
|
||||
}
|
||||
addrs := ix.Host.Peerstore().Addrs(pid)
|
||||
ai := common.FilterLoopbackAddrs(pp.AddrInfo{ID: pid, Addrs: addrs})
|
||||
if len(ai.Addrs) > 0 {
|
||||
resp.Witnesses = append(resp.Witnesses, ai)
|
||||
}
|
||||
}
|
||||
ix.StreamMU.RUnlock()
|
||||
|
||||
// Attach suggestions: exactly `need` entries from the DHT cache.
|
||||
// If the indexer is overloaded (SuggestMigrate will be set below), always
|
||||
// provide at least 1 suggestion even when need == 0, so the node has
|
||||
// somewhere to go.
|
||||
suggestionsNeeded := need
|
||||
if fillRate > offloadThreshold && suggestionsNeeded < 1 {
|
||||
suggestionsNeeded = 1
|
||||
}
|
||||
if suggestionsNeeded > 0 {
|
||||
ix.dhtCacheMu.RLock()
|
||||
// When offloading, pick from a random offset within the top N of the
|
||||
// cache so concurrent migrations spread across multiple targets rather
|
||||
// than all rushing to the same least-loaded indexer (thundering herd).
|
||||
// For normal need-based suggestions the full sorted order is fine.
|
||||
cache := ix.dhtCache
|
||||
if fillRate > offloadThreshold && len(cache) > suggestionsNeeded {
|
||||
const spreadWindow = 5 // sample from the top-5 least-loaded
|
||||
window := spreadWindow
|
||||
if window > len(cache) {
|
||||
window = len(cache)
|
||||
}
|
||||
start := rand.Intn(window)
|
||||
cache = cache[start:]
|
||||
}
|
||||
for _, e := range cache {
|
||||
if len(resp.Suggestions) >= suggestionsNeeded {
|
||||
break
|
||||
}
|
||||
// Never suggest the requesting peer itself or this indexer.
|
||||
if e.AI.ID == remotePeer || e.AI.ID == h.ID() {
|
||||
continue
|
||||
}
|
||||
resp.Suggestions = append(resp.Suggestions, e.AI)
|
||||
}
|
||||
ix.dhtCacheMu.RUnlock()
|
||||
}
|
||||
|
||||
// Offload logic: when fill rate is too high, selectively ask nodes to migrate.
|
||||
if fillRate > offloadThreshold && len(resp.Suggestions) > 0 {
|
||||
now := time.Now()
|
||||
ix.offload.mu.Lock()
|
||||
// Expire stale batch entries -> move to alreadyTried.
|
||||
for pid, addedAt := range ix.offload.inBatch {
|
||||
if now.Sub(addedAt) > offloadGracePeriod {
|
||||
ix.offload.alreadyTried[pid] = struct{}{}
|
||||
delete(ix.offload.inBatch, pid)
|
||||
}
|
||||
}
|
||||
// Reset alreadyTried if we've exhausted the whole pool.
|
||||
if len(ix.offload.alreadyTried) >= peerCount {
|
||||
ix.offload.alreadyTried = make(map[pp.ID]struct{})
|
||||
}
|
||||
_, tried := ix.offload.alreadyTried[remotePeer]
|
||||
_, inBatch := ix.offload.inBatch[remotePeer]
|
||||
if !tried {
|
||||
if inBatch {
|
||||
resp.SuggestMigrate = true
|
||||
} else if len(ix.offload.inBatch) < offloadBatchSize {
|
||||
ix.offload.inBatch[remotePeer] = now
|
||||
resp.SuggestMigrate = true
|
||||
}
|
||||
}
|
||||
ix.offload.mu.Unlock()
|
||||
} else if fillRate <= offloadThreshold {
|
||||
// Fill rate back to normal: reset offload state.
|
||||
ix.offload.mu.Lock()
|
||||
if len(ix.offload.inBatch) > 0 || len(ix.offload.alreadyTried) > 0 {
|
||||
ix.offload.inBatch = make(map[pp.ID]time.Time)
|
||||
ix.offload.alreadyTried = make(map[pp.ID]struct{})
|
||||
}
|
||||
ix.offload.mu.Unlock()
|
||||
}
|
||||
|
||||
// Bootstrap: if this indexer has no indexers of its own, probe the
|
||||
// connecting peer to check it supports ProtocolHeartbeat (i.e. it is
|
||||
// itself an indexer). Plain nodes do not register the handler and the
|
||||
// negotiation fails instantly — no wasted heartbeat cycle.
|
||||
// Run in a goroutine: the probe is a short blocking stream open.
|
||||
if len(common.Indexers.GetAddrs()) == 0 && remotePeer != h.ID() {
|
||||
pid := remotePeer
|
||||
go func() {
|
||||
if !common.SupportsHeartbeat(h, pid) {
|
||||
logger.Debug().Str("peer", pid.String()).
|
||||
Msg("[bootstrap] inbound peer has no heartbeat handler — not an indexer, skipping")
|
||||
return
|
||||
}
|
||||
addrs := h.Peerstore().Addrs(pid)
|
||||
ai := common.FilterLoopbackAddrs(pp.AddrInfo{ID: pid, Addrs: addrs})
|
||||
if len(ai.Addrs) == 0 {
|
||||
return
|
||||
}
|
||||
key := pid.String()
|
||||
if !common.Indexers.ExistsAddr(key) {
|
||||
adCopy := ai
|
||||
common.Indexers.SetAddr(key, &adCopy)
|
||||
common.Indexers.NudgeIt()
|
||||
logger.Info().Str("peer", key).Msg("[bootstrap] no indexers — added inbound indexer peer as candidate")
|
||||
}
|
||||
}()
|
||||
}
|
||||
|
||||
return resp
|
||||
}
|
||||
|
||||
// Advertise this indexer in the DHT so nodes can discover it.
|
||||
fillRateFn := func() float64 {
|
||||
ix.StreamMU.RLock()
|
||||
n := len(ix.StreamRecords[common.ProtocolHeartbeat])
|
||||
ix.StreamMU.RUnlock()
|
||||
maxN := ix.MaxNodesConn()
|
||||
if maxN <= 0 {
|
||||
return 0
|
||||
}
|
||||
rate := float64(n) / float64(maxN)
|
||||
if rate > 1 {
|
||||
rate = 1
|
||||
}
|
||||
return rate
|
||||
}
|
||||
ix.startDHTCacheRefresh()
|
||||
ix.startDHTProvide(fillRateFn)
|
||||
|
||||
return ix
|
||||
}
|
||||
|
||||
// startDHTCacheRefresh periodically queries the DHT for peer indexers and
|
||||
// refreshes ix.dhtCache. This passive cache is used by BuildHeartbeatResponse
|
||||
// to suggest better indexers to connected nodes without any per-request cost.
|
||||
func (ix *IndexerService) startDHTCacheRefresh() {
|
||||
ctx, cancel := context.WithCancel(context.Background())
|
||||
// Store cancel alongside the provide cancel so Close() stops both.
|
||||
prevCancel := ix.dhtProvideCancel
|
||||
ix.dhtProvideCancel = func() {
|
||||
if prevCancel != nil {
|
||||
prevCancel()
|
||||
}
|
||||
cancel()
|
||||
}
|
||||
go func() {
|
||||
logger := oclib.GetLogger()
|
||||
refresh := func() {
|
||||
if ix.DHT == nil {
|
||||
return
|
||||
}
|
||||
// Fetch more than needed so SelectByFillRate can filter for diversity.
|
||||
raw := common.DiscoverIndexersFromDHT(ix.Host, ix.DHT, 30)
|
||||
if len(raw) == 0 {
|
||||
return
|
||||
}
|
||||
// Remove self before selection.
|
||||
filtered := raw[:0]
|
||||
for _, ai := range raw {
|
||||
if ai.ID != ix.Host.ID() {
|
||||
filtered = append(filtered, ai)
|
||||
}
|
||||
}
|
||||
// SelectByFillRate applies /24 subnet diversity and fill-rate weighting.
|
||||
// Fill rates are unknown at this stage (nil map) so all peers get
|
||||
// the neutral prior f=0.5 — diversity filtering still applies.
|
||||
selected := common.SelectByFillRate(filtered, nil, 10)
|
||||
now := time.Now()
|
||||
ix.dhtCacheMu.Lock()
|
||||
ix.dhtCache = ix.dhtCache[:0]
|
||||
for _, ai := range selected {
|
||||
ix.dhtCache = append(ix.dhtCache, dhtCacheEntry{AI: ai, LastSeen: now})
|
||||
}
|
||||
ix.dhtCacheMu.Unlock()
|
||||
logger.Info().Int("cached", len(selected)).Msg("[dht] indexer suggestion cache refreshed")
|
||||
}
|
||||
// Initial delay: let the DHT routing table warm up first.
|
||||
select {
|
||||
case <-time.After(30 * time.Second):
|
||||
case <-ctx.Done():
|
||||
return
|
||||
}
|
||||
refresh()
|
||||
t := time.NewTicker(2 * time.Minute)
|
||||
defer t.Stop()
|
||||
for {
|
||||
select {
|
||||
case <-t.C:
|
||||
refresh()
|
||||
case <-ctx.Done():
|
||||
return
|
||||
}
|
||||
}
|
||||
}()
|
||||
}
|
||||
|
||||
// startDHTProvide bootstraps the DHT and starts a goroutine that periodically
|
||||
// advertises this indexer under the well-known provider key.
|
||||
func (ix *IndexerService) startDHTProvide(fillRateFn func() float64) {
|
||||
ctx, cancel := context.WithCancel(context.Background())
|
||||
ix.dhtProvideCancel = cancel
|
||||
go func() {
|
||||
logger := oclib.GetLogger()
|
||||
// Wait until a routable (non-loopback) address is available.
|
||||
for i := 0; i < 12; i++ {
|
||||
addrs := ix.Host.Addrs()
|
||||
if len(addrs) > 0 && !strings.Contains(addrs[len(addrs)-1].String(), "127.0.0.1") {
|
||||
break
|
||||
}
|
||||
select {
|
||||
case <-ctx.Done():
|
||||
return
|
||||
case <-time.After(5 * time.Second):
|
||||
}
|
||||
}
|
||||
if err := ix.DHT.Bootstrap(ctx); err != nil {
|
||||
logger.Warn().Err(err).Msg("[dht] bootstrap failed")
|
||||
}
|
||||
provide := func() {
|
||||
pCtx, pCancel := context.WithTimeout(ctx, 30*time.Second)
|
||||
defer pCancel()
|
||||
if err := ix.DHT.Provide(pCtx, common.IndexerCID(), true); err != nil {
|
||||
logger.Warn().Err(err).Msg("[dht] Provide failed")
|
||||
} else {
|
||||
logger.Info().Float64("fill_rate", fillRateFn()).Msg("[dht] indexer advertised in DHT")
|
||||
}
|
||||
}
|
||||
provide()
|
||||
t := time.NewTicker(common.RecommendedHeartbeatInterval)
|
||||
defer t.Stop()
|
||||
for {
|
||||
select {
|
||||
case <-t.C:
|
||||
provide()
|
||||
case <-ctx.Done():
|
||||
return
|
||||
}
|
||||
}
|
||||
}()
|
||||
}
|
||||
|
||||
func (ix *IndexerService) Close() {
|
||||
if ix.dhtProvideCancel != nil {
|
||||
ix.dhtProvideCancel()
|
||||
}
|
||||
ix.DHT.Close()
|
||||
ix.PS.UnregisterTopicValidator(common.TopicPubSubSearch)
|
||||
if ix.nameIndex != nil {
|
||||
ix.PS.UnregisterTopicValidator(TopicNameIndex)
|
||||
}
|
||||
for _, s := range ix.StreamRecords {
|
||||
for _, ss := range s {
|
||||
ss.HeartbeatStream.Stream.Close()
|
||||
|
||||
@@ -6,10 +6,10 @@ import (
|
||||
"fmt"
|
||||
"oc-discovery/daemons/node/common"
|
||||
"oc-discovery/daemons/node/stream"
|
||||
"slices"
|
||||
|
||||
oclib "cloud.o-forge.io/core/oc-lib"
|
||||
"cloud.o-forge.io/core/oc-lib/config"
|
||||
"cloud.o-forge.io/core/oc-lib/models/peer"
|
||||
"cloud.o-forge.io/core/oc-lib/tools"
|
||||
pp "github.com/libp2p/go-libp2p/core/peer"
|
||||
"github.com/libp2p/go-libp2p/core/protocol"
|
||||
@@ -25,89 +25,23 @@ type executionConsidersPayload struct {
|
||||
|
||||
func ListenNATS(n *Node) {
|
||||
tools.NewNATSCaller().ListenNats(map[tools.NATSMethod]func(tools.NATSResponse){
|
||||
/*tools.VERIFY_RESOURCE: func(resp tools.NATSResponse) {
|
||||
if resp.FromApp == config.GetAppName() {
|
||||
return
|
||||
}
|
||||
if res, err := resources.ToResource(resp.Datatype.EnumIndex(), resp.Payload); err == nil {
|
||||
access := oclib.NewRequestAdmin(oclib.LibDataEnum(oclib.PEER), nil)
|
||||
p := access.LoadOne(res.GetCreatorID())
|
||||
realP := p.ToPeer()
|
||||
if realP == nil {
|
||||
return
|
||||
} else if realP.Relation == peer.SELF {
|
||||
pubKey, err := common.PubKeyFromString(realP.PublicKey) // extract pubkey from pubkey str
|
||||
if err != nil {
|
||||
return
|
||||
}
|
||||
ok, _ := pubKey.Verify(resp.Payload, res.GetSignature())
|
||||
if b, err := json.Marshal(stream.Verify{
|
||||
IsVerified: ok,
|
||||
}); err == nil {
|
||||
tools.NewNATSCaller().SetNATSPub(tools.VERIFY_RESOURCE, tools.NATSResponse{
|
||||
FromApp: "oc-discovery",
|
||||
Method: int(tools.VERIFY_RESOURCE),
|
||||
Payload: b,
|
||||
})
|
||||
}
|
||||
} else if realP.Relation != peer.BLACKLIST {
|
||||
n.StreamService.PublishVerifyResources(&resp.Datatype, resp.User, realP.PeerID, resp.Payload)
|
||||
}
|
||||
}
|
||||
},*/
|
||||
tools.CREATE_RESOURCE: func(resp tools.NATSResponse) {
|
||||
if resp.FromApp == config.GetAppName() && resp.Datatype != tools.PEER && resp.Datatype != tools.WORKFLOW {
|
||||
return
|
||||
}
|
||||
logger := oclib.GetLogger()
|
||||
m := map[string]interface{}{}
|
||||
err := json.Unmarshal(resp.Payload, &m)
|
||||
if err != nil {
|
||||
logger.Err(err)
|
||||
return
|
||||
}
|
||||
p := &peer.Peer{}
|
||||
p = p.Deserialize(m, p).(*peer.Peer)
|
||||
|
||||
ad, err := pp.AddrInfoFromString(p.StreamAddress)
|
||||
if err != nil {
|
||||
return
|
||||
}
|
||||
n.StreamService.Mu.Lock()
|
||||
defer n.StreamService.Mu.Unlock()
|
||||
|
||||
if p.Relation == peer.PARTNER {
|
||||
n.StreamService.ConnectToPartner(p.StreamAddress)
|
||||
} else {
|
||||
ps := common.ProtocolStream{}
|
||||
for p, s := range n.StreamService.Streams {
|
||||
m := map[pp.ID]*common.Stream{}
|
||||
for k := range s {
|
||||
if ad.ID != k {
|
||||
m[k] = s[k]
|
||||
} else {
|
||||
s[k].Stream.Close()
|
||||
}
|
||||
}
|
||||
ps[p] = m
|
||||
}
|
||||
n.StreamService.Streams = ps
|
||||
}
|
||||
|
||||
},
|
||||
tools.PROPALGATION_EVENT: func(resp tools.NATSResponse) {
|
||||
fmt.Println("PROPALGATION")
|
||||
if resp.FromApp == config.GetAppName() {
|
||||
return
|
||||
}
|
||||
p, err := oclib.GetMySelf()
|
||||
if err != nil || p == nil || p.PeerID != n.PeerID.String() {
|
||||
return
|
||||
}
|
||||
var propalgation tools.PropalgationMessage
|
||||
err := json.Unmarshal(resp.Payload, &propalgation)
|
||||
err = json.Unmarshal(resp.Payload, &propalgation)
|
||||
var dt *tools.DataType
|
||||
if propalgation.DataType > 0 {
|
||||
dtt := tools.DataType(propalgation.DataType)
|
||||
dt = &dtt
|
||||
}
|
||||
fmt.Println("PROPALGATION ACT", propalgation.Action, propalgation.Action == tools.PB_CREATE, err)
|
||||
fmt.Println("PROPALGATION ACT", propalgation.DataType, propalgation.Action, propalgation.Action == tools.PB_CREATE, err)
|
||||
if err == nil {
|
||||
switch propalgation.Action {
|
||||
case tools.PB_ADMIRALTY_CONFIG, tools.PB_MINIO_CONFIG:
|
||||
@@ -116,31 +50,40 @@ func ListenNATS(n *Node) {
|
||||
if propalgation.Action == tools.PB_MINIO_CONFIG {
|
||||
proto = stream.ProtocolMinioConfigResource
|
||||
}
|
||||
if err := json.Unmarshal(resp.Payload, &m); err == nil {
|
||||
if err := json.Unmarshal(propalgation.Payload, &m); err == nil {
|
||||
peers, _ := n.GetPeerRecord(context.Background(), m.PeerID)
|
||||
for _, p := range peers {
|
||||
n.StreamService.PublishCommon(&resp.Datatype, resp.User,
|
||||
p.PeerID, proto, resp.Payload)
|
||||
n.StreamService.PublishCommon(&resp.Datatype, resp.User, resp.Groups,
|
||||
p.PeerID, proto, propalgation.Payload)
|
||||
}
|
||||
}
|
||||
case tools.PB_CREATE, tools.PB_UPDATE, tools.PB_DELETE:
|
||||
fmt.Println(propalgation.Action, dt, resp.User, propalgation.Payload)
|
||||
fmt.Println(n.StreamService.ToPartnerPublishEvent(
|
||||
context.Background(),
|
||||
propalgation.Action,
|
||||
dt, resp.User,
|
||||
propalgation.Payload,
|
||||
))
|
||||
if slices.Contains([]tools.DataType{tools.BOOKING, tools.PURCHASE_RESOURCE}, resp.Datatype) {
|
||||
m := map[string]interface{}{}
|
||||
if err := json.Unmarshal(propalgation.Payload, &m); err == nil {
|
||||
if m["peer_id"] != nil {
|
||||
n.StreamService.PublishCommon(&resp.Datatype, resp.User, resp.Groups,
|
||||
fmt.Sprintf("%v", m["peer_id"]), stream.ProtocolCreateResource, propalgation.Payload)
|
||||
}
|
||||
}
|
||||
} else {
|
||||
fmt.Println(n.StreamService.ToPartnerPublishEvent(
|
||||
context.Background(),
|
||||
propalgation.Action,
|
||||
dt, resp.User, resp.Groups,
|
||||
propalgation.Payload,
|
||||
))
|
||||
}
|
||||
case tools.PB_CONSIDERS:
|
||||
switch resp.Datatype {
|
||||
case tools.BOOKING, tools.PURCHASE_RESOURCE, tools.WORKFLOW_EXECUTION:
|
||||
var m executionConsidersPayload
|
||||
if err := json.Unmarshal(resp.Payload, &m); err == nil {
|
||||
if err := json.Unmarshal(propalgation.Payload, &m); err == nil {
|
||||
for _, p := range m.PeerIDs {
|
||||
peers, _ := n.GetPeerRecord(context.Background(), p)
|
||||
for _, pp := range peers {
|
||||
n.StreamService.PublishCommon(&resp.Datatype, resp.User,
|
||||
pp.PeerID, stream.ProtocolConsidersResource, resp.Payload)
|
||||
n.StreamService.PublishCommon(&resp.Datatype, resp.User, resp.Groups,
|
||||
pp.PeerID, stream.ProtocolConsidersResource, propalgation.Payload)
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -152,29 +95,29 @@ func ListenNATS(n *Node) {
|
||||
if err := json.Unmarshal(propalgation.Payload, &m); err == nil && m.OriginID != "" {
|
||||
peers, _ := n.GetPeerRecord(context.Background(), m.OriginID)
|
||||
for _, p := range peers {
|
||||
n.StreamService.PublishCommon(nil, resp.User,
|
||||
n.StreamService.PublishCommon(nil, resp.User, resp.Groups,
|
||||
p.PeerID, stream.ProtocolConsidersResource, propalgation.Payload)
|
||||
}
|
||||
}
|
||||
}
|
||||
case tools.PB_PLANNER:
|
||||
m := map[string]interface{}{}
|
||||
if err := json.Unmarshal(resp.Payload, &m); err == nil {
|
||||
if err := json.Unmarshal(propalgation.Payload, &m); err == nil {
|
||||
b := []byte{}
|
||||
if len(m) > 1 {
|
||||
b = resp.Payload
|
||||
b = propalgation.Payload
|
||||
}
|
||||
if m["peer_id"] == nil { // send to every active stream
|
||||
n.StreamService.Mu.Lock()
|
||||
if n.StreamService.Streams[stream.ProtocolSendPlanner] != nil {
|
||||
for pid := range n.StreamService.Streams[stream.ProtocolSendPlanner] {
|
||||
n.StreamService.PublishCommon(nil, resp.User, pid.String(), stream.ProtocolSendPlanner, b)
|
||||
for pid := range n.StreamService.Streams[stream.ProtocolSendPlanner] { // send Planner can be long lived - it's a conn
|
||||
n.StreamService.PublishCommon(nil, resp.User, resp.Groups, pid.String(), stream.ProtocolSendPlanner, b)
|
||||
}
|
||||
}
|
||||
n.StreamService.Mu.Unlock()
|
||||
} else {
|
||||
n.StreamService.PublishCommon(nil, resp.User, fmt.Sprintf("%v", m["peer_id"]), stream.ProtocolSendPlanner, b)
|
||||
n.StreamService.PublishCommon(nil, resp.User, resp.Groups, fmt.Sprintf("%v", m["peer_id"]), stream.ProtocolSendPlanner, b)
|
||||
}
|
||||
n.StreamService.Mu.Unlock()
|
||||
}
|
||||
case tools.PB_CLOSE_PLANNER:
|
||||
m := map[string]interface{}{}
|
||||
@@ -188,22 +131,28 @@ func ListenNATS(n *Node) {
|
||||
}
|
||||
n.StreamService.Mu.Unlock()
|
||||
}
|
||||
case tools.PB_CLOSE_SEARCH:
|
||||
if propalgation.DataType == int(tools.PEER) {
|
||||
n.peerSearches.Cancel(resp.User)
|
||||
} else {
|
||||
n.StreamService.ResourceSearches.Cancel(resp.User)
|
||||
}
|
||||
case tools.PB_SEARCH:
|
||||
if propalgation.DataType == int(tools.PEER) {
|
||||
m := map[string]interface{}{}
|
||||
if err := json.Unmarshal(propalgation.Payload, &m); err == nil {
|
||||
if peers, err := n.GetPeerRecord(context.Background(), fmt.Sprintf("%v", m["search"])); err == nil {
|
||||
for _, p := range peers {
|
||||
if b, err := json.Marshal(p); err == nil {
|
||||
go tools.NewNATSCaller().SetNATSPub(tools.SEARCH_EVENT, tools.NATSResponse{
|
||||
FromApp: "oc-discovery",
|
||||
Datatype: tools.DataType(tools.PEER),
|
||||
Method: int(tools.SEARCH_EVENT),
|
||||
Payload: b,
|
||||
})
|
||||
}
|
||||
needle := fmt.Sprintf("%v", m["search"])
|
||||
userKey := resp.User
|
||||
go n.SearchPeerRecord(userKey, needle, func(hit common.SearchHit) {
|
||||
if b, err := json.Marshal(hit); err == nil {
|
||||
tools.NewNATSCaller().SetNATSPub(tools.SEARCH_EVENT, tools.NATSResponse{
|
||||
FromApp: "oc-discovery",
|
||||
Datatype: tools.DataType(tools.PEER),
|
||||
Method: int(tools.SEARCH_EVENT),
|
||||
Payload: b,
|
||||
})
|
||||
}
|
||||
}
|
||||
})
|
||||
}
|
||||
|
||||
} else {
|
||||
@@ -213,7 +162,7 @@ func ListenNATS(n *Node) {
|
||||
context.Background(),
|
||||
dt,
|
||||
fmt.Sprintf("%v", m["type"]),
|
||||
resp.User,
|
||||
resp.User, resp.Groups,
|
||||
fmt.Sprintf("%v", m["search"]),
|
||||
)
|
||||
}
|
||||
|
||||
@@ -5,7 +5,6 @@ import (
|
||||
"encoding/json"
|
||||
"errors"
|
||||
"fmt"
|
||||
"maps"
|
||||
"oc-discovery/conf"
|
||||
"oc-discovery/daemons/node/common"
|
||||
"oc-discovery/daemons/node/indexer"
|
||||
@@ -22,8 +21,10 @@ import (
|
||||
"github.com/libp2p/go-libp2p"
|
||||
pubsubs "github.com/libp2p/go-libp2p-pubsub"
|
||||
"github.com/libp2p/go-libp2p/core/crypto"
|
||||
"github.com/libp2p/go-libp2p/core/network"
|
||||
pp "github.com/libp2p/go-libp2p/core/peer"
|
||||
"github.com/libp2p/go-libp2p/core/protocol"
|
||||
"github.com/libp2p/go-libp2p/p2p/security/noise"
|
||||
)
|
||||
|
||||
type Node struct {
|
||||
@@ -36,10 +37,13 @@ type Node struct {
|
||||
isIndexer bool
|
||||
peerRecord *indexer.PeerRecord
|
||||
|
||||
// peerSearches: one active peer search per user; new search cancels previous.
|
||||
peerSearches *common.SearchTracker
|
||||
|
||||
Mu sync.RWMutex
|
||||
}
|
||||
|
||||
func InitNode(isNode bool, isIndexer bool, isNativeIndexer bool) (*Node, error) {
|
||||
func InitNode(isNode bool, isIndexer bool) (*Node, error) {
|
||||
if !isNode && !isIndexer {
|
||||
return nil, errors.New("wait... what ? your node need to at least something. Retry we can't be friend in that case")
|
||||
}
|
||||
@@ -55,13 +59,17 @@ func InitNode(isNode bool, isIndexer bool, isNativeIndexer bool) (*Node, error)
|
||||
return nil, nil
|
||||
}
|
||||
logger.Info().Msg("open a host...")
|
||||
gater := newOCConnectionGater(nil) // host set below after creation
|
||||
h, err := libp2p.New(
|
||||
libp2p.PrivateNetwork(psk),
|
||||
libp2p.Identity(priv),
|
||||
libp2p.Security(noise.ID, noise.New),
|
||||
libp2p.ListenAddrStrings(
|
||||
fmt.Sprintf("/ip4/0.0.0.0/tcp/%d", conf.GetConfig().NodeEndpointPort),
|
||||
),
|
||||
libp2p.ConnectionGater(gater),
|
||||
)
|
||||
gater.host = h // wire host back into gater now that it exists
|
||||
logger.Info().Msg("Host open on " + h.ID().String())
|
||||
if err != nil {
|
||||
return nil, errors.New("no host no node")
|
||||
@@ -70,10 +78,15 @@ func InitNode(isNode bool, isIndexer bool, isNativeIndexer bool) (*Node, error)
|
||||
PeerID: h.ID(),
|
||||
isIndexer: isIndexer,
|
||||
LongLivedStreamRecordedService: common.NewStreamRecordedService[interface{}](h, 1000),
|
||||
peerSearches: common.NewSearchTracker(),
|
||||
}
|
||||
// Register the bandwidth probe handler so any peer measuring this node's
|
||||
// throughput can open a dedicated probe stream and read the echo.
|
||||
h.SetStreamHandler(common.ProtocolBandwidthProbe, common.HandleBandwidthProbe)
|
||||
// Register the witness query handler so peers can ask this node's view of indexers.
|
||||
h.SetStreamHandler(common.ProtocolWitnessQuery, func(s network.Stream) {
|
||||
common.HandleWitnessQuery(h, s)
|
||||
})
|
||||
var ps *pubsubs.PubSub
|
||||
if isNode {
|
||||
logger.Info().Msg("generate opencloud node...")
|
||||
@@ -105,35 +118,71 @@ func InitNode(isNode bool, isIndexer bool, isNativeIndexer bool) (*Node, error)
|
||||
return json.RawMessage(b)
|
||||
}
|
||||
logger.Info().Msg("connect to indexers...")
|
||||
common.ConnectToIndexers(node.Host, conf.GetConfig().MinIndexer, conf.GetConfig().MaxIndexer, node.PeerID, buildRecord)
|
||||
common.ConnectToIndexers(node.Host, conf.GetConfig().MinIndexer, conf.GetConfig().MaxIndexer, buildRecord)
|
||||
logger.Info().Msg("claims my node...")
|
||||
if _, err := node.claimInfo(conf.GetConfig().Name, conf.GetConfig().Hostname); err != nil {
|
||||
panic(err)
|
||||
}
|
||||
logger.Info().Msg("subscribe to decentralized search flow...")
|
||||
logger.Info().Msg("run garbage collector...")
|
||||
node.StartGC(30 * time.Second)
|
||||
|
||||
if node.StreamService, err = stream.InitStream(context.Background(), node.Host, node.PeerID, 1000, node); err != nil {
|
||||
panic(err)
|
||||
}
|
||||
node.StreamService.IsPeerKnown = func(pid pp.ID) bool {
|
||||
// 1. Local DB: known peer (handles blacklist).
|
||||
access := oclib.NewRequestAdmin(oclib.LibDataEnum(oclib.PEER), nil)
|
||||
results := access.Search(&dbs.Filters{
|
||||
And: map[string][]dbs.Filter{
|
||||
"peer_id": {{Operator: dbs.EQUAL.String(), Value: pid.String()}},
|
||||
},
|
||||
}, pid.String(), false)
|
||||
for _, item := range results.Data {
|
||||
p, ok := item.(*peer.Peer)
|
||||
if !ok || p.PeerID != pid.String() {
|
||||
continue
|
||||
}
|
||||
return p.Relation != peer.BLACKLIST
|
||||
}
|
||||
// 2. Ask a connected indexer → DHT lookup by peer_id.
|
||||
for _, addr := range common.Indexers.GetAddrs() {
|
||||
ctx, cancel := context.WithTimeout(context.Background(), 4*time.Second)
|
||||
s, err := h.NewStream(ctx, addr.Info.ID, common.ProtocolGet)
|
||||
cancel()
|
||||
if err != nil {
|
||||
continue
|
||||
}
|
||||
json.NewEncoder(s).Encode(indexer.GetValue{PeerID: pid.String()})
|
||||
var resp indexer.GetResponse
|
||||
json.NewDecoder(s).Decode(&resp)
|
||||
s.Reset()
|
||||
return resp.Found
|
||||
}
|
||||
return false
|
||||
}
|
||||
|
||||
if node.PubSubService, err = pubsub.InitPubSub(context.Background(), node.Host, node.PS, node, node.StreamService); err != nil {
|
||||
panic(err)
|
||||
}
|
||||
f := func(ctx context.Context, evt common.Event, topic string) {
|
||||
if p, err := node.GetPeerRecord(ctx, evt.From); err == nil && len(p) > 0 {
|
||||
node.StreamService.SendResponse(p[0], &evt)
|
||||
m := map[string]interface{}{}
|
||||
err := json.Unmarshal(evt.Payload, &m)
|
||||
if err != nil || evt.From == node.PeerID.String() {
|
||||
return
|
||||
}
|
||||
if p, err := node.GetPeerRecord(ctx, evt.From); err == nil && len(p) > 0 && m["search"] != nil {
|
||||
node.StreamService.SendResponse(p[0], &evt, fmt.Sprintf("%v", m["search"]))
|
||||
}
|
||||
}
|
||||
node.SubscribeToSearch(node.PS, &f)
|
||||
logger.Info().Msg("subscribe to decentralized search flow...")
|
||||
go node.SubscribeToSearch(node.PS, &f)
|
||||
logger.Info().Msg("connect to NATS")
|
||||
go ListenNATS(node)
|
||||
logger.Info().Msg("Node is actually running.")
|
||||
}
|
||||
if isIndexer {
|
||||
logger.Info().Msg("generate opencloud indexer...")
|
||||
node.IndexerService = indexer.NewIndexerService(node.Host, ps, 500, isNativeIndexer)
|
||||
node.IndexerService = indexer.NewIndexerService(node.Host, ps, 500)
|
||||
}
|
||||
return node, nil
|
||||
}
|
||||
@@ -154,20 +203,14 @@ func (d *Node) publishPeerRecord(
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
common.StreamMuIndexes.RLock()
|
||||
indexerSnapshot := make([]*pp.AddrInfo, 0, len(common.StaticIndexers))
|
||||
for _, ad := range common.StaticIndexers {
|
||||
indexerSnapshot = append(indexerSnapshot, ad)
|
||||
}
|
||||
common.StreamMuIndexes.RUnlock()
|
||||
|
||||
for _, ad := range indexerSnapshot {
|
||||
for _, ad := range common.Indexers.GetAddrs() {
|
||||
var err error
|
||||
if common.StreamIndexers, err = common.TempStream(d.Host, *ad, common.ProtocolPublish, "", common.StreamIndexers, map[protocol.ID]*common.ProtocolInfo{},
|
||||
&common.StreamMuIndexes); err != nil {
|
||||
if common.Indexers.Streams, err = common.TempStream(d.Host, *ad.Info, common.ProtocolPublish, "", common.Indexers.Streams, map[protocol.ID]*common.ProtocolInfo{},
|
||||
&common.Indexers.MuStream); err != nil {
|
||||
continue
|
||||
}
|
||||
stream := common.StreamIndexers[common.ProtocolPublish][ad.ID]
|
||||
stream := common.Indexers.Streams.GetPerID(common.ProtocolPublish, ad.Info.ID)
|
||||
base := indexer.PeerRecordPayload{
|
||||
Name: rec.Name,
|
||||
DID: rec.DID,
|
||||
@@ -184,38 +227,99 @@ func (d *Node) publishPeerRecord(
|
||||
return nil
|
||||
}
|
||||
|
||||
// SearchPeerRecord starts a distributed peer search via ProtocolSearchPeer.
|
||||
// A new call for the same userKey cancels any previous search.
|
||||
// Results are pushed to onResult as they arrive; the function returns when
|
||||
// the stream closes (idle timeout, explicit cancel, or indexer unreachable).
|
||||
func (d *Node) SearchPeerRecord(userKey, needle string, onResult func(common.SearchHit)) {
|
||||
logger := oclib.GetLogger()
|
||||
|
||||
idleTimeout := common.SearchIdleTimeout()
|
||||
ctx, cancel := context.WithCancel(context.Background())
|
||||
// Register cancels any previous search for userKey and starts the idle timer.
|
||||
// The composite key doubles as QueryID so the indexer echoes it back.
|
||||
searchKey := d.peerSearches.Register(userKey, cancel, idleTimeout)
|
||||
defer d.peerSearches.Cancel(searchKey)
|
||||
|
||||
req := common.SearchPeerRequest{QueryID: searchKey}
|
||||
if pid, err := pp.Decode(needle); err == nil {
|
||||
req.PeerID = pid.String()
|
||||
} else if _, err := uuid.Parse(needle); err == nil {
|
||||
req.DID = needle
|
||||
} else {
|
||||
req.Name = needle
|
||||
}
|
||||
|
||||
for _, ad := range common.Indexers.GetAddrs() {
|
||||
if ad.Info == nil {
|
||||
continue
|
||||
}
|
||||
dialCtx, dialCancel := context.WithTimeout(ctx, 5*time.Second)
|
||||
s, err := d.Host.NewStream(dialCtx, ad.Info.ID, common.ProtocolSearchPeer)
|
||||
dialCancel()
|
||||
if err != nil {
|
||||
continue
|
||||
}
|
||||
if err := json.NewEncoder(s).Encode(req); err != nil {
|
||||
s.Reset()
|
||||
continue
|
||||
}
|
||||
// Interrupt the blocking Decode as soon as the context is cancelled
|
||||
// (idle timer, explicit PB_CLOSE_SEARCH, or replacement search).
|
||||
go func() {
|
||||
<-ctx.Done()
|
||||
s.SetReadDeadline(time.Now())
|
||||
}()
|
||||
seen := map[string]struct{}{}
|
||||
dec := json.NewDecoder(s)
|
||||
for {
|
||||
var result common.SearchPeerResult
|
||||
if err := dec.Decode(&result); err != nil {
|
||||
break
|
||||
}
|
||||
if result.QueryID != searchKey || !d.peerSearches.IsActive(searchKey) {
|
||||
break
|
||||
}
|
||||
d.peerSearches.ResetIdle(searchKey)
|
||||
for _, hit := range result.Records {
|
||||
key := hit.PeerID
|
||||
if key == "" {
|
||||
key = hit.DID
|
||||
}
|
||||
if _, already := seen[key]; already {
|
||||
continue
|
||||
}
|
||||
seen[key] = struct{}{}
|
||||
onResult(hit)
|
||||
}
|
||||
}
|
||||
s.Reset()
|
||||
return
|
||||
}
|
||||
logger.Warn().Str("user", userKey).Msg("[search] no reachable indexer for peer search")
|
||||
}
|
||||
|
||||
func (d *Node) GetPeerRecord(
|
||||
ctx context.Context,
|
||||
pidOrdid string,
|
||||
) ([]*peer.Peer, error) {
|
||||
var err error
|
||||
var info map[string]indexer.PeerRecord
|
||||
common.StreamMuIndexes.RLock()
|
||||
indexerSnapshot2 := make([]*pp.AddrInfo, 0, len(common.StaticIndexers))
|
||||
for _, ad := range common.StaticIndexers {
|
||||
indexerSnapshot2 = append(indexerSnapshot2, ad)
|
||||
}
|
||||
common.StreamMuIndexes.RUnlock()
|
||||
|
||||
// Build the GetValue request: if pidOrdid is neither a UUID DID nor a libp2p
|
||||
// PeerID, treat it as a human-readable name and let the indexer resolve it.
|
||||
// GetPeerRecord resolves by PeerID or DID only.
|
||||
// Name-based search goes through SearchPeerRecord (ProtocolSearchPeer).
|
||||
getReq := indexer.GetValue{Key: pidOrdid}
|
||||
isNameSearch := false
|
||||
if pidR, pidErr := pp.Decode(pidOrdid); pidErr == nil {
|
||||
getReq.PeerID = pidR
|
||||
} else if _, uuidErr := uuid.Parse(pidOrdid); uuidErr != nil {
|
||||
// Not a UUID DID → treat pidOrdid as a name substring search.
|
||||
getReq.Name = pidOrdid
|
||||
getReq.PeerID = pidR.String()
|
||||
getReq.Key = ""
|
||||
isNameSearch = true
|
||||
}
|
||||
|
||||
for _, ad := range indexerSnapshot2 {
|
||||
if common.StreamIndexers, err = common.TempStream(d.Host, *ad, common.ProtocolGet, "",
|
||||
common.StreamIndexers, map[protocol.ID]*common.ProtocolInfo{}, &common.StreamMuIndexes); err != nil {
|
||||
for _, ad := range common.Indexers.GetAddrs() {
|
||||
if common.Indexers.Streams, err = common.TempStream(d.Host, *ad.Info, common.ProtocolGet, "",
|
||||
common.Indexers.Streams, map[protocol.ID]*common.ProtocolInfo{}, &common.Indexers.MuStream); err != nil {
|
||||
continue
|
||||
}
|
||||
stream := common.StreamIndexers[common.ProtocolGet][ad.ID]
|
||||
stream := common.Indexers.Streams.GetPerID(common.ProtocolGet, ad.Info.ID)
|
||||
if err := json.NewEncoder(stream.Stream).Encode(getReq); err != nil {
|
||||
continue
|
||||
}
|
||||
@@ -224,28 +328,17 @@ func (d *Node) GetPeerRecord(
|
||||
continue
|
||||
}
|
||||
if resp.Found {
|
||||
if info == nil {
|
||||
info = resp.Records
|
||||
} else {
|
||||
// Aggregate results from all indexers for name searches.
|
||||
maps.Copy(info, resp.Records)
|
||||
}
|
||||
// For exact lookups (PeerID / DID) stop at the first hit.
|
||||
if !isNameSearch {
|
||||
break
|
||||
}
|
||||
info = resp.Records
|
||||
}
|
||||
break
|
||||
}
|
||||
var ps []*peer.Peer
|
||||
for _, pr := range info {
|
||||
if pk, err := pr.Verify(); err != nil {
|
||||
return nil, err
|
||||
} else if ok, p, err := pr.ExtractPeer(d.PeerID.String(), pr.PeerID, pk); err != nil {
|
||||
} else if _, p, err := pr.ExtractPeer(d.PeerID.String(), pr.PeerID, pk); err != nil {
|
||||
return nil, err
|
||||
} else {
|
||||
if ok {
|
||||
d.publishPeerRecord(&pr)
|
||||
}
|
||||
ps = append(ps, p)
|
||||
}
|
||||
}
|
||||
@@ -316,6 +409,17 @@ func (d *Node) claimInfo(
|
||||
return nil, err
|
||||
} else {
|
||||
_, p, err := rec.ExtractPeer(did, did, pub)
|
||||
b, err := json.Marshal(p)
|
||||
if err != nil {
|
||||
return p, err
|
||||
}
|
||||
go tools.NewNATSCaller().SetNATSPub(tools.CREATE_RESOURCE, tools.NATSResponse{
|
||||
FromApp: "oc-discovery",
|
||||
Datatype: tools.PEER,
|
||||
Method: int(tools.CREATE_RESOURCE),
|
||||
SearchAttr: "peer_id",
|
||||
Payload: b,
|
||||
})
|
||||
return p, err
|
||||
}
|
||||
}
|
||||
|
||||
@@ -1,40 +0,0 @@
|
||||
package pubsub
|
||||
|
||||
import (
|
||||
"context"
|
||||
"oc-discovery/daemons/node/common"
|
||||
|
||||
"cloud.o-forge.io/core/oc-lib/tools"
|
||||
)
|
||||
|
||||
func (ps *PubSubService) handleEvent(ctx context.Context, topicName string, evt *common.Event) error {
|
||||
action := ps.getTopicName(topicName)
|
||||
if err := ps.handleEventSearch(ctx, evt, action); err != nil {
|
||||
return err
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
func (ps *PubSubService) handleEventSearch( // only : on partner followings. 3 canals for every partner.
|
||||
ctx context.Context,
|
||||
evt *common.Event,
|
||||
action tools.PubSubAction,
|
||||
) error {
|
||||
if !(action == tools.PB_SEARCH) {
|
||||
return nil
|
||||
}
|
||||
if p, err := ps.Node.GetPeerRecord(ctx, evt.From); err == nil && len(p) > 0 { // peerFrom is Unique
|
||||
if err := evt.Verify(p[0]); err != nil {
|
||||
return err
|
||||
}
|
||||
switch action {
|
||||
case tools.PB_SEARCH: // when someone ask for search.
|
||||
if err := ps.StreamService.SendResponse(p[0], evt); err != nil {
|
||||
return err
|
||||
}
|
||||
default:
|
||||
return nil
|
||||
}
|
||||
}
|
||||
return nil
|
||||
}
|
||||
@@ -4,8 +4,11 @@ import (
|
||||
"context"
|
||||
"encoding/json"
|
||||
"errors"
|
||||
"oc-discovery/conf"
|
||||
"oc-discovery/daemons/node/common"
|
||||
"oc-discovery/daemons/node/stream"
|
||||
"oc-discovery/models"
|
||||
"time"
|
||||
|
||||
"cloud.o-forge.io/core/oc-lib/dbs"
|
||||
"cloud.o-forge.io/core/oc-lib/models/peer"
|
||||
@@ -13,24 +16,16 @@ import (
|
||||
)
|
||||
|
||||
func (ps *PubSubService) SearchPublishEvent(
|
||||
ctx context.Context, dt *tools.DataType, typ string, user string, search string) error {
|
||||
ctx context.Context, dt *tools.DataType, typ string, user string, groups []string, search string) error {
|
||||
b, err := json.Marshal(map[string]string{"search": search})
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
switch typ {
|
||||
case "known": // define Search Strategy
|
||||
return ps.StreamService.PublishesCommon(dt, user, &dbs.Filters{ // filter by like name, short_description, description, owner, url if no filters are provided
|
||||
And: map[string][]dbs.Filter{
|
||||
"": {{Operator: dbs.NOT.String(), Value: dbs.Filters{ // filter by like name, short_description, description, owner, url if no filters are provided
|
||||
And: map[string][]dbs.Filter{
|
||||
"relation": {{Operator: dbs.EQUAL.String(), Value: peer.BLACKLIST}},
|
||||
},
|
||||
}}},
|
||||
},
|
||||
}, b, stream.ProtocolSearchResource) //if partners focus only them*/
|
||||
return ps.StreamService.PublishesCommon(dt, user, groups, nil, b, stream.ProtocolSearchResource) //if partners focus only them*/
|
||||
case "partner": // define Search Strategy
|
||||
return ps.StreamService.PublishesCommon(dt, user, &dbs.Filters{ // filter by like name, short_description, description, owner, url if no filters are provided
|
||||
return ps.StreamService.PublishesCommon(dt, user, groups, &dbs.Filters{ // filter by like name, short_description, description, owner, url if no filters are provided
|
||||
And: map[string][]dbs.Filter{
|
||||
"relation": {{Operator: dbs.EQUAL.String(), Value: peer.PARTNER}},
|
||||
},
|
||||
@@ -40,23 +35,37 @@ func (ps *PubSubService) SearchPublishEvent(
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
return ps.publishEvent(ctx, dt, tools.PB_SEARCH, user, b)
|
||||
idleTimeout := func() time.Duration {
|
||||
if t := conf.GetConfig().SearchTimeout; t > 0 {
|
||||
return time.Duration(t) * time.Second
|
||||
}
|
||||
return 5 * time.Second
|
||||
}()
|
||||
searchCtx, cancel := context.WithCancel(ctx)
|
||||
// Register cancels any previous search for this user and starts the idle timer.
|
||||
// The returned composite key is used as User in the GossipSub event so that
|
||||
// remote peers echo it back unchanged, allowing IsActive to validate results.
|
||||
searchKey := ps.StreamService.ResourceSearches.Register(user, cancel, idleTimeout)
|
||||
return ps.publishEvent(searchCtx, dt, tools.PB_SEARCH, common.TopicPubSubSearch, searchKey, b)
|
||||
default:
|
||||
return errors.New("no type of research found")
|
||||
}
|
||||
}
|
||||
|
||||
func (ps *PubSubService) publishEvent(
|
||||
ctx context.Context, dt *tools.DataType, action tools.PubSubAction, user string, payload []byte,
|
||||
ctx context.Context, dt *tools.DataType, action tools.PubSubAction, topicName string, user string, payload []byte,
|
||||
) error {
|
||||
priv, err := tools.LoadKeyFromFilePrivate()
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
msg, _ := json.Marshal(models.NewEvent(action.String(), ps.Host.ID().String(), dt, user, payload, priv))
|
||||
topic, err := ps.PS.Join(action.String())
|
||||
if err != nil {
|
||||
return err
|
||||
topic := ps.Node.GetPubSub(topicName)
|
||||
if topic == nil {
|
||||
topic, err = ps.PS.Join(topicName)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
}
|
||||
return topic.Publish(ctx, msg)
|
||||
}
|
||||
|
||||
@@ -4,17 +4,13 @@ import (
|
||||
"context"
|
||||
"oc-discovery/daemons/node/common"
|
||||
"oc-discovery/daemons/node/stream"
|
||||
"strings"
|
||||
"sync"
|
||||
|
||||
oclib "cloud.o-forge.io/core/oc-lib"
|
||||
"cloud.o-forge.io/core/oc-lib/tools"
|
||||
pubsub "github.com/libp2p/go-libp2p-pubsub"
|
||||
"github.com/libp2p/go-libp2p/core/host"
|
||||
)
|
||||
|
||||
type PubSubService struct {
|
||||
*common.LongLivedPubSubService
|
||||
Node common.DiscoveryPeer
|
||||
Host host.Host
|
||||
PS *pubsub.PubSub
|
||||
@@ -24,24 +20,12 @@ type PubSubService struct {
|
||||
}
|
||||
|
||||
func InitPubSub(ctx context.Context, h host.Host, ps *pubsub.PubSub, node common.DiscoveryPeer, streamService *stream.StreamService) (*PubSubService, error) {
|
||||
service := &PubSubService{
|
||||
LongLivedPubSubService: common.NewLongLivedPubSubService(h),
|
||||
Node: node,
|
||||
StreamService: streamService,
|
||||
PS: ps,
|
||||
}
|
||||
logger := oclib.GetLogger()
|
||||
logger.Info().Msg("subscribe to events...")
|
||||
service.initSubscribeEvents(ctx)
|
||||
return service, nil
|
||||
}
|
||||
|
||||
func (ps *PubSubService) getTopicName(topicName string) tools.PubSubAction {
|
||||
ns := strings.Split(topicName, ".")
|
||||
if len(ns) > 0 {
|
||||
return tools.GetActionString(ns[0])
|
||||
}
|
||||
return tools.NONE
|
||||
return &PubSubService{
|
||||
Host: h,
|
||||
Node: node,
|
||||
StreamService: streamService,
|
||||
PS: ps,
|
||||
}, nil
|
||||
}
|
||||
|
||||
func (ix *PubSubService) Close() {
|
||||
|
||||
@@ -1,45 +0,0 @@
|
||||
package pubsub
|
||||
|
||||
import (
|
||||
"context"
|
||||
"oc-discovery/daemons/node/common"
|
||||
|
||||
oclib "cloud.o-forge.io/core/oc-lib"
|
||||
"cloud.o-forge.io/core/oc-lib/models/peer"
|
||||
"cloud.o-forge.io/core/oc-lib/tools"
|
||||
)
|
||||
|
||||
func (ps *PubSubService) initSubscribeEvents(ctx context.Context) error {
|
||||
if err := ps.subscribeEvents(ctx, nil, tools.PB_SEARCH, ""); err != nil {
|
||||
return err
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
// generic function to subscribe to DHT flow of event
|
||||
func (ps *PubSubService) subscribeEvents(
|
||||
ctx context.Context, dt *tools.DataType, action tools.PubSubAction, peerID string,
|
||||
) error {
|
||||
logger := oclib.GetLogger()
|
||||
// define a name app.action#peerID
|
||||
name := action.String() + "#" + peerID
|
||||
if dt != nil { // if a datatype is precised then : app.action.datatype#peerID
|
||||
name = action.String() + "." + (*dt).String() + "#" + peerID
|
||||
}
|
||||
f := func(ctx context.Context, evt common.Event, topicName string) {
|
||||
if p, err := ps.Node.GetPeerRecord(ctx, evt.From); err == nil && len(p) > 0 {
|
||||
if err := ps.processEvent(ctx, p[0], &evt, topicName); err != nil {
|
||||
logger.Err(err)
|
||||
}
|
||||
}
|
||||
}
|
||||
return common.SubscribeEvents(ps.LongLivedPubSubService, ctx, name, -1, f)
|
||||
}
|
||||
|
||||
func (ps *PubSubService) processEvent(
|
||||
ctx context.Context, p *peer.Peer, event *common.Event, topicName string) error {
|
||||
if err := event.Verify(p); err != nil {
|
||||
return err
|
||||
}
|
||||
return ps.handleEvent(ctx, topicName, event)
|
||||
}
|
||||
@@ -9,6 +9,7 @@ import (
|
||||
"oc-discovery/daemons/node/common"
|
||||
|
||||
oclib "cloud.o-forge.io/core/oc-lib"
|
||||
"cloud.o-forge.io/core/oc-lib/dbs"
|
||||
"cloud.o-forge.io/core/oc-lib/models/booking/planner"
|
||||
"cloud.o-forge.io/core/oc-lib/models/peer"
|
||||
"cloud.o-forge.io/core/oc-lib/models/resources"
|
||||
@@ -44,17 +45,17 @@ func (ps *StreamService) handleEvent(protocol string, evt *common.Event) error {
|
||||
}
|
||||
}
|
||||
if protocol == ProtocolConsidersResource {
|
||||
if err := ps.pass(evt, tools.PB_CONSIDERS); err != nil {
|
||||
if err := ps.pass(evt, tools.CONSIDERS_EVENT); err != nil {
|
||||
return err
|
||||
}
|
||||
}
|
||||
if protocol == ProtocolAdmiraltyConfigResource {
|
||||
if err := ps.pass(evt, tools.PB_ADMIRALTY_CONFIG); err != nil {
|
||||
if err := ps.pass(evt, tools.ADMIRALTY_CONFIG_EVENT); err != nil {
|
||||
return err
|
||||
}
|
||||
}
|
||||
if protocol == ProtocolMinioConfigResource {
|
||||
if err := ps.pass(evt, tools.PB_MINIO_CONFIG); err != nil {
|
||||
if err := ps.pass(evt, tools.MINIO_CONFIG_EVENT); err != nil {
|
||||
return err
|
||||
}
|
||||
}
|
||||
@@ -79,50 +80,50 @@ func (abs *StreamService) verifyResponse(event *common.Event) error { //
|
||||
}
|
||||
}
|
||||
if b, err := json.Marshal(verify); err == nil {
|
||||
abs.PublishCommon(nil, "", event.From, ProtocolVerifyResource, b)
|
||||
abs.PublishCommon(nil, event.User, event.Groups, event.From, ProtocolVerifyResource, b)
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
func (abs *StreamService) sendPlanner(event *common.Event) error { //
|
||||
fmt.Println("sendPlanner", len(event.Payload))
|
||||
if len(event.Payload) == 0 {
|
||||
if plan, err := planner.GenerateShallow(&tools.APIRequest{Admin: true}); err == nil {
|
||||
if b, err := json.Marshal(plan); err == nil {
|
||||
abs.PublishCommon(nil, event.User, event.From, ProtocolSendPlanner, b)
|
||||
abs.PublishCommon(nil, event.User, event.Groups, event.From, ProtocolSendPlanner, b)
|
||||
} else {
|
||||
return err
|
||||
}
|
||||
} else {
|
||||
m := map[string]interface{}{}
|
||||
if err := json.Unmarshal(event.Payload, &m); err == nil {
|
||||
m["peer_id"] = event.From
|
||||
if pl, err := json.Marshal(m); err == nil {
|
||||
if b, err := json.Marshal(tools.PropalgationMessage{
|
||||
DataType: -1,
|
||||
Action: tools.PB_PLANNER,
|
||||
Payload: pl,
|
||||
}); err == nil {
|
||||
go tools.NewNATSCaller().SetNATSPub(tools.PROPALGATION_EVENT, tools.NATSResponse{
|
||||
FromApp: "oc-discovery",
|
||||
Datatype: tools.DataType(oclib.BOOKING),
|
||||
Method: int(tools.PROPALGATION_EVENT),
|
||||
Payload: b,
|
||||
})
|
||||
}
|
||||
}
|
||||
return err
|
||||
}
|
||||
} else { // if not empty so it's
|
||||
m := map[string]interface{}{}
|
||||
if err := json.Unmarshal(event.Payload, &m); err == nil {
|
||||
m["peer_id"] = event.From
|
||||
if pl, err := json.Marshal(m); err == nil {
|
||||
go tools.NewNATSCaller().SetNATSPub(tools.PLANNER_EXECUTION, tools.NATSResponse{
|
||||
FromApp: "oc-discovery",
|
||||
Datatype: tools.DataType(oclib.BOOKING),
|
||||
Method: int(tools.PLANNER_EXECUTION),
|
||||
Payload: pl,
|
||||
})
|
||||
}
|
||||
}
|
||||
} else {
|
||||
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
func (abs *StreamService) retrieveResponse(event *common.Event) error { //
|
||||
if !abs.ResourceSearches.IsActive(event.User) {
|
||||
return nil // search already closed or timed out
|
||||
}
|
||||
res, err := resources.ToResource(int(event.DataType), event.Payload)
|
||||
if err != nil || res == nil {
|
||||
return nil
|
||||
}
|
||||
// A response arrived — reset the idle timeout.
|
||||
abs.ResourceSearches.ResetIdle(event.User)
|
||||
b, err := json.Marshal(res.Serialize(res))
|
||||
go tools.NewNATSCaller().SetNATSPub(tools.SEARCH_EVENT, tools.NATSResponse{
|
||||
FromApp: "oc-discovery",
|
||||
@@ -133,38 +134,48 @@ func (abs *StreamService) retrieveResponse(event *common.Event) error { //
|
||||
return nil
|
||||
}
|
||||
|
||||
func (abs *StreamService) pass(event *common.Event, action tools.PubSubAction) error { //
|
||||
if b, err := json.Marshal(&tools.PropalgationMessage{
|
||||
Action: action,
|
||||
DataType: int(event.DataType),
|
||||
func (abs *StreamService) pass(event *common.Event, method tools.NATSMethod) error { //
|
||||
go tools.NewNATSCaller().SetNATSPub(method, tools.NATSResponse{
|
||||
FromApp: "oc-discovery",
|
||||
Datatype: tools.DataType(event.DataType),
|
||||
Method: int(method),
|
||||
Payload: event.Payload,
|
||||
}); err == nil {
|
||||
go tools.NewNATSCaller().SetNATSPub(tools.PROPALGATION_EVENT, tools.NATSResponse{
|
||||
FromApp: "oc-discovery",
|
||||
Datatype: tools.DataType(event.DataType),
|
||||
Method: int(tools.PROPALGATION_EVENT),
|
||||
Payload: b,
|
||||
})
|
||||
}
|
||||
})
|
||||
return nil
|
||||
}
|
||||
|
||||
func (ps *StreamService) handleEventFromPartner(evt *common.Event, protocol string) error {
|
||||
switch protocol {
|
||||
case ProtocolSearchResource:
|
||||
if evt.DataType < 0 {
|
||||
m := map[string]interface{}{}
|
||||
err := json.Unmarshal(evt.Payload, &m)
|
||||
if err != nil {
|
||||
return err
|
||||
}
|
||||
if search, ok := m["search"]; ok {
|
||||
access := oclib.NewRequestAdmin(oclib.LibDataEnum(oclib.PEER), nil)
|
||||
peers := access.Search(nil, evt.From, false)
|
||||
peers := access.Search(&dbs.Filters{
|
||||
And: map[string][]dbs.Filter{
|
||||
"peer_id": {{Operator: dbs.EQUAL.String(), Value: evt.From}},
|
||||
},
|
||||
}, evt.From, false)
|
||||
if len(peers.Data) > 0 {
|
||||
p := peers.Data[0].(*peer.Peer)
|
||||
// TODO : something if peer is missing in our side !
|
||||
ps.SendResponse(p, evt)
|
||||
ps.SendResponse(p, evt, fmt.Sprintf("%v", search))
|
||||
} else if p, err := ps.Node.GetPeerRecord(context.Background(), evt.From); err == nil && len(p) > 0 { // peer from is peerID
|
||||
ps.SendResponse(p[0], evt)
|
||||
ps.SendResponse(p[0], evt, fmt.Sprintf("%v", search))
|
||||
}
|
||||
} else {
|
||||
fmt.Println("SEND SEARCH_EVENT SetNATSPub", m)
|
||||
go tools.NewNATSCaller().SetNATSPub(tools.SEARCH_EVENT, tools.NATSResponse{
|
||||
FromApp: "oc-discovery",
|
||||
Datatype: tools.DataType(evt.DataType),
|
||||
Method: int(tools.SEARCH_EVENT),
|
||||
Payload: evt.Payload,
|
||||
})
|
||||
}
|
||||
case ProtocolCreateResource, ProtocolUpdateResource:
|
||||
fmt.Println("RECEIVED Protocol.Update")
|
||||
fmt.Println("RECEIVED Protocol.Update", string(evt.Payload))
|
||||
go tools.NewNATSCaller().SetNATSPub(tools.CREATE_RESOURCE, tools.NATSResponse{
|
||||
FromApp: "oc-discovery",
|
||||
Datatype: tools.DataType(evt.DataType),
|
||||
@@ -184,32 +195,26 @@ func (ps *StreamService) handleEventFromPartner(evt *common.Event, protocol stri
|
||||
return nil
|
||||
}
|
||||
|
||||
func (abs *StreamService) SendResponse(p *peer.Peer, event *common.Event) error {
|
||||
dts := []oclib.LibDataEnum{oclib.LibDataEnum(event.DataType)}
|
||||
func (abs *StreamService) SendResponse(p *peer.Peer, event *common.Event, search string) error {
|
||||
dts := []tools.DataType{tools.DataType(event.DataType)}
|
||||
if event.DataType == -1 { // expect all resources
|
||||
dts = []oclib.LibDataEnum{
|
||||
oclib.LibDataEnum(oclib.COMPUTE_RESOURCE),
|
||||
oclib.LibDataEnum(oclib.STORAGE_RESOURCE),
|
||||
oclib.LibDataEnum(oclib.PROCESSING_RESOURCE),
|
||||
oclib.LibDataEnum(oclib.DATA_RESOURCE),
|
||||
oclib.LibDataEnum(oclib.WORKFLOW_RESOURCE)}
|
||||
dts = []tools.DataType{
|
||||
tools.COMPUTE_RESOURCE,
|
||||
tools.STORAGE_RESOURCE,
|
||||
tools.PROCESSING_RESOURCE,
|
||||
tools.DATA_RESOURCE,
|
||||
tools.WORKFLOW_RESOURCE,
|
||||
}
|
||||
}
|
||||
var m map[string]string
|
||||
err := json.Unmarshal(event.Payload, &m)
|
||||
if err != nil {
|
||||
if self, err := oclib.GetMySelf(); err != nil {
|
||||
return err
|
||||
}
|
||||
for _, dt := range dts {
|
||||
access := oclib.NewRequestAdmin(oclib.LibDataEnum(event.DataType), nil)
|
||||
peerID := p.GetID()
|
||||
searched := access.Search(abs.FilterPeer(peerID, m["search"]), "", false)
|
||||
for _, ss := range searched.Data {
|
||||
if j, err := json.Marshal(ss); err == nil {
|
||||
if event.DataType != -1 {
|
||||
ndt := tools.DataType(dt.EnumIndex())
|
||||
abs.PublishCommon(&ndt, event.User, peerID, ProtocolSearchResource, j)
|
||||
} else {
|
||||
abs.PublishCommon(nil, event.User, peerID, ProtocolSearchResource, j)
|
||||
} else {
|
||||
for _, dt := range dts {
|
||||
access := oclib.NewRequestAdmin(oclib.LibDataEnum(dt), nil)
|
||||
searched := access.Search(abs.FilterPeer(self.GetID(), event.Groups, search), "", false)
|
||||
for _, ss := range searched.Data {
|
||||
if j, err := json.Marshal(ss); err == nil {
|
||||
abs.PublishCommon(&dt, event.User, event.Groups, p.PeerID, ProtocolSearchResource, j)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@@ -15,22 +15,28 @@ import (
|
||||
"github.com/libp2p/go-libp2p/core/protocol"
|
||||
)
|
||||
|
||||
func (ps *StreamService) PublishesCommon(dt *tools.DataType, user string, filter *dbs.Filters, resource []byte, protos ...protocol.ID) error {
|
||||
func (ps *StreamService) PublishesCommon(dt *tools.DataType, user string, groups []string, filter *dbs.Filters, resource []byte, protos ...protocol.ID) error {
|
||||
access := oclib.NewRequestAdmin(oclib.LibDataEnum(oclib.PEER), nil)
|
||||
p := access.Search(filter, "", false)
|
||||
var p oclib.LibDataShallow
|
||||
if filter == nil {
|
||||
p = access.LoadAll(false)
|
||||
} else {
|
||||
p = access.Search(filter, "", false)
|
||||
}
|
||||
for _, pes := range p.Data {
|
||||
for _, proto := range protos {
|
||||
if _, err := ps.PublishCommon(dt, user, pes.(*peer.Peer).PeerID, proto, resource); err != nil {
|
||||
return err
|
||||
if _, err := ps.PublishCommon(dt, user, groups, pes.(*peer.Peer).PeerID, proto, resource); err != nil {
|
||||
continue
|
||||
}
|
||||
}
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
func (ps *StreamService) PublishCommon(dt *tools.DataType, user string, toPeerID string, proto protocol.ID, resource []byte) (*common.Stream, error) {
|
||||
func (ps *StreamService) PublishCommon(dt *tools.DataType, user string, groups []string, toPeerID string, proto protocol.ID, resource []byte) (*common.Stream, error) {
|
||||
fmt.Println("PublishCommon")
|
||||
if toPeerID == ps.Key.String() {
|
||||
fmt.Println("Can't send to ourself !")
|
||||
return nil, errors.New("Can't send to ourself !")
|
||||
}
|
||||
|
||||
@@ -47,7 +53,7 @@ func (ps *StreamService) PublishCommon(dt *tools.DataType, user string, toPeerID
|
||||
pe = pps[0]
|
||||
}
|
||||
if pe != nil {
|
||||
ad, err := pp.AddrInfoFromString(p.Data[0].(*peer.Peer).StreamAddress)
|
||||
ad, err := pp.AddrInfoFromString(pe.StreamAddress)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
@@ -57,14 +63,13 @@ func (ps *StreamService) PublishCommon(dt *tools.DataType, user string, toPeerID
|
||||
}
|
||||
|
||||
func (ps *StreamService) ToPartnerPublishEvent(
|
||||
ctx context.Context, action tools.PubSubAction, dt *tools.DataType, user string, payload []byte) error {
|
||||
ctx context.Context, action tools.PubSubAction, dt *tools.DataType, user string, groups []string, payload []byte) error {
|
||||
if *dt == tools.PEER {
|
||||
var p peer.Peer
|
||||
if err := json.Unmarshal(payload, &p); err != nil {
|
||||
return err
|
||||
}
|
||||
pid, err := pp.Decode(p.PeerID)
|
||||
if err != nil {
|
||||
if _, err := pp.Decode(p.PeerID); err != nil {
|
||||
return err
|
||||
}
|
||||
|
||||
@@ -76,22 +81,10 @@ func (ps *StreamService) ToPartnerPublishEvent(
|
||||
pe.Relation = p.Relation
|
||||
pe.Verify = false
|
||||
if b2, err := json.Marshal(pe); err == nil {
|
||||
if _, err := ps.PublishCommon(dt, user, p.PeerID, ProtocolUpdateResource, b2); err != nil {
|
||||
if _, err := ps.PublishCommon(dt, user, groups, p.PeerID, ProtocolUpdateResource, b2); err != nil {
|
||||
return err
|
||||
}
|
||||
if p.Relation == peer.PARTNER {
|
||||
if ps.Streams[ProtocolHeartbeatPartner] == nil {
|
||||
ps.Streams[ProtocolHeartbeatPartner] = map[pp.ID]*common.Stream{}
|
||||
}
|
||||
fmt.Println("SHOULD CONNECT")
|
||||
ps.ConnectToPartner(p.StreamAddress)
|
||||
} else if ps.Streams[ProtocolHeartbeatPartner] != nil && ps.Streams[ProtocolHeartbeatPartner][pid] != nil {
|
||||
for _, pids := range ps.Streams {
|
||||
if pids[pid] != nil {
|
||||
delete(pids, pid)
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
}
|
||||
}
|
||||
return nil
|
||||
@@ -100,11 +93,19 @@ func (ps *StreamService) ToPartnerPublishEvent(
|
||||
for k := range protocolsPartners {
|
||||
ks = append(ks, k)
|
||||
}
|
||||
ps.PublishesCommon(dt, user, &dbs.Filters{ // filter by like name, short_description, description, owner, url if no filters are provided
|
||||
var proto protocol.ID
|
||||
proto = ProtocolCreateResource
|
||||
switch action {
|
||||
case tools.PB_DELETE:
|
||||
proto = ProtocolDeleteResource
|
||||
case tools.PB_UPDATE:
|
||||
proto = ProtocolUpdateResource
|
||||
}
|
||||
ps.PublishesCommon(dt, user, groups, &dbs.Filters{ // filter by like name, short_description, description, owner, url if no filters are provided
|
||||
And: map[string][]dbs.Filter{
|
||||
"relation": {{Operator: dbs.EQUAL.String(), Value: peer.PARTNER}},
|
||||
},
|
||||
}, payload, ks...)
|
||||
}, payload, proto)
|
||||
return nil
|
||||
}
|
||||
|
||||
@@ -129,13 +130,17 @@ func (s *StreamService) write(
|
||||
return nil, errors.New("no stream available for protocol " + fmt.Sprintf("%v", proto) + " from PID " + peerID.ID.String())
|
||||
|
||||
}
|
||||
|
||||
stream := s.Streams[proto][peerID.ID]
|
||||
evt := common.NewEvent(string(proto), peerID.ID.String(), dt, user, payload)
|
||||
fmt.Println("SEND EVENT ", evt.From, evt.DataType, evt.Timestamp)
|
||||
evt := common.NewEvent(string(proto), s.Host.ID().String(), dt, user, payload)
|
||||
fmt.Println("SEND EVENT ", peerID, proto, evt.From, evt.DataType, evt.Timestamp)
|
||||
if err := json.NewEncoder(stream.Stream).Encode(evt); err != nil {
|
||||
stream.Stream.Close()
|
||||
logger.Err(err)
|
||||
return stream, nil
|
||||
return nil, err
|
||||
}
|
||||
if protocolInfo, ok := protocols[proto]; ok && protocolInfo.WaitResponse {
|
||||
go s.readLoop(stream, peerID.ID, proto, &common.ProtocolInfo{PersistantStream: true})
|
||||
}
|
||||
return stream, nil
|
||||
}
|
||||
|
||||
@@ -3,7 +3,8 @@ package stream
|
||||
import (
|
||||
"context"
|
||||
"encoding/json"
|
||||
"fmt"
|
||||
"errors"
|
||||
"io"
|
||||
"oc-discovery/conf"
|
||||
"oc-discovery/daemons/node/common"
|
||||
"strings"
|
||||
@@ -19,7 +20,6 @@ import (
|
||||
"github.com/libp2p/go-libp2p/core/network"
|
||||
pp "github.com/libp2p/go-libp2p/core/peer"
|
||||
"github.com/libp2p/go-libp2p/core/protocol"
|
||||
ma "github.com/multiformats/go-multiaddr"
|
||||
)
|
||||
|
||||
const ProtocolConsidersResource = "/opencloud/resource/considers/1.0"
|
||||
@@ -51,28 +51,30 @@ var protocolsPartners = map[protocol.ID]*common.ProtocolInfo{
|
||||
}
|
||||
|
||||
type StreamService struct {
|
||||
Key pp.ID
|
||||
Host host.Host
|
||||
Node common.DiscoveryPeer
|
||||
Streams common.ProtocolStream
|
||||
maxNodesConn int
|
||||
Mu sync.RWMutex
|
||||
// Stream map[protocol.ID]map[pp.ID]*daemons.Stream
|
||||
Key pp.ID
|
||||
Host host.Host
|
||||
Node common.DiscoveryPeer
|
||||
Streams common.ProtocolStream
|
||||
maxNodesConn int
|
||||
Mu sync.RWMutex
|
||||
ResourceSearches *common.SearchTracker
|
||||
// IsPeerKnown, when set, is called at stream open for every inbound protocol.
|
||||
// Return false to reset the stream immediately. Left nil until wired by the node.
|
||||
IsPeerKnown func(pid pp.ID) bool
|
||||
}
|
||||
|
||||
func InitStream(ctx context.Context, h host.Host, key pp.ID, maxNode int, node common.DiscoveryPeer) (*StreamService, error) {
|
||||
logger := oclib.GetLogger()
|
||||
service := &StreamService{
|
||||
Key: key,
|
||||
Node: node,
|
||||
Host: h,
|
||||
Streams: common.ProtocolStream{},
|
||||
maxNodesConn: maxNode,
|
||||
Key: key,
|
||||
Node: node,
|
||||
Host: h,
|
||||
Streams: common.ProtocolStream{},
|
||||
maxNodesConn: maxNode,
|
||||
ResourceSearches: common.NewSearchTracker(),
|
||||
}
|
||||
logger.Info().Msg("handle to partner heartbeat protocol...")
|
||||
service.Host.SetStreamHandler(ProtocolHeartbeatPartner, service.HandlePartnerHeartbeat)
|
||||
for proto := range protocols {
|
||||
service.Host.SetStreamHandler(proto, service.HandleResponse)
|
||||
service.Host.SetStreamHandler(proto, service.gate(service.HandleResponse))
|
||||
}
|
||||
logger.Info().Msg("connect to partners...")
|
||||
service.connectToPartners() // we set up a stream
|
||||
@@ -80,8 +82,24 @@ func InitStream(ctx context.Context, h host.Host, key pp.ID, maxNode int, node c
|
||||
return service, nil
|
||||
}
|
||||
|
||||
// gate wraps a stream handler with IsPeerKnown validation.
|
||||
// If the peer is unknown the entire connection is closed and the handler is not called.
|
||||
// IsPeerKnown is read at stream-open time so it works even when set after InitStream.
|
||||
func (s *StreamService) gate(h func(network.Stream)) func(network.Stream) {
|
||||
return func(stream network.Stream) {
|
||||
if s.IsPeerKnown != nil && !s.IsPeerKnown(stream.Conn().RemotePeer()) {
|
||||
logger := oclib.GetLogger()
|
||||
logger.Warn().Str("peer", stream.Conn().RemotePeer().String()).Msg("[stream] unknown peer, closing connection")
|
||||
stream.Conn().Close()
|
||||
return
|
||||
}
|
||||
h(stream)
|
||||
}
|
||||
}
|
||||
|
||||
func (s *StreamService) HandleResponse(stream network.Stream) {
|
||||
s.Mu.Lock()
|
||||
defer s.Mu.Unlock()
|
||||
stream.Protocol()
|
||||
if s.Streams[stream.Protocol()] == nil {
|
||||
s.Streams[stream.Protocol()] = map[pp.ID]*common.Stream{}
|
||||
@@ -98,46 +116,15 @@ func (s *StreamService) HandleResponse(stream network.Stream) {
|
||||
Stream: stream,
|
||||
Expiry: time.Now().UTC().Add(expiry + 1*time.Minute),
|
||||
}
|
||||
s.Mu.Unlock()
|
||||
|
||||
go s.readLoop(s.Streams[stream.Protocol()][stream.Conn().RemotePeer()],
|
||||
stream.Conn().RemotePeer(),
|
||||
stream.Protocol(), protocols[stream.Protocol()])
|
||||
}
|
||||
|
||||
func (s *StreamService) HandlePartnerHeartbeat(stream network.Stream) {
|
||||
s.Mu.Lock()
|
||||
if s.Streams[ProtocolHeartbeatPartner] == nil {
|
||||
s.Streams[ProtocolHeartbeatPartner] = map[pp.ID]*common.Stream{}
|
||||
}
|
||||
streams := s.Streams[ProtocolHeartbeatPartner]
|
||||
streamsAnonym := map[pp.ID]common.HeartBeatStreamed{}
|
||||
for k, v := range streams {
|
||||
streamsAnonym[k] = v
|
||||
}
|
||||
s.Mu.Unlock()
|
||||
pid, hb, err := common.CheckHeartbeat(s.Host, stream, json.NewDecoder(stream), streamsAnonym, &s.Mu, s.maxNodesConn)
|
||||
if err != nil {
|
||||
return
|
||||
}
|
||||
s.Mu.Lock()
|
||||
defer s.Mu.Unlock()
|
||||
// if record already seen update last seen
|
||||
if rec, ok := streams[*pid]; ok {
|
||||
rec.DID = hb.DID
|
||||
rec.Expiry = time.Now().UTC().Add(10 * time.Second)
|
||||
} else { // if not in stream ?
|
||||
val, err := stream.Conn().RemoteMultiaddr().ValueForProtocol(ma.P_IP4)
|
||||
if err == nil {
|
||||
s.ConnectToPartner(val)
|
||||
}
|
||||
}
|
||||
// GC is already running via InitStream — starting a new ticker goroutine on
|
||||
// every heartbeat would leak an unbounded number of goroutines.
|
||||
}
|
||||
|
||||
func (s *StreamService) connectToPartners() error {
|
||||
logger := oclib.GetLogger()
|
||||
// Register handlers for partner resource protocols (create/update/delete).
|
||||
// Connections to partners happen on-demand via TempStream when needed.
|
||||
for proto, info := range protocolsPartners {
|
||||
f := func(ss network.Stream) {
|
||||
if s.Streams[proto] == nil {
|
||||
@@ -150,28 +137,11 @@ func (s *StreamService) connectToPartners() error {
|
||||
go s.readLoop(s.Streams[proto][ss.Conn().RemotePeer()], ss.Conn().RemotePeer(), proto, info)
|
||||
}
|
||||
logger.Info().Msg("SetStreamHandler " + string(proto))
|
||||
s.Host.SetStreamHandler(proto, f)
|
||||
}
|
||||
peers, err := s.searchPeer(fmt.Sprintf("%v", peer.PARTNER.EnumIndex()))
|
||||
if err != nil {
|
||||
logger.Err(err)
|
||||
return err
|
||||
}
|
||||
for _, p := range peers {
|
||||
s.ConnectToPartner(p.StreamAddress)
|
||||
s.Host.SetStreamHandler(proto, s.gate(f))
|
||||
}
|
||||
return nil
|
||||
}
|
||||
|
||||
func (s *StreamService) ConnectToPartner(address string) {
|
||||
logger := oclib.GetLogger()
|
||||
if ad, err := pp.AddrInfoFromString(address); err == nil {
|
||||
logger.Info().Msg("Connect to Partner " + ProtocolHeartbeatPartner + " " + address)
|
||||
common.SendHeartbeat(context.Background(), ProtocolHeartbeatPartner, conf.GetConfig().Name,
|
||||
s.Host, s.Streams, map[string]*pp.AddrInfo{address: ad}, nil, 20*time.Second)
|
||||
}
|
||||
}
|
||||
|
||||
func (s *StreamService) searchPeer(search string) ([]*peer.Peer, error) {
|
||||
ps := []*peer.Peer{}
|
||||
if conf.GetConfig().PeerIDS != "" {
|
||||
@@ -219,11 +189,7 @@ func (s *StreamService) gc() {
|
||||
defer s.Mu.Unlock()
|
||||
now := time.Now().UTC()
|
||||
|
||||
if s.Streams[ProtocolHeartbeatPartner] == nil {
|
||||
s.Streams[ProtocolHeartbeatPartner] = map[pp.ID]*common.Stream{}
|
||||
}
|
||||
streams := s.Streams[ProtocolHeartbeatPartner]
|
||||
for pid, rec := range streams {
|
||||
for pid, rec := range s.Streams[ProtocolHeartbeatPartner] {
|
||||
if now.After(rec.Expiry) {
|
||||
for _, sstreams := range s.Streams {
|
||||
if sstreams[pid] != nil {
|
||||
@@ -256,7 +222,13 @@ func (ps *StreamService) readLoop(s *common.Stream, id pp.ID, proto protocol.ID,
|
||||
if err := json.NewDecoder(s.Stream).Decode(&evt); err != nil {
|
||||
// Any decode error (EOF, reset, malformed JSON) terminates the loop;
|
||||
// continuing on a dead/closed stream creates an infinite spin.
|
||||
return
|
||||
if errors.Is(err, io.EOF) || errors.Is(err, io.ErrUnexpectedEOF) ||
|
||||
strings.Contains(err.Error(), "reset") ||
|
||||
strings.Contains(err.Error(), "closed") ||
|
||||
strings.Contains(err.Error(), "too many connections") {
|
||||
return
|
||||
}
|
||||
continue
|
||||
}
|
||||
ps.handleEvent(evt.Type, &evt)
|
||||
if protocolInfo.WaitResponse && !protocolInfo.PersistantStream {
|
||||
@@ -265,21 +237,22 @@ func (ps *StreamService) readLoop(s *common.Stream, id pp.ID, proto protocol.ID,
|
||||
}
|
||||
}
|
||||
|
||||
func (abs *StreamService) FilterPeer(peerID string, search string) *dbs.Filters {
|
||||
id, err := oclib.GetMySelf()
|
||||
func (abs *StreamService) FilterPeer(peerID string, groups []string, search string) *dbs.Filters {
|
||||
p, err := oclib.GetMySelf()
|
||||
if err != nil {
|
||||
return nil
|
||||
}
|
||||
groups = append(groups, "*")
|
||||
filter := map[string][]dbs.Filter{
|
||||
"creator_id": {{Operator: dbs.EQUAL.String(), Value: id}}, // is my resource...
|
||||
"abstractinstanciatedresource.abstractresource.abstractobject.creator_id": {{Operator: dbs.EQUAL.String(), Value: p.GetID()}}, // is my resource...
|
||||
"": {{Operator: dbs.OR.String(), Value: &dbs.Filters{
|
||||
Or: map[string][]dbs.Filter{
|
||||
"abstractobject.access_mode": {{Operator: dbs.EQUAL.String(), Value: 1}}, // if public
|
||||
"abstractinstanciatedresource.abstractresource.abstractobject.access_mode": {{Operator: dbs.EQUAL.String(), Value: 1}}, // if public
|
||||
"abstractinstanciatedresource.instances": {{Operator: dbs.ELEMMATCH.String(), Value: &dbs.Filters{ // or got a partners instances
|
||||
And: map[string][]dbs.Filter{
|
||||
"resourceinstance.partnerships": {{Operator: dbs.ELEMMATCH.String(), Value: &dbs.Filters{
|
||||
And: map[string][]dbs.Filter{
|
||||
"resourcepartnership.peer_groups." + peerID: {{Operator: dbs.EXISTS.String(), Value: true}},
|
||||
"resourcepartnership.peer_groups." + peerID: {{Operator: dbs.IN.String(), Value: groups}},
|
||||
},
|
||||
}}},
|
||||
},
|
||||
@@ -287,15 +260,15 @@ func (abs *StreamService) FilterPeer(peerID string, search string) *dbs.Filters
|
||||
},
|
||||
}}},
|
||||
}
|
||||
|
||||
if search != "" {
|
||||
filter[" "] = []dbs.Filter{{Operator: dbs.OR.String(), Value: &dbs.Filters{
|
||||
Or: map[string][]dbs.Filter{ // filter by like name, short_description, description, owner, url if no filters are provided
|
||||
"abstractintanciatedresource.abstractresource.abstractobject.name": {{Operator: dbs.LIKE.String(), Value: search}},
|
||||
"abstractintanciatedresource.abstractresource.type": {{Operator: dbs.LIKE.String(), Value: search}},
|
||||
"abstractintanciatedresource.abstractresource.short_description": {{Operator: dbs.LIKE.String(), Value: search}},
|
||||
"abstractintanciatedresource.abstractresource.description": {{Operator: dbs.LIKE.String(), Value: search}},
|
||||
"abstractintanciatedresource.abstractresource.owners.name": {{Operator: dbs.LIKE.String(), Value: search}},
|
||||
"abstractintanciatedresource.abstractresource.abstractobject.creator_id": {{Operator: dbs.EQUAL.String(), Value: search}},
|
||||
"abstractinstanciatedresource.abstractresource.abstractobject.name": {{Operator: dbs.LIKE.String(), Value: search}},
|
||||
"abstractinstanciatedresource.abstractresource.type": {{Operator: dbs.LIKE.String(), Value: search}},
|
||||
"abstractinstanciatedresource.abstractresource.short_description": {{Operator: dbs.LIKE.String(), Value: search}},
|
||||
"abstractinstanciatedresource.abstractresource.description": {{Operator: dbs.LIKE.String(), Value: search}},
|
||||
"abstractinstanciatedresource.abstractresource.owners.name": {{Operator: dbs.LIKE.String(), Value: search}},
|
||||
},
|
||||
}}}
|
||||
}
|
||||
|
||||
@@ -1,10 +0,0 @@
|
||||
{
|
||||
"MONGO_URL":"mongodb://mongo:27017/",
|
||||
"MONGO_DATABASE":"DC_myDC",
|
||||
"NATS_URL": "nats://nats:4222",
|
||||
"NODE_MODE": "node",
|
||||
"NODE_ENDPOINT_PORT": 4010,
|
||||
"NATIVE_INDEXER_ADDRESSES": "/ip4/172.40.0.5/tcp/4005/p2p/12D3KooWGn3j4XqTSrjJDGGpTQERdDV5TPZdhQp87rAUnvQssvQu",
|
||||
"MIN_INDEXER": 2,
|
||||
"PEER_IDS": "/ip4/172.40.0.9/tcp/4009/p2p/12D3KooWGnQfKwX9E4umCPE8dUKZuig4vw5BndDowRLEbGmcZyta"
|
||||
}
|
||||
@@ -4,5 +4,6 @@
|
||||
"NATS_URL": "nats://nats:4222",
|
||||
"NODE_MODE": "node",
|
||||
"NODE_ENDPOINT_PORT": 4003,
|
||||
"NAME": "opencloud-demo-1",
|
||||
"INDEXER_ADDRESSES": "/ip4/172.40.0.2/tcp/4002/p2p/12D3KooWC3GNStak8KCYtJq11Dxiq45EJV53z1ZvKetMcZBeBX6u"
|
||||
}
|
||||
@@ -4,6 +4,6 @@
|
||||
"NATS_URL": "nats://nats:4222",
|
||||
"NODE_MODE": "node",
|
||||
"NODE_ENDPOINT_PORT": 4004,
|
||||
"INDEXER_ADDRESSES": "/ip4/172.40.0.1/tcp/4001/p2p/12D3KooWGn3j4XqTSrjJDGGpTQERdDV5TPZdhQp87rAUnvQssvQu",
|
||||
"PEER_IDS": "/ip4/172.40.0.3/tcp/4003/p2p/12D3KooWBh9kZrekBAE5G33q4jCLNRAzygem3gP1mMdK8mhoCTaw"
|
||||
"NAME": "opencloud-demo-2",
|
||||
"INDEXER_ADDRESSES": "/ip4/172.40.0.1/tcp/4001/p2p/12D3KooWGn3j4XqTSrjJDGGpTQERdDV5TPZdhQp87rAUnvQssvQu"
|
||||
}
|
||||
|
||||
@@ -1,7 +0,0 @@
|
||||
{
|
||||
"MONGO_URL":"mongodb://mongo:27017/",
|
||||
"MONGO_DATABASE":"DC_myDC",
|
||||
"NATS_URL": "nats://nats:4222",
|
||||
"NODE_MODE": "native-indexer",
|
||||
"NODE_ENDPOINT_PORT": 4005
|
||||
}
|
||||
@@ -1,8 +0,0 @@
|
||||
{
|
||||
"MONGO_URL":"mongodb://mongo:27017/",
|
||||
"MONGO_DATABASE":"DC_myDC",
|
||||
"NATS_URL": "nats://nats:4222",
|
||||
"NODE_MODE": "native-indexer",
|
||||
"NODE_ENDPOINT_PORT": 4006,
|
||||
"NATIVE_INDEXER_ADDRESSES": "/ip4/172.40.0.5/tcp/4005/p2p/12D3KooWGn3j4XqTSrjJDGGpTQERdDV5TPZdhQp87rAUnvQssvQu"
|
||||
}
|
||||
@@ -1,8 +0,0 @@
|
||||
{
|
||||
"MONGO_URL":"mongodb://mongo:27017/",
|
||||
"MONGO_DATABASE":"DC_myDC",
|
||||
"NATS_URL": "nats://nats:4222",
|
||||
"NODE_MODE": "indexer",
|
||||
"NODE_ENDPOINT_PORT": 4007,
|
||||
"NATIVE_INDEXER_ADDRESSES": "/ip4/172.40.0.6/tcp/4006/p2p/12D3KooWC3GNStak8KCYtJq11Dxiq45EJV53z1ZvKetMcZBeBX6u"
|
||||
}
|
||||
@@ -1,8 +0,0 @@
|
||||
{
|
||||
"MONGO_URL":"mongodb://mongo:27017/",
|
||||
"MONGO_DATABASE":"DC_myDC",
|
||||
"NATS_URL": "nats://nats:4222",
|
||||
"NODE_MODE": "indexer",
|
||||
"NODE_ENDPOINT_PORT": 4008,
|
||||
"NATIVE_INDEXER_ADDRESSES": "/ip4/172.40.0.5/tcp/4005/p2p/12D3KooWGn3j4XqTSrjJDGGpTQERdDV5TPZdhQp87rAUnvQssvQu"
|
||||
}
|
||||
@@ -1,8 +0,0 @@
|
||||
{
|
||||
"MONGO_URL":"mongodb://mongo:27017/",
|
||||
"MONGO_DATABASE":"DC_myDC",
|
||||
"NATS_URL": "nats://nats:4222",
|
||||
"NODE_MODE": "node",
|
||||
"NODE_ENDPOINT_PORT": 4009,
|
||||
"NATIVE_INDEXER_ADDRESSES": "/ip4/172.40.0.6/tcp/4006/p2p/12D3KooWC3GNStak8KCYtJq11Dxiq45EJV53z1ZvKetMcZBeBX6u,/ip4/172.40.0.5/tcp/4005/p2p/12D3KooWGn3j4XqTSrjJDGGpTQERdDV5TPZdhQp87rAUnvQssvQu"
|
||||
}
|
||||
1030
docs/DECENTRALIZED_SYSTEMS_COMPARISON.txt
Normal file
1030
docs/DECENTRALIZED_SYSTEMS_COMPARISON.txt
Normal file
File diff suppressed because it is too large
Load Diff
362
docs/FUTURE_DHT_ARCHITECTURE.txt
Normal file
362
docs/FUTURE_DHT_ARCHITECTURE.txt
Normal file
@@ -0,0 +1,362 @@
|
||||
================================================================================
|
||||
OC-DISCOVERY : ARCHITECTURE CIBLE — RÉSEAU DHT SANS NATIFS
|
||||
Vision d'évolution long terme, issue d'une analyse comparative
|
||||
================================================================================
|
||||
|
||||
Rédigé à partir de l'analyse de l'architecture actuelle et de la discussion
|
||||
comparative avec Tapestry, Kademlia, EigenTrust et les systèmes de réputation
|
||||
distribués.
|
||||
|
||||
Référence : DECENTRALIZED_SYSTEMS_COMPARISON.txt §9
|
||||
|
||||
|
||||
================================================================================
|
||||
1. MOTIVATION
|
||||
================================================================================
|
||||
|
||||
L'architecture actuelle (node → indexer → native indexer) est robuste et bien
|
||||
adaptée à une phase précoce du réseau. Ses limites à l'échelle sont :
|
||||
|
||||
- Pool de natives statique au démarrage → dépendance à la configuration
|
||||
- Cache local des natives = point de défaillance unique (perte = pool vide)
|
||||
- Consensus inter-natives bloquant (~7s) déclenché à chaque bootstrap node
|
||||
- État O(N indexers) par native → croît linéairement avec le réseau
|
||||
- Nœuds privilégiés structurellement → SPOFs relatifs
|
||||
|
||||
La cible décrite ici supprime la notion de native indexer en tant que tier
|
||||
architectural. Le réseau devient plat : indexers et nodes sont des acteurs
|
||||
de même nature, différenciés uniquement par leur rôle volontaire.
|
||||
|
||||
|
||||
================================================================================
|
||||
2. PRINCIPES FONDAMENTAUX
|
||||
================================================================================
|
||||
|
||||
P1. Aucun nœud n'est structurellement privilégié.
|
||||
P2. La confiance est un produit du temps et de la vérification, pas d'un arbitre.
|
||||
P3. Les claims d'un acteur sont vérifiables indépendamment par tout pair.
|
||||
P4. La réputation émerge du comportement collectif, pas d'un signalement central.
|
||||
P5. La DHT est une infrastructure neutre — elle stocke des faits, pas des jugements.
|
||||
P6. La configuration statique n'existe plus au runtime — seulement au bootstrap.
|
||||
|
||||
|
||||
================================================================================
|
||||
3. RÔLES
|
||||
================================================================================
|
||||
|
||||
3.1 Node
|
||||
--------
|
||||
Consommateur du réseau. Démarre, sélectionne un pool d'indexers via DHT,
|
||||
heartbeat ses indexers, accumule des scores localement. Ne publie rien en
|
||||
routine. Participe aux challenges de consensus à la demande.
|
||||
|
||||
3.2 Indexer
|
||||
-----------
|
||||
Acteur volontaire. S'inscrit dans la DHT à la naissance, maintient son record,
|
||||
sert le trafic des nodes (heartbeat, Publish, Get). Déclare ses métriques dans
|
||||
chaque réponse heartbeat. Maintient un score agrégé depuis ses nodes connectés.
|
||||
|
||||
Différence avec l'actuel : l'indexer n'a plus de lien avec une native.
|
||||
Il est autonome. Son existence dans le réseau est prouvée par son record DHT
|
||||
et par les nodes qui le contactent directement.
|
||||
|
||||
3.3 Nœud DHT infrastructure (ex-native)
|
||||
----------------------------------------
|
||||
N'importe quel nœud suffisamment stable peut maintenir la DHT sans être un
|
||||
indexer. C'est une configuration, pas un type architectural : `dht_mode: server`.
|
||||
Ces nœuds maintiennent les k-buckets Kademlia et stockent les records des
|
||||
indexers. Ils ne connaissent pas le trafic node↔indexer et ne l'orchestrent pas.
|
||||
|
||||
|
||||
================================================================================
|
||||
4. BOOTSTRAP D'UN NODE
|
||||
================================================================================
|
||||
|
||||
4.1 Entrée dans le réseau
|
||||
-------------------------
|
||||
Le node démarre avec 1 à 3 adresses de nœuds DHT connus (bootstrap peers).
|
||||
Ce sont les seules informations statiques nécessaires. Ces peers n'ont pas de
|
||||
rôle sémantique — ils servent uniquement à entrer dans l'overlay DHT.
|
||||
|
||||
4.2 Découverte du pool d'indexers
|
||||
----------------------------------
|
||||
|
||||
Node → DHT.FindProviders(hash("/opencloud/indexers"))
|
||||
→ reçoit une liste de N candidats avec leurs records
|
||||
|
||||
Sélection du pool initial :
|
||||
|
||||
1. Filtre latence : ping < seuil → proximité réseau réelle
|
||||
2. Filtre fill rate : préférer les indexers moins chargés
|
||||
3. Tirage pondéré : probabilité ∝ (1 - fill_rate), courbe w(F) = F×(1-F)
|
||||
indexer à 20% charge → très probable
|
||||
indexer à 80% charge → peu probable
|
||||
4. Filtre diversité : subnet /24 différent pour chaque entrée du pool
|
||||
|
||||
Aucun consensus nécessaire à cette étape. Le node démarre avec une tolérance
|
||||
basse (voir §7) — il accepte des indexers imparfaits et les évalue au fil du temps.
|
||||
|
||||
|
||||
================================================================================
|
||||
5. REGISTRATION D'UN INDEXER DANS LA DHT
|
||||
================================================================================
|
||||
|
||||
À la naissance, l'indexer publie son record DHT :
|
||||
|
||||
clé : hash("/opencloud/indexers") ← clé fixe, connue de tous
|
||||
valeur: {
|
||||
multiaddr : <adresse réseau>,
|
||||
region : <subnet /24>,
|
||||
capacity : <maxNodesConn>,
|
||||
fill_rate : <float 0-1>, ← auto-déclaré, vérifiable
|
||||
peer_count : <int>, ← auto-déclaré, vérifiable
|
||||
peers : [hash(nodeID1), ...], ← liste hashée des nodes connectés
|
||||
born_at : <timestamp>,
|
||||
sig : <signature clé indexer>, ← non-forgeable (PSK context)
|
||||
}
|
||||
|
||||
Le record est rafraîchi toutes les ~60s (avant expiration du TTL).
|
||||
Si l'indexer tombe : TTL expire → disparaît de la DHT automatiquement.
|
||||
|
||||
La peer list est hashée pour la confidentialité mais reste vérifiable :
|
||||
un challenger peut demander directement à un node s'il est connecté à cet indexer.
|
||||
|
||||
|
||||
================================================================================
|
||||
6. PROTOCOLE HEARTBEAT — QUESTION ET RÉPONSE
|
||||
================================================================================
|
||||
|
||||
Le heartbeat devient bidirectionnel : le node pose des questions, l'indexer
|
||||
répond avec ses déclarations courantes.
|
||||
|
||||
6.1 Structure
|
||||
-------------
|
||||
|
||||
Node → Indexer :
|
||||
{
|
||||
ts : now,
|
||||
challenge : <optionnel, voir §8>
|
||||
}
|
||||
|
||||
Indexer → Node :
|
||||
{
|
||||
ts : now,
|
||||
fill_rate : 0.42,
|
||||
peer_count : 87,
|
||||
cached_score : 0.74, ← score agrégé depuis tous ses nodes connectés
|
||||
challenge_response : {...} ← si challenge présent dans la requête
|
||||
}
|
||||
|
||||
Le heartbeat normal (sans challenge) est quasi-identique à l'actuel en poids.
|
||||
Le cached_score indexer est mis à jour progressivement par les feedbacks reçus.
|
||||
|
||||
6.2 Le cached_score de l'indexer
|
||||
---------------------------------
|
||||
L'indexer agrège les scores que ses nodes connectés lui communiquent
|
||||
(implicitement via le fait qu'ils restent connectés, ou explicitement lors
|
||||
d'un consensus). Ce score lui donne une vision de sa propre qualité réseau.
|
||||
|
||||
Un node peut comparer son score local de l'indexer avec le cached_score déclaré.
|
||||
Une forte divergence est un signal d'alerte.
|
||||
|
||||
Score local node : 0.40 ← cet indexer est médiocre pour moi
|
||||
Cached score : 0.91 ← il se prétend excellent globalement
|
||||
→ déclenche un challenge de vérification
|
||||
|
||||
|
||||
================================================================================
|
||||
7. MODÈLE DE CONFIANCE PROGRESSIVE
|
||||
================================================================================
|
||||
|
||||
7.1 Cycle de vie d'un node
|
||||
---------------------------
|
||||
|
||||
Naissance
|
||||
→ tolérance basse : accepte presque n'importe quel indexer du DHT
|
||||
→ switching cost faible : peu de contexte accumulé
|
||||
→ minScore ≈ 20% (dynamicMinScore existant, conservé)
|
||||
|
||||
Quelques heures
|
||||
→ uptime s'accumule sur chaque indexer connu
|
||||
→ scores se stabilisent
|
||||
→ seuil de remplacement qui monte progressivement
|
||||
|
||||
Long terme (jours)
|
||||
→ pool stable, confiance élevée sur les indexers connus
|
||||
→ switching coûteux mais déclenché sur déception franche
|
||||
→ minScore ≈ 80% (maturité)
|
||||
|
||||
7.2 Modèle sous-jacent : beta distribution implicite
|
||||
------------------------------------------------------
|
||||
|
||||
α = succès cumulés (heartbeats OK, probes OK, challenges réussis)
|
||||
β = échecs cumulés (timeouts, probes échoués, challenges ratés)
|
||||
|
||||
confiance = α / (α + β)
|
||||
|
||||
Nouveau indexer : α=0, β=0 → prior neutre, tolérance basse
|
||||
Après 10 jours : α élevé → confiance stable, seuil de switch élevé
|
||||
Déception franche : β monte → confiance chute → switch déclenché
|
||||
|
||||
7.3 Ce que "décevoir" signifie
|
||||
--------------------------------
|
||||
|
||||
Heartbeat rate → trop de timeouts → fiabilité en baisse
|
||||
Bandwidth probe → chute sous déclaré → dégradation ou mensonge
|
||||
Fill rate réel → supérieur au déclaré → indexer surchargé ou malhonnête
|
||||
Challenge échoué → peer déclaré absent du réseau → claim invalide
|
||||
Latence → dérive progressive → qualité réseau dégradée
|
||||
Cached_score gonflé → divergence forte avec score local → suspicion
|
||||
|
||||
|
||||
================================================================================
|
||||
8. VÉRIFICATION DES CLAIMS — TROIS COUCHES
|
||||
================================================================================
|
||||
|
||||
8.1 Couche 1 : passive (chaque heartbeat, 60s)
|
||||
-----------------------------------------------
|
||||
Mesures automatiques, zéro coût supplémentaire.
|
||||
|
||||
- RTT du heartbeat → latence directe
|
||||
- fill_rate déclaré → tiny payload dans la réponse
|
||||
- peer_count déclaré → tiny payload
|
||||
- cached_score indexer → comparé au score local
|
||||
|
||||
8.2 Couche 2 : sampling actif (1 heartbeat sur N)
|
||||
--------------------------------------------------
|
||||
Vérifications périodiques, asynchrones, légères.
|
||||
|
||||
Tous les 5 HB (~5min) : spot-check 1 peer aléatoire (voir §8.4)
|
||||
Tous les 10 HB (~10min): vérification diversité subnet (lookups DHT légers)
|
||||
Tous les 15 HB (~15min): bandwidth probe (transfert réel, protocole dédié)
|
||||
|
||||
8.3 Couche 3 : consensus (événementiel)
|
||||
-----------------------------------------
|
||||
Déclenché sur : admission d'un nouvel indexer dans le pool, ou suspicion détectée.
|
||||
|
||||
Node sélectionne une claim vérifiable de l'indexer cible X
|
||||
Node vérifie lui-même
|
||||
Node demande à ses indexers de confiance : "vérifiez cette claim sur X"
|
||||
Chaque indexer vérifie indépendamment
|
||||
Convergence des résultats → X est honnête → admission
|
||||
Divergence → X est suspect → rejet ou probation
|
||||
|
||||
Le consensus est léger : quelques contacts out-of-band, pas de round bloquant.
|
||||
Il n'est pas continu — il est événementiel.
|
||||
|
||||
8.4 Vérification out-of-band (pas de DHT writes par les nodes)
|
||||
----------------------------------------------------------------
|
||||
Les nodes ne publient PAS de contact records continus dans la DHT.
|
||||
Cela éviterait N×M records à rafraîchir (coût DHT élevé à l'échelle).
|
||||
|
||||
À la place, lors d'un challenge :
|
||||
|
||||
Challenger sélectionne 2-3 peers dans la peer list déclarée par X
|
||||
→ contacte ces peers directement : "es-tu connecté à indexer X ?"
|
||||
→ réponse directe (out-of-band, pas via DHT)
|
||||
→ vérification sans écriture DHT
|
||||
|
||||
L'indexer ne peut pas faire répondre "oui" à des peers qui ne lui sont pas
|
||||
connectés. La vérification est non-falsifiable et sans coût DHT.
|
||||
|
||||
8.5 Pourquoi X ne peut pas tricher
|
||||
------------------------------------
|
||||
X ne peut pas coordonner des réponses différentes vers des challengers
|
||||
simultanés. Chaque challenger contacte indépendamment les mêmes peers.
|
||||
Si X ment sur sa peer list :
|
||||
|
||||
- Challenger A contacte peer P → "non, pas connecté à X"
|
||||
- Challenger B contacte peer P → "non, pas connecté à X"
|
||||
- Consensus : X ment → score chute chez tous les challengers
|
||||
- Effet réseau : progressivement, X perd ses connections
|
||||
- Peer list DHT se vide → claims futures encore moins crédibles
|
||||
|
||||
|
||||
================================================================================
|
||||
9. EFFET RÉSEAU SANS SIGNALEMENT CENTRAL
|
||||
================================================================================
|
||||
|
||||
Un node qui pénalise un indexer n'envoie aucun "rapport" à quiconque.
|
||||
Ses actions locales produisent l'effet réseau par agrégation :
|
||||
|
||||
Node baisse le score de X → X reçoit moins de trafic de ce node
|
||||
Node switche vers Y → X perd un client
|
||||
Node refuse les challenges X → X ne peut plus participer aux consensus
|
||||
|
||||
Si 200 nodes font pareil :
|
||||
|
||||
X perd la majorité de ses connections
|
||||
Sa peer list DHT se vide (peers contactés directement disent "non")
|
||||
Son cached_score s'effondre (peu de nodes restent)
|
||||
Les nouveaux nodes qui voient X dans la DHT obtiennent des challenges échoués
|
||||
X est naturellement exclu sans aucune décision centrale
|
||||
|
||||
Inversement, un indexer honnête voit ses scores monter sur tous ses nodes
|
||||
connectés, sa peer list se densifier, ses challenges réussis systématiquement.
|
||||
Sa réputation est un produit observable et vérifiable.
|
||||
|
||||
|
||||
================================================================================
|
||||
10. RÉSUMÉ DE L'ARCHITECTURE
|
||||
================================================================================
|
||||
|
||||
DHT → annuaire neutre, vérité des records indexers
|
||||
maintenu par tout nœud stable (dht_mode: server)
|
||||
|
||||
Indexer → acteur volontaire, s'inscrit, maintient ses claims,
|
||||
sert le trafic, accumule son propre score agrégé
|
||||
|
||||
Node → consommateur, score passif + sampling + consensus léger,
|
||||
confiance progressive, switching adaptatif
|
||||
|
||||
Heartbeat → métronome 60s + vecteur de déclarations légères + challenge optionnel
|
||||
|
||||
Consensus → événementiel, multi-challengers indépendants,
|
||||
vérification out-of-band sur claims DHT
|
||||
|
||||
Confiance → beta implicite, progressive, switching cost croissant avec l'âge
|
||||
|
||||
Réputation → émerge du comportement collectif, aucun arbitre central
|
||||
|
||||
Bootstrap → 1-3 peers DHT connus → seule configuration statique nécessaire
|
||||
|
||||
|
||||
================================================================================
|
||||
11. TRAJECTOIRE DE MIGRATION
|
||||
================================================================================
|
||||
|
||||
Phase 1 (actuel)
|
||||
Natives statiques, pool indexers dynamique, consensus inter-natives
|
||||
→ robuste, adapté à la phase précoce
|
||||
|
||||
Phase 2 (intermédiaire)
|
||||
Pool de natives dynamique via DHT (bootstrap + gossip)
|
||||
Même protocole natif, juste la découverte devient dynamique
|
||||
→ supprime la dépendance à la configuration statique des natives
|
||||
→ voir DECENTRALIZED_SYSTEMS_COMPARISON.txt §9.2
|
||||
|
||||
Phase 3 (cible)
|
||||
Architecture décrite dans ce document
|
||||
Natives disparaissent en tant que tier architectural
|
||||
DHT = infrastructure, indexers = acteurs autonomes
|
||||
Scoring et consensus entièrement côté node
|
||||
→ aucun nœud privilégié, scalabilité O(log N)
|
||||
|
||||
La migration Phase 2 → Phase 3 est une refonte du plan de contrôle.
|
||||
Le plan de données (heartbeat node↔indexer, Publish, Get) est inchangé.
|
||||
Les primitives libp2p (Kademlia DHT, GossipSub) sont déjà présentes.
|
||||
|
||||
|
||||
================================================================================
|
||||
12. PROPRIÉTÉS DU SYSTÈME CIBLE
|
||||
================================================================================
|
||||
|
||||
Scalabilité O(log N) — routage DHT Kademlia
|
||||
Résilience Pas de SPOF structurel, TTL = seule source de vérité
|
||||
Confiance Progressive, vérifiable, émergente
|
||||
Sybil resistance PSK — seuls les nœuds avec la clé peuvent publier
|
||||
Cold start Tolérance basse initiale, montée progressive (existant)
|
||||
Honnêteté Claims vérifiables out-of-band, non-falsifiables
|
||||
Décentralisation Aucun nœud ne connaît l'état global complet
|
||||
|
||||
================================================================================
|
||||
@@ -1,56 +0,0 @@
|
||||
sequenceDiagram
|
||||
title Node Initialization — Pair A (InitNode)
|
||||
|
||||
participant MainA as main (Pair A)
|
||||
participant NodeA as Node A
|
||||
participant libp2pA as libp2p (Pair A)
|
||||
participant DBA as DB Pair A (oc-lib)
|
||||
participant NATSA as NATS A
|
||||
participant IndexerA as Indexer (partagé)
|
||||
participant StreamA as StreamService A
|
||||
participant PubSubA as PubSubService A
|
||||
|
||||
MainA->>NodeA: InitNode(isNode, isIndexer, isNativeIndexer)
|
||||
|
||||
NodeA->>NodeA: LoadKeyFromFilePrivate() → priv
|
||||
NodeA->>NodeA: LoadPSKFromFile() → psk
|
||||
|
||||
NodeA->>libp2pA: New(PrivateNetwork(psk), Identity(priv), ListenAddr:4001)
|
||||
libp2pA-->>NodeA: host A (PeerID_A)
|
||||
|
||||
Note over NodeA: isNode == true
|
||||
|
||||
NodeA->>libp2pA: NewGossipSub(ctx, host)
|
||||
libp2pA-->>NodeA: ps (GossipSub)
|
||||
|
||||
NodeA->>IndexerA: ConnectToIndexers → SendHeartbeat /opencloud/heartbeat/1.0
|
||||
Note over IndexerA: Heartbeat long-lived établi<br/>Score qualité calculé (bw + uptime + diversité)
|
||||
IndexerA-->>NodeA: OK
|
||||
|
||||
NodeA->>NodeA: claimInfo(name, hostname)
|
||||
NodeA->>IndexerA: TempStream /opencloud/record/publish/1.0
|
||||
NodeA->>IndexerA: json.Encode(PeerRecord A signé)
|
||||
IndexerA->>IndexerA: DHT.PutValue("/node/"+DID_A, record)
|
||||
|
||||
NodeA->>DBA: NewRequestAdmin(PEER).Search(SELF)
|
||||
DBA-->>NodeA: peer A local (ou UUID généré)
|
||||
|
||||
NodeA->>NodeA: StartGC(30s) — GC sur StreamRecords
|
||||
|
||||
NodeA->>StreamA: InitStream(ctx, host, PeerID_A, 1000, nodeA)
|
||||
StreamA->>StreamA: SetStreamHandler(heartbeat/partner, search, planner, ...)
|
||||
StreamA->>DBA: Search(PEER, PARTNER) → liste partenaires
|
||||
DBA-->>StreamA: [] (aucun partenaire au démarrage)
|
||||
StreamA-->>NodeA: StreamService A
|
||||
|
||||
NodeA->>PubSubA: InitPubSub(ctx, host, ps, nodeA, streamA)
|
||||
PubSubA->>PubSubA: subscribeEvents(PB_SEARCH, timeout=-1)
|
||||
PubSubA-->>NodeA: PubSubService A
|
||||
|
||||
NodeA->>NodeA: SubscribeToSearch(ps, callback)
|
||||
Note over NodeA: callback: GetPeerRecord(evt.From)<br/>→ StreamService.SendResponse
|
||||
|
||||
NodeA->>NATSA: ListenNATS(nodeA)
|
||||
Note over NATSA: Enregistre handlers:<br/>CREATE_RESOURCE, PROPALGATION_EVENT
|
||||
|
||||
NodeA-->>MainA: *Node A prêt
|
||||
@@ -1,46 +1,50 @@
|
||||
@startuml
|
||||
title Node Initialization — Pair A (InitNode)
|
||||
title Node Initialization — Peer A (InitNode)
|
||||
|
||||
participant "main (Pair A)" as MainA
|
||||
participant "main (Peer A)" as MainA
|
||||
participant "Node A" as NodeA
|
||||
participant "libp2p (Pair A)" as libp2pA
|
||||
participant "DB Pair A (oc-lib)" as DBA
|
||||
participant "libp2p (Peer A)" as libp2pA
|
||||
participant "ConnectionGater A" as GaterA
|
||||
participant "DB Peer A (oc-lib)" as DBA
|
||||
participant "NATS A" as NATSA
|
||||
participant "Indexer (partagé)" as IndexerA
|
||||
participant "Indexer (shared)" as IndexerA
|
||||
participant "DHT A" as DHTA
|
||||
participant "StreamService A" as StreamA
|
||||
participant "PubSubService A" as PubSubA
|
||||
|
||||
MainA -> NodeA: InitNode(isNode, isIndexer, isNativeIndexer)
|
||||
MainA -> NodeA: InitNode(isNode=true, isIndexer=false)
|
||||
|
||||
NodeA -> NodeA: LoadKeyFromFilePrivate() → priv
|
||||
NodeA -> NodeA: LoadPSKFromFile() → psk
|
||||
|
||||
NodeA -> libp2pA: New(PrivateNetwork(psk), Identity(priv), ListenAddr:4001)
|
||||
NodeA -> GaterA: newOCConnectionGater(nil)
|
||||
NodeA -> libp2pA: New(\n PrivateNetwork(psk),\n Identity(priv),\n ListenAddr: tcp/4001,\n ConnectionGater(gater)\n)
|
||||
libp2pA --> NodeA: host A (PeerID_A)
|
||||
NodeA -> GaterA: gater.host = host A
|
||||
|
||||
note over NodeA: isNode == true
|
||||
note over GaterA: InterceptSecured (inbound):\n1. DB lookup by peer_id\n → BLACKLIST : refuse\n → found : accept\n2. Not found → DHT sequential check\n (transport-error fallthrough only)
|
||||
|
||||
NodeA -> libp2pA: NewGossipSub(ctx, host)
|
||||
libp2pA --> NodeA: ps (GossipSub)
|
||||
NodeA -> libp2pA: SetStreamHandler(/opencloud/probe/1.0, HandleBandwidthProbe)
|
||||
NodeA -> libp2pA: SetStreamHandler(/opencloud/witness/1.0, HandleWitnessQuery)
|
||||
|
||||
NodeA -> IndexerA: ConnectToIndexers → SendHeartbeat /opencloud/heartbeat/1.0
|
||||
note over IndexerA: Heartbeat long-lived établi\nScore qualité calculé (bw + uptime + diversité)
|
||||
IndexerA --> NodeA: OK
|
||||
NodeA -> libp2pA: NewGossipSub(ctx, host) → ps (GossipSub)
|
||||
|
||||
NodeA -> NodeA: buildRecord() closure\n→ signs fresh PeerRecord (expiry=now+2min)\n embedded in each heartbeat tick
|
||||
|
||||
NodeA -> IndexerA: ConnectToIndexers(host, minIndexer=1, maxIndexer=5, buildRecord)
|
||||
note over IndexerA: Reads IndexerAddresses from config\nAdds seeds → Indexers Directory (IsSeed=true)\nLaunches SendHeartbeat goroutine (20s ticker)
|
||||
|
||||
IndexerA -> DHTA: proactive DHT discovery (after 5s warmup)\ninitNodeDHT(h, seeds)\nDiscoverIndexersFromDHT → SelectByFillRate\n→ add to Indexers Directory + NudgeIt()
|
||||
|
||||
NodeA -> NodeA: claimInfo(name, hostname)
|
||||
NodeA -> IndexerA: TempStream /opencloud/record/publish/1.0
|
||||
NodeA -> IndexerA: json.Encode(PeerRecord A signé)
|
||||
IndexerA -> IndexerA: DHT.PutValue("/node/"+DID_A, record)
|
||||
NodeA -> IndexerA: stream.Encode(Signed PeerRecord A)
|
||||
IndexerA -> DHTA: PutValue("/node/"+DID_A, record)
|
||||
|
||||
NodeA -> DBA: NewRequestAdmin(PEER).Search(SELF)
|
||||
DBA --> NodeA: peer A local (ou UUID généré)
|
||||
|
||||
NodeA -> NodeA: StartGC(30s) — GC sur StreamRecords
|
||||
NodeA -> NodeA: StartGC(30s)
|
||||
|
||||
NodeA -> StreamA: InitStream(ctx, host, PeerID_A, 1000, nodeA)
|
||||
StreamA -> StreamA: SetStreamHandler(heartbeat/partner, search, planner, ...)
|
||||
StreamA -> DBA: Search(PEER, PARTNER) → liste partenaires
|
||||
DBA --> StreamA: [] (aucun partenaire au démarrage)
|
||||
StreamA -> StreamA: SetStreamHandler(resource/search, create, update,\n delete, planner, verify, considers)
|
||||
StreamA --> NodeA: StreamService A
|
||||
|
||||
NodeA -> PubSubA: InitPubSub(ctx, host, ps, nodeA, streamA)
|
||||
@@ -48,11 +52,13 @@ PubSubA -> PubSubA: subscribeEvents(PB_SEARCH, timeout=-1)
|
||||
PubSubA --> NodeA: PubSubService A
|
||||
|
||||
NodeA -> NodeA: SubscribeToSearch(ps, callback)
|
||||
note over NodeA: callback: GetPeerRecord(evt.From)\n→ StreamService.SendResponse
|
||||
note over NodeA: callback: if evt.From != self\n → GetPeerRecord(evt.From)\n → StreamService.SendResponse
|
||||
|
||||
NodeA -> NATSA: ListenNATS(nodeA)
|
||||
note over NATSA: Enregistre handlers:\nCREATE_RESOURCE, PROPALGATION_EVENT
|
||||
note over NATSA: Subscribes:\nCREATE_RESOURCE → partner on-demand\nPROPALGATION_EVENT → resource propagation
|
||||
|
||||
NodeA --> MainA: *Node A prêt
|
||||
NodeA --> MainA: *Node A is ready
|
||||
|
||||
note over NodeA,IndexerA: SendHeartbeat goroutine (permanent, 20s ticker):\nNode → Indexer : Heartbeat{name, PeerID, indexersBinded, need, challenges?, record}\nIndexer → Node : HeartbeatResponse{fillRate, challenges, suggestions, witnesses, suggestMigrate}\nScore updated (7 dimensions), pool managed autonomously
|
||||
|
||||
@enduml
|
||||
|
||||
@@ -1,38 +0,0 @@
|
||||
sequenceDiagram
|
||||
title Node Claim — Pair A publie son PeerRecord (claimInfo + publishPeerRecord)
|
||||
|
||||
participant DBA as DB Pair A (oc-lib)
|
||||
participant NodeA as Node A
|
||||
participant IndexerA as Indexer (partagé)
|
||||
participant DHT as DHT Kademlia
|
||||
participant NATSA as NATS A
|
||||
|
||||
NodeA->>DBA: NewRequestAdmin(PEER).Search(SELF)
|
||||
DBA-->>NodeA: existing peer (DID_A) ou nouveau UUID
|
||||
|
||||
NodeA->>NodeA: LoadKeyFromFilePrivate() → priv A
|
||||
NodeA->>NodeA: LoadKeyFromFilePublic() → pub A
|
||||
NodeA->>NodeA: crypto.MarshalPublicKey(pub A) → pubBytes
|
||||
|
||||
NodeA->>NodeA: Build PeerRecord A {<br/> Name, DID, PubKey,<br/> PeerID: PeerID_A,<br/> APIUrl: hostname,<br/> StreamAddress: /ip4/.../tcp/4001/p2p/PeerID_A,<br/> NATSAddress, WalletAddress<br/>}
|
||||
|
||||
NodeA->>NodeA: sha256(json(rec)) → hash
|
||||
NodeA->>NodeA: priv.Sign(hash) → signature
|
||||
NodeA->>NodeA: rec.ExpiryDate = now + 150s
|
||||
|
||||
loop Pour chaque StaticIndexer (Indexer A, B, …)
|
||||
NodeA->>IndexerA: TempStream /opencloud/record/publish/1.0
|
||||
NodeA->>IndexerA: json.Encode(PeerRecord A signé)
|
||||
|
||||
IndexerA->>IndexerA: Verify signature
|
||||
IndexerA->>IndexerA: Check heartbeat stream actif pour PeerID_A
|
||||
IndexerA->>DHT: PutValue("/node/"+DID_A, PeerRecord A)
|
||||
DHT-->>IndexerA: ok
|
||||
end
|
||||
|
||||
NodeA->>NodeA: rec.ExtractPeer(DID_A, DID_A, pub A)
|
||||
NodeA->>NATSA: SetNATSPub(CREATE_RESOURCE, {PEER, Peer A JSON})
|
||||
NATSA->>DBA: Upsert Peer A (SearchAttr: peer_id)
|
||||
DBA-->>NATSA: ok
|
||||
|
||||
NodeA-->>NodeA: *peer.Peer A (SELF)
|
||||
@@ -1,31 +1,29 @@
|
||||
@startuml
|
||||
title Node Claim — Pair A publie son PeerRecord (claimInfo + publishPeerRecord)
|
||||
title Node Claim — Peer A publish its PeerRecord (claimInfo + publishPeerRecord)
|
||||
|
||||
participant "DB Pair A (oc-lib)" as DBA
|
||||
participant "DB Peer A (oc-lib)" as DBA
|
||||
participant "Node A" as NodeA
|
||||
participant "Indexer (partagé)" as IndexerA
|
||||
participant "Indexer (shared)" as IndexerA
|
||||
participant "DHT Kademlia" as DHT
|
||||
participant "NATS A" as NATSA
|
||||
|
||||
NodeA -> DBA: NewRequestAdmin(PEER).Search(SELF)
|
||||
DBA --> NodeA: existing peer (DID_A) ou nouveau UUID
|
||||
NodeA -> DBA: DB(PEER).Search(SELF)
|
||||
DBA --> NodeA: existing peer (DID_A) or new UUID
|
||||
|
||||
NodeA -> NodeA: LoadKeyFromFilePrivate() → priv A
|
||||
NodeA -> NodeA: LoadKeyFromFilePublic() → pub A
|
||||
NodeA -> NodeA: crypto.MarshalPublicKey(pub A) → pubBytes
|
||||
|
||||
NodeA -> NodeA: Build PeerRecord A {\n Name, DID, PubKey,\n PeerID: PeerID_A,\n APIUrl: hostname,\n StreamAddress: /ip4/.../tcp/4001/p2p/PeerID_A,\n NATSAddress, WalletAddress\n}
|
||||
|
||||
NodeA -> NodeA: sha256(json(rec)) → hash
|
||||
NodeA -> NodeA: priv.Sign(hash) → signature
|
||||
NodeA -> NodeA: priv.Sign(rec) → signature
|
||||
NodeA -> NodeA: rec.ExpiryDate = now + 150s
|
||||
|
||||
loop Pour chaque StaticIndexer (Indexer A, B, ...)
|
||||
loop For every Node Binded Indexer (Indexer A, B, ...)
|
||||
NodeA -> IndexerA: TempStream /opencloud/record/publish/1.0
|
||||
NodeA -> IndexerA: json.Encode(PeerRecord A signé)
|
||||
NodeA -> IndexerA: strea!.Encode(Signed PeerRecord A)
|
||||
|
||||
IndexerA -> IndexerA: Verify signature
|
||||
IndexerA -> IndexerA: Check heartbeat stream actif pour PeerID_A
|
||||
IndexerA -> IndexerA: Check PeerID_A heartbeat stream
|
||||
IndexerA -> DHT: PutValue("/node/"+DID_A, PeerRecord A)
|
||||
DHT --> IndexerA: ok
|
||||
end
|
||||
|
||||
@@ -1,47 +0,0 @@
|
||||
sequenceDiagram
|
||||
title Indexer — Heartbeat double (Pair A + Pair B → Indexer partagé)
|
||||
|
||||
participant NodeA as Node A
|
||||
participant NodeB as Node B
|
||||
participant Indexer as IndexerService (partagé)
|
||||
|
||||
Note over NodeA,NodeB: Chaque pair tick toutes les 20s
|
||||
|
||||
par Pair A heartbeat
|
||||
NodeA->>Indexer: NewStream /opencloud/heartbeat/1.0
|
||||
NodeA->>Indexer: json.Encode(Heartbeat A {Name, DID_A, PeerID_A, IndexersBinded})
|
||||
|
||||
Indexer->>Indexer: CheckHeartbeat(host, stream, streams, mu, maxNodes)
|
||||
Note over Indexer: len(peers) < maxNodes ?
|
||||
|
||||
Indexer->>Indexer: getBandwidthChallenge(512–2048 bytes, stream)
|
||||
Indexer->>NodeA: Write(random payload)
|
||||
NodeA->>Indexer: Echo(same payload)
|
||||
Indexer->>Indexer: Mesure round-trip → Mbps A
|
||||
|
||||
Indexer->>Indexer: getDiversityRate(host, IndexersBinded_A)
|
||||
Note over Indexer: /24 subnet diversity des indexeurs liés
|
||||
|
||||
Indexer->>Indexer: ComputeIndexerScore(uptimeA%, MbpsA%, diversityA%)
|
||||
Note over Indexer: Score = 0.4×uptime + 0.4×bpms + 0.2×diversity
|
||||
|
||||
alt Score A < 75
|
||||
Indexer->>NodeA: (close stream)
|
||||
else Score A ≥ 75
|
||||
Indexer->>Indexer: StreamRecord[PeerID_A] = {DID_A, Heartbeat, UptimeTracker}
|
||||
end
|
||||
and Pair B heartbeat
|
||||
NodeB->>Indexer: NewStream /opencloud/heartbeat/1.0
|
||||
NodeB->>Indexer: json.Encode(Heartbeat B {Name, DID_B, PeerID_B, IndexersBinded})
|
||||
|
||||
Indexer->>Indexer: CheckHeartbeat → getBandwidthChallenge
|
||||
Indexer->>NodeB: Write(random payload)
|
||||
NodeB->>Indexer: Echo(same payload)
|
||||
Indexer->>Indexer: ComputeIndexerScore(uptimeB%, MbpsB%, diversityB%)
|
||||
|
||||
alt Score B ≥ 75
|
||||
Indexer->>Indexer: StreamRecord[PeerID_B] = {DID_B, Heartbeat, UptimeTracker}
|
||||
end
|
||||
end
|
||||
|
||||
Note over Indexer: Les deux pairs sont désormais<br/>enregistrés avec leurs streams actifs
|
||||
@@ -1,49 +1,59 @@
|
||||
@startuml
|
||||
title Indexer — Heartbeat double (Pair A + Pair B → Indexer partagé)
|
||||
@startuml indexer_heartbeat
|
||||
title Heartbeat bidirectionnel node → indexeur (scoring 7 dimensions + challenges)
|
||||
|
||||
participant "Node A" as NodeA
|
||||
participant "Node B" as NodeB
|
||||
participant "IndexerService (partagé)" as Indexer
|
||||
participant "IndexerService" as Indexer
|
||||
|
||||
note over NodeA,NodeB: Chaque pair tick toutes les 20s
|
||||
note over NodeA,NodeB: SendHeartbeat goroutine — tick every 20s
|
||||
|
||||
par Pair A heartbeat
|
||||
NodeA -> Indexer: NewStream /opencloud/heartbeat/1.0
|
||||
NodeA -> Indexer: json.Encode(Heartbeat A {Name, DID_A, PeerID_A, IndexersBinded})
|
||||
== Tick Node A ==
|
||||
|
||||
Indexer -> Indexer: CheckHeartbeat(host, stream, streams, mu, maxNodes)
|
||||
note over Indexer: len(peers) < maxNodes ?
|
||||
NodeA -> Indexer: NewStream /opencloud/heartbeat/1.0\n(long-lived, réutilisé aux ticks suivants)
|
||||
NodeA -> Indexer: stream.Encode(Heartbeat{\n name, PeerID_A, timestamp,\n indexersBinded: [addr1, addr2],\n need: maxPool - len(pool),\n challenges: [PeerID_A, PeerID_B], ← batch (tous les 1-10 HBs)\n challengeDID: "uuid-did-A", ← DHT challenge (tous les 5 batches)\n record: SignedPeerRecord_A ← expiry=now+2min\n})
|
||||
|
||||
Indexer -> Indexer: getBandwidthChallenge(512-2048 bytes, stream)
|
||||
Indexer -> NodeA: Write(random payload)
|
||||
NodeA -> Indexer: Echo(same payload)
|
||||
Indexer -> Indexer: Mesure round-trip → Mbps A
|
||||
Indexer -> Indexer: CheckHeartbeat(stream, maxNodes)\n→ len(Peers()) >= maxNodes → reject
|
||||
|
||||
Indexer -> Indexer: getDiversityRate(host, IndexersBinded_A)
|
||||
note over Indexer: /24 subnet diversity des indexeurs liés
|
||||
Indexer -> Indexer: HandleHeartbeat → UptimeTracker.RecordHeartbeat()\n→ gap ≤ 2×interval : TotalOnline += gap
|
||||
|
||||
Indexer -> Indexer: ComputeIndexerScore(uptimeA%, MbpsA%, diversityA%)
|
||||
note over Indexer: Score = 0.4×uptime + 0.4×bpms + 0.2×diversity
|
||||
Indexer -> Indexer: Republish PeerRecord A to DHT\nDHT.PutValue("/node/"+DID_A, record_A)
|
||||
|
||||
alt Score A < 75
|
||||
Indexer -> NodeA: (close stream)
|
||||
else Score A >= 75
|
||||
Indexer -> Indexer: StreamRecord[PeerID_A] = {DID_A, Heartbeat, UptimeTracker}
|
||||
== Réponse indexeur → node A ==
|
||||
|
||||
Indexer -> Indexer: BuildHeartbeatResponse(remotePeer=A, need, challenges, challengeDID)\n\nfillRate = connected_nodes / MaxNodesConn()\npeerCount = connected_nodes\nmaxNodes = MaxNodesConn()\nbornAt = time of indexer startup\n\nChallenges: pour chaque PeerID challengé\n found = PeerID dans StreamRecords[ProtocolHeartbeat]?\n lastSeen = HeartbeatStream.UptimeTracker.LastSeen\n\nDHT challenge:\n DHT.GetValue("/node/"+challengeDID, timeout=3s)\n → dhtFound + dhtPayload\n\nWitnesses: jusqu'à 3 AddrInfos de nœuds connectés\n (adresses connues dans Peerstore)\n\nSuggestions: jusqu'à `need` indexeurs depuis dhtCache\n (refresh asynchrone 2min, SelectByFillRate)\n\nSuggestMigrate: fillRate > 80%\n ET node dans offload.inBatch (batch ≤ 5, grace 3×HB)
|
||||
|
||||
Indexer --> NodeA: stream.Encode(HeartbeatResponse{\n fillRate, peerCount, maxNodes, bornAt,\n challenges, dhtFound, dhtPayload,\n witnesses, suggestions, suggestMigrate\n})
|
||||
|
||||
== Traitement score côté Node A ==
|
||||
|
||||
NodeA -> NodeA: score = ensureScore(Indexers, addr_indexer)\nscore.UptimeTracker.RecordHeartbeat()\n\nlatencyScore = max(0, 1 - RTT / (BaseRoundTrip × 10))\n\nBornAt stability:\n bornAt changed? → score.bornAtChanges++\n\nfillConsistency:\n expected = peerCount / maxNodes\n |expected - fillRate| < 10% → fillConsistent++\n\nChallenge PeerID (ground truth own PeerID):\n found=true AND lastSeen < 2×interval → challengeCorrect++\n\nDHT challenge:\n dhtFound=true → dhtSuccess++\n\nWitness query (async):\n go queryWitnesses(h, indexerID, bornAt, fillRate, witnesses, score)
|
||||
|
||||
NodeA -> NodeA: score.Score = ComputeNodeSideScore(latencyScore)\n\nScore = (\n 0.20 × uptimeRatio\n+ 0.20 × challengeAccuracy\n+ 0.15 × latencyScore\n+ 0.10 × fillScore ← 1 - fillRate\n+ 0.10 × fillConsistency\n+ 0.15 × witnessConsistency\n+ 0.10 × dhtSuccessRate\n) × 100 × bornAtPenalty\n\nbornAtPenalty = max(0, 1 - 0.30 × bornAtChanges)\nminScore = clamp(20 + 60 × (age.Hours/24), 20, 80)
|
||||
|
||||
alt score < minScore\n AND TotalOnline ≥ 2×interval\n AND !IsSeed\n AND len(pool) > 1
|
||||
NodeA -> NodeA: evictPeer(dir, addr, id, proto)\n→ delete Addr + Score + Stream\ngo TriggerConsensus(h, voters, need)\n ou replenishIndexersFromDHT(h, need)
|
||||
end
|
||||
|
||||
alt resp.SuggestMigrate == true AND nonSeedCount >= MinIndexer
|
||||
alt IsSeed
|
||||
NodeA -> NodeA: score.IsSeed = false\n(de-stickied — score eviction maintenant possible)
|
||||
else !IsSeed
|
||||
NodeA -> NodeA: evictPeer → migration acceptée
|
||||
end
|
||||
else Pair B heartbeat
|
||||
NodeB -> Indexer: NewStream /opencloud/heartbeat/1.0
|
||||
NodeB -> Indexer: json.Encode(Heartbeat B {Name, DID_B, PeerID_B, IndexersBinded})
|
||||
end
|
||||
|
||||
Indexer -> Indexer: CheckHeartbeat → getBandwidthChallenge
|
||||
Indexer -> NodeB: Write(random payload)
|
||||
NodeB -> Indexer: Echo(same payload)
|
||||
Indexer -> Indexer: ComputeIndexerScore(uptimeB%, MbpsB%, diversityB%)
|
||||
alt len(resp.Suggestions) > 0
|
||||
NodeA -> NodeA: handleSuggestions(dir, indexerID, suggestions)\n→ inconnus ajoutés à Indexers Directory\n→ NudgeIt() si ajout effectif
|
||||
end
|
||||
|
||||
alt Score B >= 75
|
||||
Indexer -> Indexer: StreamRecord[PeerID_B] = {DID_B, Heartbeat, UptimeTracker}
|
||||
end
|
||||
end par
|
||||
== Tick Node B (concurrent) ==
|
||||
|
||||
note over Indexer: Les deux pairs sont désormais\nenregistrés avec leurs streams actifs
|
||||
NodeB -> Indexer: stream.Encode(Heartbeat{PeerID_B, ...})
|
||||
Indexer -> Indexer: CheckHeartbeat → UptimeTracker → BuildHeartbeatResponse
|
||||
Indexer --> NodeB: HeartbeatResponse{...}
|
||||
|
||||
== GC côté Indexeur ==
|
||||
|
||||
note over Indexer: GC ticker 30s — gc()\nnow.After(Expiry) où Expiry = lastHBTime + 2min\n→ AfterDelete(pid, name, did) hors lock\n→ publishNameEvent(NameIndexDelete, ...)\nFillRate recalculé automatiquement
|
||||
|
||||
@enduml
|
||||
|
||||
@@ -1,41 +0,0 @@
|
||||
sequenceDiagram
|
||||
title Indexer — Pair A publie, Pair B publie (handleNodePublish → DHT)
|
||||
|
||||
participant NodeA as Node A
|
||||
participant NodeB as Node B
|
||||
participant Indexer as IndexerService (partagé)
|
||||
participant DHT as DHT Kademlia
|
||||
|
||||
Note over NodeA: Après claimInfo ou refresh TTL
|
||||
|
||||
par Pair A publie son PeerRecord
|
||||
NodeA->>Indexer: TempStream /opencloud/record/publish/1.0
|
||||
NodeA->>Indexer: json.Encode(PeerRecord A {DID_A, PeerID_A, PubKey_A, Expiry, Sig_A})
|
||||
|
||||
Indexer->>Indexer: Verify sig_A (reconstruit rec minimal, pubKey_A.Verify)
|
||||
Indexer->>Indexer: Check StreamRecords[Heartbeat][PeerID_A] existe
|
||||
|
||||
alt Heartbeat actif pour A
|
||||
Indexer->>Indexer: StreamRecord A → DID_A, Record=PeerRecord A, LastSeen=now
|
||||
Indexer->>DHT: PutValue("/node/"+DID_A, PeerRecord A JSON)
|
||||
DHT-->>Indexer: ok
|
||||
else Pas de heartbeat
|
||||
Indexer->>NodeA: (erreur "no heartbeat", stream close)
|
||||
end
|
||||
and Pair B publie son PeerRecord
|
||||
NodeB->>Indexer: TempStream /opencloud/record/publish/1.0
|
||||
NodeB->>Indexer: json.Encode(PeerRecord B {DID_B, PeerID_B, PubKey_B, Expiry, Sig_B})
|
||||
|
||||
Indexer->>Indexer: Verify sig_B
|
||||
Indexer->>Indexer: Check StreamRecords[Heartbeat][PeerID_B] existe
|
||||
|
||||
alt Heartbeat actif pour B
|
||||
Indexer->>Indexer: StreamRecord B → DID_B, Record=PeerRecord B, LastSeen=now
|
||||
Indexer->>DHT: PutValue("/node/"+DID_B, PeerRecord B JSON)
|
||||
DHT-->>Indexer: ok
|
||||
else Pas de heartbeat
|
||||
Indexer->>NodeB: (erreur "no heartbeat", stream close)
|
||||
end
|
||||
end
|
||||
|
||||
Note over DHT: DHT contient maintenant<br/>"/node/DID_A" et "/node/DID_B"
|
||||
@@ -1,43 +1,47 @@
|
||||
@startuml
|
||||
title Indexer — Pair A publie, Pair B publie (handleNodePublish → DHT)
|
||||
title Indexer — Peer A publishing, Peer B publishing (handleNodePublish → DHT)
|
||||
|
||||
participant "Node A" as NodeA
|
||||
participant "Node B" as NodeB
|
||||
participant "IndexerService (partagé)" as Indexer
|
||||
participant "IndexerService (shared)" as Indexer
|
||||
participant "DHT Kademlia" as DHT
|
||||
|
||||
note over NodeA: Après claimInfo ou refresh TTL
|
||||
note over NodeA: Start after claimInfo or refresh TTL
|
||||
|
||||
par Pair A publie son PeerRecord
|
||||
par Peer A publish its PeerRecord
|
||||
NodeA -> Indexer: TempStream /opencloud/record/publish/1.0
|
||||
NodeA -> Indexer: json.Encode(PeerRecord A {DID_A, PeerID_A, PubKey_A, Expiry, Sig_A})
|
||||
NodeA -> Indexer: stream.Encode(PeerRecord A {DID_A, PeerID_A, PubKey_A, Expiry, Sig_A})
|
||||
|
||||
Indexer -> Indexer: Verify sig_A (reconstruit rec minimal, pubKey_A.Verify)
|
||||
Indexer -> Indexer: Check StreamRecords[Heartbeat][PeerID_A] existe
|
||||
|
||||
alt Heartbeat actif pour A
|
||||
alt A active Heartbeat
|
||||
Indexer -> Indexer: StreamRecord A → DID_A, Record=PeerRecord A, LastSeen=now
|
||||
Indexer -> DHT: PutValue("/node/"+DID_A, PeerRecord A JSON)
|
||||
Indexer -> DHT: PutValue("/name/"+name_A, DID_A)
|
||||
Indexer -> DHT: PutValue("/peer/"+peer_id_A, DID_A)
|
||||
DHT --> Indexer: ok
|
||||
else Pas de heartbeat
|
||||
Indexer -> NodeA: (erreur "no heartbeat", stream close)
|
||||
end
|
||||
else Pair B publie son PeerRecord
|
||||
else Peer B publish its PeerRecord
|
||||
NodeB -> Indexer: TempStream /opencloud/record/publish/1.0
|
||||
NodeB -> Indexer: json.Encode(PeerRecord B {DID_B, PeerID_B, PubKey_B, Expiry, Sig_B})
|
||||
NodeB -> Indexer: stream.Encode(PeerRecord B {DID_B, PeerID_B, PubKey_B, Expiry, Sig_B})
|
||||
|
||||
Indexer -> Indexer: Verify sig_B
|
||||
Indexer -> Indexer: Check StreamRecords[Heartbeat][PeerID_B] existe
|
||||
|
||||
alt Heartbeat actif pour B
|
||||
alt B Active Heartbeat
|
||||
Indexer -> Indexer: StreamRecord B → DID_B, Record=PeerRecord B, LastSeen=now
|
||||
Indexer -> DHT: PutValue("/node/"+DID_B, PeerRecord B JSON)
|
||||
Indexer -> DHT: PutValue("/name/"+name_B, DID_B)
|
||||
Indexer -> DHT: PutValue("/peer/"+peer_id_B, DID_B)
|
||||
DHT --> Indexer: ok
|
||||
else Pas de heartbeat
|
||||
Indexer -> NodeB: (erreur "no heartbeat", stream close)
|
||||
end
|
||||
end par
|
||||
|
||||
note over DHT: DHT contient maintenant\n"/node/DID_A" et "/node/DID_B"
|
||||
note over DHT: DHT got \n"/node/DID_A" et "/node/DID_B"
|
||||
|
||||
@enduml
|
||||
|
||||
@@ -1,49 +0,0 @@
|
||||
sequenceDiagram
|
||||
title Indexer — Pair A résout Pair B (GetPeerRecord + handleNodeGet)
|
||||
|
||||
participant NATSA as NATS A
|
||||
participant DBA as DB Pair A (oc-lib)
|
||||
participant NodeA as Node A
|
||||
participant Indexer as IndexerService (partagé)
|
||||
participant DHT as DHT Kademlia
|
||||
participant NATSA2 as NATS A (retour)
|
||||
|
||||
Note over NodeA: Déclenché par : NATS PB_SEARCH PEER<br/>ou callback SubscribeToSearch
|
||||
|
||||
NodeA->>DBA: NewRequestAdmin(PEER).Search(DID_B ou PeerID_B)
|
||||
DBA-->>NodeA: Peer B local (si connu) → résout DID_B + PeerID_B<br/>sinon utilise la valeur brute
|
||||
|
||||
loop Pour chaque StaticIndexer
|
||||
NodeA->>Indexer: TempStream /opencloud/record/get/1.0
|
||||
NodeA->>Indexer: json.Encode(GetValue{Key: DID_B, PeerID: PeerID_B})
|
||||
|
||||
Indexer->>Indexer: key = "/node/" + DID_B
|
||||
Indexer->>DHT: SearchValue(ctx 10s, "/node/"+DID_B)
|
||||
DHT-->>Indexer: channel de bytes (PeerRecord B)
|
||||
|
||||
loop Pour chaque résultat DHT
|
||||
Indexer->>Indexer: Unmarshal → PeerRecord B
|
||||
alt PeerRecord.PeerID == PeerID_B
|
||||
Indexer->>Indexer: resp.Found=true, resp.Records[PeerID_B]=PeerRecord B
|
||||
Indexer->>Indexer: StreamRecord B.LastSeen = now (si heartbeat actif)
|
||||
end
|
||||
end
|
||||
|
||||
Indexer->>NodeA: json.Encode(GetResponse{Found:true, Records:{PeerID_B: PeerRecord B}})
|
||||
end
|
||||
|
||||
loop Pour chaque PeerRecord retourné
|
||||
NodeA->>NodeA: rec.Verify() → valide signature de B
|
||||
NodeA->>NodeA: rec.ExtractPeer(ourDID_A, DID_B, pubKey_B)
|
||||
|
||||
alt ourDID_A == DID_B (c'est notre propre entrée)
|
||||
Note over NodeA: Republier pour rafraîchir le TTL
|
||||
NodeA->>Indexer: publishPeerRecord(rec) [refresh 2 min]
|
||||
end
|
||||
|
||||
NodeA->>NATSA2: SetNATSPub(CREATE_RESOURCE, {PEER, Peer B JSON,<br/>SearchAttr:"peer_id"})
|
||||
NATSA2->>DBA: Upsert Peer B dans DB A
|
||||
DBA-->>NATSA2: ok
|
||||
end
|
||||
|
||||
NodeA-->>NodeA: []*peer.Peer → [Peer B]
|
||||
@@ -1,5 +1,5 @@
|
||||
@startuml
|
||||
title Indexer — Pair A résout Pair B (GetPeerRecord + handleNodeGet)
|
||||
title Indexer — Peer A discover Peer B (GetPeerRecord + handleNodeGet)
|
||||
|
||||
participant "NATS A" as NATSA
|
||||
participant "DB Pair A (oc-lib)" as DBA
|
||||
@@ -8,41 +8,41 @@ participant "IndexerService (partagé)" as Indexer
|
||||
participant "DHT Kademlia" as DHT
|
||||
participant "NATS A (retour)" as NATSA2
|
||||
|
||||
note over NodeA: Déclenché par : NATS PB_SEARCH PEER\nou callback SubscribeToSearch
|
||||
note over NodeA: Trigger : NATS PB_SEARCH PEER\nor callback SubscribeToSearch
|
||||
|
||||
NodeA -> DBA: NewRequestAdmin(PEER).Search(DID_B ou PeerID_B)
|
||||
DBA --> NodeA: Peer B local (si connu) → résout DID_B + PeerID_B\nsinon utilise la valeur brute
|
||||
NodeA -> DBA: (PEER).Search(DID_B or PeerID_B)
|
||||
DBA --> NodeA: Local Peer B (if known) → solve DID_B + PeerID_B\nor use search value
|
||||
|
||||
loop Pour chaque StaticIndexer
|
||||
NodeA -> Indexer: TempStream /opencloud/record/get/1.0
|
||||
NodeA -> Indexer: json.Encode(GetValue{Key: DID_B, PeerID: PeerID_B})
|
||||
loop For every Peer A Binded Indexer
|
||||
NodeA -> Indexer: TempStream /opencloud/record/get/1.0 -> streamAI
|
||||
NodeA -> Indexer: streamAI.Encode(GetValue{Key: DID_B, PeerID: PeerID_B})
|
||||
|
||||
Indexer -> Indexer: key = "/node/" + DID_B
|
||||
Indexer -> DHT: SearchValue(ctx 10s, "/node/"+DID_B)
|
||||
DHT --> Indexer: channel de bytes (PeerRecord B)
|
||||
|
||||
loop Pour chaque résultat DHT
|
||||
Indexer -> Indexer: Unmarshal → PeerRecord B
|
||||
loop Pour every results in DHT
|
||||
Indexer -> Indexer: read → PeerRecord B
|
||||
alt PeerRecord.PeerID == PeerID_B
|
||||
Indexer -> Indexer: resp.Found=true, resp.Records[PeerID_B]=PeerRecord B
|
||||
Indexer -> Indexer: StreamRecord B.LastSeen = now (si heartbeat actif)
|
||||
Indexer -> Indexer: StreamRecord B.LastSeen = now (if active heartbeat)
|
||||
end
|
||||
end
|
||||
|
||||
Indexer -> NodeA: json.Encode(GetResponse{Found:true, Records:{PeerID_B: PeerRecord B}})
|
||||
Indexer -> NodeA: streamAI.Encode(GetResponse{Found:true, Records:{PeerID_B: PeerRecord B}})
|
||||
end
|
||||
|
||||
loop Pour chaque PeerRecord retourné
|
||||
NodeA -> NodeA: rec.Verify() → valide signature de B
|
||||
loop For every PeerRecord founded
|
||||
NodeA -> NodeA: rec.Verify() → valid B signature
|
||||
NodeA -> NodeA: rec.ExtractPeer(ourDID_A, DID_B, pubKey_B)
|
||||
|
||||
alt ourDID_A == DID_B (c'est notre propre entrée)
|
||||
note over NodeA: Republier pour rafraîchir le TTL
|
||||
alt ourDID_A == DID_B (it's our proper entry)
|
||||
note over NodeA: Republish to refresh TTL
|
||||
NodeA -> Indexer: publishPeerRecord(rec) [refresh 2 min]
|
||||
end
|
||||
|
||||
NodeA -> NATSA2: SetNATSPub(CREATE_RESOURCE, {PEER, Peer B JSON,\nSearchAttr:"peer_id"})
|
||||
NATSA2 -> DBA: Upsert Peer B dans DB A
|
||||
NATSA2 -> DBA: Upsert Peer B in DB A
|
||||
DBA --> NATSA2: ok
|
||||
end
|
||||
|
||||
|
||||
@@ -1,39 +0,0 @@
|
||||
sequenceDiagram
|
||||
title Native Indexer — Enregistrement d'un Indexer auprès du Native
|
||||
|
||||
participant IndexerA as Indexer A
|
||||
participant IndexerB as Indexer B
|
||||
participant Native as Native Indexer (partagé)
|
||||
participant DHT as DHT Kademlia
|
||||
participant PubSub as GossipSub (oc-indexer-registry)
|
||||
|
||||
Note over IndexerA,IndexerB: Au démarrage + toutes les 60s (StartNativeRegistration)
|
||||
|
||||
par Indexer A s'enregistre
|
||||
IndexerA->>IndexerA: Build IndexerRegistration{PeerID_A, Addr_A}
|
||||
IndexerA->>Native: NewStream /opencloud/native/subscribe/1.0
|
||||
IndexerA->>Native: json.Encode(IndexerRegistration A)
|
||||
|
||||
Native->>Native: Decode → liveIndexerEntry{PeerID_A, Addr_A, ExpiresAt=now+66s}
|
||||
Native->>DHT: PutValue("/indexer/"+PeerID_A, entry A)
|
||||
DHT-->>Native: ok
|
||||
Native->>Native: liveIndexers[PeerID_A] = entry A
|
||||
Native->>Native: knownPeerIDs[PeerID_A] = {}
|
||||
|
||||
Native->>PubSub: topic.Publish([]byte(PeerID_A))
|
||||
Note over PubSub: Gossipé aux autres Natives<br/>→ ils ajoutent PeerID_A à knownPeerIDs<br/>→ refresh DHT au prochain tick 30s
|
||||
IndexerA->>Native: stream.Close()
|
||||
and Indexer B s'enregistre
|
||||
IndexerB->>IndexerB: Build IndexerRegistration{PeerID_B, Addr_B}
|
||||
IndexerB->>Native: NewStream /opencloud/native/subscribe/1.0
|
||||
IndexerB->>Native: json.Encode(IndexerRegistration B)
|
||||
|
||||
Native->>Native: Decode → liveIndexerEntry{PeerID_B, Addr_B, ExpiresAt=now+66s}
|
||||
Native->>DHT: PutValue("/indexer/"+PeerID_B, entry B)
|
||||
DHT-->>Native: ok
|
||||
Native->>Native: liveIndexers[PeerID_B] = entry B
|
||||
Native->>PubSub: topic.Publish([]byte(PeerID_B))
|
||||
IndexerB->>Native: stream.Close()
|
||||
end
|
||||
|
||||
Note over Native: liveIndexers = {PeerID_A: entryA, PeerID_B: entryB}
|
||||
@@ -1,41 +1,49 @@
|
||||
@startuml
|
||||
title Native Indexer — Enregistrement d'un Indexer auprès du Native
|
||||
@startuml native_registration
|
||||
title Native Indexer — Indexer Subscription (StartNativeRegistration)
|
||||
|
||||
participant "Indexer A" as IndexerA
|
||||
participant "Indexer B" as IndexerB
|
||||
participant "Native Indexer (partagé)" as Native
|
||||
participant "Native Indexer" as Native
|
||||
participant "DHT Kademlia" as DHT
|
||||
participant "GossipSub (oc-indexer-registry)" as PubSub
|
||||
|
||||
note over IndexerA,IndexerB: Au démarrage + toutes les 60s (StartNativeRegistration)
|
||||
note over IndexerA,IndexerB: At start + every 60s (RecommendedHeartbeatInterval)\\nStartNativeRegistration → RegisterWithNative
|
||||
|
||||
par Indexer A subscribe
|
||||
IndexerA -> IndexerA: fillRateFn()\\n= len(StreamRecords[HB]) / maxNodes
|
||||
|
||||
IndexerA -> IndexerA: Build IndexerRegistration{\\n PeerID_A, Addr_A,\\n Timestamp=now.UnixNano(),\\n FillRate=fillRateFn(),\\n PubKey, Signature\\n}\\nreg.Sign(h)
|
||||
|
||||
par Indexer A s'enregistre
|
||||
IndexerA -> IndexerA: Build IndexerRegistration{PeerID_A, Addr_A}
|
||||
IndexerA -> Native: NewStream /opencloud/native/subscribe/1.0
|
||||
IndexerA -> Native: json.Encode(IndexerRegistration A)
|
||||
IndexerA -> Native: stream.Encode(IndexerRegistration A)
|
||||
|
||||
Native -> Native: reg.Verify() — verify signature
|
||||
Native -> Native: liveIndexerEntry{\\n PeerID_A, Addr_A,\\n ExpiresAt = now + IndexerTTL (90s),\\n FillRate = reg.FillRate,\\n PubKey, Signature\\n}
|
||||
Native -> Native: liveIndexers[PeerID_A] = entry A
|
||||
Native -> Native: knownPeerIDs[PeerID_A] = Addr_A
|
||||
|
||||
Native -> Native: Decode → liveIndexerEntry{PeerID_A, Addr_A, ExpiresAt=now+66s}
|
||||
Native -> DHT: PutValue("/indexer/"+PeerID_A, entry A)
|
||||
DHT --> Native: ok
|
||||
Native -> Native: liveIndexers[PeerID_A] = entry A
|
||||
Native -> Native: knownPeerIDs[PeerID_A] = {}
|
||||
|
||||
Native -> PubSub: topic.Publish([]byte(PeerID_A))
|
||||
note over PubSub: Gossipé aux autres Natives\n→ ils ajoutent PeerID_A à knownPeerIDs\n→ refresh DHT au prochain tick 30s
|
||||
IndexerA -> Native: stream.Close()
|
||||
else Indexer B s'enregistre
|
||||
IndexerB -> IndexerB: Build IndexerRegistration{PeerID_B, Addr_B}
|
||||
IndexerB -> Native: NewStream /opencloud/native/subscribe/1.0
|
||||
IndexerB -> Native: json.Encode(IndexerRegistration B)
|
||||
note over PubSub: Gossip to other Natives\\n→ it adds PeerID_A to knownPeerIDs\\n→ refresh DHT next tick (30s)
|
||||
|
||||
Native -> Native: Decode → liveIndexerEntry{PeerID_B, Addr_B, ExpiresAt=now+66s}
|
||||
Native -> DHT: PutValue("/indexer/"+PeerID_B, entry B)
|
||||
DHT --> Native: ok
|
||||
IndexerA -> Native: stream.Close()
|
||||
|
||||
else Indexer B subscribe
|
||||
IndexerB -> IndexerB: fillRateFn() + reg.Sign(h)
|
||||
IndexerB -> Native: NewStream /opencloud/native/subscribe/1.0
|
||||
IndexerB -> Native: stream.Encode(IndexerRegistration B)
|
||||
|
||||
Native -> Native: reg.Verify() + liveIndexerEntry{FillRate=reg.FillRate, ExpiresAt=now+90s}
|
||||
Native -> Native: liveIndexers[PeerID_B] = entry B
|
||||
Native -> DHT: PutValue("/indexer/"+PeerID_B, entry B)
|
||||
Native -> PubSub: topic.Publish([]byte(PeerID_B))
|
||||
IndexerB -> Native: stream.Close()
|
||||
end par
|
||||
|
||||
note over Native: liveIndexers = {PeerID_A: entryA, PeerID_B: entryB}
|
||||
note over Native: liveIndexers = {PeerID_A: {FillRate:0.3}, PeerID_B: {FillRate:0.6}}\\nTTL 90s — IndexerTTL
|
||||
|
||||
note over Native: Explicit unsubcrive on stop :\\nUnregisterFromNative → /opencloud/native/unsubscribe/1.0\\nNative close all now.
|
||||
|
||||
@enduml
|
||||
|
||||
@@ -1,60 +0,0 @@
|
||||
sequenceDiagram
|
||||
title Native — ConnectToNatives + Consensus (Pair A bootstrap)
|
||||
|
||||
participant NodeA as Node A
|
||||
participant Native1 as Native #1 (primary)
|
||||
participant Native2 as Native #2
|
||||
participant NativeN as Native #N
|
||||
participant DHT as DHT Kademlia
|
||||
|
||||
Note over NodeA: NativeIndexerAddresses configuré<br/>Appelé pendant InitNode → ConnectToIndexers
|
||||
|
||||
NodeA->>NodeA: Parse NativeIndexerAddresses → StaticNatives
|
||||
NodeA->>Native1: SendHeartbeat /opencloud/heartbeat/1.0 (20s tick)
|
||||
NodeA->>Native2: SendHeartbeat /opencloud/heartbeat/1.0 (20s tick)
|
||||
|
||||
%% Étape 1 : récupérer un pool initial
|
||||
NodeA->>Native1: Connect + NewStream /opencloud/native/indexers/1.0
|
||||
NodeA->>Native1: json.Encode(GetIndexersRequest{Count: maxIndexer})
|
||||
|
||||
Native1->>Native1: reachableLiveIndexers()
|
||||
Note over Native1: Filtre liveIndexers par TTL<br/>ping chaque candidat (PeerIsAlive)
|
||||
|
||||
alt Aucun indexer connu par Native1
|
||||
Native1->>Native1: selfDelegate(NodeA.PeerID, resp)
|
||||
Note over Native1: IsSelfFallback=true<br/>Indexers=[native1 addr]
|
||||
Native1->>NodeA: GetIndexersResponse{IsSelfFallback:true, Indexers:[native1]}
|
||||
NodeA->>NodeA: StaticIndexers[native1] = native1
|
||||
Note over NodeA: Pas de consensus — native1 utilisé directement comme indexeur
|
||||
else Indexers disponibles
|
||||
Native1->>NodeA: GetIndexersResponse{Indexers:[Addr_IndexerA, Addr_IndexerB, ...]}
|
||||
|
||||
%% Étape 2 : consensus
|
||||
Note over NodeA: clientSideConsensus(candidates)
|
||||
|
||||
par Requêtes consensus parallèles
|
||||
NodeA->>Native1: NewStream /opencloud/native/consensus/1.0
|
||||
NodeA->>Native1: ConsensusRequest{Candidates:[Addr_A, Addr_B]}
|
||||
Native1->>Native1: Croiser avec liveIndexers propres
|
||||
Native1->>NodeA: ConsensusResponse{Trusted:[Addr_A, Addr_B], Suggestions:[]}
|
||||
and
|
||||
NodeA->>Native2: NewStream /opencloud/native/consensus/1.0
|
||||
NodeA->>Native2: ConsensusRequest{Candidates:[Addr_A, Addr_B]}
|
||||
Native2->>Native2: Croiser avec liveIndexers propres
|
||||
Native2->>NodeA: ConsensusResponse{Trusted:[Addr_A], Suggestions:[Addr_C]}
|
||||
and
|
||||
NodeA->>NativeN: NewStream /opencloud/native/consensus/1.0
|
||||
NodeA->>NativeN: ConsensusRequest{Candidates:[Addr_A, Addr_B]}
|
||||
NativeN->>NativeN: Croiser avec liveIndexers propres
|
||||
NativeN->>NodeA: ConsensusResponse{Trusted:[Addr_A, Addr_B], Suggestions:[]}
|
||||
end
|
||||
|
||||
Note over NodeA: Aggrège les votes (timeout 4s)<br/>Addr_A → 3/3 votes → confirmé ✓<br/>Addr_B → 2/3 votes → confirmé ✓
|
||||
|
||||
alt confirmed < maxIndexer && suggestions disponibles
|
||||
Note over NodeA: Round 2 — rechallenge avec suggestions
|
||||
NodeA->>NodeA: clientSideConsensus(confirmed + sample(suggestions))
|
||||
end
|
||||
|
||||
NodeA->>NodeA: StaticIndexers = adresses confirmées à majorité
|
||||
end
|
||||
@@ -1,62 +1,70 @@
|
||||
@startuml
|
||||
title Native — ConnectToNatives + Consensus (Pair A bootstrap)
|
||||
@startuml native_get_consensus
|
||||
title Native — ConnectToNatives : fetch pool + Phase 1 + Phase 2
|
||||
|
||||
participant "Node A" as NodeA
|
||||
participant "Native #1 (primary)" as Native1
|
||||
participant "Native #2" as Native2
|
||||
participant "Native #N" as NativeN
|
||||
participant "DHT Kademlia" as DHT
|
||||
participant "Node / Indexer\\n(appelant)" as Caller
|
||||
participant "Native A" as NA
|
||||
participant "Native B" as NB
|
||||
participant "Indexer A\\n(stable voter)" as IA
|
||||
|
||||
note over NodeA: NativeIndexerAddresses configuré\nAppelé pendant InitNode → ConnectToIndexers
|
||||
note over Caller: NativeIndexerAddresses configured\\nConnectToNatives() called from ConnectToIndexers
|
||||
|
||||
NodeA -> NodeA: Parse NativeIndexerAddresses → StaticNatives
|
||||
NodeA -> Native1: SendHeartbeat /opencloud/heartbeat/1.0 (20s tick)
|
||||
NodeA -> Native2: SendHeartbeat /opencloud/heartbeat/1.0 (20s tick)
|
||||
== Step 1 : heartbeat to the native mesh (nativeHeartbeatOnce) ==
|
||||
Caller -> NA: SendHeartbeat /opencloud/heartbeat/1.0
|
||||
Caller -> NB: SendHeartbeat /opencloud/heartbeat/1.0
|
||||
|
||||
' Étape 1 : récupérer un pool initial
|
||||
NodeA -> Native1: Connect + NewStream /opencloud/native/indexers/1.0
|
||||
NodeA -> Native1: json.Encode(GetIndexersRequest{Count: maxIndexer})
|
||||
== Step 2 : parrallel fetch pool (timeout 6s) ==
|
||||
par fetchIndexersFromNative — parallel
|
||||
Caller -> NA: NewStream /opencloud/native/indexers/1.0\\nGetIndexersRequest{Count: maxIndexer, From: PeerID}
|
||||
NA -> NA: reachableLiveIndexers()\\ntri par w(F) = fillRate×(1−fillRate) desc
|
||||
NA --> Caller: GetIndexersResponse{Indexers:[IA,IB], FillRates:{IA:0.3,IB:0.6}}
|
||||
else
|
||||
Caller -> NB: NewStream /opencloud/native/indexers/1.0
|
||||
NB -> NB: reachableLiveIndexers()
|
||||
NB --> Caller: GetIndexersResponse{Indexers:[IA,IB], FillRates:{IA:0.3,IB:0.6}}
|
||||
end par
|
||||
|
||||
Native1 -> Native1: reachableLiveIndexers()
|
||||
note over Native1: Filtre liveIndexers par TTL\nping chaque candidat (PeerIsAlive)
|
||||
note over Caller: Fusion → candidates=[IA,IB]\\nisFallback=false
|
||||
|
||||
alt Aucun indexer connu par Native1
|
||||
Native1 -> Native1: selfDelegate(NodeA.PeerID, resp)
|
||||
note over Native1: IsSelfFallback=true\nIndexers=[native1 addr]
|
||||
Native1 -> NodeA: GetIndexersResponse{IsSelfFallback:true, Indexers:[native1]}
|
||||
NodeA -> NodeA: StaticIndexers[native1] = native1
|
||||
note over NodeA: Pas de consensus — native1 utilisé directement comme indexeur
|
||||
else Indexers disponibles
|
||||
Native1 -> NodeA: GetIndexersResponse{Indexers:[Addr_IndexerA, Addr_IndexerB, ...]}
|
||||
|
||||
' Étape 2 : consensus
|
||||
note over NodeA: clientSideConsensus(candidates)
|
||||
|
||||
par Requêtes consensus parallèles
|
||||
NodeA -> Native1: NewStream /opencloud/native/consensus/1.0
|
||||
NodeA -> Native1: ConsensusRequest{Candidates:[Addr_A, Addr_B]}
|
||||
Native1 -> Native1: Croiser avec liveIndexers propres
|
||||
Native1 -> NodeA: ConsensusResponse{Trusted:[Addr_A, Addr_B], Suggestions:[]}
|
||||
alt isFallback=true (native give themself as Fallback indexer)
|
||||
note over Caller: resolvePool : avoid consensus\\nadmittedAt = Now (zero)\\nStaticIndexers = {native_addr}
|
||||
else isFallback=false → Phase 1 + Phase 2
|
||||
== Phase 1 — clientSideConsensus (timeout 3s/natif, 4s total) ==
|
||||
par Parralel Consensus
|
||||
Caller -> NA: NewStream /opencloud/native/consensus/1.0\\nConsensusRequest{Candidates:[IA,IB]}
|
||||
NA -> NA: compare with clean liveIndexers
|
||||
NA --> Caller: ConsensusResponse{Trusted:[IA,IB], Suggestions:[]}
|
||||
else
|
||||
NodeA -> Native2: NewStream /opencloud/native/consensus/1.0
|
||||
NodeA -> Native2: ConsensusRequest{Candidates:[Addr_A, Addr_B]}
|
||||
Native2 -> Native2: Croiser avec liveIndexers propres
|
||||
Native2 -> NodeA: ConsensusResponse{Trusted:[Addr_A], Suggestions:[Addr_C]}
|
||||
else
|
||||
NodeA -> NativeN: NewStream /opencloud/native/consensus/1.0
|
||||
NodeA -> NativeN: ConsensusRequest{Candidates:[Addr_A, Addr_B]}
|
||||
NativeN -> NativeN: Croiser avec liveIndexers propres
|
||||
NativeN -> NodeA: ConsensusResponse{Trusted:[Addr_A, Addr_B], Suggestions:[]}
|
||||
Caller -> NB: NewStream /opencloud/native/consensus/1.0
|
||||
NB --> Caller: ConsensusResponse{Trusted:[IA], Suggestions:[IC]}
|
||||
end par
|
||||
|
||||
note over NodeA: Aggrège les votes (timeout 4s)\nAddr_A → 3/3 votes → confirmé ✓\nAddr_B → 2/3 votes → confirmé ✓
|
||||
note over Caller: IA → 2/2 votes → confirmed ✓\\nIB → 1/2 vote → refusé ✗\\nIC → suggestion → round 2 if confirmed < maxIndexer
|
||||
|
||||
alt confirmed < maxIndexer && suggestions disponibles
|
||||
note over NodeA: Round 2 — rechallenge avec suggestions
|
||||
NodeA -> NodeA: clientSideConsensus(confirmed + sample(suggestions))
|
||||
alt confirmed < maxIndexer && available suggestions
|
||||
note over Caller: Round 2 — rechallenge with confirmed + sample(suggestions)\\nclientSideConsensus([IA, IC])
|
||||
end
|
||||
|
||||
NodeA -> NodeA: StaticIndexers = adresses confirmées à majorité
|
||||
note over Caller: admittedAt = time.Now()
|
||||
|
||||
== Phase 2 — indexerLivenessVote (timeout 3s/votant, 4s total) ==
|
||||
note over Caller: Search for stable voters in Subscribed Indexers\\nAdmittedAt != zero && age >= MinStableAge (2min)
|
||||
|
||||
alt Stable Voters are available
|
||||
par Phase 2 parrallel
|
||||
Caller -> IA: NewStream /opencloud/indexer/consensus/1.0\\nIndexerConsensusRequest{Candidates:[IA]}
|
||||
IA -> IA: StreamRecords[ProtocolHB][candidate]\\ntime.Since(LastSeen) <= 120s && LastScore >= 30.0
|
||||
IA --> Caller: IndexerConsensusResponse{Alive:[IA]}
|
||||
end par
|
||||
note over Caller: alive IA confirmed per quorum > 0.5\\npool = {IA}
|
||||
else No voters are stable (startup)
|
||||
note over Caller: Phase 1 keep directly\\n(no indexer reaches MinStableAge)
|
||||
end
|
||||
|
||||
== Replacement pool ==
|
||||
Caller -> Caller: replaceStaticIndexers(pool, admittedAt)\\nStaticIndexerMeta[IA].AdmittedAt = admittedAt
|
||||
end
|
||||
|
||||
== Étape 3 : heartbeat to indexers pool (ConnectToIndexers) ==
|
||||
Caller -> Caller: SendHeartbeat /opencloud/heartbeat/1.0\\nvers StaticIndexers
|
||||
|
||||
@enduml
|
||||
|
||||
@@ -1,49 +0,0 @@
|
||||
sequenceDiagram
|
||||
title NATS — CREATE_RESOURCE : Pair A découvre Pair B et établit le stream
|
||||
|
||||
participant AppA as App Pair A (oc-api)
|
||||
participant NATSA as NATS A
|
||||
participant NodeA as Node A
|
||||
participant StreamA as StreamService A
|
||||
participant NodeB as Node B
|
||||
participant StreamB as StreamService B
|
||||
participant DBA as DB Pair A (oc-lib)
|
||||
|
||||
Note over AppA: Pair B vient d'être découvert<br/>(via indexeur ou manuel)
|
||||
|
||||
AppA->>NATSA: Publish(CREATE_RESOURCE, {<br/> FromApp:"oc-api",<br/> Datatype:PEER,<br/> Payload: Peer B {StreamAddress_B, Relation:PARTNER}<br/>})
|
||||
|
||||
NATSA->>NodeA: ListenNATS callback → CREATE_RESOURCE
|
||||
|
||||
NodeA->>NodeA: resp.FromApp == "oc-discovery" ? → Non, continuer
|
||||
NodeA->>NodeA: json.Unmarshal(payload) → peer.Peer B
|
||||
NodeA->>NodeA: pp.AddrInfoFromString(B.StreamAddress)
|
||||
Note over NodeA: ad_B = {ID: PeerID_B, Addrs: [...]}
|
||||
|
||||
NodeA->>StreamA: Mu.Lock()
|
||||
|
||||
alt peer B.Relation == PARTNER
|
||||
NodeA->>StreamA: ConnectToPartner(B.StreamAddress)
|
||||
StreamA->>StreamA: AddrInfoFromString(B.StreamAddress) → ad_B
|
||||
StreamA->>NodeB: Connect (libp2p)
|
||||
StreamA->>NodeB: NewStream /opencloud/resource/heartbeat/partner/1.0
|
||||
StreamA->>NodeB: json.Encode(Heartbeat{Name_A, DID_A, PeerID_A})
|
||||
|
||||
NodeB->>StreamB: HandlePartnerHeartbeat(stream)
|
||||
StreamB->>StreamB: CheckHeartbeat → bandwidth challenge
|
||||
StreamB->>StreamA: Echo(payload)
|
||||
StreamB->>StreamB: streams[ProtocolHeartbeatPartner][PeerID_A] = {DID_A, Expiry=now+10s}
|
||||
|
||||
StreamA->>StreamA: streams[ProtocolHeartbeatPartner][PeerID_B] = {DID_B, Expiry=now+10s}
|
||||
Note over StreamA,StreamB: Stream partner long-lived établi<br/>dans les deux sens
|
||||
|
||||
else peer B.Relation != PARTNER (révocation / blacklist)
|
||||
Note over NodeA: Supprimer tous les streams vers Pair B
|
||||
loop Pour chaque protocole dans Streams
|
||||
NodeA->>StreamA: streams[proto][PeerID_B].Stream.Close()
|
||||
NodeA->>StreamA: delete(streams[proto], PeerID_B)
|
||||
end
|
||||
end
|
||||
|
||||
NodeA->>StreamA: Mu.Unlock()
|
||||
NodeA->>DBA: (pas de write direct ici — géré par l'app source)
|
||||
@@ -1,50 +0,0 @@
|
||||
@startuml
|
||||
title NATS — CREATE_RESOURCE : Pair A découvre Pair B et établit le stream
|
||||
|
||||
participant "App Pair A (oc-api)" as AppA
|
||||
participant "NATS A" as NATSA
|
||||
participant "Node A" as NodeA
|
||||
participant "StreamService A" as StreamA
|
||||
participant "Node B" as NodeB
|
||||
participant "StreamService B" as StreamB
|
||||
participant "DB Pair A (oc-lib)" as DBA
|
||||
|
||||
note over AppA: Pair B vient d'être découvert\n(via indexeur ou manuel)
|
||||
|
||||
AppA -> NATSA: Publish(CREATE_RESOURCE, {\n FromApp:"oc-api",\n Datatype:PEER,\n Payload: Peer B {StreamAddress_B, Relation:PARTNER}\n})
|
||||
|
||||
NATSA -> NodeA: ListenNATS callback → CREATE_RESOURCE
|
||||
|
||||
NodeA -> NodeA: resp.FromApp == "oc-discovery" ? → Non, continuer
|
||||
NodeA -> NodeA: json.Unmarshal(payload) → peer.Peer B
|
||||
NodeA -> NodeA: pp.AddrInfoFromString(B.StreamAddress)
|
||||
note over NodeA: ad_B = {ID: PeerID_B, Addrs: [...]}
|
||||
|
||||
NodeA -> StreamA: Mu.Lock()
|
||||
|
||||
alt peer B.Relation == PARTNER
|
||||
NodeA -> StreamA: ConnectToPartner(B.StreamAddress)
|
||||
StreamA -> StreamA: AddrInfoFromString(B.StreamAddress) → ad_B
|
||||
StreamA -> NodeB: Connect (libp2p)
|
||||
StreamA -> NodeB: NewStream /opencloud/resource/heartbeat/partner/1.0
|
||||
StreamA -> NodeB: json.Encode(Heartbeat{Name_A, DID_A, PeerID_A})
|
||||
|
||||
NodeB -> StreamB: HandlePartnerHeartbeat(stream)
|
||||
StreamB -> StreamB: CheckHeartbeat → bandwidth challenge
|
||||
StreamB -> StreamA: Echo(payload)
|
||||
StreamB -> StreamB: streams[ProtocolHeartbeatPartner][PeerID_A] = {DID_A, Expiry=now+10s}
|
||||
|
||||
StreamA -> StreamA: streams[ProtocolHeartbeatPartner][PeerID_B] = {DID_B, Expiry=now+10s}
|
||||
note over StreamA,StreamB: Stream partner long-lived établi\ndans les deux sens
|
||||
else peer B.Relation != PARTNER (révocation / blacklist)
|
||||
note over NodeA: Supprimer tous les streams vers Pair B
|
||||
loop Pour chaque protocole dans Streams
|
||||
NodeA -> StreamA: streams[proto][PeerID_B].Stream.Close()
|
||||
NodeA -> StreamA: delete(streams[proto], PeerID_B)
|
||||
end
|
||||
end
|
||||
|
||||
NodeA -> StreamA: Mu.Unlock()
|
||||
NodeA -> DBA: (pas de write direct ici — géré par l'app source)
|
||||
|
||||
@enduml
|
||||
38
docs/diagrams/08_nats_create_update_peer.puml
Normal file
38
docs/diagrams/08_nats_create_update_peer.puml
Normal file
@@ -0,0 +1,38 @@
|
||||
@startuml
|
||||
title NATS — CREATE_RESOURCE : Peer A crée/met à jour Peer B (connexion on-demand)
|
||||
|
||||
participant "App Peer A (oc-api)" as AppA
|
||||
participant "NATS A" as NATSA
|
||||
participant "Node A" as NodeA
|
||||
participant "StreamService A" as StreamA
|
||||
participant "Node B" as NodeB
|
||||
participant "DB Peer A (oc-lib)" as DBA
|
||||
|
||||
note over AppA: Peer B est découvert\n(via indexeur ou manuellement)
|
||||
|
||||
AppA -> NATSA: Publish(CREATE_RESOURCE, {\n FromApp:"oc-api",\n Datatype:PEER,\n Payload: Peer B {StreamAddress_B, Relation:PARTNER}\n})
|
||||
|
||||
NATSA -> NodeA: ListenNATS callback → CREATE_RESOURCE
|
||||
|
||||
NodeA -> NodeA: json.Unmarshal(payload) → peer.Peer B
|
||||
NodeA -> NodeA: if peer == self ? → skip
|
||||
|
||||
alt peer B.Relation == PARTNER
|
||||
NodeA -> StreamA: ToPartnerPublishEvent(ctx, PB_CREATE, PEER, payload)
|
||||
note over StreamA: Pas de heartbeat permanent.\nConnexion on-demand : ouvre un stream,\nenvoie l'événement, ferme ou laisse expirer.
|
||||
StreamA -> StreamA: PublishCommon(PEER, user, B.PeerID,\n ProtocolUpdateResource, selfPeerJSON)
|
||||
StreamA -> NodeB: TempStream /opencloud/resource/update/1.0\n(TTL court, fermé après envoi)
|
||||
StreamA -> NodeB: stream.Encode(Event{from, datatype, payload})
|
||||
NodeB --> StreamA: (traitement applicatif)
|
||||
|
||||
else peer B.Relation != PARTNER (révocation / blacklist)
|
||||
note over NodeA: Ferme tous les streams existants vers Peer B
|
||||
loop Pour chaque stream actif vers PeerID_B
|
||||
NodeA -> StreamA: streams[proto][PeerID_B].Stream.Close()
|
||||
NodeA -> StreamA: delete(streams[proto], PeerID_B)
|
||||
end
|
||||
end
|
||||
|
||||
NodeA -> DBA: (pas d'écriture directe — seule l'app source gère la DB)
|
||||
|
||||
@enduml
|
||||
@@ -1,66 +0,0 @@
|
||||
sequenceDiagram
|
||||
title NATS — PROPALGATION_EVENT : Pair A propage vers Pair B
|
||||
|
||||
participant AppA as App Pair A
|
||||
participant NATSA as NATS A
|
||||
participant NodeA as Node A
|
||||
participant StreamA as StreamService A
|
||||
participant NodeB as Node B
|
||||
participant NATSB as NATS B
|
||||
participant DBB as DB Pair B (oc-lib)
|
||||
|
||||
AppA->>NATSA: Publish(PROPALGATION_EVENT, {Action, DataType, Payload})
|
||||
NATSA->>NodeA: ListenNATS callback → PROPALGATION_EVENT
|
||||
NodeA->>NodeA: resp.FromApp != "oc-discovery" ? → continuer
|
||||
NodeA->>NodeA: json.Unmarshal → PropalgationMessage{Action, DataType, Payload}
|
||||
|
||||
alt Action == PB_DELETE
|
||||
NodeA->>StreamA: ToPartnerPublishEvent(PB_DELETE, dt, user, payload)
|
||||
StreamA->>StreamA: searchPeer(PARTNER) → [Pair B, ...]
|
||||
StreamA->>NodeB: write(PeerID_B, addr_B, dt, user, payload, ProtocolDeleteResource)
|
||||
Note over NodeB: /opencloud/resource/delete/1.0
|
||||
|
||||
NodeB->>NodeB: handleEventFromPartner(evt, ProtocolDeleteResource)
|
||||
NodeB->>NATSB: SetNATSPub(REMOVE_RESOURCE, {DataType, resource JSON})
|
||||
NATSB->>DBB: Supprimer ressource dans DB B
|
||||
|
||||
else Action == PB_UPDATE (via ProtocolUpdateResource)
|
||||
NodeA->>StreamA: ToPartnerPublishEvent(PB_UPDATE, dt, user, payload)
|
||||
StreamA->>NodeB: write → /opencloud/resource/update/1.0
|
||||
NodeB->>NATSB: SetNATSPub(CREATE_RESOURCE, {DataType, resource JSON})
|
||||
NATSB->>DBB: Upsert ressource dans DB B
|
||||
|
||||
else Action == PB_CONSIDERS + WORKFLOW_EXECUTION
|
||||
NodeA->>NodeA: Unmarshal → executionConsidersPayload{PeerIDs:[PeerID_B, ...]}
|
||||
loop Pour chaque peer_id cible
|
||||
NodeA->>StreamA: PublishCommon(dt, user, PeerID_B, ProtocolConsidersResource, payload)
|
||||
StreamA->>NodeB: write → /opencloud/resource/considers/1.0
|
||||
NodeB->>NodeB: passConsidering(evt)
|
||||
NodeB->>NATSB: SetNATSPub(PROPALGATION_EVENT, {PB_CONSIDERS, dt, payload})
|
||||
NATSB->>DBB: (traité par oc-workflow sur NATS B)
|
||||
end
|
||||
|
||||
else Action == PB_PLANNER (broadcast)
|
||||
NodeA->>NodeA: Unmarshal → {peer_id: nil, ...payload}
|
||||
loop Pour chaque stream ProtocolSendPlanner ouvert
|
||||
NodeA->>StreamA: PublishCommon(nil, user, pid, ProtocolSendPlanner, payload)
|
||||
StreamA->>NodeB: write → /opencloud/resource/planner/1.0
|
||||
end
|
||||
|
||||
else Action == PB_CLOSE_PLANNER
|
||||
NodeA->>NodeA: Unmarshal → {peer_id: PeerID_B}
|
||||
NodeA->>StreamA: Streams[ProtocolSendPlanner][PeerID_B].Stream.Close()
|
||||
NodeA->>StreamA: delete(Streams[ProtocolSendPlanner], PeerID_B)
|
||||
|
||||
else Action == PB_SEARCH + DataType == PEER
|
||||
NodeA->>NodeA: Unmarshal → {search: "..."}
|
||||
NodeA->>NodeA: GetPeerRecord(ctx, search)
|
||||
Note over NodeA: Résolution via DB A + Indexer + DHT
|
||||
NodeA->>NATSA: SetNATSPub(SEARCH_EVENT, {PEER, PeerRecord JSON})
|
||||
NATSA->>NATSA: (AppA reçoit le résultat)
|
||||
|
||||
else Action == PB_SEARCH + autre DataType
|
||||
NodeA->>NodeA: Unmarshal → {type:"all"|"known"|"partner", search:"..."}
|
||||
NodeA->>NodeA: PubSubService.SearchPublishEvent(ctx, dt, type, user, search)
|
||||
Note over NodeA: Voir diagrammes 10 et 11
|
||||
end
|
||||
@@ -1,50 +1,55 @@
|
||||
@startuml
|
||||
title NATS — PROPALGATION_EVENT : Pair A propage vers Pair B
|
||||
title NATS — PROPALGATION_EVENT : Peer A propalgate to Peer B lookup
|
||||
|
||||
participant "App Pair A" as AppA
|
||||
participant "NATS A" as NATSA
|
||||
participant "Node A" as NodeA
|
||||
participant "StreamService A" as StreamA
|
||||
participant "Node B" as NodeB
|
||||
participant "Node Partner B" as PeerB
|
||||
participant "Node C" as PeerC
|
||||
|
||||
participant "NATS B" as NATSB
|
||||
participant "DB Pair B (oc-lib)" as DBB
|
||||
|
||||
note over App: only our proper resource (db data) can be propalgate : creator_id==self
|
||||
|
||||
AppA -> NATSA: Publish(PROPALGATION_EVENT, {Action, DataType, Payload})
|
||||
NATSA -> NodeA: ListenNATS callback → PROPALGATION_EVENT
|
||||
NodeA -> NodeA: resp.FromApp != "oc-discovery" ? → continuer
|
||||
NodeA -> NodeA: propalgate from himself ? → no, continue
|
||||
NodeA -> NodeA: json.Unmarshal → PropalgationMessage{Action, DataType, Payload}
|
||||
|
||||
alt Action == PB_DELETE
|
||||
NodeA -> StreamA: ToPartnerPublishEvent(PB_DELETE, dt, user, payload)
|
||||
StreamA -> StreamA: searchPeer(PARTNER) → [Pair B, ...]
|
||||
StreamA -> StreamA: searchPeer(PARTNER) → [Peer Partner B, ...]
|
||||
StreamA -> NodeB: write(PeerID_B, addr_B, dt, user, payload, ProtocolDeleteResource)
|
||||
note over NodeB: /opencloud/resource/delete/1.0
|
||||
|
||||
NodeB -> NodeB: handleEventFromPartner(evt, ProtocolDeleteResource)
|
||||
NodeB -> NATSB: SetNATSPub(REMOVE_RESOURCE, {DataType, resource JSON})
|
||||
NATSB -> DBB: Supprimer ressource dans DB B
|
||||
NATSB -> DBB: Suppress ressource into DB B
|
||||
|
||||
else Action == PB_UPDATE (via ProtocolUpdateResource)
|
||||
else Action == PB_UPDATE (per ProtocolUpdateResource)
|
||||
NodeA -> StreamA: ToPartnerPublishEvent(PB_UPDATE, dt, user, payload)
|
||||
StreamA -> StreamA: searchPeer(PARTNER) → [Peer Partner B, ...]
|
||||
StreamA -> NodeB: write → /opencloud/resource/update/1.0
|
||||
NodeB -> NATSB: SetNATSPub(CREATE_RESOURCE, {DataType, resource JSON})
|
||||
NATSB -> DBB: Upsert ressource dans DB B
|
||||
|
||||
else Action == PB_CONSIDERS + WORKFLOW_EXECUTION
|
||||
else Action == PB_CREATE (per ProtocolCreateResource)
|
||||
NodeA -> StreamA: ToPartnerPublishEvent(PB_UPDATE, dt, user, payload)
|
||||
StreamA -> StreamA: searchPeer(PARTNER) → [Peer Partner B, ...]
|
||||
StreamA -> NodeB: write → /opencloud/resource/create/1.0
|
||||
NodeB -> NATSB: SetNATSPub(CREATE_RESOURCE, {DataType, resource JSON})
|
||||
NATSB -> DBB: Create ressource dans DB B
|
||||
|
||||
else Action == PB_CONSIDERS (is a considering a previous action, such as planning or creating resource)
|
||||
NodeA -> NodeA: Unmarshal → executionConsidersPayload{PeerIDs:[PeerID_B, ...]}
|
||||
loop Pour chaque peer_id cible
|
||||
loop For every peer_id targeted
|
||||
NodeA -> StreamA: PublishCommon(dt, user, PeerID_B, ProtocolConsidersResource, payload)
|
||||
StreamA -> NodeB: write → /opencloud/resource/considers/1.0
|
||||
NodeB -> NodeB: passConsidering(evt)
|
||||
NodeB -> NATSB: SetNATSPub(PROPALGATION_EVENT, {PB_CONSIDERS, dt, payload})
|
||||
NATSB -> DBB: (traité par oc-workflow sur NATS B)
|
||||
end
|
||||
|
||||
else Action == PB_PLANNER (broadcast)
|
||||
NodeA -> NodeA: Unmarshal → {peer_id: nil, ...payload}
|
||||
loop Pour chaque stream ProtocolSendPlanner ouvert
|
||||
NodeA -> StreamA: PublishCommon(nil, user, pid, ProtocolSendPlanner, payload)
|
||||
StreamA -> NodeB: write → /opencloud/resource/planner/1.0
|
||||
NATSB -> DBB: (treat per emmitters app of a previous action on NATS B)
|
||||
end
|
||||
|
||||
else Action == PB_CLOSE_PLANNER
|
||||
@@ -53,16 +58,16 @@ else Action == PB_CLOSE_PLANNER
|
||||
NodeA -> StreamA: delete(Streams[ProtocolSendPlanner], PeerID_B)
|
||||
|
||||
else Action == PB_SEARCH + DataType == PEER
|
||||
NodeA -> NodeA: Unmarshal → {search: "..."}
|
||||
NodeA -> NodeA: read → {search: "..."}
|
||||
NodeA -> NodeA: GetPeerRecord(ctx, search)
|
||||
note over NodeA: Résolution via DB A + Indexer + DHT
|
||||
note over NodeA: Resolved per DB A or Indexer + DHT
|
||||
NodeA -> NATSA: SetNATSPub(SEARCH_EVENT, {PEER, PeerRecord JSON})
|
||||
NATSA -> NATSA: (AppA reçoit le résultat)
|
||||
NATSA -> NATSA: (AppA retrieve results)
|
||||
|
||||
else Action == PB_SEARCH + autre DataType
|
||||
NodeA -> NodeA: Unmarshal → {type:"all"|"known"|"partner", search:"..."}
|
||||
else Action == PB_SEARCH + other DataType
|
||||
NodeA -> NodeA: read → {type:"all"|"known"|"partner", search:"..."}
|
||||
NodeA -> NodeA: PubSubService.SearchPublishEvent(ctx, dt, type, user, search)
|
||||
note over NodeA: Voir diagrammes 10 et 11
|
||||
note over NodeA: Watch after pubsub_search & stream_search diagrams
|
||||
end
|
||||
|
||||
@enduml
|
||||
|
||||
@@ -1,52 +0,0 @@
|
||||
sequenceDiagram
|
||||
title PubSub — Recherche gossip globale (type "all") : Pair A cherche, Pair B répond
|
||||
|
||||
participant AppA as App Pair A
|
||||
participant NATSA as NATS A
|
||||
participant NodeA as Node A
|
||||
participant PubSubA as PubSubService A
|
||||
participant GossipSub as GossipSub libp2p (mesh)
|
||||
participant NodeB as Node B
|
||||
participant PubSubB as PubSubService B
|
||||
participant DBB as DB Pair B (oc-lib)
|
||||
participant StreamB as StreamService B
|
||||
participant StreamA as StreamService A
|
||||
|
||||
AppA->>NATSA: Publish(PROPALGATION_EVENT, {PB_SEARCH, type:"all", search:"gpu"})
|
||||
NATSA->>NodeA: ListenNATS → PB_SEARCH (type "all")
|
||||
|
||||
NodeA->>PubSubA: SearchPublishEvent(ctx, dt, "all", user, "gpu")
|
||||
PubSubA->>PubSubA: publishEvent(PB_SEARCH, user, {search:"gpu"})
|
||||
PubSubA->>PubSubA: GenerateNodeID() → from = DID_A
|
||||
PubSubA->>PubSubA: priv_A.Sign(event body) → sig
|
||||
PubSubA->>PubSubA: Build Event{Type:"search", From:DID_A, Payload:{search:"gpu"}, Sig}
|
||||
|
||||
PubSubA->>GossipSub: topic.Join("search")
|
||||
PubSubA->>GossipSub: topic.Publish(ctx, json(Event))
|
||||
|
||||
GossipSub-->>NodeB: Message propagé (gossip mesh)
|
||||
|
||||
NodeB->>PubSubB: subscribeEvents écoute topic "search#"
|
||||
PubSubB->>PubSubB: json.Unmarshal → Event{From: DID_A}
|
||||
|
||||
PubSubB->>NodeB: GetPeerRecord(ctx, DID_A)
|
||||
Note over NodeB: Résolution Pair A via DB B ou Indexer
|
||||
NodeB-->>PubSubB: Peer A {PublicKey_A, Relation, ...}
|
||||
|
||||
PubSubB->>PubSubB: event.Verify(Peer A) → valide sig_A
|
||||
PubSubB->>PubSubB: handleEventSearch(ctx, evt, PB_SEARCH)
|
||||
|
||||
PubSubB->>StreamB: SendResponse(Peer A, evt)
|
||||
StreamB->>DBB: Search(COMPUTE + STORAGE + ..., filters{creator=self, access=PUBLIC OR partnerships[PeerID_A]}, search="gpu")
|
||||
DBB-->>StreamB: [Resource1, Resource2, ...]
|
||||
|
||||
loop Pour chaque ressource matchée
|
||||
StreamB->>StreamB: write(PeerID_A, addr_A, dt, resource JSON, ProtocolSearchResource)
|
||||
StreamB->>StreamA: NewStream /opencloud/resource/search/1.0
|
||||
StreamB->>StreamA: json.Encode(Event{Type:search, From:DID_B, DataType, Payload:resource})
|
||||
end
|
||||
|
||||
StreamA->>StreamA: readLoop → handleEvent(ProtocolSearchResource, evt)
|
||||
StreamA->>StreamA: retrieveResponse(evt)
|
||||
StreamA->>NATSA: SetNATSPub(SEARCH_EVENT, {DataType, resource JSON})
|
||||
NATSA->>AppA: Résultats de recherche de Pair B
|
||||
@@ -1,54 +1,58 @@
|
||||
@startuml
|
||||
title PubSub — Recherche gossip globale (type "all") : Pair A cherche, Pair B répond
|
||||
title PubSub — Gossip Global search (type "all") : Peer A searching, Peer B answering
|
||||
|
||||
participant "App Pair A" as AppA
|
||||
participant "App UI A" as UIA
|
||||
participant "App Peer A" as AppA
|
||||
participant "NATS A" as NATSA
|
||||
participant "Node A" as NodeA
|
||||
participant "StreamService A" as StreamA
|
||||
participant "PubSubService A" as PubSubA
|
||||
participant "GossipSub libp2p (mesh)" as GossipSub
|
||||
participant "Node B" as NodeB
|
||||
participant "PubSubService B" as PubSubB
|
||||
participant "DB Pair B (oc-lib)" as DBB
|
||||
participant "DB Peer B (oc-lib)" as DBB
|
||||
participant "StreamService B" as StreamB
|
||||
participant "StreamService A" as StreamA
|
||||
|
||||
AppA -> NATSA: Publish(PROPALGATION_EVENT, {PB_SEARCH, type:"all", search:"gpu"})
|
||||
UIA -> AppA: websocket subscription, sending {type:"all", search:"search"} in query
|
||||
|
||||
AppA -> NATSA: Publish(PROPALGATION_EVENT, {PB_SEARCH, type:"all", search:"search"})
|
||||
NATSA -> NodeA: ListenNATS → PB_SEARCH (type "all")
|
||||
|
||||
NodeA -> PubSubA: SearchPublishEvent(ctx, dt, "all", user, "gpu")
|
||||
PubSubA -> PubSubA: publishEvent(PB_SEARCH, user, {search:"gpu"})
|
||||
PubSubA -> PubSubA: GenerateNodeID() → from = DID_A
|
||||
NodeA -> PubSubA: SearchPublishEvent(ctx, dt, "all", user, "search")
|
||||
PubSubA -> PubSubA: publishEvent(PB_SEARCH, user, {search:"search"})
|
||||
PubSubA -> PubSubA: priv_A.Sign(event body) → sig
|
||||
PubSubA -> PubSubA: Build Event{Type:"search", From:DID_A, Payload:{search:"gpu"}, Sig}
|
||||
PubSubA -> PubSubA: Build Event{Type:"search", From:DID_A, Payload:{search:"search"}, Sig}
|
||||
|
||||
PubSubA -> GossipSub: topic.Join("search")
|
||||
PubSubA -> GossipSub: topic.Publish(ctx, json(Event))
|
||||
|
||||
GossipSub --> NodeB: Message propagé (gossip mesh)
|
||||
GossipSub --> NodeB: Propalgate message (gossip mesh)
|
||||
|
||||
NodeB -> PubSubB: subscribeEvents écoute topic "search#"
|
||||
PubSubB -> PubSubB: json.Unmarshal → Event{From: DID_A}
|
||||
NodeB -> PubSubB: subscribeEvents listen to topic "search#"
|
||||
PubSubB -> PubSubB: read → Event{From: DID_A}
|
||||
|
||||
PubSubB -> NodeB: GetPeerRecord(ctx, DID_A)
|
||||
note over NodeB: Résolution Pair A via DB B ou Indexer
|
||||
note over NodeB: Resolve Peer A per DB B or ask to Indexer
|
||||
NodeB --> PubSubB: Peer A {PublicKey_A, Relation, ...}
|
||||
|
||||
PubSubB -> PubSubB: event.Verify(Peer A) → valide sig_A
|
||||
PubSubB -> PubSubB: event.Verify(Peer A) → valid sig_A
|
||||
PubSubB -> PubSubB: handleEventSearch(ctx, evt, PB_SEARCH)
|
||||
|
||||
PubSubB -> StreamB: SendResponse(Peer A, evt)
|
||||
StreamB -> DBB: Search(COMPUTE + STORAGE + ..., filters{creator=self, access=PUBLIC OR partnerships[PeerID_A]}, search="gpu")
|
||||
StreamB -> DBB: Search(COMPUTE + STORAGE + ..., filters{creator=self, access=PUBLIC OR partnerships[PeerID_A]}, search="search")
|
||||
DBB --> StreamB: [Resource1, Resource2, ...]
|
||||
|
||||
loop Pour chaque ressource matchée
|
||||
loop For every matching resource, only match our own resource creator_id=self_did
|
||||
StreamB -> StreamB: write(PeerID_A, addr_A, dt, resource JSON, ProtocolSearchResource)
|
||||
StreamB -> StreamA: NewStream /opencloud/resource/search/1.0
|
||||
StreamB -> StreamA: json.Encode(Event{Type:search, From:DID_B, DataType, Payload:resource})
|
||||
StreamB -> StreamA: stream.Encode(Event{Type:search, From:DID_B, DataType, Payload:resource})
|
||||
end
|
||||
|
||||
StreamA -> StreamA: readLoop → handleEvent(ProtocolSearchResource, evt)
|
||||
StreamA -> StreamA: retrieveResponse(evt)
|
||||
StreamA -> NATSA: SetNATSPub(SEARCH_EVENT, {DataType, resource JSON})
|
||||
NATSA -> AppA: Résultats de recherche de Pair B
|
||||
NATSA -> AppA: Search results from Peer B
|
||||
|
||||
AppA -> UIA: emit on websocket
|
||||
|
||||
@enduml
|
||||
|
||||
@@ -1,52 +0,0 @@
|
||||
sequenceDiagram
|
||||
title Stream — Recherche directe (type "known"/"partner") : Pair A → Pair B
|
||||
|
||||
participant AppA as App Pair A
|
||||
participant NATSA as NATS A
|
||||
participant NodeA as Node A
|
||||
participant PubSubA as PubSubService A
|
||||
participant StreamA as StreamService A
|
||||
participant DBA as DB Pair A (oc-lib)
|
||||
participant NodeB as Node B
|
||||
participant StreamB as StreamService B
|
||||
participant DBB as DB Pair B (oc-lib)
|
||||
|
||||
AppA->>NATSA: Publish(PROPALGATION_EVENT, {PB_SEARCH, type:"partner", search:"gpu"})
|
||||
NATSA->>NodeA: ListenNATS → PB_SEARCH (type "partner")
|
||||
NodeA->>PubSubA: SearchPublishEvent(ctx, dt, "partner", user, "gpu")
|
||||
|
||||
PubSubA->>StreamA: SearchPartnersPublishEvent(dt, user, "gpu")
|
||||
StreamA->>DBA: Search(PEER, PARTNER) + PeerIDS config
|
||||
DBA-->>StreamA: [Peer B, ...]
|
||||
|
||||
loop Pour chaque pair partenaire (Pair B)
|
||||
StreamA->>StreamA: json.Marshal({search:"gpu"}) → payload
|
||||
StreamA->>StreamA: write(PeerID_B, addr_B, dt, user, payload, ProtocolSearchResource)
|
||||
StreamA->>NodeB: TempStream /opencloud/resource/search/1.0
|
||||
StreamA->>NodeB: json.Encode(Event{Type:search, From:DID_A, DataType, Payload:{search:"gpu"}})
|
||||
|
||||
NodeB->>StreamB: HandleResponse(stream) → readLoop
|
||||
StreamB->>StreamB: handleEvent(ProtocolSearchResource, evt)
|
||||
StreamB->>StreamB: handleEventFromPartner(evt, ProtocolSearchResource)
|
||||
|
||||
alt evt.DataType == -1 (toutes ressources)
|
||||
StreamB->>DBA: Search(PEER, evt.From=DID_A)
|
||||
Note over StreamB: Résolution locale ou via GetPeerRecord
|
||||
StreamB->>StreamB: SendResponse(Peer A, evt)
|
||||
StreamB->>DBB: Search(ALL_RESOURCES, filter{creator=B + public OR partner A + search:"gpu"})
|
||||
DBB-->>StreamB: [Resource1, Resource2, ...]
|
||||
else evt.DataType spécifié
|
||||
StreamB->>DBB: Search(DataType, filter{creator=B + access + search:"gpu"})
|
||||
DBB-->>StreamB: [Resource1, ...]
|
||||
end
|
||||
|
||||
loop Pour chaque ressource
|
||||
StreamB->>StreamA: write(PeerID_A, addr_A, dt, resource JSON, ProtocolSearchResource)
|
||||
StreamA->>StreamA: readLoop → handleEvent(ProtocolSearchResource, evt)
|
||||
StreamA->>StreamA: retrieveResponse(evt)
|
||||
StreamA->>NATSA: SetNATSPub(SEARCH_EVENT, {DataType, resource JSON})
|
||||
NATSA->>AppA: Résultat de Pair B
|
||||
end
|
||||
end
|
||||
|
||||
Note over NATSA,DBA: Optionnel: App A persiste<br/>les ressources découvertes dans DB A
|
||||
@@ -1,6 +1,7 @@
|
||||
@startuml
|
||||
title Stream — Recherche directe (type "known"/"partner") : Pair A → Pair B
|
||||
title Stream — Direct search (type "known"/"partner") : Peer A → Peer B
|
||||
|
||||
participant "App UI A" as UIA
|
||||
participant "App Pair A" as AppA
|
||||
participant "NATS A" as NATSA
|
||||
participant "Node A" as NodeA
|
||||
@@ -11,6 +12,8 @@ participant "Node B" as NodeB
|
||||
participant "StreamService B" as StreamB
|
||||
participant "DB Pair B (oc-lib)" as DBB
|
||||
|
||||
UIA -> AppA: websocket subscription, sending {type:"all", search:"search"} in query
|
||||
|
||||
AppA -> NATSA: Publish(PROPALGATION_EVENT, {PB_SEARCH, type:"partner", search:"gpu"})
|
||||
NATSA -> NodeA: ListenNATS → PB_SEARCH (type "partner")
|
||||
NodeA -> PubSubA: SearchPublishEvent(ctx, dt, "partner", user, "gpu")
|
||||
@@ -20,10 +23,9 @@ StreamA -> DBA: Search(PEER, PARTNER) + PeerIDS config
|
||||
DBA --> StreamA: [Peer B, ...]
|
||||
|
||||
loop Pour chaque pair partenaire (Pair B)
|
||||
StreamA -> StreamA: json.Marshal({search:"gpu"}) → payload
|
||||
StreamA -> StreamA: write(PeerID_B, addr_B, dt, user, payload, ProtocolSearchResource)
|
||||
StreamA -> NodeB: TempStream /opencloud/resource/search/1.0
|
||||
StreamA -> NodeB: json.Encode(Event{Type:search, From:DID_A, DataType, Payload:{search:"gpu"}})
|
||||
StreamA -> NodeB: stream.Encode(Event{Type:search, From:DID_A, DataType, Payload:{search:"gpu"}})
|
||||
|
||||
NodeB -> StreamB: HandleResponse(stream) → readLoop
|
||||
StreamB -> StreamB: handleEvent(ProtocolSearchResource, evt)
|
||||
@@ -31,11 +33,11 @@ loop Pour chaque pair partenaire (Pair B)
|
||||
|
||||
alt evt.DataType == -1 (toutes ressources)
|
||||
StreamB -> DBA: Search(PEER, evt.From=DID_A)
|
||||
note over StreamB: Résolution locale ou via GetPeerRecord
|
||||
note over StreamB: Local Resolving (DB) or GetPeerRecord (Indexer Way)
|
||||
StreamB -> StreamB: SendResponse(Peer A, evt)
|
||||
StreamB -> DBB: Search(ALL_RESOURCES, filter{creator=B + public OR partner A + search:"gpu"})
|
||||
DBB --> StreamB: [Resource1, Resource2, ...]
|
||||
else evt.DataType spécifié
|
||||
else evt.DataType specified
|
||||
StreamB -> DBB: Search(DataType, filter{creator=B + access + search:"gpu"})
|
||||
DBB --> StreamB: [Resource1, ...]
|
||||
end
|
||||
@@ -45,10 +47,8 @@ loop Pour chaque pair partenaire (Pair B)
|
||||
StreamA -> StreamA: readLoop → handleEvent(ProtocolSearchResource, evt)
|
||||
StreamA -> StreamA: retrieveResponse(evt)
|
||||
StreamA -> NATSA: SetNATSPub(SEARCH_EVENT, {DataType, resource JSON})
|
||||
NATSA -> AppA: Résultat de Pair B
|
||||
NATSA -> AppA: Peer B results
|
||||
AppA -> UIA: emit on websocket
|
||||
end
|
||||
end
|
||||
|
||||
note over NATSA,DBA: Optionnel: App A persiste\nles ressources découvertes dans DB A
|
||||
|
||||
@enduml
|
||||
|
||||
@@ -1,58 +0,0 @@
|
||||
sequenceDiagram
|
||||
title Stream — Partner Heartbeat et propagation CRUD Pair A ↔ Pair B
|
||||
|
||||
participant DBA as DB Pair A (oc-lib)
|
||||
participant StreamA as StreamService A
|
||||
participant NodeA as Node A
|
||||
participant NodeB as Node B
|
||||
participant StreamB as StreamService B
|
||||
participant NATSB as NATS B
|
||||
participant DBB as DB Pair B (oc-lib)
|
||||
participant NATSA as NATS A
|
||||
|
||||
Note over StreamA: Démarrage → connectToPartners()
|
||||
|
||||
StreamA->>DBA: Search(PEER, PARTNER) + PeerIDS config
|
||||
DBA-->>StreamA: [Peer B, ...]
|
||||
|
||||
StreamA->>NodeB: Connect (libp2p)
|
||||
StreamA->>NodeB: NewStream /opencloud/resource/heartbeat/partner/1.0
|
||||
StreamA->>NodeB: json.Encode(Heartbeat{Name_A, DID_A, PeerID_A, IndexersBinded_A})
|
||||
|
||||
NodeB->>StreamB: HandlePartnerHeartbeat(stream)
|
||||
StreamB->>StreamB: CheckHeartbeat → bandwidth challenge
|
||||
StreamB->>StreamA: Echo(payload)
|
||||
StreamB->>StreamB: streams[ProtocolHeartbeatPartner][PeerID_A] = {DID_A, Expiry=now+10s}
|
||||
|
||||
StreamA->>StreamA: streams[ProtocolHeartbeatPartner][PeerID_B] = {DID_B, Expiry=now+10s}
|
||||
|
||||
Note over StreamA,StreamB: Stream partner long-lived établi<br/>GC toutes les 8s (StreamService A)<br/>GC toutes les 30s (StreamService B)
|
||||
|
||||
Note over NATSA: Pair A reçoit PROPALGATION_EVENT{PB_DELETE, dt:"storage", payload:res}
|
||||
|
||||
NATSA->>NodeA: ListenNATS → ToPartnerPublishEvent(PB_DELETE, dt, user, payload)
|
||||
NodeA->>StreamA: ToPartnerPublishEvent(ctx, PB_DELETE, dt_storage, user, payload)
|
||||
|
||||
alt dt == PEER (mise à jour relation partenaire)
|
||||
StreamA->>StreamA: json.Unmarshal → peer.Peer B updated
|
||||
alt B.Relation == PARTNER
|
||||
StreamA->>NodeB: ConnectToPartner(B.StreamAddress)
|
||||
Note over StreamA,NodeB: Reconnexion heartbeat si relation upgrade
|
||||
else B.Relation != PARTNER
|
||||
loop Tous les protocoles
|
||||
StreamA->>StreamA: delete(streams[proto][PeerID_B])
|
||||
StreamA->>NodeB: (streams fermés)
|
||||
end
|
||||
end
|
||||
else dt != PEER (ressource ordinaire)
|
||||
StreamA->>DBA: Search(PEER, PARTNER) → [Pair B, ...]
|
||||
loop Pour chaque protocole partner (Create/Update/Delete)
|
||||
StreamA->>NodeB: write(PeerID_B, addr_B, dt, user, payload, ProtocolDeleteResource)
|
||||
Note over NodeB: /opencloud/resource/delete/1.0
|
||||
|
||||
NodeB->>StreamB: HandleResponse → readLoop
|
||||
StreamB->>StreamB: handleEventFromPartner(evt, ProtocolDeleteResource)
|
||||
StreamB->>NATSB: SetNATSPub(REMOVE_RESOURCE, {DataType, resource JSON})
|
||||
NATSB->>DBB: Supprimer ressource dans DB B
|
||||
end
|
||||
end
|
||||
@@ -1,49 +0,0 @@
|
||||
sequenceDiagram
|
||||
title Stream — Session Planner : Pair A demande le plan de Pair B
|
||||
|
||||
participant AppA as App Pair A (oc-booking)
|
||||
participant NATSA as NATS A
|
||||
participant NodeA as Node A
|
||||
participant StreamA as StreamService A
|
||||
participant NodeB as Node B
|
||||
participant StreamB as StreamService B
|
||||
participant DBB as DB Pair B (oc-lib)
|
||||
participant NATSB as NATS B
|
||||
|
||||
%% Ouverture session planner
|
||||
AppA->>NATSA: Publish(PROPALGATION_EVENT, {PB_PLANNER, peer_id:PeerID_B, payload:{}})
|
||||
NATSA->>NodeA: ListenNATS → PB_PLANNER
|
||||
|
||||
NodeA->>NodeA: Unmarshal → {peer_id: PeerID_B, payload: {}}
|
||||
NodeA->>StreamA: PublishCommon(nil, user, PeerID_B, ProtocolSendPlanner, {})
|
||||
Note over StreamA: WaitResponse=true, TTL=24h<br/>Stream long-lived vers Pair B
|
||||
StreamA->>NodeB: TempStream /opencloud/resource/planner/1.0
|
||||
StreamA->>NodeB: json.Encode(Event{Type:planner, From:DID_A, Payload:{}})
|
||||
|
||||
NodeB->>StreamB: HandleResponse → readLoop(ProtocolSendPlanner)
|
||||
StreamB->>StreamB: handleEvent(ProtocolSendPlanner, evt)
|
||||
StreamB->>StreamB: sendPlanner(evt)
|
||||
|
||||
alt evt.Payload vide (requête initiale)
|
||||
StreamB->>DBB: planner.GenerateShallow(AdminRequest)
|
||||
DBB-->>StreamB: plan (shallow booking plan de Pair B)
|
||||
StreamB->>StreamA: PublishCommon(nil, user, DID_A, ProtocolSendPlanner, planJSON)
|
||||
StreamA->>NodeA: json.Encode(Event{plan de B})
|
||||
NodeA->>NATSA: (forwardé à AppA via SEARCH_EVENT ou PLANNER event)
|
||||
NATSA->>AppA: Plan de Pair B
|
||||
else evt.Payload non vide (mise à jour planner)
|
||||
StreamB->>StreamB: m["peer_id"] = evt.From (DID_A)
|
||||
StreamB->>NATSB: SetNATSPub(PROPALGATION_EVENT, {PB_PLANNER, peer_id:DID_A, payload:plan})
|
||||
NATSB->>DBB: (oc-booking traite le plan sur NATS B)
|
||||
end
|
||||
|
||||
%% Fermeture session planner
|
||||
AppA->>NATSA: Publish(PROPALGATION_EVENT, {PB_CLOSE_PLANNER, peer_id:PeerID_B})
|
||||
NATSA->>NodeA: ListenNATS → PB_CLOSE_PLANNER
|
||||
|
||||
NodeA->>NodeA: Unmarshal → {peer_id: PeerID_B}
|
||||
NodeA->>StreamA: Mu.Lock()
|
||||
NodeA->>StreamA: Streams[ProtocolSendPlanner][PeerID_B].Stream.Close()
|
||||
NodeA->>StreamA: delete(Streams[ProtocolSendPlanner], PeerID_B)
|
||||
NodeA->>StreamA: Mu.Unlock()
|
||||
Note over StreamA,NodeB: Stream planner fermé — session terminée
|
||||
@@ -1,59 +0,0 @@
|
||||
sequenceDiagram
|
||||
title Native Indexer — Boucles background (offload, DHT refresh, GC streams)
|
||||
|
||||
participant IndexerA as Indexer A (enregistré)
|
||||
participant IndexerB as Indexer B (enregistré)
|
||||
participant Native as Native Indexer
|
||||
participant DHT as DHT Kademlia
|
||||
participant NodeA as Node A (responsible peer)
|
||||
|
||||
Note over Native: runOffloadLoop — toutes les 30s
|
||||
|
||||
loop Toutes les 30s
|
||||
Native->>Native: len(responsiblePeers) > 0 ?
|
||||
Note over Native: responsiblePeers = peers pour lesquels<br/>le native a fait selfDelegate (aucun indexer dispo)
|
||||
alt Des responsible peers existent (ex: Node A)
|
||||
Native->>Native: reachableLiveIndexers()
|
||||
Note over Native: Filtre liveIndexers par TTL<br/>ping PeerIsAlive pour chaque candidat
|
||||
alt Indexers A et B maintenant joignables
|
||||
Native->>Native: responsiblePeers = {} (libère Node A et autres)
|
||||
Note over Native: Node A se reconnectera<br/>au prochain ConnectToNatives
|
||||
else Toujours aucun indexer
|
||||
Note over Native: Node A reste sous la responsabilité du native
|
||||
end
|
||||
end
|
||||
end
|
||||
|
||||
Note over Native: refreshIndexersFromDHT — toutes les 30s
|
||||
|
||||
loop Toutes les 30s
|
||||
Native->>Native: Collecter tous les knownPeerIDs<br/>= {PeerID_A, PeerID_B, ...}
|
||||
loop Pour chaque PeerID connu
|
||||
Native->>Native: liveIndexers[PeerID] encore frais ?
|
||||
alt Entrée manquante ou expirée
|
||||
Native->>DHT: SearchValue(ctx 5s, "/indexer/"+PeerID)
|
||||
DHT-->>Native: channel de bytes
|
||||
loop Pour chaque résultat DHT
|
||||
Native->>Native: Unmarshal → liveIndexerEntry
|
||||
Native->>Native: Garder le meilleur (ExpiresAt le plus récent, valide)
|
||||
end
|
||||
Native->>Native: liveIndexers[PeerID] = best entry
|
||||
Note over Native: "native: refreshed indexer from DHT"
|
||||
end
|
||||
end
|
||||
end
|
||||
|
||||
Note over Native: LongLivedStreamRecordedService GC — toutes les 30s
|
||||
|
||||
loop Toutes les 30s
|
||||
Native->>Native: gc() — lock StreamRecords[Heartbeat]
|
||||
loop Pour chaque StreamRecord (Indexer A, B, ...)
|
||||
Native->>Native: now > rec.Expiry ?<br/>OU timeSince(LastSeen) > 2×TTL restant ?
|
||||
alt Pair périmé (ex: Indexer B disparu)
|
||||
Native->>Native: Supprimer Indexer B de TOUS les maps de protocoles
|
||||
Note over Native: Stream heartbeat fermé<br/>liveIndexers[PeerID_B] expirera naturellement
|
||||
end
|
||||
end
|
||||
end
|
||||
|
||||
Note over IndexerA: Indexer A continue à heartbeater normalement<br/>et reste dans StreamRecords + liveIndexers
|
||||
49
docs/diagrams/15_archi_config_nominale.puml
Normal file
49
docs/diagrams/15_archi_config_nominale.puml
Normal file
@@ -0,0 +1,49 @@
|
||||
@startuml 15_archi_config_nominale
|
||||
skinparam componentStyle rectangle
|
||||
skinparam backgroundColor white
|
||||
skinparam defaultTextAlignment center
|
||||
|
||||
title C1 — Topologie nominale\n2 natifs · 2 indexeurs · 2 nœuds
|
||||
|
||||
package "Couche 1 — Mesh natif" #E8F4FD {
|
||||
component "Native A\n(hub autoritaire)" as NA #AED6F1
|
||||
component "Native B\n(hub autoritaire)" as NB #AED6F1
|
||||
NA <--> NB : heartbeat /opencloud/heartbeat/1.0 (20s)\n+ gossip PubSub oc-indexer-registry
|
||||
}
|
||||
|
||||
package "Couche 2 — Indexeurs" #E9F7EF {
|
||||
component "Indexer A\n(DHT server)" as IA #A9DFBF
|
||||
component "Indexer B\n(DHT server)" as IB #A9DFBF
|
||||
}
|
||||
|
||||
package "Couche 3 — Nœuds" #FEFBD8 {
|
||||
component "Node 1" as N1 #FAF0BE
|
||||
component "Node 2" as N2 #FAF0BE
|
||||
}
|
||||
|
||||
' Enregistrements (one-shot, 60s)
|
||||
IA -[#117A65]--> NA : subscribe signé (60s)\n/opencloud/native/subscribe/1.0
|
||||
IA -[#117A65]--> NB : subscribe signé (60s)
|
||||
IB -[#117A65]--> NA : subscribe signé (60s)
|
||||
IB -[#117A65]--> NB : subscribe signé (60s)
|
||||
|
||||
' Heartbeats indexeurs → natifs (long-lived, 20s)
|
||||
IA -[#27AE60]..> NA : heartbeat (20s)
|
||||
IA -[#27AE60]..> NB : heartbeat (20s)
|
||||
IB -[#27AE60]..> NA : heartbeat (20s)
|
||||
IB -[#27AE60]..> NB : heartbeat (20s)
|
||||
|
||||
' Heartbeats nœuds → indexeurs (long-lived, 20s)
|
||||
N1 -[#E67E22]--> IA : heartbeat long-lived (20s)\n/opencloud/heartbeat/1.0
|
||||
N1 -[#E67E22]--> IB : heartbeat long-lived (20s)
|
||||
N2 -[#E67E22]--> IA : heartbeat long-lived (20s)
|
||||
N2 -[#E67E22]--> IB : heartbeat long-lived (20s)
|
||||
|
||||
note as Legend
|
||||
Légende :
|
||||
──► enregistrement one-shot (signé)
|
||||
···► heartbeat long-lived (20s)
|
||||
──► heartbeat nœud → indexeur (20s)
|
||||
end note
|
||||
|
||||
@enduml
|
||||
38
docs/diagrams/16_archi_config_seed.puml
Normal file
38
docs/diagrams/16_archi_config_seed.puml
Normal file
@@ -0,0 +1,38 @@
|
||||
@startuml 16_archi_config_seed
|
||||
skinparam componentStyle rectangle
|
||||
skinparam backgroundColor white
|
||||
skinparam defaultTextAlignment center
|
||||
|
||||
title C2 — Mode seed (sans natif)\nIndexerAddresses seuls · AdmittedAt = zero
|
||||
|
||||
package "Couche 2 — Indexeurs seeds" #E9F7EF {
|
||||
component "Indexer A\n(seed, AdmittedAt=0)" as IA #A9DFBF
|
||||
component "Indexer B\n(seed, AdmittedAt=0)" as IB #A9DFBF
|
||||
}
|
||||
|
||||
package "Couche 3 — Nœuds" #FEFBD8 {
|
||||
component "Node 1" as N1 #FAF0BE
|
||||
component "Node 2" as N2 #FAF0BE
|
||||
}
|
||||
|
||||
note as NNative #FFDDDD
|
||||
Aucun natif configuré.
|
||||
AdmittedAt = zero → IsStableVoter() = false
|
||||
Phase 2 sans votants : Phase 1 conservée directement.
|
||||
Risque D20 : circularité du trust (seeds se valident mutuellement).
|
||||
end note
|
||||
|
||||
' Heartbeats nœuds → indexeurs seeds
|
||||
N1 -[#E67E22]--> IA : heartbeat long-lived (20s)
|
||||
N1 -[#E67E22]--> IB : heartbeat long-lived (20s)
|
||||
N2 -[#E67E22]--> IA : heartbeat long-lived (20s)
|
||||
N2 -[#E67E22]--> IB : heartbeat long-lived (20s)
|
||||
|
||||
note bottom of IA
|
||||
Après 2s : goroutine async
|
||||
fetchNativeFromIndexers → ?
|
||||
Si natif trouvé → ConnectToNatives (upgrade vers C1)
|
||||
Si non → mode indexeur pur (D20 actif)
|
||||
end note
|
||||
|
||||
@enduml
|
||||
63
docs/diagrams/17_startup_consensus_phase1_phase2.puml
Normal file
63
docs/diagrams/17_startup_consensus_phase1_phase2.puml
Normal file
@@ -0,0 +1,63 @@
|
||||
@startuml 17_startup_consensus_phase1_phase2
|
||||
title Démarrage avec natifs — Phase 1 (admission) + Phase 2 (vivacité)
|
||||
|
||||
participant "Node / Indexer\n(appelant)" as Caller
|
||||
participant "Native A" as NA
|
||||
participant "Native B" as NB
|
||||
participant "Indexer A" as IA
|
||||
participant "Indexer B" as IB
|
||||
|
||||
note over Caller: ConnectToNatives()\nNativeIndexerAddresses configuré
|
||||
|
||||
== Étape 0 : heartbeat vers le mesh natif ==
|
||||
Caller -> NA: SendHeartbeat /opencloud/heartbeat/1.0 (goroutine longue durée)
|
||||
Caller -> NB: SendHeartbeat /opencloud/heartbeat/1.0 (goroutine longue durée)
|
||||
|
||||
== Étape 1 : fetch pool en parallèle ==
|
||||
par Fetch parallèle (timeout 6s)
|
||||
Caller -> NA: GET /opencloud/native/indexers/1.0\nGetIndexersRequest{Count: max, FillRates demandés}
|
||||
NA -> NA: reachableLiveIndexers()\ntri par w(F) = fillRate×(1−fillRate)
|
||||
NA --> Caller: GetIndexersResponse{Indexers:[IA,IB], FillRates:{IA:0.3, IB:0.6}}
|
||||
else
|
||||
Caller -> NB: GET /opencloud/native/indexers/1.0
|
||||
NB -> NB: reachableLiveIndexers()
|
||||
NB --> Caller: GetIndexersResponse{Indexers:[IA,IB], FillRates:{IA:0.3, IB:0.6}}
|
||||
end par
|
||||
|
||||
note over Caller: Fusion + dédup → candidates = [IA, IB]\nisFallback = false
|
||||
|
||||
== Étape 2a : Phase 1 — Admission native (clientSideConsensus) ==
|
||||
par Consensus parallèle (timeout 3s par natif, 4s total)
|
||||
Caller -> NA: /opencloud/native/consensus/1.0\nConsensusRequest{Candidates:[IA,IB]}
|
||||
NA -> NA: croiser avec liveIndexers
|
||||
NA --> Caller: ConsensusResponse{Trusted:[IA,IB], Suggestions:[]}
|
||||
else
|
||||
Caller -> NB: /opencloud/native/consensus/1.0\nConsensusRequest{Candidates:[IA,IB]}
|
||||
NB -> NB: croiser avec liveIndexers
|
||||
NB --> Caller: ConsensusResponse{Trusted:[IA], Suggestions:[IC]}
|
||||
end par
|
||||
|
||||
note over Caller: IA → 2/2 votes → confirmé ✓\nIB → 1/2 vote → refusé ✗\nadmittedAt = time.Now()
|
||||
|
||||
== Étape 2b : Phase 2 — Liveness vote (indexerLivenessVote) ==
|
||||
note over Caller: Cherche votants stables dans StaticIndexerMeta\n(AdmittedAt != zero, age >= MinStableAge=2min)
|
||||
|
||||
alt Votants stables disponibles
|
||||
par Phase 2 parallèle (timeout 3s)
|
||||
Caller -> IA: /opencloud/indexer/consensus/1.0\nIndexerConsensusRequest{Candidates:[IA]}
|
||||
IA -> IA: vérifier StreamRecords[ProtocolHB][candidate]\nLastSeen ≤ 2×60s && LastScore ≥ 30
|
||||
IA --> Caller: IndexerConsensusResponse{Alive:[IA]}
|
||||
end par
|
||||
note over Caller: IA confirmé vivant par quorum > 0.5
|
||||
else Aucun votant stable (premier démarrage)
|
||||
note over Caller: Phase 1 conservée directement\n(aucun votant MinStableAge atteint)
|
||||
end
|
||||
|
||||
== Étape 3 : remplacement StaticIndexers ==
|
||||
Caller -> Caller: replaceStaticIndexers(pool={IA}, admittedAt)\nStaticIndexerMeta[IA].AdmittedAt = time.Now()
|
||||
|
||||
== Étape 4 : heartbeat long-lived vers pool ==
|
||||
Caller -> IA: SendHeartbeat /opencloud/heartbeat/1.0 (goroutine longue durée)
|
||||
note over Caller: Pool actif. NudgeIndexerHeartbeat()
|
||||
|
||||
@enduml
|
||||
51
docs/diagrams/18_startup_seed_discovers_native.puml
Normal file
51
docs/diagrams/18_startup_seed_discovers_native.puml
Normal file
@@ -0,0 +1,51 @@
|
||||
@startuml 18_startup_seed_discovers_native
|
||||
title C2 → C1 — Seed découvre un natif (upgrade async)
|
||||
|
||||
participant "Node / Indexer\\n(seed mode)" as Caller
|
||||
participant "Indexer A\\n(seed)" as IA
|
||||
participant "Indexer B\\n(seed)" as IB
|
||||
participant "Native A\\n(découvert)" as NA
|
||||
|
||||
note over Caller: Démarrage sans NativeIndexerAddresses\\nStaticIndexers = [IA, IB] (AdmittedAt=0)
|
||||
|
||||
== Phase initiale seed ==
|
||||
Caller -> IA: SendHeartbeat /opencloud/heartbeat/1.0 (goroutine longue durée)
|
||||
Caller -> IB: SendHeartbeat /opencloud/heartbeat/1.0 (goroutine longue durée)
|
||||
|
||||
note over Caller: Pool actif en mode seed.\\nIsStableVoter() = false (AdmittedAt=0)\\nPhase 2 sans votants → Phase 1 conservée.
|
||||
|
||||
== Goroutine async après 2s ==
|
||||
note over Caller: time.Sleep(2s)\\nfetchNativeFromIndexers()
|
||||
|
||||
Caller -> IA: GET /opencloud/indexer/natives/1.0
|
||||
IA --> Caller: GetNativesResponse{Natives:[NA]}
|
||||
|
||||
note over Caller: Natif découvert : NA\\nAppel ConnectToNatives([NA])
|
||||
|
||||
== Upgrade vers mode nominal (ConnectToNatives) ==
|
||||
Caller -> NA: SendHeartbeat /opencloud/heartbeat/1.0 (goroutine longue durée)
|
||||
|
||||
par Fetch pool depuis natif (timeout 6s)
|
||||
Caller -> NA: GET /opencloud/native/indexers/1.0\\nGetIndexersRequest{Count: max}
|
||||
NA -> NA: reachableLiveIndexers()\\ntri par w(F) = fillRate×(1−fillRate)
|
||||
NA --> Caller: GetIndexersResponse{Indexers:[IA,IB], FillRates:{IA:0.4, IB:0.6}}
|
||||
end par
|
||||
|
||||
note over Caller: candidates = [IA, IB], isFallback = false
|
||||
|
||||
par Consensus Phase 1 (timeout 3s)
|
||||
Caller -> NA: /opencloud/native/consensus/1.0\\nConsensusRequest{Candidates:[IA,IB]}
|
||||
NA -> NA: croiser avec liveIndexers
|
||||
NA --> Caller: ConsensusResponse{Trusted:[IA,IB], Suggestions:[]}
|
||||
end par
|
||||
|
||||
note over Caller: IA ✓ IB ✓ (1/1 vote)\\nadmittedAt = time.Now()
|
||||
|
||||
note over Caller: Aucun votant stable (AdmittedAt vient d'être posé)\\nPhase 2 sautée → Phase 1 conservée directement
|
||||
|
||||
== Remplacement pool ==
|
||||
Caller -> Caller: replaceStaticIndexers(pool={IA,IB}, admittedAt)\\nStaticIndexerMeta[IA].AdmittedAt = time.Now()\\nStaticIndexerMeta[IB].AdmittedAt = time.Now()
|
||||
|
||||
note over Caller: Pool upgradé dans la map partagée StaticIndexers.\\nLa goroutine heartbeat existante (démarrée en mode seed)\\ndétecte les nouveaux membres sur le prochain tick (20s).\\nAucune nouvelle goroutine créée.\\nIsStableVoter() deviendra true après MinStableAge (2min).\\nD20 (circularité seeds) éliminé.
|
||||
|
||||
@enduml
|
||||
55
docs/diagrams/19_failure_indexer_crash.puml
Normal file
55
docs/diagrams/19_failure_indexer_crash.puml
Normal file
@@ -0,0 +1,55 @@
|
||||
@startuml failure_indexer_crash
|
||||
title Indexer Failure → replenish from a Native
|
||||
|
||||
participant "Node" as N
|
||||
participant "Indexer A (alive)" as IA
|
||||
participant "Indexer B (crashed)" as IB
|
||||
participant "Native A" as NA
|
||||
participant "Native B" as NB
|
||||
|
||||
note over N: Active Pool : Indexers = [IA, IB]\\nActive Heartbeat long-lived from IA & IB
|
||||
|
||||
== IB Failure ==
|
||||
IB ->x N: heartbeat fails (sendHeartbeat err)
|
||||
note over N: doTick() dans SendHeartbeat triggers failure\\n→ delete(Indexers[IB])\\n→ delete(IndexerMeta[IB])\\nUnique heartbeat goroutine continue
|
||||
|
||||
N -> N: go replenishIndexersFromNative(need=1)
|
||||
|
||||
note over N: Reduced Pool to 1 indexers.\\nReplenish triggers with goroutine.
|
||||
|
||||
== Replenish from natives ==
|
||||
par Fetch pool (timeout 6s)
|
||||
N -> NA: GET /opencloud/native/indexers/1.0\\nGetIndexersRequest{Count: max}
|
||||
NA -> NA: reachableLiveIndexers()\\n(IB absent because of a expired heartbeat)
|
||||
NA --> N: GetIndexersResponse{Indexers:[IA,IC], FillRates:{IA:0.4,IC:0.2}}
|
||||
else
|
||||
N -> NB: GET /opencloud/native/indexers/1.0
|
||||
NB --> N: GetIndexersResponse{Indexers:[IA,IC]}
|
||||
end par
|
||||
|
||||
note over N: Fusion + duplication → candidates = [IA, IC]\\n(IA already in pool → IC new candidate)
|
||||
|
||||
par Consensus Phase 1 (timeout 4s)
|
||||
N -> NA: /opencloud/native/consensus/1.0\\nConsensusRequest{Candidates:[IA,IC]}
|
||||
NA --> N: ConsensusResponse{Trusted:[IA,IC]}
|
||||
else
|
||||
N -> NB: /opencloud/native/consensus/1.0
|
||||
NB --> N: ConsensusResponse{Trusted:[IA,IC]}
|
||||
end par
|
||||
|
||||
note over N: IC → 2/2 votes → admit\\nadmittedAt = time.Now()
|
||||
|
||||
par Phase 2 — liveness vote (if stable voters )
|
||||
N -> IA: /opencloud/indexer/consensus/1.0\\nIndexerConsensusRequest{Candidates:[IC]}
|
||||
IA -> IA: StreamRecords[ProtocolHB][IC]\\nLastSeen ≤ 120s && LastScore ≥ 30
|
||||
IA --> N: IndexerConsensusResponse{Alive:[IC]}
|
||||
end par
|
||||
|
||||
note over N: IC confirmed alive → add to pool
|
||||
|
||||
N -> N: replaceStaticIndexers(pool={IA,IC})
|
||||
N -> IC: SendHeartbeat /opencloud/heartbeat/1.0 (goroutine long-live)
|
||||
|
||||
note over N: Pool restaured to 2 indexers.
|
||||
|
||||
@enduml
|
||||
51
docs/diagrams/20_failure_indexers_native_falback.puml
Normal file
51
docs/diagrams/20_failure_indexers_native_falback.puml
Normal file
@@ -0,0 +1,51 @@
|
||||
@startuml failure_indexers_native_falback
|
||||
title indexers failures → native IsSelfFallback
|
||||
|
||||
participant "Node" as N
|
||||
participant "Indexer A (crashed)" as IA
|
||||
participant "Indexer B (crashed)" as IB
|
||||
participant "Native A" as NA
|
||||
participant "Native B" as NB
|
||||
|
||||
note over N: Active Pool : Indexers = [IA, IB]
|
||||
|
||||
== Successive Failures on IA & IB ==
|
||||
IA ->x N: heartbeat failure (sendHeartbeat err)
|
||||
IB ->x N: heartbeat failure (sendHeartbeat err)
|
||||
|
||||
note over N: doTick() in SendHeartbeat triggers failures\\n→ delete(StaticIndexers[IA]), delete(StaticIndexers[IB])\\n→ delete(StaticIndexerMeta[IA/IB])\\n unique heartbeat goroutine continue.
|
||||
|
||||
N -> N: go replenishIndexersFromNative(need=2)
|
||||
|
||||
== Replenish attempt — natives switches to self-fallback mode ==
|
||||
par Fetch from natives (timeout 6s)
|
||||
N -> NA: GET /opencloud/native/indexers/1.0
|
||||
NA -> NA: reachableLiveIndexers() → 0 alive indexer\\nFallback : included as himself(IsSelfFallback=true)
|
||||
NA --> N: GetIndexersResponse{Indexers:[NA_addr], IsSelfFallback:true}
|
||||
else
|
||||
N -> NB: GET /opencloud/native/indexers/1.0
|
||||
NB --> N: GetIndexersResponse{Indexers:[NB_addr], IsSelfFallback:true}
|
||||
end par
|
||||
|
||||
note over N: isFallback=true → resolvePool avoids consensus\\nadmittedAt = time.Time{} (zero)\\nStaticIndexers = {NA_addr} (native as fallback)
|
||||
|
||||
N -> NA: SendHeartbeat /opencloud/heartbeat/1.0\\n(native as temporary fallback indexers)
|
||||
|
||||
note over NA: responsiblePeers[N] registered.\\nrunOffloadLoop look after real indexers.
|
||||
|
||||
== Reprise IA → runOffloadLoop native side ==
|
||||
IA -> NA: /opencloud/native/subscribe/1.0\\nIndexerRegistration{FillRate: 0}
|
||||
note over NA: liveIndexers[IA] updated.\\nrunOffloadLoop triggers a real available indexer\\migrate from N to IA.
|
||||
|
||||
== Replenish on next heartbeat tick ==
|
||||
N -> NA: GET /opencloud/native/indexers/1.0
|
||||
NA --> N: GetIndexersResponse{Indexers:[IA], IsSelfFallback:false}
|
||||
|
||||
note over N: isFallback=false → Classic Phase 1 + Phase 2
|
||||
|
||||
N -> N: replaceStaticIndexers(pool={IA}, admittedAt)
|
||||
N -> IA: SendHeartbeat /opencloud/heartbeat/1.0
|
||||
|
||||
note over N: Pool restaured. Native self extracted as indexer.
|
||||
|
||||
@enduml
|
||||
46
docs/diagrams/21_failure_native_one_down.puml
Normal file
46
docs/diagrams/21_failure_native_one_down.puml
Normal file
@@ -0,0 +1,46 @@
|
||||
@startuml failure_native_one_down
|
||||
title Native failure, with one still alive
|
||||
|
||||
participant "Indexer A" as IA
|
||||
participant "Indexer B" as IB
|
||||
participant "Native A (crashed)" as NA
|
||||
participant "Native B (alive)" as NB
|
||||
participant "Node" as N
|
||||
|
||||
note over IA, NB: Native State : IA, IB heartbeats to NA & NB
|
||||
|
||||
== Native A Failure ==
|
||||
NA ->x IA: stream reset
|
||||
NA ->x IB: stream reset
|
||||
NA ->x N: stream reset (heartbeat Node → NA)
|
||||
|
||||
== Indexers side : replenishNativesFromPeers ==
|
||||
note over IA: SendHeartbeat(NA) détecte reset\\nAfterDelete(NA)\\nStaticNatives = [NB] (still 1)
|
||||
|
||||
IA -> IA: replenishNativesFromPeers()\\nphase 1 : fetchNativeFromNatives
|
||||
|
||||
IA -> NB: GET /opencloud/native/peers/1.0
|
||||
NB --> IA: GetPeersResponse{Peers:[NC]} /' new native if one known '/
|
||||
|
||||
alt NC disponible
|
||||
IA -> NC: SendHeartbeat /opencloud/heartbeat/1.0\\nSubscribe /opencloud/native/subscribe/1.0
|
||||
note over IA: StaticNatives = [NB, NC]\\nNative Pool restored.
|
||||
else Aucun peer natif
|
||||
IA -> IA: fetchNativeFromIndexers()\\nAsk to any indexers their natives
|
||||
IB --> IA: GetNativesResponse{Natives:[]} /' IB also only got NB '/
|
||||
note over IA: Impossible to find a 2e native.\\nStaticNatives = [NB] (degraded but alive).
|
||||
end
|
||||
|
||||
== Node side : alive indexers pool ==
|
||||
note over N: Node heartbeats to IA & IB.\\nNA Failure does not affect indexers pool.\\nFuture Consensus did not use NB (1/1 vote = quorum OK).
|
||||
|
||||
N -> NB: /opencloud/native/consensus/1.0\\nConsensusRequest{Candidates:[IA,IB]}
|
||||
NB --> N: ConsensusResponse{Trusted:[IA,IB]}
|
||||
note over N: Consensus 1/1 alive natif → admit.\\nAuto downgrade of the consensus floor (alive majority).
|
||||
|
||||
== NB side : heartbeat to NA fails ==
|
||||
note over NB: EnsureNativePeers / SendHeartbeat to NA\\nfail (sendHeartbeat err)\\n→ delete(StaticNatives[NA])\\nreplenishNativesFromPeers(NA) triggers
|
||||
|
||||
note over NB: Mesh natif downgraded to NB alone.\\Downgraded but functionnal.
|
||||
|
||||
@enduml
|
||||
60
docs/diagrams/22_failure_both_natives.puml
Normal file
60
docs/diagrams/22_failure_both_natives.puml
Normal file
@@ -0,0 +1,60 @@
|
||||
@startuml 22_failure_both_natives
|
||||
title F4 — Panne des 2 natifs → fallback pool pré-validé
|
||||
|
||||
participant "Node" as N
|
||||
participant "Indexer A\\n(vivant)" as IA
|
||||
participant "Indexer B\\n(vivant)" as IB
|
||||
participant "Native A\\n(crashé)" as NA
|
||||
participant "Native B\\n(crashé)" as NB
|
||||
|
||||
note over N: Pool actif : StaticIndexers = [IA, IB]\\nStaticNatives = [NA, NB]\\nAdmittedAt[IA] et AdmittedAt[IB] posés (stables)
|
||||
|
||||
== Panne simultanée NA et NB ==
|
||||
NA ->x N: stream reset
|
||||
NB ->x N: stream reset
|
||||
|
||||
N -> N: AfterDelete(NA) + AfterDelete(NB)\\nStaticNatives = {} (vide)
|
||||
|
||||
== replenishNativesFromPeers (sans résultat) ==
|
||||
N -> N: fetchNativeFromNatives() → aucun natif vivant
|
||||
N -> IA: GET /opencloud/indexer/natives/1.0
|
||||
IA --> N: GetNativesResponse{Natives:[NA,NB]}
|
||||
note over N: NA et NB connus mais non joignables.\\nAucun nouveau natif trouvé.
|
||||
|
||||
== Fallback : pool d'indexeurs conservé ==
|
||||
note over N: isFallback = true\\nStaticIndexers conservé tel quel [IA, IB]\\n(dernier pool validé avec AdmittedAt != zero)\\nRisque D19 atténué : quorum natif = 0 → fallback accepté
|
||||
|
||||
note over N: Heartbeats IA et IB continuent normalement.\\nPool d'indexeurs opérationnel sans natifs.
|
||||
|
||||
N -> IA: SendHeartbeat /opencloud/heartbeat/1.0 (continue)
|
||||
N -> IB: SendHeartbeat /opencloud/heartbeat/1.0 (continue)
|
||||
|
||||
== retryLostNative (30s ticker) ==
|
||||
loop toutes les 30s
|
||||
N -> N: retryLostNative()\\ntente reconnexion NA et NB
|
||||
N -> NA: dial (échec)
|
||||
N -> NB: dial (échec)
|
||||
note over N: Retry sans résultat.\\nPool indexeurs maintenu en fallback.
|
||||
end
|
||||
|
||||
== Reprise natifs ==
|
||||
NA -> NA: redémarrage
|
||||
NB -> NB: redémarrage
|
||||
|
||||
N -> NA: dial (succès)
|
||||
N -> NA: SendHeartbeat /opencloud/heartbeat/1.0
|
||||
N -> NB: SendHeartbeat /opencloud/heartbeat/1.0
|
||||
note over N: StaticNatives = [NA, NB] restauré\\nisFallback = false
|
||||
|
||||
== Re-consensus pool indexeurs (optionnel) ==
|
||||
par Consensus Phase 1
|
||||
N -> NA: /opencloud/native/consensus/1.0\\nConsensusRequest{Candidates:[IA,IB]}
|
||||
NA --> N: ConsensusResponse{Trusted:[IA,IB]}
|
||||
else
|
||||
N -> NB: /opencloud/native/consensus/1.0
|
||||
NB --> N: ConsensusResponse{Trusted:[IA,IB]}
|
||||
end par
|
||||
|
||||
note over N: Pool [IA,IB] reconfirmé.\\nisFallback = false. AdmittedAt[IA,IB] rafraîchi.
|
||||
|
||||
@enduml
|
||||
63
docs/diagrams/23_failure_native_plus_indexer.puml
Normal file
63
docs/diagrams/23_failure_native_plus_indexer.puml
Normal file
@@ -0,0 +1,63 @@
|
||||
@startuml 23_failure_native_plus_indexer
|
||||
title F5 — Panne combinée : 1 natif + 1 indexeur
|
||||
|
||||
participant "Node" as N
|
||||
participant "Indexer A\\n(vivant)" as IA
|
||||
participant "Indexer B\\n(crashé)" as IB
|
||||
participant "Native A\\n(vivant)" as NA
|
||||
participant "Native B\\n(crashé)" as NB
|
||||
|
||||
note over N: Pool nominal : StaticIndexers=[IA,IB], StaticNatives=[NA,NB]
|
||||
|
||||
== Pannes simultanées NB + IB ==
|
||||
NB ->x N: stream reset
|
||||
IB ->x N: stream reset
|
||||
|
||||
N -> N: AfterDelete(NB) — StaticNatives = [NA]
|
||||
N -> N: AfterDelete(IB) — StaticIndexers = [IA]
|
||||
|
||||
== Replenish natif (1 vivant) ==
|
||||
N -> N: replenishNativesFromPeers()
|
||||
N -> NA: GET /opencloud/native/peers/1.0
|
||||
NA --> N: GetPeersResponse{Peers:[]} /' NB seul pair, disparu '/
|
||||
note over N: Aucun natif alternatif.\\nStaticNatives = [NA] — dégradé.
|
||||
|
||||
== Replenish indexeur depuis NA ==
|
||||
par Fetch pool (timeout 6s)
|
||||
N -> NA: GET /opencloud/native/indexers/1.0
|
||||
NA -> NA: reachableLiveIndexers()\\n(IB absent — heartbeat expiré)
|
||||
NA --> N: GetIndexersResponse{Indexers:[IA,IC], FillRates:{IA:0.5,IC:0.3}}
|
||||
end par
|
||||
|
||||
note over N: candidates = [IA, IC]
|
||||
|
||||
par Consensus Phase 1 — 1 seul natif vivant (timeout 3s)
|
||||
N -> NA: /opencloud/native/consensus/1.0\\nConsensusRequest{Candidates:[IA,IC]}
|
||||
NA --> N: ConsensusResponse{Trusted:[IA,IC]}
|
||||
end par
|
||||
|
||||
note over N: IC → 1/1 vote → admis (quorum sur vivants)\\nadmittedAt = time.Now()
|
||||
|
||||
par Phase 2 liveness vote
|
||||
N -> IA: /opencloud/indexer/consensus/1.0\\nIndexerConsensusRequest{Candidates:[IC]}
|
||||
IA -> IA: StreamRecords[ProtocolHB][IC]\\nLastSeen ≤ 120s && LastScore ≥ 30
|
||||
IA --> N: IndexerConsensusResponse{Alive:[IC]}
|
||||
end par
|
||||
|
||||
N -> N: replaceStaticIndexers(pool={IA,IC})
|
||||
N -> IC: SendHeartbeat /opencloud/heartbeat/1.0
|
||||
|
||||
note over N: Pool restauré à [IA,IC].\\nMode dégradé : 1 natif seulement.\\nretryLostNative(NB) actif (30s ticker).
|
||||
|
||||
== retryLostNative pour NB ==
|
||||
loop toutes les 30s
|
||||
N -> NB: dial (échec)
|
||||
end
|
||||
|
||||
NB -> NB: redémarrage
|
||||
NB -> NA: heartbeat (mesh natif reconstruit)
|
||||
N -> NB: dial (succès)
|
||||
N -> NB: SendHeartbeat /opencloud/heartbeat/1.0
|
||||
note over N: StaticNatives = [NA,NB] restauré.\\nMode nominal retrouvé.
|
||||
|
||||
@enduml
|
||||
45
docs/diagrams/24_failure_retry_lost_native.puml
Normal file
45
docs/diagrams/24_failure_retry_lost_native.puml
Normal file
@@ -0,0 +1,45 @@
|
||||
@startuml 24_failure_retry_lost_native
|
||||
title F6 — retryLostNative : reconnexion natif après panne réseau
|
||||
|
||||
participant "Node / Indexer" as Caller
|
||||
participant "Native A\\n(vivant)" as NA
|
||||
participant "Native B\\n(réseau instable)" as NB
|
||||
|
||||
note over Caller: StaticNatives = [NA, NB]\\nHeartbeats actifs vers NA et NB
|
||||
|
||||
== Panne réseau transitoire vers NB ==
|
||||
NB ->x Caller: stream reset (timeout réseau)
|
||||
|
||||
Caller -> Caller: AfterDelete(NB)\\nStaticNatives = [NA]\\nlostNatives.Store(NB.addr)
|
||||
|
||||
== replenishNativesFromPeers — phase 1 ==
|
||||
Caller -> NA: GET /opencloud/native/peers/1.0
|
||||
NA --> Caller: GetPeersResponse{Peers:[NB]}
|
||||
|
||||
note over Caller: NB connu de NA, tentative de reconnexion directe
|
||||
|
||||
Caller -> NB: dial (échec — réseau toujours coupé)
|
||||
note over Caller: Connexion impossible.\\nPassage en retryLostNative()
|
||||
|
||||
== retryLostNative : ticker 30s ==
|
||||
loop toutes les 30s tant que NB absent
|
||||
Caller -> Caller: retryLostNative()\\nParcourt lostNatives
|
||||
Caller -> NB: StartNativeRegistration (dial + heartbeat + subscribe)
|
||||
NB --> Caller: dial échoue
|
||||
note over Caller: Retry loggé. Prochain essai dans 30s.
|
||||
end
|
||||
|
||||
== Réseau rétabli ==
|
||||
note over NB: Réseau rétabli\\nNB de nouveau joignable
|
||||
|
||||
Caller -> NB: StartNativeRegistration\\ndial (succès)
|
||||
Caller -> NB: SendHeartbeat /opencloud/heartbeat/1.0 (goroutine longue durée)
|
||||
Caller -> NB: /opencloud/native/subscribe/1.0\\nIndexerRegistration{FillRate: fillRateFn()}
|
||||
|
||||
NB --> Caller: subscribe ack
|
||||
|
||||
Caller -> Caller: lostNatives.Delete(NB.addr)\\nStaticNatives = [NA, NB] restauré
|
||||
|
||||
note over Caller: Mode nominal retrouvé.\\nnativeHeartbeatOnce non utilisé (goroutine déjà active pour NA).\\nNouvelle goroutine SendHeartbeat pour NB uniquement.
|
||||
|
||||
@enduml
|
||||
35
docs/diagrams/25_failure_node_gc.puml
Normal file
35
docs/diagrams/25_failure_node_gc.puml
Normal file
@@ -0,0 +1,35 @@
|
||||
@startuml 25_failure_node_gc
|
||||
title F7 — Crash nœud → GC indexeur + AfterDelete
|
||||
|
||||
participant "Node\n(crashé)" as N
|
||||
participant "Indexer A" as IA
|
||||
participant "Indexer B" as IB
|
||||
|
||||
note over N, IB: État nominal : N heartbeatait vers IA et IB
|
||||
|
||||
== Crash Node ==
|
||||
N ->x IA: stream reset (heartbeat coupé)
|
||||
N ->x IB: stream reset (heartbeat coupé)
|
||||
|
||||
== GC côté Indexer A ==
|
||||
note over IA: HandleHeartbeat : stream reset détecté\nStreamRecords[ProtocolHeartbeat][N].Expiry figé
|
||||
|
||||
loop ticker GC (30s) — StartGC(30*time.Second)
|
||||
IA -> IA: gc()\nnow.After(Expiry) où Expiry = lastHBTime + 2min\n→ si 2min sans heartbeat → éviction
|
||||
IA -> IA: delete(StreamRecords[ProtocolHeartbeat][N])\nAfterDelete(N, name, did) appelé hors lock
|
||||
note over IA: N retiré du registre vivant.\nFillRate recalculé : (n-1) / MaxNodesConn()
|
||||
end
|
||||
|
||||
== Impact fill rate ==
|
||||
note over IA: FillRate diminue.\nProchain BuildHeartbeatResponse\ninclura FillRate mis à jour.\nSi fillRate revient < 80% :\n→ offload.inBatch et alreadyTried réinitialisés.
|
||||
|
||||
== GC côté Indexer B ==
|
||||
note over IB: Même GC effectué.\nN retiré de StreamRecords[ProtocolHeartbeat].
|
||||
|
||||
== Reconnexion éventuelle du nœud ==
|
||||
N -> N: redémarrage
|
||||
N -> IA: SendHeartbeat /opencloud/heartbeat/1.0\nHeartbeat{name, PeerID_N, IndexersBinded, need, record}
|
||||
IA -> IA: HandleHeartbeat → UptimeTracker(FirstSeen=now)\nStreamRecords[ProtocolHeartbeat][N] recréé\nRepublish PeerRecord N dans DHT
|
||||
note over IA: N de retour avec FirstSeen frais.\ndynamicMinScore élevé tant que age < 24h.\n(phase de grâce : 2 ticks avant scoring)
|
||||
|
||||
@enduml
|
||||
@@ -1,43 +1,88 @@
|
||||
# OC-Discovery — Diagrammes de séquence
|
||||
# OC-Discovery — Diagrammes d'architecture et de séquence
|
||||
|
||||
Tous les fichiers `.mmd` sont au format [Mermaid](https://mermaid.js.org/).
|
||||
Rendu possible via VS Code (extension Mermaid Preview), IntelliJ, ou [mermaid.live](https://mermaid.live).
|
||||
Tous les fichiers sont au format [PlantUML](https://plantuml.com/).
|
||||
Rendu possible via VS Code (extension PlantUML), IntelliJ, ou [plantuml.com/plantuml](https://www.plantuml.com/plantuml/uml/).
|
||||
|
||||
## Vue d'ensemble des diagrammes
|
||||
> **Note :** Les diagrammes 06, 07, 12, 14–24 et plusieurs protocoles ci-dessous
|
||||
> concernaient l'architecture à 3 niveaux (node → indexer → native indexer),
|
||||
> supprimée dans la branche `feature/no_native_consortium`. Ces fichiers sont
|
||||
> conservés à titre historique. Les diagrammes actifs sont indiqués ci-dessous.
|
||||
|
||||
## Diagrammes actifs (architecture 2 niveaux)
|
||||
|
||||
### Séquences principales
|
||||
|
||||
| Fichier | Description |
|
||||
|---------|-------------|
|
||||
| `01_node_init.mmd` | Initialisation complète d'un Node (libp2p host, GossipSub, indexers, StreamService, PubSubService, NATS) |
|
||||
| `02_node_claim.mmd` | Enregistrement du nœud auprès des indexeurs (`claimInfo` + `publishPeerRecord`) |
|
||||
| `03_indexer_heartbeat.mmd` | Protocole heartbeat avec calcul du score qualité (bande passante, uptime, diversité) |
|
||||
| `04_indexer_publish.mmd` | Publication d'un `PeerRecord` vers l'indexeur → DHT |
|
||||
| `05_indexer_get.mmd` | Résolution d'un pair via l'indexeur (`GetPeerRecord` + `handleNodeGet` + DHT) |
|
||||
| `06_native_registration.mmd` | Enregistrement d'un indexeur auprès d'un Native Indexer + gossip PubSub |
|
||||
| `07_native_get_consensus.mmd` | `ConnectToNatives` : pool d'indexeurs + protocole de consensus (vote majoritaire) |
|
||||
| `08_nats_create_resource.mmd` | Handler NATS `CREATE_RESOURCE` : connexion/déconnexion d'un partner |
|
||||
| `09_nats_propagation.mmd` | Handler NATS `PROPALGATION_EVENT` : delete, considers, planner, search |
|
||||
| `10_pubsub_search.mmd` | Recherche gossip globale (type `"all"`) via GossipSub |
|
||||
| `11_stream_search.mmd` | Recherche directe par stream (type `"known"` ou `"partner"`) |
|
||||
| `12_partner_heartbeat.mmd` | Heartbeat partner + propagation CRUD vers les partenaires |
|
||||
| `13_planner_flow.mmd` | Session planner (ouverture, échange, fermeture) |
|
||||
| `14_native_offload_gc.mmd` | Boucles background du Native Indexer (offload, DHT refresh, GC) |
|
||||
| `01_node_init.puml` | Initialisation d'un Node : libp2p host + PSK + ConnectionGater + ConnectToIndexers + SendHeartbeat + DHT proactive |
|
||||
| `02_node_claim.puml` | Enregistrement du nœud : `claimInfo` + `publishPeerRecord` → indexeurs → DHT |
|
||||
| `03_indexer_heartbeat.puml` | Protocole heartbeat bidirectionnel : challenges PeerID + DHT + witness, scoring 7 dimensions, suggestions, SuggestMigrate |
|
||||
| `04_indexer_publish.puml` | Publication d'un `PeerRecord` vers l'indexeur → DHT (PutValue /node, /name, /pid) |
|
||||
| `05_indexer_get.puml` | Résolution d'un pair : `GetPeerRecord` → indexeur → DHT si absent local |
|
||||
| `08_nats_create_resource.puml` | Handler NATS `CREATE_RESOURCE` : propagation partenaires on-demand |
|
||||
| `09_nats_propagation.puml` | Handler NATS `PROPALGATION_EVENT` : delete, considers, planner, search |
|
||||
| `10_pubsub_search.puml` | Recherche gossip globale (GossipSub /opencloud/search/1.0) |
|
||||
| `11_stream_search.puml` | Recherche directe par stream (type `"known"` ou `"partner"`) |
|
||||
| `13_planner_flow.puml` | Session planner (ouverture, échange, fermeture) |
|
||||
|
||||
## Protocoles libp2p utilisés
|
||||
### Résilience et pool management
|
||||
|
||||
| Fichier | Description |
|
||||
|---------|-------------|
|
||||
| `hb_failure_evict.puml` | HeartbeatFailure → evictPeer → TriggerConsensus ou DHT replenish |
|
||||
| `hb_last_indexer.puml` | Protection last-indexer → reconnectToSeeds → retryUntilSeedResponds |
|
||||
| `dht_discovery.puml` | Découverte proactive DHT : Provide/FindProviders, SelectByFillRate, dhtCache |
|
||||
| `connection_gater.puml` | ConnectionGater : DB blacklist → DHT sequential check (transport-error fallthrough) |
|
||||
|
||||
## Diagrammes historiques (architecture 3 niveaux — obsolètes)
|
||||
|
||||
Ces fichiers documentent l'ancienne architecture. Ils ne correspondent plus
|
||||
au code en production.
|
||||
|
||||
| Fichier | Description |
|
||||
|---------|-------------|
|
||||
| `06_native_registration.puml` | Enregistrement d'un indexeur auprès du Native (supprimé) |
|
||||
| `07_native_get_consensus.puml` | `ConnectToNatives` : fetch pool + Phase 1 + Phase 2 (supprimé) |
|
||||
| `12_partner_heartbeat.puml` | Heartbeat partner permanent (supprimé — connexions on-demand) |
|
||||
| `14_native_offload_gc.puml` | Boucles background Native Indexer (supprimé) |
|
||||
| `15_archi_config_nominale.puml` | Topologie nominale avec natifs (obsolète) |
|
||||
| `16_archi_config_seed.puml` | Mode seed sans natif (obsolète) |
|
||||
| `17_startup_consensus_phase1_phase2.puml` | Démarrage avec consensus natifs (supprimé) |
|
||||
| `18_startup_seed_discovers_native.puml` | Upgrade seed → native (supprimé) |
|
||||
| `19_failure_indexer_crash.puml` | F1 — replenish depuis natif (supprimé) |
|
||||
| `20_failure_both_indexers_selfdelegate.puml` | F2 — IsSelfFallback native (supprimé) |
|
||||
| `21_failure_native_one_down.puml` | F3 — panne 1 natif (supprimé) |
|
||||
| `22_failure_both_natives.puml` | F4 — panne 2 natifs (supprimé) |
|
||||
| `23_failure_native_plus_indexer.puml` | F5 — panne combinée natif + indexeur (supprimé) |
|
||||
| `24_failure_retry_lost_native.puml` | F6 — retryLostNative (supprimé) |
|
||||
| `25_failure_node_gc.puml` | F7 — GC nœud côté indexeur (toujours valide) |
|
||||
|
||||
## Protocoles libp2p actifs
|
||||
|
||||
| Protocole | Description |
|
||||
|-----------|-------------|
|
||||
| `/opencloud/heartbeat/1.0` | Heartbeat node → indexeur (long-lived) |
|
||||
| `/opencloud/heartbeat/indexer/1.0` | Heartbeat indexeur → native (long-lived) |
|
||||
| `/opencloud/resource/heartbeat/partner/1.0` | Heartbeat node ↔ partner (long-lived) |
|
||||
| `/opencloud/heartbeat/1.0` | Heartbeat bidirectionnel node→indexeur (long-lived) |
|
||||
| `/opencloud/probe/1.0` | Sonde de bande passante (echo, mesure latence + débit) |
|
||||
| `/opencloud/witness/1.0` | Requête témoin : "quel est ton score de l'indexeur X ?" |
|
||||
| `/opencloud/record/publish/1.0` | Publication `PeerRecord` vers indexeur |
|
||||
| `/opencloud/record/get/1.0` | Requête `GetPeerRecord` vers indexeur |
|
||||
| `/opencloud/native/subscribe/1.0` | Enregistrement indexeur auprès du native |
|
||||
| `/opencloud/native/indexers/1.0` | Requête de pool d'indexeurs au native |
|
||||
| `/opencloud/native/consensus/1.0` | Validation de pool d'indexeurs (consensus) |
|
||||
| `/opencloud/resource/search/1.0` | Recherche de ressources entre peers |
|
||||
| `/opencloud/resource/create/1.0` | Propagation création ressource vers partner |
|
||||
| `/opencloud/resource/update/1.0` | Propagation mise à jour ressource vers partner |
|
||||
| `/opencloud/resource/delete/1.0` | Propagation suppression ressource vers partner |
|
||||
| `/opencloud/resource/create/1.0` | Propagation création ressource → partner |
|
||||
| `/opencloud/resource/update/1.0` | Propagation mise à jour ressource → partner |
|
||||
| `/opencloud/resource/delete/1.0` | Propagation suppression ressource → partner |
|
||||
| `/opencloud/resource/planner/1.0` | Session planner (booking) |
|
||||
| `/opencloud/resource/verify/1.0` | Vérification signature ressource |
|
||||
| `/opencloud/resource/considers/1.0` | Transmission d'un "considers" d'exécution |
|
||||
| `/opencloud/resource/considers/1.0` | Transmission d'un considers d'exécution |
|
||||
|
||||
## Protocoles supprimés (architecture native)
|
||||
|
||||
| Protocole | Raison |
|
||||
|-----------|--------|
|
||||
| `/opencloud/native/subscribe/1.0` | Tier native supprimé |
|
||||
| `/opencloud/native/unsubscribe/1.0` | Tier native supprimé |
|
||||
| `/opencloud/native/indexers/1.0` | Remplacé par DHT FindProviders |
|
||||
| `/opencloud/native/consensus/1.0` | Remplacé par TriggerConsensus léger |
|
||||
| `/opencloud/native/peers/1.0` | Tier native supprimé |
|
||||
| `/opencloud/indexer/natives/1.0` | Tier native supprimé |
|
||||
| `/opencloud/indexer/consensus/1.0` | Remplacé par TriggerConsensus |
|
||||
| `/opencloud/resource/heartbeat/partner/1.0` | Heartbeat partner supprimé — on-demand |
|
||||
|
||||
69
docs/diagrams/connection_gater.puml
Normal file
69
docs/diagrams/connection_gater.puml
Normal file
@@ -0,0 +1,69 @@
|
||||
@startuml connection_gater
|
||||
title ConnectionGater — Vérification à l'admission (InterceptSecured)
|
||||
|
||||
participant "Remote Peer\n(inbound)" as Remote
|
||||
participant "libp2p\nhost A" as Host
|
||||
participant "OCConnectionGater" as Gater
|
||||
participant "DB (oc-lib)" as DB
|
||||
participant "Indexer X\n(joignable)" as IX
|
||||
participant "Indexer Y\n(injoignable)" as IY
|
||||
|
||||
Remote -> Host: inbound connection (post-PSK, post-TLS)
|
||||
Host -> Gater: InterceptSecured(dir=Inbound, id=RemotePeerID, conn)
|
||||
|
||||
alt dir == Outbound
|
||||
Gater --> Host: true (outbound toujours autorisé)
|
||||
end
|
||||
|
||||
== Étape 1 : Vérification base de données ==
|
||||
|
||||
Gater -> DB: NewRequestAdmin(PEER).Search(\n Filter: peer_id = RemotePeerID\n)
|
||||
DB --> Gater: []peer.Peer
|
||||
|
||||
alt trouvé AND relation == BLACKLIST
|
||||
Gater --> Host: false (refusé — blacklisté)
|
||||
Host ->x Remote: connexion fermée
|
||||
end
|
||||
|
||||
alt trouvé AND relation != BLACKLIST
|
||||
Gater --> Host: true (connu et non blacklisté)
|
||||
end
|
||||
|
||||
== Étape 2 : Vérification DHT (peer inconnu en DB) ==
|
||||
|
||||
note over Gater: Peer inconnu → vérifier qu'il existe\ndans le réseau DHT
|
||||
|
||||
Gater -> Gater: getReq = GetValue{PeerID: RemotePeerID}
|
||||
|
||||
loop Pour chaque indexeur (ordre aléatoire — Shuffle)
|
||||
|
||||
alt Indexer IY injoignable (transport error)
|
||||
Gater -> IY: h.Connect(ctxTTL, IY_AddrInfo)
|
||||
IY -->x Gater: connexion échouée
|
||||
note over Gater: reachable=false\n→ essaie le suivant
|
||||
end
|
||||
|
||||
alt Indexer IX joignable
|
||||
Gater -> IX: h.Connect(ctxTTL, IX_AddrInfo)
|
||||
IX --> Gater: OK
|
||||
Gater -> IX: TempStream /opencloud/record/get/1.0
|
||||
Gater -> IX: stream.Encode(GetValue{PeerID: RemotePeerID})
|
||||
IX -> IX: Recherche locale + DHT si absent
|
||||
IX --> Gater: GetResponse{Found: true/false, Records}
|
||||
note over Gater: reachable=true → réponse autoritaire\n(DHT distribué : un seul indexeur suffit)
|
||||
|
||||
alt Found == true
|
||||
Gater --> Host: true (pair connu du réseau)
|
||||
else Found == false
|
||||
Gater --> Host: false (refusé — inconnu du réseau)
|
||||
Host ->x Remote: connexion fermée
|
||||
end
|
||||
end
|
||||
end
|
||||
|
||||
alt Aucun indexeur joignable
|
||||
note over Gater: Réseau naissant ou tous isolés.\nAutorisation par défaut.
|
||||
Gater --> Host: true
|
||||
end
|
||||
|
||||
@enduml
|
||||
56
docs/diagrams/dht_discovery.puml
Normal file
56
docs/diagrams/dht_discovery.puml
Normal file
@@ -0,0 +1,56 @@
|
||||
@startuml dht_discovery
|
||||
title Découverte DHT : Provide/FindProviders + SelectByFillRate + dhtCache indexeur
|
||||
|
||||
participant "Indexer A\n(nouveau)" as IA
|
||||
participant "DHT Network" as DHT
|
||||
participant "Node B\n(bootstrap)" as NodeB
|
||||
participant "Indexer A\n(existant)" as IAexist
|
||||
|
||||
== Inscription indexeur dans la DHT ==
|
||||
|
||||
note over IA: Démarrage IndexerService\nstartDHTProvide(fillRateFn)
|
||||
|
||||
IA -> IA: Attend adresse routable (max 60s)\nnon-loopback disponible
|
||||
|
||||
IA -> DHT: DHT.Bootstrap(ctx)\n→ routing table warmup
|
||||
|
||||
loop ticker RecommendedHeartbeatInterval (~20s)
|
||||
IA -> DHT: DHT.Provide(IndexerCID, true)\n← IndexerCID = CID(sha256("/opencloud/indexers"))
|
||||
note over DHT: L'indexeur est annoncé comme provider.\nTTL géré par libp2p-kad-dht.\nAuto-expire si Provide() s'arrête.
|
||||
end
|
||||
|
||||
== Cache DHT passif de l'indexeur ==
|
||||
|
||||
note over IA: startDHTCacheRefresh()\ngoroutine arrière-plan
|
||||
|
||||
IA -> IA: Initial delay 30s (routing table warmup)
|
||||
|
||||
loop ticker 2min
|
||||
IA -> DHT: DiscoverIndexersFromDHT(h, dht, 30)\n← FindProviders(IndexerCID, max=30)
|
||||
DHT --> IA: []AddrInfo (jusqu'à 30 candidats)
|
||||
IA -> IA: Filtre self\nSelectByFillRate(filtered, nil, 10)\n→ diversité /24, prior f=0.5 (fill rates inconnus)
|
||||
IA -> IA: dhtCache = selected (max 10)\n→ utilisé pour Suggestions dans BuildHeartbeatResponse
|
||||
end
|
||||
|
||||
== Découverte côté Node au bootstrap ==
|
||||
|
||||
NodeB -> NodeB: ConnectToIndexers → seeds ajoutés\nSendHeartbeat démarré
|
||||
|
||||
NodeB -> NodeB: goroutine proactive (après 5s warmup)
|
||||
|
||||
alt discoveryDHT == nil (node pur, pas d'IndexerService)
|
||||
NodeB -> DHT: initNodeDHT(h, seeds)\n← DHT client mode, bootstrappé sur seeds
|
||||
end
|
||||
|
||||
NodeB -> DHT: DiscoverIndexersFromDHT(h, discoveryDHT, need+extra)
|
||||
DHT --> NodeB: []AddrInfo candidats
|
||||
|
||||
NodeB -> NodeB: Filtre self\nSelectByFillRate(candidates, fillRates, need)\n→ pondération w(F) = F×(1-F)\n F=0.2 → w=0.16 (très probable)\n F=0.5 → w=0.25 (max)\n F=0.8 → w=0.16 (peu probable)\n→ filtre diversité /24
|
||||
|
||||
loop Pour chaque candidat retenu
|
||||
NodeB -> NodeB: Indexers.SetAddr(key, &addrInfo)\nNudgeIt() → heartbeat immédiat
|
||||
end
|
||||
|
||||
note over NodeB: Pool enrichi au-delà des seeds.\nScoring commence au premier heartbeat.\nSeeds restent IsSeed=true (stickiness).
|
||||
|
||||
@enduml
|
||||
41
docs/diagrams/hb_failure_evict.puml
Normal file
41
docs/diagrams/hb_failure_evict.puml
Normal file
@@ -0,0 +1,41 @@
|
||||
@startuml hb_failure_evict
|
||||
title HeartbeatFailure → evictPeer → TriggerConsensus ou DHT replenish
|
||||
|
||||
participant "Node A" as NodeA
|
||||
participant "Indexer X\n(défaillant)" as IX
|
||||
participant "Indexer Y\n(voter)" as IY
|
||||
participant "Indexer Z\n(voter)" as IZ
|
||||
participant "DHT" as DHT
|
||||
participant "Indexer NEW\n(candidat)" as INEW
|
||||
|
||||
note over NodeA: SendHeartbeat tick — Indexer X dans le pool
|
||||
|
||||
NodeA -> IX: stream.Encode(Heartbeat{...})
|
||||
IX -->x NodeA: timeout / transport error
|
||||
|
||||
NodeA -> NodeA: HeartbeatFailure(h, proto, dir, addr_X, info_X, isIndexerHB=true, maxPool)
|
||||
|
||||
NodeA -> NodeA: evictPeer(dir, addr_X, id_X, proto)\n→ Streams.Delete(proto, &id_X)\n→ DeleteAddr(addr_X)\n→ DeleteScore(addr_X)\n→ voters = remaining AddrInfos
|
||||
|
||||
NodeA -> NodeA: poolSize = len(dir.GetAddrs())
|
||||
|
||||
alt poolSize == 0
|
||||
NodeA -> NodeA: reconnectToSeeds()\n→ réinjecte IndexerAddresses (IsSeed=true)
|
||||
alt seeds ajoutés
|
||||
NodeA -> NodeA: need = maxPool\nNudgeIt() → tick immédiat
|
||||
else aucun seed configuré ou seeds injoignables
|
||||
NodeA -> NodeA: go retryUntilSeedResponds()\n(backoff 10s→5min, panic si IndexerAddresses vide)
|
||||
end
|
||||
else poolSize > 0 AND len(voters) > 0
|
||||
NodeA -> NodeA: go TriggerConsensus(h, voters, need)
|
||||
NodeA -> IY: stream GET → GetValue{Key: candidate_DID}
|
||||
IY --> NodeA: GetResponse{Found, Records}
|
||||
NodeA -> IZ: stream GET → GetValue{Key: candidate_DID}
|
||||
IZ --> NodeA: GetResponse{Found, Records}
|
||||
note over NodeA: Quorum check:\nfound=true AND lastSeen ≤ 2×interval\nAND lastScore ≥ 30\n→ majorité → admission INEW
|
||||
NodeA -> NodeA: Indexers.SetAddr(addr_NEW, &INEW_AddrInfo)\nIndexers.SetScore(addr_NEW, Score{IsSeed:false})\nNudgeIt()
|
||||
else poolSize > 0 AND len(voters) == 0
|
||||
NodeA -> DHT: go replenishIndexersFromDHT(h, need)\nDiscoverIndexersFromDHT → SelectByFillRate\n→ add to Indexers Directory
|
||||
end
|
||||
|
||||
@enduml
|
||||
46
docs/diagrams/hb_last_indexer.puml
Normal file
46
docs/diagrams/hb_last_indexer.puml
Normal file
@@ -0,0 +1,46 @@
|
||||
@startuml hb_last_indexer
|
||||
title Protection last-indexer → reconnectToSeeds → retryUntilSeedResponds
|
||||
|
||||
participant "Node A" as NodeA
|
||||
participant "Indexer LAST\n(seul restant)" as IL
|
||||
participant "Seed Indexer\n(config)" as SEED
|
||||
participant "DHT" as DHT
|
||||
|
||||
note over NodeA: Pool = 1 indexeur (LAST)\nIsSeed=false, score bas depuis longtemps
|
||||
|
||||
== Tentative d'éviction par score ==
|
||||
NodeA -> NodeA: score < minScore\nAND TotalOnline ≥ 2×interval\nAND !IsSeed\nAND len(pool) > 1 ← FAUX : pool == 1
|
||||
|
||||
note over NodeA: Garde active : len(pool) == 1\n→ éviction par score BLOQUÉE\nLAST reste dans le pool
|
||||
|
||||
== Panne réseau (heartbeat fail) ==
|
||||
NodeA -> IL: stream.Encode(Heartbeat{...})
|
||||
IL -->x NodeA: timeout
|
||||
|
||||
NodeA -> NodeA: HeartbeatFailure → evictPeer(LAST)\npoolSize = 0
|
||||
|
||||
NodeA -> NodeA: reconnectToSeeds()\n→ parse IndexerAddresses (conf)\n→ SetAddr + SetScore(IsSeed=true) pour chaque seed
|
||||
|
||||
alt seeds ajoutés (IndexerAddresses non vide)
|
||||
NodeA -> NodeA: NudgeIt() → tick immédiat
|
||||
NodeA -> SEED: Heartbeat{...} (via SendHeartbeat nudge)
|
||||
SEED --> NodeA: HeartbeatResponse{fillRate, ...}
|
||||
note over NodeA: Pool rétabli via seeds.\nDHT proactive discovery reprend.
|
||||
|
||||
else IndexerAddresses vide
|
||||
NodeA -> NodeA: go retryUntilSeedResponds()
|
||||
note over NodeA: panic immédiat :\n"pool is empty and no seed indexers configured"\n→ arrêt du processus
|
||||
end
|
||||
|
||||
== retryUntilSeedResponds (si seeds non répondants) ==
|
||||
loop backoff exponentiel (10s → 20s → ... → 5min)
|
||||
NodeA -> NodeA: time.Sleep(backoff)
|
||||
NodeA -> NodeA: len(Indexers.GetAddrs()) > 0?\n→ oui : retour (quelqu'un a refillé)
|
||||
NodeA -> NodeA: reconnectToSeeds()
|
||||
alt pool > 0 après reconnect
|
||||
NodeA -> NodeA: NudgeIt()\nDHT.Bootstrap(ctx, 15s)
|
||||
note over NodeA: Sortie de la boucle.\nHeartbeat normal reprend.
|
||||
end
|
||||
end
|
||||
|
||||
@enduml
|
||||
6
go.mod
6
go.mod
@@ -3,10 +3,12 @@ module oc-discovery
|
||||
go 1.25.0
|
||||
|
||||
require (
|
||||
cloud.o-forge.io/core/oc-lib v0.0.0-20260302152414-542b0b73aba5
|
||||
cloud.o-forge.io/core/oc-lib v0.0.0-20260318143822-5976795d4406
|
||||
github.com/ipfs/go-cid v0.6.0
|
||||
github.com/libp2p/go-libp2p v0.47.0
|
||||
github.com/libp2p/go-libp2p-record v0.3.1
|
||||
github.com/multiformats/go-multiaddr v0.16.1
|
||||
github.com/multiformats/go-multihash v0.2.3
|
||||
)
|
||||
|
||||
require (
|
||||
@@ -32,7 +34,6 @@ require (
|
||||
github.com/hashicorp/golang-lru/v2 v2.0.7 // indirect
|
||||
github.com/huin/goupnp v1.3.0 // indirect
|
||||
github.com/ipfs/boxo v0.35.2 // indirect
|
||||
github.com/ipfs/go-cid v0.6.0 // indirect
|
||||
github.com/ipfs/go-datastore v0.9.0 // indirect
|
||||
github.com/ipfs/go-log/v2 v2.9.1 // indirect
|
||||
github.com/ipld/go-ipld-prime v0.21.0 // indirect
|
||||
@@ -67,7 +68,6 @@ require (
|
||||
github.com/multiformats/go-multiaddr-fmt v0.1.0 // indirect
|
||||
github.com/multiformats/go-multibase v0.2.0 // indirect
|
||||
github.com/multiformats/go-multicodec v0.10.0 // indirect
|
||||
github.com/multiformats/go-multihash v0.2.3 // indirect
|
||||
github.com/multiformats/go-multistream v0.6.1 // indirect
|
||||
github.com/multiformats/go-varint v0.1.0 // indirect
|
||||
github.com/pbnjay/memory v0.0.0-20210728143218-7b4eea64cf58 // indirect
|
||||
|
||||
20
go.sum
20
go.sum
@@ -1,13 +1,13 @@
|
||||
cloud.o-forge.io/core/oc-lib v0.0.0-20260224130821-ce8ef70516f7 h1:p9uJjMY+QkE4neA+xRmIRtAm9us94EKZqgajDdLOd0Y=
|
||||
cloud.o-forge.io/core/oc-lib v0.0.0-20260224130821-ce8ef70516f7/go.mod h1:+ENuvBfZdESSvecoqGY/wSvRlT3vinEolxKgwbOhUpA=
|
||||
cloud.o-forge.io/core/oc-lib v0.0.0-20260226084851-959fce48ef6c h1:FTUu9tdEfib6J+fuc7e5wYTe++EIlB70bVNpOeFjnyU=
|
||||
cloud.o-forge.io/core/oc-lib v0.0.0-20260226084851-959fce48ef6c/go.mod h1:+ENuvBfZdESSvecoqGY/wSvRlT3vinEolxKgwbOhUpA=
|
||||
cloud.o-forge.io/core/oc-lib v0.0.0-20260226085754-f4e2d8057df0 h1:lvrRF4ToIMl/5k1q4AiPEy6ycjwRtOaDhWnQ/LrW1ZA=
|
||||
cloud.o-forge.io/core/oc-lib v0.0.0-20260226085754-f4e2d8057df0/go.mod h1:+ENuvBfZdESSvecoqGY/wSvRlT3vinEolxKgwbOhUpA=
|
||||
cloud.o-forge.io/core/oc-lib v0.0.0-20260226091217-cb3771c17a31 h1:hvkvJibS9NmImw73j79Ov5VpIYs4WbP4SYGlK/XO82Q=
|
||||
cloud.o-forge.io/core/oc-lib v0.0.0-20260226091217-cb3771c17a31/go.mod h1:+ENuvBfZdESSvecoqGY/wSvRlT3vinEolxKgwbOhUpA=
|
||||
cloud.o-forge.io/core/oc-lib v0.0.0-20260302152414-542b0b73aba5 h1:h+Fkyj6cfwAirc0QGCBEkZSSrgcyThXswg7ytOLm948=
|
||||
cloud.o-forge.io/core/oc-lib v0.0.0-20260302152414-542b0b73aba5/go.mod h1:+ENuvBfZdESSvecoqGY/wSvRlT3vinEolxKgwbOhUpA=
|
||||
cloud.o-forge.io/core/oc-lib v0.0.0-20260304145747-e03a0d3dd0aa h1:1wCpI4dwN1pj6MlpJ7/WifhHVHmCE4RU+9klwqgo/bk=
|
||||
cloud.o-forge.io/core/oc-lib v0.0.0-20260304145747-e03a0d3dd0aa/go.mod h1:+ENuvBfZdESSvecoqGY/wSvRlT3vinEolxKgwbOhUpA=
|
||||
cloud.o-forge.io/core/oc-lib v0.0.0-20260311072518-933b7147e908 h1:1jz3xI/u2FzCG8phY7ShqADrmCj0mlrdjbdNUosSwgs=
|
||||
cloud.o-forge.io/core/oc-lib v0.0.0-20260311072518-933b7147e908/go.mod h1:+ENuvBfZdESSvecoqGY/wSvRlT3vinEolxKgwbOhUpA=
|
||||
cloud.o-forge.io/core/oc-lib v0.0.0-20260312073634-2c9c42dd516a h1:oCkb9l/Cvn0x6iicxIydrjfCNU+UHhKuklFgfzDa174=
|
||||
cloud.o-forge.io/core/oc-lib v0.0.0-20260312073634-2c9c42dd516a/go.mod h1:+ENuvBfZdESSvecoqGY/wSvRlT3vinEolxKgwbOhUpA=
|
||||
cloud.o-forge.io/core/oc-lib v0.0.0-20260312141150-a335c905b3a2 h1:DuB6SDThFVJVQ0iI0pZnBqtCE0uW+SNI7R7ndKixu2k=
|
||||
cloud.o-forge.io/core/oc-lib v0.0.0-20260312141150-a335c905b3a2/go.mod h1:+ENuvBfZdESSvecoqGY/wSvRlT3vinEolxKgwbOhUpA=
|
||||
cloud.o-forge.io/core/oc-lib v0.0.0-20260318143822-5976795d4406 h1:FN1EtRWn228JprAbnY5K863Fzj+SzMqQtKRtwvECbLw=
|
||||
cloud.o-forge.io/core/oc-lib v0.0.0-20260318143822-5976795d4406/go.mod h1:+ENuvBfZdESSvecoqGY/wSvRlT3vinEolxKgwbOhUpA=
|
||||
github.com/BurntSushi/toml v0.3.1/go.mod h1:xHWCNGjB5oqiDr8zfno3MHue2Ht5sIBksp03qcyfWMU=
|
||||
github.com/Masterminds/semver/v3 v3.4.0 h1:Zog+i5UMtVoCU8oKka5P7i9q9HgrJeGzI9SA1Xbatp0=
|
||||
github.com/Masterminds/semver/v3 v3.4.0/go.mod h1:4V+yj/TJE1HU9XfppCwVMZq3I84lprf4nC11bSS5beM=
|
||||
|
||||
8
main.go
8
main.go
@@ -2,7 +2,6 @@ package main
|
||||
|
||||
import (
|
||||
"context"
|
||||
"fmt"
|
||||
"log"
|
||||
"oc-discovery/conf"
|
||||
"oc-discovery/daemons/node"
|
||||
@@ -28,7 +27,7 @@ func main() {
|
||||
conf.GetConfig().PSKPath = o.GetStringDefault("PSK_PATH", "./psk/psk.key")
|
||||
conf.GetConfig().NodeEndpointPort = o.GetInt64Default("NODE_ENDPOINT_PORT", 4001)
|
||||
conf.GetConfig().IndexerAddresses = o.GetStringDefault("INDEXER_ADDRESSES", "")
|
||||
conf.GetConfig().NativeIndexerAddresses = o.GetStringDefault("NATIVE_INDEXER_ADDRESSES", "")
|
||||
|
||||
|
||||
conf.GetConfig().PeerIDS = o.GetStringDefault("PEER_IDS", "")
|
||||
|
||||
@@ -43,12 +42,9 @@ func main() {
|
||||
syscall.SIGTERM,
|
||||
)
|
||||
defer stop()
|
||||
fmt.Println(conf.GetConfig().NodeMode)
|
||||
isNode := strings.Contains(conf.GetConfig().NodeMode, "node")
|
||||
isIndexer := strings.Contains(conf.GetConfig().NodeMode, "indexer")
|
||||
isNativeIndexer := strings.Contains(conf.GetConfig().NodeMode, "native-indexer")
|
||||
|
||||
if n, err := node.InitNode(isNode, isIndexer, isNativeIndexer); err != nil {
|
||||
if n, err := node.InitNode(isNode, isIndexer); err != nil {
|
||||
panic(err)
|
||||
} else {
|
||||
<-ctx.Done() // the only blocking point
|
||||
|
||||
@@ -1,3 +0,0 @@
|
||||
-----BEGIN PRIVATE KEY-----
|
||||
MC4CAQAwBQYDK2VwBCIEIPc7D3Mgb1U2Ipyb/85hA4Ew7dC8zHDEuQYSjqzzRgLK
|
||||
-----END PRIVATE KEY-----
|
||||
@@ -1,3 +0,0 @@
|
||||
-----BEGIN PRIVATE KEY-----
|
||||
MC4CAQAwBQYDK2VwBCIEIK2oBaOtGNchE09MBRtPd5oEOUcVUQG2ndym5wKExj7R
|
||||
-----END PRIVATE KEY-----
|
||||
@@ -1,3 +0,0 @@
|
||||
-----BEGIN PRIVATE KEY-----
|
||||
MC4CAQAwBQYDK2VwBCIEIE58GDazCyF1jp796ivSmHiCepbkC8TpzliIaQ7eGEpu
|
||||
-----END PRIVATE KEY-----
|
||||
@@ -1,3 +0,0 @@
|
||||
-----BEGIN PRIVATE KEY-----
|
||||
MC4CAQAwBQYDK2VwBCIEIAeX4O7ldwehRSnPkbzuE6csyo63vjvqAcNNujENOKUC
|
||||
-----END PRIVATE KEY-----
|
||||
@@ -1,3 +0,0 @@
|
||||
-----BEGIN PRIVATE KEY-----
|
||||
MC4CAQAwBQYDK2VwBCIEIEkgqINXDLnxIJZs2LEK9O4vdsqk43dwbULGUE25AWuR
|
||||
-----END PRIVATE KEY-----
|
||||
@@ -1,3 +0,0 @@
|
||||
-----BEGIN PRIVATE KEY-----
|
||||
MC4CAQAwBQYDK2VwBCIEIBcflxGlZYyUVJoExC94rHZbIyKMwZ+Oh7EDkb0qUlxd
|
||||
-----END PRIVATE KEY-----
|
||||
@@ -1,3 +0,0 @@
|
||||
-----BEGIN PUBLIC KEY-----
|
||||
MCowBQYDK2VwAyEAEomuEQGmGsYVw35C6DB5tfY8LI8jm359ceAxRX8eQ0o=
|
||||
-----END PUBLIC KEY-----
|
||||
@@ -1,3 +0,0 @@
|
||||
-----BEGIN PUBLIC KEY-----
|
||||
MCowBQYDK2VwAyEAZ2nLJBL8a5opfa8nFeVj0SZToW8pl4+zgcSUkeZFRO4=
|
||||
-----END PUBLIC KEY-----
|
||||
@@ -1,3 +0,0 @@
|
||||
-----BEGIN PUBLIC KEY-----
|
||||
MCowBQYDK2VwAyEAIQVeSGwsjPjyepPTnzzYqVxIxviSEjZXU7C7zuNTui4=
|
||||
-----END PUBLIC KEY-----
|
||||
@@ -1,3 +0,0 @@
|
||||
-----BEGIN PUBLIC KEY-----
|
||||
MCowBQYDK2VwAyEAG95Ettl3jTi41HM8le1A9WDmOEq0ANEqpLF7zTZrfXA=
|
||||
-----END PUBLIC KEY-----
|
||||
@@ -1,3 +0,0 @@
|
||||
-----BEGIN PUBLIC KEY-----
|
||||
MCowBQYDK2VwAyEA/ymOIb0sJ0qCWrf3mKz7ACCvsMXLog/EK533JfNXZTM=
|
||||
-----END PUBLIC KEY-----
|
||||
@@ -1,3 +0,0 @@
|
||||
-----BEGIN PUBLIC KEY-----
|
||||
MCowBQYDK2VwAyEAZ4F3KqOp/5QrPdZGqqX6PYYEGd2snX4Q3AUt9XAG3v8=
|
||||
-----END PUBLIC KEY-----
|
||||
Reference in New Issue
Block a user