Skip to main content

DHT

DHT stands for Distributed Hash Table and is essentially a distributed key-value database, where each member of the network can store something, for example, information about themselves.

The implementation of DHT in TON is inherently similar to the implementation of Kademlia, which is used in IPFS. Any network member can run a DHT node, generate keys and store data. To do this, he needs to generate a random ID and inform other nodes about himself.

To determine which node to store the data on, an algorithm is used to determine the "distance" between the node and the key. The algorithm is simple: we take the ID of the node and the ID of the key, we perform the XOR operation. The smaller the value, the closer the node. The task is to store the key on the nodes as close as possible to the key, so that other network participants can, using the same algorithm, find a node that can give data on this key.

Finding a value by key

Let's look at an example with a search for a key, connect to any DHT node and establish a connection via ADNL UDP.

For example, we want to find the address and public key for connecting to the node that hosts foundation.ton site. Let's say we have already obtained the ADNL address of this site by executing the Get method of the DNS contract. The ADNL address in hex representation is 516618cf6cbe9004f6883e742c9a2e3ca53ed02e3e36f4cef62a98ee1e449174. Now our goal is to find the ip, port and public key of the node that has this address.

To do this, we need to get the ID of the DHT key, first we will fill the DHT key schema:

dht.key id:int256 name:bytes idx:int = dht.Key

name is the type of key, for ADNL addresses the word address is used, and, for example, to search for shardchain nodes - nodes. But the key type can be any array of bytes, depending on the value you are looking for.

Filling in this schema, we get:

8fde67f6                                                           -- TL ID dht.key
516618cf6cbe9004f6883e742c9a2e3ca53ed02e3e36f4cef62a98ee1e449174 -- our searched ADNL address
07 61646472657373 -- key type, the word "address" as an TL array of bytes
00000000 -- index 0 because there is only 1 key

Next - get the key ID, sha256 hash from the bytes serialized above. It will be b30af0538916421b46df4ce580bf3a29316831e0c3323a7f156df0236c5b2f75

Now we can start searching. To do this, we need to execute a query that has schema:

dht.findValue key:int256 k:int = dht.ValueResult

key is the id of our DHT key, and k is the "width" of the search, the smaller it is, the more accurate, but fewer potential nodes to query. The maximum k for nodes in a TON is 10, usually 6 is used.

Let's populate this structure, serialize and send the request using the adnl.message.query schema. You can read more about this in another article.

In response, we can get:

  • dht.valueNotFound - if the value is not found.
  • dht.valueFound - if the value is found on this node.
dht.valueNotFound

If we get dht.valueNotFound, the response will contain a list of nodes that are known to the node we requested and are as close as possible to the key we requested from the list of nodes known to it. In this case, we need to connect and add the received nodes to the list known to us. After that, from the list of all nodes known to us, select the closest, accessible and not yet requested, and make the same request to it. And so on until we try all the nodes in the range we have chosen or until we stop receiving new nodes.

Let's analyze the response fields in more detail, the schemas used:

adnl.address.udp ip:int port:int = adnl.Address;
adnl.addressList addrs:(vector adnl.Address) version:int reinit_date:int priority:int expire_at:int = adnl.AddressList;

dht.node id:PublicKey addr_list:adnl.addressList version:int signature:bytes = dht.Node;
dht.nodes nodes:(vector dht.node) = dht.Nodes;

dht.valueNotFound nodes:dht.nodes = dht.ValueResult;

dht.nodes -> nodes - list of DHT nodes (array).

Each node has an id which is its public key, usually pub.ed25519, used as a server key to connect to the node via ADNL. Also, each node has a list of addresses addr_list:adnl.addressList, version and signature.

We need to check the signature of each node, for this we read the value of signature and set the field to zero (we make it an empty bytes array). After - we serialize the TL structure dht.node with the empty signature and check the signature field that was before emptying. We check on the received serialized bytes, using the public key from id field. [Implementation example]

From the list addrs:(vector adnl.Address) we take the address and try to establish an ADNL UDP connection, as the server key we use id, which is the public key.

To find out the "distance" to this node - we need to take key id from the key from the id field and check the distance by the XOR operation from the node's key id and the desired key. If the distance is small enough, we can make the same request to this node. And so on, until we find a value or there are no more new nodes.

dht.valueFound

The response will contain the value itself, the full key information, and optionally a signature (depends on value type).

Let's analyze the response fields in more detail, the schemas used:

adnl.address.udp ip:int port:int = adnl.Address;
adnl.addressList addrs:(vector adnl.Address) version:int reinit_date:int priority:int expire_at:int = adnl.AddressList;

dht.key id:int256 name:bytes idx:int = dht.Key;

dht.updateRule.signature = dht.UpdateRule;
dht.updateRule.anybody = dht.UpdateRule;
dht.updateRule.overlayNodes = dht.UpdateRule;

dht.keyDescription key:dht.key id:PublicKey update_rule:dht.UpdateRule signature:bytes = dht.KeyDescription;

dht.value key:dht.keyDescription value:bytes ttl:int signature:bytes = dht.Value;

dht.valueFound value:dht.Value = dht.ValueResult;

First, let's analyze key:dht.keyDescription, it is a complete description of the key, the key itself and information about who and how can update the value.

  • key:dht.key - the key must match the one from which we took the key ID for the search.
  • id:PublicKey - the public key of the record owner.
  • update_rule:dht.UpdateRule - record update rule.
    • dht.updateRule.signature - only the owner of the private key can update the record, the signature of both the key and the value must be valid
    • dht.updateRule.anybody - everyone can update the record, signature is empty and not checked
    • dht.updateRule.overlayNodes - nodes from the same overlay can update the key, used to find nodes of the same overlay and add yourself
dht.updateRule.signature

After reading the description of the key, we act depending on the updateRule, for the ADNL address lookup case the type is always dht.updateRule.signature. We check the key signature in the same way as last time, make the signature an empty byte array, serialize and check. After - we repeat the same for the value, i.e. for the entire dht.value object (while returning the key signature to its place).

[Implementation example]

dht.updateRule.overlayNodes

Used for keys containing information about other nodes-shards of the workchain in the network, the value always has the TL structure overlay.nodes. The value field must be empty.

overlay.node id:PublicKey overlay:int256 version:int signature:bytes = overlay.Node;
overlay.nodes nodes:(vector overlay.node) = overlay.Nodes;

To check for validity, we must check all nodes and for each check signature against its id by serializing the TL structure:

overlay.node.toSign id:adnl.id.short overlay:int256 version:int = overlay.node.ToSign;

As we can see, id should be replaced with adnl.id.short, which is the key id (hash) of the id field from the original structure. After serialization - we check the signature with the data.

As a result, we get a valid list of nodes that are able to give us information about the workchain shard we need.

dht.updateRule.anybody

There are no signatures, anyone can update.

Using a value

When everything is verified and the ttl:int value has not expired, we can start working with the value itself, i.e. value:bytes. For an ADNL address, there must be an adnl.addressList structure inside. It will contain ip addresses and ports of servers corresponding to the requested ADNL address. In our case, there will most likely be 1 RLDP-HTTP address of the foundation.ton service. We will use the public key id:PublicKey from the DHT key information as the server key.

After the connection is established, we can request the pages of the site using the RLDP protocol. The task from the DHT side at this stage is completed.

Search for nodes that store the state of the blockchain

DHT is also used to find information about the nodes that store the data of workchains and their shards. The process is the same as when searching for any key, the only difference is in the serialization of the key itself and the validation of the response, we will analyze these points in this section.

In order to get data, for example, of the masterchain and its shards, we need to fill in the TL structure:

tonNode.shardPublicOverlayId workchain:int shard:long zero_state_file_hash:int256 = tonNode.ShardPublicOverlayId;

Where workchain in the case of a masterchain will be equal to -1, its shard will be equal to -922337203685477580 (0xFFFFFFFFFFFFFFFF), and zero_state_file_hash is the hash of the zero state of the chain (file_hash), like other data, it can be taken from the global network config, in the "validator" field

"zero_state": {
"workchain": -1,
"shard": -9223372036854775808,
"seqno": 0,
"root_hash": "F6OpKZKqvqeFp6CQmFomXNMfMj2EnaUSOXN+Mh+wVWk=",
"file_hash": "XplPz01CXAps5qeSWUtxcyBfdAo5zVb1N979KLSKD24="
}

After we have filled in tonNode.shardPublicOverlayId, we serialize it and get the key id from it by hashing (as always).

We need to use the resulting key ID as name to fill in the pub.overlay name:bytes = PublicKey structure, wrapping it in TL bytes array. Next, we serialize it, and we get the key ID now from it.

The resulting id will be the key to use in dht.findValue, and the name field's value will be the word nodes. We repeat the process from the previous section, everything is the same as last time, but updateRule will be dht.updateRule.overlayNodes.

After validation - we will get the public keys (id) of the nodes that have information about our workchain and shard. To get the ADNL addresses of the nodes, we need to make IDs from the keys (using the hashing method) and repeat the procedure described above for each of the ADNL addresses, as with the ADNL address of the foundation.ton domain.

As a result, we will get the addresses of the nodes from which, if we want, we can find out the addresses of other nodes of this chain using overlay.getRandomPeers. We will also be able to receive all the information about the blocks from these nodes.

References

Here a link to the original article by Oleg Baranov.