BNB Chain has unveiled a new multi-datastore solution for its BNB Smart Chain (BSC) Geth nodes, aimed at tackling performance inefficiencies caused by the rapid increase in data volume. According to the BNB Chain Blog, the new approach addresses issues related to mixed data storage patterns, decreased querying efficiency, and optimization conflicts within a single key-value database.
Current Challenges
Currently, BSC node data is stored in a single key-value database instance, categorized by different prefixes. This setup has led to several complications:
Inefficient performance due to mixed storage of data with different patterns.
Decreased querying efficiency as the database size grows, particularly during execution processes.
Limited ability to optimize database parameters for different data patterns, as read and write optimizations often conflict.
The existing storage pattern includes a single KV store and two Ancient stores, which handle different types of data access patterns.
Proposed Solution
Multi-Database Approach
The new solution involves segregating the blockchain data into three distinct databases: Block Database, Trie Database, and Snapshot Database, each designed according to specific data schemas and access behaviors.
Block Database: Stores block-related data such as headers, bodies, receipts, difficulties, and historical block data.
Trie Database: Contains all trie nodes of the current state and historical state data of nearly 90,000 blocks.
Snapshot Database: Houses snapshot data, transaction indexes, contract codes, and other metadata. This database is read-intensive and frequently accessed during block execution.
Folder Structure
The new folder structure includes the original database within the chaindata/ folder, and introduces new block/ and state/ folders for storing block and trie data, respectively. An ancient folder is also included for storing historical data under each directory.
Impact and Performance
The multi-database approach is expected to enhance the performance, scalability, and maintainability of BSC nodes. By separating databases based on data schemas and access behaviors, the solution aims to reduce read/write latency and improve overall blockchain performance.
Block Database
The Block Database will store recent blocks in a key-value database before migrating them to the ancient database, reducing disk bandwidth usage. BNB Chain plans to retain only 20-30 recent blocks in the key-value database, as opposed to the previous 90,000 blocks, to align with its Proof-of-Stake-Authority consensus mechanism.
Trie Database
The Trie Database will handle the rapidly growing trie nodes of Merkle Patricia Tries (MPT). This separation will reduce database compaction costs and improve read/write speed, thereby enhancing block execution and verification performance.
Snapshot Database
By isolating snapshot data in its own database, BNB Chain aims to reduce the depth of the Log-Structured Merge (LSM) tree, improving read/write performance. During blockchain execution, frequent access to snapshot data will benefit from this reduced latency.
Testing Results
Tests conducted on an EC2 m6i.4xlarge machine with Geth v1.3.10 showed significant performance improvements. The multi-database setup outperformed the single database model, especially when databases were distributed across multiple disks.
ETH Adoption
This multi-database solution is also being contributed to the Ethereum Geth client. Discussions with Geth developers are ongoing, and the feature is expected to become part of the Ethereum Geth client upon merging the pull request.
Looking Forward
As blockchain data continues to grow, BNB Chain emphasizes the importance of building efficient storage models for different data types. The multi-database support helps in storing state data independently, paving the way for a high-performance state data engine. This initiative aims to make the BSC network more robust and efficient.
Image source: Shutterstock