代做Assignment 3: B-Tree Layer代做数据库编程

- 首页 >> C/C++编程

Assignment 3: B-Tree Layer

Introduction

In this assignment, you must implement the missing pieces of the B-Tree layer including the logic for insert, delete, and balance operations in a B-tree data structure. This assignment tests your knowledge of the B-Tree data structures and algorithms, level up your system design skills, and get a feel for how things work under the hood in a database. When you conclude this assignment, you will better understand how the B-Tree layer allows its upper layer (VDBE) to execute SQL operations by:

● Creating tables and indexes

● Inserting, searching, and deleting key-data pairs in O(log n) time

● Traversing through key-data pairs in the order of the keys

● Starting transactions

Task 1: Complete B-Tree Balance Function

This task aims to help you understand how a B-Tree-inspired data storage engine works under the hood. B-Trees are designed to keep their height low, even as they grow. A balancing function ensures that the tree remains balanced after inserts and deletes, so search, insert, and delete operations run in O(log n) time. In this part, you have to finish the balance functions in the B-Tree layer. These functions are responsible for balancing the file structure. Without balance, the tree could become skewed (like a linked list), slowing operations and defeating the purpose of using a B-Tree.

What is a B-Tree?

A. B-Tree Definition

B-Tree is a self-balancing search tree in which each node can have more than one element and more than two children. It is a generalized form. of a binary search tree. A B-Tree keeps data sorted and allows searches, insertions, and deletion in logarithmic time:  O(log n).

In a traditional binary search tree, we only have 1 element stored inside a node. This means that each node has at most 2 child pointers (right and left pointers), as shown in Figure 1.

Figure 1

On the other hand, a B-Tree allows more than one element to be inserted inside a single node. This results in nodes having more than two child pointers.

The Figure 2 below shows the structure of a B-Tree. Each internal node with k elements has k+1 child pointers. The elements inside each node are also sorted.

 

Figure 2: Example of B-tree (source)

As mentioned before, the time complexity of searches, insertions, and deletions on a B-Tree is O(log n), where n is the number of elements in the tree. This time is not affected by the maximum number of elements we are allowed to insert in each node, normally denoted as m. This is the same complexity as the operations on a self-balancing binary search tree.

B. Tree in File Structure Design

At first glance, B-Trees do not provide much benefit compared to a binary search tree. However, creating a search tree with multiple elements in 1 node becomes relevant when designing memory-efficient file structures.

In CapybaraDB, all the data in a database is stored within one file. The file is split into multiple sections of equal sizes in bytes called pages. A majority of the pages are linked into search-trees, such that each node in the tree is represented as a page, named node page.

The major difference between B-Trees in definitions and B-Trees in file format is the size limiting factor. In most definitions of the B-Tree, a constant value m denotes the order of a B-Tree. In other words, itt determines how many elements can be placed in a node and when to split the node for balancing purposes. In file format, we are limited by a constant page size (e.g., 1 KB). We determine when to split based on how much free space there is on a page.

TODOs

The codebase (file btree_balance.cc) contains several "TODO" comments to guide you. These markers indicate the locations where you need to insert the implementation logic to balance the B-Tree. Pay close attention to these "TODOs" as they directly correspond to the following key functions:

● B-Tree::Balance():

○ This method is responsible for rebalancing the B-tree after insertions or deletions. The "TODO" within this function will require you to complete some logic for redistributing cells across pages when a node becomes overfull. This may include identifying sibling pages, splitting or merging nodes, updating parent pointers, and ensuring the cursor remains correctly positioned after the rebalancing process. 

● B-Tree::BalanceHelperHandleRoot():

○ This helper method is responsible for handling balance operations specifically on the root page of the B-tree, which requires special treatment. The "TODO" within this function will require you to implement logic that preserves the root’s identity while restructuring its contents. This may include copying data to a child node, clearing the root page, adjusting parent pointers, and determining whether the balancing operation can return early. 

● B-Tree::BalanceHelperFindChildIdx():

○ This helper method is responsible for checking which cell in the parent node points at the current page node. The "TODO" within this function will require you to implement logic that finds the cell index in the node page that points to the current node page.

Task 2: Complete B-Tree Insertion and Deletion Functions

In this task, your goal is to implement the logic behind insertion and deletion which are the key operations that B-Tree needs to support. These functions rely heavily on cursor navigation, page manipulation, overflow handling, and the balance function implemented in Task 1. Your implementation must ensure correctness even in edge cases — such as replacing an existing key or deleting from internal or leaf nodes. This task will deepen your understanding of how B-Trees maintain structure and correctness as data is added and removed.

TODOs

The codebase  (file btree_bt_cursor_public.cc)  contains several "TODO" comments to guide you. These markers indicate the locations where you need to insert the implementation logic. Pay close attention to these "TODOs" as they directly correspond to the following key functions:

● B-Tree::B-TreeInsert():

○ This method is responsible for inserting a key-data pair into the B-tree. The "TODO" within this function will require you to complete logic for locating the correct position for insertion, handling overflow pages when keys or data are too large, replacing existing entries with the same key, and calling the balance function to maintain the tree structure. You will also need to ensure that the cursor correctly reflects the new insertion point.

● B-Tree::B-TreeDelete():

○ This method is responsible for deleting the key-data pair that the cursor is currently pointing to. The "TODO" within this function will require you to implement logic to remove the entry from the page, handle overflow cleanup, and distinguish between deletion in internal vs. leaf nodes. You will also need to invoke the balancing function afterward to ensure the B-tree maintains its structural properties following the deletion.

Submission

For this assignment, you are required to submit your work via Autolab. Please submit a .tar file that contains only the following two files:

● btree_balance.cc

● btree_bt_cursor_public.cc




站长地图