Range partitioning
Range partitioning is a method of dividing a large dataset into smaller, more manageable chunks, based on a range of values.
For example, let's say we have a large dataset of customer orders, and we want to partition it based on the date the order was placed. We could divide the dataset into four partitions: orders placed in January-March, April-June, July-September, and October-December. Each partition would contain only the orders that fall within the specified date range.
This diagram shows how the original dataset of customer orders is divided into four partitions based on the date range of the orders. Each partition contains only the orders that fall within its specified date range, making it easier to manage and analyze the data.
Database management systems divide a table into multiple partitions, each containing a range of data based on a specific column or set of columns. Each partition is treated as a separate table that can be stored on various storage devices or servers.
Consider the "sales" table, which stores information about products sold in a store. "product id," "product name," "product category," "price," and "date sold" are the columns in the table. We can use range partitioning based on the "date sold" column to improve query performance on this table.
In this example, the "sales" table is partitioned into two partitions, "Partition 1" and "Partition 2", based on the date_sold column. Partition 1 contains all rows with a date_sold between January 1st and June 30th, while Partition 2 contains all rows with a date_sold between July 1st and December 31st. When a query is executed on the "sales" table, the database management system can quickly determine which partition(s) contain the relevant data, rather than scanning the entire table. This improves the performance of the query and reduces the amount of data that needs to be scanned.