Database performance optimization has become one of the most important reasons to worry about these days.
The number of users has increased with time and so the data. We try to store every minor detail to provide better machine learning predictions.
I will be discussing all the approaches we have for pagination. Starting with the traditional approach. i.e.
Offset Based Approach
You have a website with e-commerce product listing or some other thing with a list of things that might be too long for a single page. So, you decide to break it into parts of, say, 50 items, and provide a button to go the next “page”.
For this, you will most probably choose to write below query
SELECT * FROM items WHERE [condition] ORDER BY date DESC OFFSET 40 LIMIT 10
This type of query is fine if we have less data or almost no data in our database, but when it comes to a large database having millions of rows this type of query can become a performance breaker.
What is wrong with OFFSET and LIMIT?
Using offset and limits is perfectly fine, until and unless data is less.
The issue arises when your database has more data and you still need to paginate performantly through them all.
A Full table scan (sequential scan) is done when we use offset, if the offset is 40, it will check the first 40 rows and when then the condition is met, it will display the result of the next 10 rows. Scanning the whole table is not an issue, if the database is small but when it comes to millions of rows then it slows down the performance. Let’s say the offset is 85000. It will first scan all the 85000 rows before printing the rows from 85001 to 85010.
This approach has another big flaw: if the results list has changed between calls to the API, the indices would shift and cause an item to be either returned twice or skipped and never returned.
A second approach is time-based paging. So if your API returns result sorted by newest-first, you could pass the
created date of the last item in the list to get the next page of items created before it:
This solves the problem of the offset approach because results are no longer skipped. If you query the first page, and then a new item is deleted, it won’t shift the results on your second page and all is fine. However, this approach has a major flaw: what if there is more than one item that was created at the same time? So if there were 11 items with the
created timestamp of exactly
1505086265160, then when we query the next page, we’ll miss the 11th entry because we’re querying items created before 15050854654160. You’ll miss results.
SELECT * FROM items WHERE id > 10 LIMIT 10
The best way to build API pagination safely is for the API to return a “cursor” with the list of results. This token is usually called
next_cursor This token can be passed with the next API request to request the next page.
Here’s an example from the API:
By returning a “cursor”, the API guarantees that it will return exactly the next entry in the list, regardless of what changes happen to the collection between API calls. Think of the cursor as a permanent marker in the list that says “we left off here”.
Since a cursor is just a marker to say “we left off here”, we can simply make it the value of a column in the table. The column must satisfy the following properties:
- Unique. This value must be unique for all rows in the table.
- Orderable. Every value of this column must be able to be compared to every other value. This is because we need to sort by this column in order to get a list of results to then return in pages.
- Immutable. The value in this column cannot change, otherwise, the pages wouldn’t return the right information.
The main takeaway of this should be to always check how your queries behave and perform whether it has 1000 rows or 100000 rows. Scalability and reliability are of extreme importance and if implemented properly from the beginning could surely avoid many issues in the future.
You have a website with news articles, or a blog, or some other thing with a list of things that might be too long for…
API Paging Built The Right Way | Mixmax Engineering Blog
Mixmax has a very extensive list of APIs that give our users direct access to all of their data. Some APIs, such as…