All about pagination — Database Performance optimization

Database performance optimization has become one of the most important reasons to worry about these days.

The number of users has increased with time and so the data. We try to store every minor detail to provide better machine learning predictions.

I will be discussing all the approaches we have for pagination. Starting with the traditional approach. i.e.

Offset Based Approach

For this, you will most probably choose to write below query

SELECT  * FROM  items WHERE  [condition] ORDER BY  date DESC OFFSET  40  LIMIT 10

This type of query is fine if we have less data or almost no data in our database, but when it comes to a large database having millions of rows this type of query can become a performance breaker.

What is wrong with OFFSET and LIMIT?

The issue arises when your database has more data and you still need to paginate performantly through them all.

A Full table scan (sequential scan) is done when we use offset, if the offset is 40, it will check the first 40 rows and when then the condition is met, it will display the result of the next 10 rows. Scanning the whole table is not an issue, if the database is small but when it comes to millions of rows then it slows down the performance. Let’s say the offset is 85000. It will first scan all the 85000 rows before printing the rows from 85001 to 85010.

This approach has another big flaw: if the results list has changed between calls to the API, the indices would shift and cause an item to be either returned twice or skipped and never returned.

Time-based paging

Example API

curl https://api.mywebsite.com/items?created_before_timestamp=15050888888&limit=10

This solves the problem of the offset approach because results are no longer skipped. If you query the first page, and then a new item is deleted, it won’t shift the results on your second page and all is fine. However, this approach has a major flaw: what if there is more than one item that was created at the same time? So if there were 11 items with the created timestamp of exactly 1505086265160, then when we query the next page, we’ll miss the 11th entry because we’re querying items created before 15050854654160. You’ll miss results.

Cursor-based paging

SELECT  * FROM  items WHERE  id > 10  LIMIT 10

The best way to build API pagination safely is for the API to return a “cursor” with the list of results. This token is usually called next or next_cursor This token can be passed with the next API request to request the next page.

Here’s an example from the API:

curl https://api.mywebsite.com/items?&cursor=10

By returning a “cursor”, the API guarantees that it will return exactly the next entry in the list, regardless of what changes happen to the collection between API calls. Think of the cursor as a permanent marker in the list that says “we left off here”.

Since a cursor is just a marker to say “we left off here”, we can simply make it the value of a column in the table. The column must satisfy the following properties:

  • Unique. This value must be unique for all rows in the table.
  • Orderable. Every value of this column must be able to be compared to every other value. This is because we need to sort by this column in order to get a list of results to then return in pages.
  • Immutable. The value in this column cannot change, otherwise, the pages wouldn’t return the right information.

Conclusion

References

I am Indian by birth, Punjabi by destiny. Humanity is my religion. Love to eat, travel, read books and my million dreams keep me alive.