Skip to content

[sim2_daily] Implement API Data Pagination

Okay, incorporating the provided Plumber header information, here's an enhanced issue description:


Issue: Implement Pagination for sim2_daily API Endpoint

Summary: This issue addresses the need to implement pagination for the /sim2_daily API endpoint to improve performance when querying the SIM2 daily database, especially for large rectangular regions. The endpoint currently returns data in CSV format and is used to access data from the Meteo-France dataset (https://www.data.gouv.fr/fr/datasets/6569b27598256cc583c917a7/).

Background: The /sim2_daily API allows users to query a rectangular region of the SIM2 daily database based on Lambert II coordinates (LAMBX, LAMBY) and an optional date range. Given the potential size of the data within a given region, returning the entire result set at once can lead to performance issues and client-side memory constraints.

Proposed Solution: Implement pagination to allow clients to retrieve data in smaller, manageable chunks (pages). This will significantly improve performance and scalability.

Detailed Implementation:

  1. API Parameters: Add offset and limit parameters to the API.

    • offset: Integer representing the starting row number (0-indexed).
    • limit: Integer representing the maximum number of rows to return per page.
  2. Data Filtering & Slicing: Modify the API logic to:

    • Apply the existing filtering criteria (LAMBX, LAMBY, DATE).
    • Use slice() function to retrieve the correct set of rows based on offset and limit.
  3. Metadata Delivery: Alongside the CSV data, provide the following pagination metadata. Because the API is serialized as CSV, the metadata must be delivered via HTTP Headers, not within the CSV itself.

To dynamically add pagination data (e.g., X-Total-Count, X-Page, or even a Link header) when using a serializer_csv() in Plumber, the best practice is to modify the response headers dynamically inside your endpoint function, based on your data and pagination logic.

Here’s how you can do it step-by-step:


🔧 1. Example with dynamic pagination headers

library(plumber)

#* Paginated CSV endpoint
#* @get /paginated-csv
#* @serializer csv
function(req, res, page = 1, per_page = 10) {
  # Simulate data
  full_data <- data.frame(
    id = 1:100,
    name = paste("Item", 1:100)
  )
  
  # Pagination logic
  page <- as.integer(page)
  per_page <- as.integer(per_page)
  start <- ((page - 1) * per_page) + 1
  end <- min(nrow(full_data), start + per_page - 1)
  
  # Subset data
  paginated_data <- full_data[start:end, , drop = FALSE]
  
  # Set pagination headers
  res$setHeader("X-Total-Count", nrow(full_data))
  res$setHeader("X-Page", page)
  res$setHeader("X-Per-Page", per_page)
  
  # Optional: Add Link header for navigation
  base_url <- req$rook$url_scheme %||% "http"
  host <- req$HTTP_HOST %||% "localhost"
  path <- req$PATH_INFO
  next_page <- page + 1
  prev_page <- max(1, page - 1)
  
  link_header <- sprintf(
    '<%s://%s%s?page=%d&per_page=%d>; rel="next", <%s://%s%s?page=%d&per_page=%d>; rel="prev"',
    base_url, host, path, next_page, per_page,
    base_url, host, path, prev_page, per_page
  )
  
  res$setHeader("Link", link_header)
  
  return(paginated_data)
}

📥 Response headers (example)

X-Total-Count: 100
X-Page: 2
X-Per-Page: 10
Link: <http://localhost/paginated-csv?page=3&per_page=10>; rel="next", <http://localhost/paginated-csv?page=1&per_page=10>; rel="prev"

Notes

  • You can use these headers for API consumers to manage pagination.
  • If you're exporting this as a downloadable CSV (filename="data.csv"), you can still set these headers, and they'll be included in the HTTP response (but not in the CSV content itself).
  • You can wrap this pattern in a helper function to generalize it across routes.

Would you like a reusable helper for pagination handling in Plumber?

Acceptance Criteria:

  • The API accepts offset and limit parameters.
  • The API returns the correct number of rows for a given offset and limit.
  • The API delivers pagination metadata via HTTP headers as described above.
  • The API correctly handles cases where offset + limit exceeds the total number of matching rows.
  • Performance is demonstrably improved for large datasets when using pagination.
  • The API documentation (including the Plumber header) is updated to reflect the new parameters and HTTP Headers.
Edited by David Dorchies