Async Polling for Grade Syncs

Bulk grade exports are the one ingestion task an LMS almost never answers synchronously. Unlike a lightweight roster lookup, a request for an entire course’s gradebook forces the platform to recompute assignment-group weights, apply late-submission penalties, resolve excused and missing states, and flatten multiple grading periods before a single byte is serialized. Rather than hold an HTTP connection open for that work, Canvas, Moodle, and Blackboard all return a deferred job handle and expect the client to poll for completion. Treating that handle as if it were the data — blocking on a single GET, or hammering the status endpoint in a tight loop — is the most common way an otherwise-correct pipeline corrupts incremental state or earns an account-level throttle. This page specifies the job entity you are actually polling, the endpoints that drive it, and the normalization and compliance rules that turn a finished export into warehouse-ready rows. It is one stage of the broader API Ingestion & Sync Workflows reference.

The deferred-job entity you are polling

A grade sync is not a request/response pair; it is a small state machine whose canonical record is the platform’s job object. On Canvas this object is the Progress resource, returned the moment you kick off a bulk gradebook write or export. Modelling it explicitly — rather than tracking ad-hoc booleans — is what makes the polling loop deterministic and resumable across worker restarts.

The Canvas Progress payload carries the following fields, which map cleanly onto a single staging row:

Field	Type	Meaning	Notes for ingestion
`id`	integer	Primary key of the job	Your composite-key root: `(host, progress_id)`
`context_id`	integer	Course (or account) the job runs for	Foreign key to your `course` dimension
`context_type`	string	`Course`, `Account`, `Group`	Constrains how `context_id` resolves
`tag`	string	Job kind, e.g. `submissions_update` / `gradebook_export`	Discriminator for the result fetch
`workflow_state`	enum	`queued` \| `running` \| `completed` \| `failed`	The only field the poll loop branches on
`completion`	float 0–100	Percent complete	Drives progress UI; never a completion signal on its own
`message`	string \| null	Human-readable error or summary	Capture on `failed` for the audit log
`url`	string	Canonical URL of this Progress object	Poll target; do not reconstruct it by hand
`results`	object \| null	Job-specific result, populated only on `completed`	Where the file id / attachment lives
`updated_at`	ISO 8601	Last server-side transition	Detects stuck jobs (no movement = staleness)

Two relationships matter downstream. First, context_id is a foreign key into the same course entity described by the Canvas Gradebook Data Structure — the export you are awaiting will eventually be joined back to assignment groups and submissions on that key. Second, (host, id) forms the idempotency key for the whole sync: persist it before the first poll so a crashed worker resumes the existing job instead of launching a duplicate export that double-counts grades.

The single most important rule of the schema is that completion is not a terminal signal. A job can sit at completion: 100 while workflow_state is still running for several seconds as the result file is finalized. Branch only on workflow_state; treat completion as cosmetic.

Job-initiation and status endpoints

The async pattern is two endpoints: one that enqueues and hands back a job, and one that reports on it.

Initiate. A bulk gradebook write returns a Progress object directly:

POST /api/v1/courses/:course_id/submissions/update_grades
Authorization: Bearer <token>
Content-Type: application/json

The 202 Accepted response body is the Progress object above, including its url. For exports rather than writes, the equivalent kickoff is the content-export endpoint (POST /api/v1/courses/:course_id/content_exports), which likewise returns a handle whose progress_url you follow.

Monitor. Every subsequent check is a single read against the handle:

GET /api/v1/progress/:id
Authorization: Bearer <token>

This endpoint is deliberately cheap — one cost unit on the Canvas rate-limit budget — but it is not free, and a loop with no sleep will still drain your quota. Parse these response headers on every poll:

X-Rate-Limit-Remaining — remaining bucket capacity; throttle yourself well before it reaches zero.
X-Request-Cost — the cost the call just incurred; useful for budgeting concurrent course polls.
Retry-After — present on a 429 or 503; an authoritative instruction that overrides your computed backoff.

Because grade exports run per course and an institution polls thousands concurrently, polling cadence is governed by the same throttling math covered in Handling Canvas API Rate Limits. The status endpoint is not paginated — it returns one object — but the result it points to often is: a gradebook export can exceed a single page of submissions, so the fetch that follows a completed state must apply the pagination strategies for bulk exports and walk the Link: rel="next" header to completion. When a poll returns 429 or a 5xx, the loop stops being a steady tick and becomes structured retry logic that honours RFC 6585 semantics for Retry-After.

Moodle and Blackboard expose the same shape under different names, which your normalizer must collapse to one canonical state set:

Platform	Kick-off	Poll target	Terminal states
Canvas	`POST .../submissions/update_grades`	`GET /api/v1/progress/:id`	`completed`, `failed`
Moodle	`core_grades_*` web-service call (adhoc task)	task status via `tool_task` / report-builder schedule	`done`, `failed`
Blackboard	`POST .../courses/:id/gradebook/columns/.../attempts` (async grade)	job/status resource on the REST job id	`Complete`, `Error`

Calibrating the poll interval

A naive fixed-interval poll is wrong in both directions: too short and you exhaust the request budget before a large export finishes; too long and a 12-second job adds two minutes of needless latency to your nightly run. The correct cadence starts tight and widens as the job ages, because most exports either complete in the first window or settle into a long queue.

Model the delay before the $n$ -th status check as a capped exponential schedule:

$t_n = \min\!\left(T_{\text{cap}},\; T_{\text{base}} \cdot 2^{\,n}\right)$

then apply full jitter so that thousands of per-course pollers do not synchronize into a request spike:

$t_n' = \operatorname{rand}\!\left(0,\; t_n\right)$

With T_base = 1s and T_cap = 30s, the first few polls land at roughly 1, 2, 4, 8, 16, then 30-second ceilings, each randomized downward. The jitter term is not optional decoration: a deterministic schedule causes a thundering-herd burst against the LMS load balancer at every doubling boundary. The full derivation, and the trade-off between recovery speed and infrastructure friendliness, is worked through in Implementing Exponential Backoff for LMS Syncs. When a status check returns an error rather than a non-terminal state, the same backoff machinery feeds the broader error and retry logic for sync jobs that decides when to give up versus dead-letter the export.

Normalization and transformation logic

A finished Progress object is a pointer, not a payload. The transformation stage has two jobs: collapse the vendor’s job vocabulary into one canonical status, and flatten the result file the job produced into the institutional gradebook schema.

Map every platform’s terminal vocabulary onto a closed enum so downstream consumers never branch on vendor strings:

queued / pending / Created → PENDING
running → RUNNING
completed / done / Complete → SUCCEEDED
failed / Error → FAILED

On SUCCEEDED, fetch the referenced result, then apply the canonical mapping: coerce the raw score to a typed Decimal (never a float — see the type-coercion warnings in the gradebook normalization guidance), reconstruct each submission’s grading-period and assignment-group keys, and stamp two audit columns — job_id and synced_at — so any row can be traced back to the export that produced it. The composite key for a normalized grade row is (host, course_id, assignment_id, student_token), where the final element is a tokenized identifier rather than the raw user_id.

Compliance constraints for grade payloads

A grade export is the most PII-dense object an LMS emits: it pairs an identifiable student with their academic performance, which is exactly the linkage FERPA governs. Two field-level rules apply at this stage, before any data crosses into the warehouse.

First, the student identifier must never be persisted in the clear. Replace user_id (and any sis_user_id or login_id the result carries) with a salted SHA-256 token at the normalization boundary, the same tokenization contract used by cross-LMS student ID mapping. Score, assignment, and timestamp fields pass through; name, email, and login fields are dropped unless a specific downstream consumer is authorized for them.

Second, the job itself is auditable. The Progress id, the requesting service principal, and the completion timestamp belong in an immutable audit log, because a grade write is a change to an official academic record. Retain the message field on FAILED jobs so a registrar dispute can establish whether a sync ever applied. These requirements flow down from the FERPA compliance boundary defined for the data-architecture domain.

Reference Python implementation

The following poller is production-shaped: a single pooled requests.Session, capped exponential backoff with full jitter, authoritative Retry-After handling, branching strictly on workflow_state, and tokenization applied before any record leaves the function. It targets the Canvas Progress object but the control flow is platform-agnostic.

python

import hashlib
import os
import random
import time
from decimal import Decimal
from typing import Any

import requests

CANVAS_HOST = "https://canvas.instructure.com"
T_BASE, T_CAP, MAX_POLLS = 1.0, 30.0, 40
TERMINAL = {"completed": "SUCCEEDED", "failed": "FAILED"}
ID_SALT = os.environ["GRADE_SYNC_SALT"].encode()  # rotated per environment


def _session(token: str) -> requests.Session:
    s = requests.Session()
    s.headers["Authorization"] = f"Bearer {token}"
    # Pool keep-alive sockets so each poll reuses a connection.
    s.mount("https://", requests.adapters.HTTPAdapter(pool_maxsize=20))
    return s


def tokenize(user_id: str | int) -> str:
    """FERPA-safe surrogate: salted SHA-256 of the raw identifier."""
    return hashlib.sha256(ID_SALT + str(user_id).encode()).hexdigest()


def _sleep_for(attempt: int, retry_after: str | None) -> None:
    if retry_after:                      # server instruction wins outright
        time.sleep(float(retry_after))
        return
    ceiling = min(T_CAP, T_BASE * 2 ** attempt)
    time.sleep(random.uniform(0, ceiling))   # full jitter


def poll_progress(session: requests.Session, progress_url: str) -> dict[str, Any]:
    """Drive one Canvas Progress object to a terminal state."""
    for attempt in range(MAX_POLLS):
        resp = session.get(progress_url, timeout=30)

        if resp.status_code in (429, 502, 503, 504):
            _sleep_for(attempt, resp.headers.get("Retry-After"))
            continue
        resp.raise_for_status()

        job = resp.json()
        state = job["workflow_state"]            # NEVER branch on `completion`
        if state in TERMINAL:
            return {"status": TERMINAL[state], "job": job}

        _sleep_for(attempt, None)                # queued / running: back off
    raise TimeoutError(f"Progress {progress_url} did not settle in {MAX_POLLS} polls")


def normalize_grades(course_id: int, rows: list[dict], job_id: int) -> list[dict]:
    """Flatten an export payload into canonical, tokenized grade rows."""
    synced_at = time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime())
    out = []
    for r in rows:
        out.append({
            "host": CANVAS_HOST,
            "course_id": course_id,
            "assignment_id": r["assignment_id"],
            "student_token": tokenize(r["user_id"]),   # raw id never persisted
            "score": Decimal(str(r["score"])) if r.get("score") is not None else None,
            "job_id": job_id,        # audit column → traces row to its export
            "synced_at": synced_at,
        })
    return out

The script deliberately omits the result fetch loop after SUCCEEDED; that step walks the paginated submissions list and feeds normalize_grades. Building the synchronous client itself — session configuration, timeouts, and adapter tuning — is covered in Python Requests for LMS APIs.

Failure modes and edge cases

Stuck-running jobs. A Progress object can wedge in running indefinitely if the upstream task crashes. MAX_POLLS plus a side check on updated_at (no movement for N minutes) catches this; without a ceiling the worker blocks forever.
completion: 100 before completed. As noted in the schema, the percent field reaches 100 seconds before the file is final. Branching on it returns an empty or partial result. Always wait for workflow_state.
Token rotation mid-poll. A long export can outlive an access token, turning a poll into a 401. Treat 401 as non-retryable for backoff but trigger a refresh-and-resume against the persisted progress_id, never a fresh POST (which would launch a duplicate export).
Duplicate kick-off after a crash. If the initiation POST succeeds but the worker dies before persisting the handle, a restart re-issues the write and double-applies grades. Persist (host, progress_id) before the first poll and make initiation idempotent.
Excused and null scores. A null score after an export is legitimate (excused, missing, or ungraded) and must not be coerced to 0 — that silently changes a student’s grade. Carry None through to the warehouse and let weighting logic decide.
429 on the status endpoint itself. Cheap as it is, the poll counts against the budget; thousands of concurrent course pollers can exhaust it. Honour Retry-After and widen the per-course interval rather than retrying immediately.
Result pagination truncation. A large gradebook export spans multiple pages; stopping at page one silently drops the tail of the roster. Follow Link: rel="next" until it is absent.

Handling Canvas API Rate Limits — header parsing and token-bucket pacing that governs poll cadence.
Implementing Exponential Backoff for LMS Syncs — the jittered retry schedule this loop depends on.
Error & Retry Logic for Sync Jobs — deciding when to dead-letter a failed export versus retry it.
Pagination Strategies for Bulk Exports — walking the result payload a finished job points to.
Python Requests for LMS APIs — session pooling and timeout configuration for the poller.

Part of: API Ingestion & Sync Workflows

Explore deeper

Related in this section