Python Requests for LMS APIs: Sessions, Auth, and Resilient Transport

The Python requests library is the transport layer underneath almost every institutional LMS ingestion job: the component that turns an access token and an endpoint path into a stream of gradebook, roster, and engagement records. For EdTech engineers, academic IT teams, and data analysts building synchronization pipelines, the hard part is never the first GET. It is making the thousandth request behave — reusing a connection pool across a fleet of course sections, refreshing a token that rotates mid-batch, decoding a vendor’s pagination dialect, backing off cleanly when the platform meters traffic, and doing all of it without ever spilling a raw student_id into a log line. A request client that looks correct against one sandbox course quietly corrupts data the moment a registrar reopens a grading period or a vendor recoerces a numeric score into a string.

This page treats the HTTP client as a first-class entity in the pipeline rather than a convenience wrapper. It covers the session and credential model you should maintain locally, the exact Canvas, Moodle, and Blackboard endpoints and headers that govern a request, how raw payloads map onto the canonical institutional schema, the FERPA constraints on what a transport layer may log, a production-quality reference client, and the vendor-specific failure modes that only surface under fleet-scale load. It is one building block of the broader API ingestion and sync workflows that move student data from a learning platform into a warehouse.

The Request Session as a First-Class Entity

Most failures in LMS transport trace back to treating each call as an isolated function invocation instead of a stateful actor. A robust client models three related entities — the session, the credential, and the request record — and persists enough of each to reason about retries, budgets, and audit trails.

The session is the long-lived object that owns a TCP connection pool, default headers, TLS verification, and a mounted retry adapter. Creating one requests.Session() per worker and reusing it across endpoints is the single highest-leverage decision in the client: it amortizes the TLS handshake, keeps keep-alive connections warm, and gives you one place to attach authentication and backoff policy.

Field	Owner	Type	Meaning
`session`	worker	`requests.Session`	Connection pool + default headers; one per worker, reused across calls.
`base_url`	config	`str`	Tenant root, e.g. `https://canvas.institution.edu/api/v1`.
`access_token_hash`	credential	`str`	SHA-256 of the bearer token; the safe foreign key tying a request record to the credential that spent it.
`token_expires_at`	credential	`datetime \| None`	OAuth expiry; `None` for non-expiring developer keys.
`request_cost`	response	`float \| None`	Canvas `X-Request-Cost`; the actual budget charge of a call.
`remaining`	response	`float \| None`	Rate-limit headroom left after the call settled.
`observed_at`	client clock	`datetime`	Response timestamp, used to model continuous budget refill and to order the audit ledger.

The relational shape is one credential row keyed by access_token_hash, one mutable budget row per credential, and one immutable request-record row per call capturing (access_token_hash, method, endpoint, status, request_cost, remaining, observed_at). That ledger is what lets you reconstruct why a worker stalled — almost always a burst of expensive nested reads rather than a high request count. Critically, the token itself never appears in any of these structures; only its hash does, which is what keeps the client compatible with the FERPA compliance boundary the rest of the pipeline enforces.

A session is configured once and then left alone:

python

import hashlib
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry


def build_session(token: str, base_url: str) -> requests.Session:
    """Return a connection-pooled session with auth and transport-level retries."""
    session = requests.Session()
    session.headers.update(
        {
            "Authorization": f"Bearer {token}",
            "Accept": "application/json",
            "User-Agent": "institution-lms-ingest/1.0",
        }
    )
    retry = Retry(
        total=5,
        connect=3,
        read=3,
        status_forcelist=(500, 502, 503, 504),  # not 403/429 — those carry Retry-After
        allowed_methods=frozenset({"GET", "POST"}),
        backoff_factor=1.5,  # 0s, 1.5s, 3s, 6s, 12s
        respect_retry_after_header=True,
        raise_on_status=False,
    )
    adapter = HTTPAdapter(max_retries=retry, pool_connections=16, pool_maxsize=16)
    session.mount("https://", adapter)
    session.base_url = base_url  # type: ignore[attr-defined]
    return session


def token_hash(token: str) -> str:
    """Stable, non-reversible credential id for ledgers and logs."""
    return hashlib.sha256(token.encode()).hexdigest()

Transport-level retries handle the network’s bad days — dropped connections, brief gateway errors — but deliberately exclude 403 and 429. Those are budget signals, not network faults, and they carry vendor instructions (Retry-After, cost headers) that a blind backoff would discard. Honoring that distinction is the foundation of correct API rate limit handling.

API Endpoints, Auth Headers, and Request Patterns

The transport layer has to speak three dialects. The endpoint paths differ, the authentication scheme differs, and — most painfully — the pagination mechanics differ across Canvas, Moodle, and Blackboard. A client that hardcodes one vendor’s conventions silently truncates results against another.

Canvas (REST)

Canvas exposes a conventional REST surface under /api/v1/. Reads that matter to ingestion include GET /api/v1/courses/:id/enrollments, GET /api/v1/courses/:id/students/submissions, and GET /api/v1/courses/:id/assignment_groups?include[]=assignments. Authentication is a bearer token in the Authorization header. Every response carries X-Rate-Limit-Remaining and X-Request-Cost, and a tripped budget returns 403 Forbidden (Rate Limit Exceeded) — not the 429 generic middleware expects.

Pagination is driven by the Link header, not by a body cursor. Each page returns rel="next", rel="prev", rel="first", and rel="last" URLs, and you follow next until it disappears. requests parses this for you via response.links:

python

def iter_pages(session: requests.Session, url: str, params: dict | None = None):
    """Yield JSON bodies across all Canvas pages by following the Link rel=next header."""
    while url:
        resp = session.get(url, params=params, timeout=30)
        resp.raise_for_status()
        yield resp
        url = resp.links.get("next", {}).get("url")
        params = None  # the next URL already encodes per_page and cursor

Always set per_page=100 on the first call; the Canvas default of 10 turns a 5,000-student roster into 500 round trips. The deeper mechanics of cursor versus offset paging are covered in the pagination strategies for bulk exports guide.

Moodle (web service)

Moodle has no REST resource tree. Every call is a POST (or GET) to a single endpoint, /webservice/rest/server.php, with the operation selected by a wsfunction parameter and the format pinned with moodlewsrestformat=json. Authentication is a wstoken query parameter rather than a header, which means the token is part of the request body or query string — a detail with direct logging consequences. A roster pull is wsfunction=core_enrol_get_enrolled_users&courseid=...; grades come from gradereport_user_get_grade_items. Moodle returns the entire result set in one response with no native paging on most functions, so memory pressure, not round-trip count, is the constraint.

Blackboard (REST)

Blackboard Learn uses an OAuth 2.0 client-credentials flow: you POST to /learn/api/public/v1/oauth2/token with a base64 Basic credential to mint a short-lived bearer token, then call resources like GET /learn/api/public/v1/courses/:id/gradebook/columns. Paging is body-based: a paging.nextPage relative URL appears in the JSON until the final page. Because the bearer token typically expires in an hour, the session’s Authorization header has to be re-minted mid-job — which is exactly the rotation problem solved in automating Canvas API token refresh in Python, whose pattern generalizes to Blackboard’s client-credentials grant.

Concern	Canvas	Moodle	Blackboard
Base path	`/api/v1/...`	`/webservice/rest/server.php`	`/learn/api/public/v1/...`
Auth	`Authorization: Bearer` header	`wstoken` query param	OAuth2 bearer (1h expiry)
Operation selector	URL path	`wsfunction` param	URL path
Pagination	`Link` header `rel=next`	none (full set)	`paging.nextPage` in body
Throttle signal	`403` + `X-Rate-Limit-Remaining`	server-config limits	`429` + `Retry-After`

Normalization and Transformation Logic

Raw LMS payloads never match a warehouse schema, and the transport layer is the wrong place to do heavy reshaping — but it is the right place to apply the thin, deterministic coercions that make every downstream stage simpler. The rule of thumb: the request client normalizes structure and identity; the warehouse normalizes meaning.

Three transformations belong at the request boundary. First, identity tokenization: replace every vendor user_id, login_id, or SIS sis_user_id with its hash before the record leaves the client, so no raw identifier is ever held in memory longer than the parse. Second, timestamp canonicalization: Canvas emits ISO-8601 with a Z suffix, Moodle emits Unix epoch integers, and Blackboard emits ISO-8601 with offsets — coerce all three to timezone-aware UTC datetime. Third, null discipline: a missing grade is semantically distinct from a zero, so coerce "", None, and absent keys into an explicit sentinel rather than letting a 0.0 leak in.

python

from datetime import datetime, timezone


def canonical_ts(value) -> datetime | None:
    """Coerce Canvas ISO-Z, Moodle epoch ints, and Blackboard ISO-offset to UTC."""
    if value in (None, "", 0):
        return None
    if isinstance(value, (int, float)):  # Moodle epoch seconds
        return datetime.fromtimestamp(value, tz=timezone.utc)
    return datetime.fromisoformat(str(value).replace("Z", "+00:00")).astimezone(timezone.utc)


def flatten_enrollment(raw: dict) -> dict:
    """Map a raw Canvas enrollment payload onto the canonical institutional shape."""
    return {
        "student_key": token_hash(str(raw["user_id"])),
        "course_key": str(raw["course_id"]),
        "role": raw.get("role", "StudentEnrollment"),
        "state": raw.get("enrollment_state"),
        "last_activity_at": canonical_ts(raw.get("last_activity_at")),
        "grade": (raw.get("grades") or {}).get("current_score"),  # None ≠ 0.0
    }

The composite key (student_key, course_key) is what lets records from three different vendors land in one table, and reconstructing grading weights or attendance aggregates from there is the job of the gradebook and attendance normalization layer, not the HTTP client.

Compliance Constraints on the Transport Layer

Because the request client is the first place raw student data materializes, it is also the first place FERPA can be violated. The constraint is concrete and field-level, not aspirational.

The bearer token and wstoken are credentials, not student data, but they are still secrets: they must come from an environment variable or secrets manager, never a literal in source, and they must never be logged. The trap unique to Moodle is that the token lives in the URL query string, so any logger that records full request URLs — including requests’ own debug logging at DEBUG level — will write the credential to disk. Strip it before logging.

Student identifiers (user_id, sis_user_id, login_id, email) are protected education-record fields. They may exist transiently in the parse step but must be tokenized before the record is logged, cached, or emitted. Names, grades, and submission contents must never appear in transport logs at all; a request record should carry only access_token_hash, method, the templated endpoint (/courses/:id/enrollments, not the populated path), status, cost, and timing. Add an ingested_at audit column to every emitted row so downstream access can be reconstructed. These rules are the client-side expression of the data-minimization principle described in the U.S. Department of Education’s FERPA guidance.

python

import logging
import re

logger = logging.getLogger("lms.transport")


def safe_log(method: str, url: str, status: int, cost: float | None) -> None:
    """Log a request with the credential stripped and the path de-identified."""
    url = re.sub(r"wstoken=[^&]+", "wstoken=REDACTED", url)        # Moodle secret
    url = re.sub(r"/(users|students)/\d+", r"/\1/:id", url)        # numeric PII ids
    logger.info("lms_request", extra={"method": method, "endpoint": url,
                                       "status": status, "request_cost": cost})

Reference Implementation

The following client ties the pieces together: a reused session with bounded transport retries, Canvas Link-header pagination, explicit handling of the 403 budget rejection that requests retries must not swallow, FERPA-safe logging, and tokenized output. It is the shape a production ingestion worker should take.

python

import os
import time
import hashlib
import logging
import requests
from datetime import datetime, timezone
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

logger = logging.getLogger("lms.transport")


class LMSClient:
    """Connection-pooled Canvas client with rate-aware paging and FERPA-safe output."""

    def __init__(self, base_url: str, token: str) -> None:
        self.base_url = base_url.rstrip("/")
        self._token_hash = hashlib.sha256(token.encode()).hexdigest()
        self.session = requests.Session()
        self.session.headers.update(
            {"Authorization": f"Bearer {token}", "Accept": "application/json"}
        )
        retry = Retry(total=4, backoff_factor=1.5, respect_retry_after_header=True,
                      status_forcelist=(500, 502, 503, 504), raise_on_status=False,
                      allowed_methods=frozenset({"GET"}))
        self.session.mount("https://", HTTPAdapter(max_retries=retry, pool_maxsize=16))

    def _hash(self, raw_id) -> str:
        return hashlib.sha256(str(raw_id).encode()).hexdigest()

    def _get(self, url: str, params: dict | None = None) -> requests.Response:
        resp = self.session.get(url, params=params, timeout=30)
        # Canvas signals an exhausted budget with 403, not 429 — pause and retry once.
        if resp.status_code == 403 and "Rate Limit" in resp.text:
            wait = float(resp.headers.get("Retry-After", 5))
            logger.warning("rate_limited", extra={"token": self._token_hash, "wait": wait})
            time.sleep(wait)
            resp = self.session.get(url, params=params, timeout=30)
        resp.raise_for_status()
        logger.info("lms_request", extra={
            "token": self._token_hash, "endpoint": resp.url.split("?")[0],
            "status": resp.status_code, "cost": resp.headers.get("X-Request-Cost")})
        return resp

    def enrollments(self, course_id: int):
        """Yield tokenized enrollment records across every page for one course."""
        url = f"{self.base_url}/courses/{course_id}/enrollments"
        params: dict | None = {"per_page": 100, "type[]": "StudentEnrollment"}
        while url:
            resp = self._get(url, params)
            for row in resp.json():
                yield {
                    "student_key": self._hash(row["user_id"]),
                    "course_key": str(course_id),
                    "state": row.get("enrollment_state"),
                    "current_score": (row.get("grades") or {}).get("current_score"),
                    "ingested_at": datetime.now(timezone.utc),
                }
            url = resp.links.get("next", {}).get("url")
            params = None  # next URL already carries per_page + cursor


if __name__ == "__main__":
    client = LMSClient(os.environ["CANVAS_BASE_URL"], os.environ["CANVAS_TOKEN"])
    rows = list(client.enrollments(course_id=12345))
    print(f"ingested {len(rows)} enrollments")

The generator design matters: yielding tokenized rows page by page keeps memory flat regardless of cohort size, and because the only identifier that ever leaves the method is a hash, the worker can crash, be inspected, or have its logs shipped to an observability stack without exposing a single education record.

Failure Modes and Edge Cases

Session not reused. Constructing requests.get(...) directly, or a fresh Session per call, forces a new TLS handshake every request and ignores your retry adapter entirely. Under fleet load this both multiplies latency and trips rate budgets faster. Build one session per worker and pass it down.

Retrying the wrong status. Adding 403 or 429 to status_forcelist makes urllib3 retry a budget rejection on its own schedule, ignoring Canvas’s Retry-After and burning the budget further. Keep budget statuses out of the transport retry and handle them explicitly, as the reference _get does. Deeper recovery semantics live in the error and retry logic for sync jobs guide.

Token rotation mid-job. A bulk export of thousands of sections can outlive a one-hour OAuth token; request 5,001 then fails 401 while the first 5,000 succeeded. Detect 401, re-mint the token, update session.headers["Authorization"], and resume from the last cursor rather than restarting — the procedure in automating Canvas API token refresh in Python.

Truncated pagination. Reading response.json() once and stopping — instead of following rel="next" — silently returns only the first 10 or 100 records and looks like a successful run. Always loop until the next link is absent, and assert the row count against an expected roster size in tests.

Missing or null grading fields. Canvas returns grades.current_score as null for an ungraded student and omits the grades object entirely for some enrollment states; row["grades"]["current_score"] raises KeyError or TypeError. Guard with (row.get("grades") or {}).get("current_score") and treat None as “not graded,” never 0.0.

Silent type coercion across API versions. A Canvas minor version can ship a numeric current_score as a string, passing every JSON parse while breaking arithmetic downstream. Validate types at the boundary with a schema check rather than trusting the payload.

Asynchronous backend jobs. Gradebook recalculations and bulk exports run on the LMS backend and return a job id, not a result. A blocking request loop will time out; submit the job, capture its id, and poll for completion as described in async polling for grade syncs.

Long URLs leaking secrets. Enabling requests debug logging or printing resp.url writes Moodle’s wstoken and any populated /users/:id path to disk. Redact before logging, every time.

Handling Canvas API Rate Limits — the cost-bucket model and 403 semantics your request client must respect.
Async Polling for Grade Syncs — non-blocking patterns for long-running backend jobs the transport layer submits.
Pagination Strategies for Bulk Exports — cursor versus offset paging beyond the basic Link-header loop.
Error & Retry Logic for Sync Jobs — idempotent recovery and structured failure handling around these requests.
Automating Canvas API Token Refresh in Python — keeping the session’s bearer header valid across long jobs.

Part of: API Ingestion & Sync Workflows

Explore deeper

Related in this section