Canvas Gradebook Data Structure: Schema, Endpoints, and Normalization

The Canvas gradebook is a relational graph, not a flat tabular export, which makes it a foundational source for institutional analytics, automated reporting, and cross-platform EdTech integrations. For engineering teams and academic IT staff, building a reliable pipeline on top of it requires a precise understanding of how Instructure models assessments, submission states, assignment-group weights, and grading periods across its REST API. Unlike a single CSV download, Canvas decouples assignments, assignment groups, students, and submissions and stitches them together through foreign keys and composite identifiers. This page documents the entity model, the exact endpoints and pagination mechanics, the transformation rules that map raw payloads onto a canonical institutional schema, and the FERPA field-level controls that constrain every step — all within the broader LMS Data Architecture & Schema Mapping discipline.

Entity Model and Relational Schema

Canvas organizes gradebook data around four primary entities — courses, assignment groups, assignments, and submissions — plus the grading period and enrollment records that scope them. Treating these as discrete tables with explicit foreign keys is what keeps a downstream warehouse join-safe.

Assignment

An assignment is the unit of assessment. The fields that matter for a gradebook pipeline are:

id (integer) — primary key, stable across a term.
course_id (integer) — foreign key to the parent course.
assignment_group_id (integer) — foreign key to the weighting bucket.
points_possible (float, nullable) — denominator for points and percent grading; null for ungraded surveys.
grading_type (enum) — one of points, percent, letter_grade, gpa_scale, pass_fail, not_graded. This field dictates downstream normalization logic.
grading_period_id (integer, nullable) — present only when MGP (Multiple Grading Periods) is enabled on the account.
omit_from_final_grade (boolean) — when true, the assignment is excluded from weighted totals even if it carries points.

Assignment group

The assignment group is the weighting bucket. Final-grade math is impossible without it.

id (integer) — primary key, referenced by assignment.assignment_group_id.
group_weight (float) — percentage contribution to the final grade when apply_assignment_group_weights is enabled on the course. Values are nominal percentages (e.g. 40.0) that may not sum to 100.
rules (object) — drop rules such as drop_lowest and never_drop, which silently change which submissions count.

Submission

A submission is one student’s attempt at one assignment — the grain of the fact table.

assignment_id + user_id (integers) — the natural composite key; together with course_id they form the deterministic upsert key course_id + assignment_id + user_id.
score (float, nullable) — raw earned points; null means ungraded, which is not the same as zero.
grade (string, nullable) — the display grade, already formatted per grading_type (e.g. "B+", "92%").
workflow_state (enum) — unsubmitted, submitted, pending_review, or graded.
late, missing, excused (booleans) — policy flags that override how score is interpreted. An excused submission must be dropped from the denominator, not scored as zero.
posted_at (timestamp, nullable) — when a grade became visible; null while held behind a manual posting policy, in which case the score must be treated as embargoed.

A single course routinely carries hundreds of assignments across several groups, each with independent weighting and drop rules. Because submissions reference assignments and users by foreign key rather than embedding them, the schema minimizes redundancy but forces explicit relational mapping at ingestion time. This same modelling discipline — resolving identifiers before joining — is what links Canvas data to other platforms through Cross-LMS Student ID Mapping.

API Endpoints and Request Patterns

Canvas does not expose a single monolithic gradebook object. Data is spread across resource-specific endpoints that you must orchestrate yourself:

GET /api/v1/courses/:course_id/assignment_groups?include[]=assignments — groups with their nested assignments and group_weight.
GET /api/v1/courses/:course_id/assignments — full assignment metadata when you need fields the nested form omits.
GET /api/v1/courses/:course_id/students/submissions?student_ids[]=all&per_page=100 — bulk submissions across the roster, the workhorse for gradebook sync.
GET /api/v1/courses/:course_id/grading_periods — grading-period boundaries when MGP is enabled.

All requests carry a bearer token: Authorization: Bearer <token>. Tokens are time-bound, so long batch jobs must refresh proactively — see automating Canvas API token refresh in Python.

Pagination

Canvas paginates with RFC 5988 Link headers, not offset counters. Each response includes rel="current", rel="next", rel="first", and (sometimes) rel="last" URLs; you follow rel="next" until it disappears. Setting per_page=100 (the practical ceiling for most endpoints) minimizes round-trips. This differs from the cursor envelopes used by the Blackboard REST API architecture, which batch resources inside a JSON wrapper rather than exposing header links. The broader trade-offs are covered under pagination strategies for bulk exports.

Rate limits

Canvas enforces a leaky-bucket quota per token. Every response carries X-Rate-Limit-Remaining (a decreasing cost budget) and, on exhaustion, returns 403 Forbidden with the body 403 Forbidden (Rate Limit Exceeded) — note that Canvas uses 403, not the 429 some clients expect. A resilient client reads X-Rate-Limit-Remaining, throttles before the budget hits zero, and applies exponential backoff on rejection. The full strategy lives in handling Canvas API rate limits.

Normalization and Transformation Logic

Raw Canvas payloads rarely contain precomputed final grades for every student, because Canvas recomputes them dynamically from group weights, drop rules, grading-period boundaries, and posting policies. The transformation layer has to reconstruct that hierarchy deterministically.

Type coercion rules

score: keep null as null (a missing grade), never coerce to 0.0 — that conflation is the most common silent error in gradebook pipelines.
grade: parse to a normalized numeric only after branching on grading_type; a letter_grade string must route through the course grading scheme, never float().
timestamps (submitted_at, posted_at, graded_at): parse ISO-8601 UTC into timezone-aware datetime; treat null as a meaningful state, not an error.
booleans (late, missing, excused): default absent flags to False.

Composite-key construction

The canonical fact-table key is the tuple (course_id, assignment_id, user_id). Constructing it explicitly at ingestion guarantees idempotent upserts during nightly syncs and prevents duplicate rows when a submission is regraded.

Grading-weight reconstruction

When apply_assignment_group_weights is enabled, the final grade is the weight-normalized sum of per-group scores, after drop rules and excused submissions are applied:

$G_{\text{final}} = \frac{\displaystyle\sum_{g \in \text{groups}} w_g \cdot \frac{\sum_{a \in g} \text{score}_a}{\sum_{a \in g} \text{points\_possible}_a}}{\displaystyle\sum_{g \in \text{groups}} w_g}$

Dividing by the realized weight sum $\sum w_g$ matters: groups with no graded assignments yet drop out of both numerator and denominator, which is exactly how Canvas reproduces a “current grade” mid-term. Materializing these intermediate per-group scores in a staging layer — rather than computing them inline — prevents silent drift when an instructor changes a weight or excuses a submission after the fact. The reusable form of this calculation is detailed in weighted grade calculation engines, and the canonical column names that the warehouse expects come from the LMS CSV export format standards.

Compliance Constraints

Gradebook rows are education records under FERPA, so the field-level controls are not optional and belong at the ingestion boundary, before any data reaches an analytical store. Align audit and access design with the U.S. Department of Education FERPA guidance.

Tokenize direct identifiers — user_id, sis_user_id, login/SIS handles — into an irreversible surrogate (a salted SHA-256 hash). The hashed key is what joins to other sources; the raw identifier never lands downstream.
Pass through non-identifying measures — score, points_possible, workflow_state, late/missing/excused, and the grading-period boundaries — since they are not identifying once the student key is surrogated.
Embargo score/grade where posted_at IS NULL: an unposted grade is invisible to the student and must not surface in analytics until it posts.
Add audit columns to every materialized row — ingested_at, source_endpoint, pipeline_run_id, and schema_version — so data lineage and transformation logic are reconstructable during an audit.

Transport must be TLS 1.2+ and at-rest storage encrypted; access to the un-tokenized mapping table is restricted by role. The same tokenization boundary is reused across every platform feed, which is why the surrogate key contract is defined once and shared with Cross-LMS Student ID Mapping.

Reference Python Implementation

The following extractor demonstrates the full path for one course: token auth, Link-header pagination, rate-limit awareness, schema flattening, FERPA tokenization, and composite-key construction. Identifiers are hashed with a salted SHA-256 so no raw student id is materialized.

python

import hashlib
import os
import time
from typing import Iterator

import requests

BASE = "https://canvas.instructure.com/api/v1"
SALT = os.environ["STUDENT_ID_SALT"].encode()  # rotate per environment


def tokenize(student_id: int | str) -> str:
    """Irreversible FERPA-safe surrogate for a direct identifier."""
    return hashlib.sha256(SALT + str(student_id).encode()).hexdigest()


def paginate(session: requests.Session, url: str, params: dict | None = None) -> Iterator[dict]:
    """Yield records across Canvas Link-header pages, honoring rate limits."""
    while url:
        resp = session.get(url, params=params)
        params = None  # query params only apply to the first request
        if resp.status_code == 403 and "Rate Limit" in resp.text:
            time.sleep(5)  # budget exhausted — back off and retry the same URL
            continue
        resp.raise_for_status()
        if float(resp.headers.get("X-Rate-Limit-Remaining", "999")) < 50:
            time.sleep(1)  # pre-emptive throttle before the bucket drains
        yield from resp.json()
        url = resp.links.get("next", {}).get("url")


def extract_submissions(course_id: int, token: str) -> Iterator[dict]:
    """Flatten Canvas submissions into canonical, tokenized fact rows."""
    session = requests.Session()
    session.headers["Authorization"] = f"Bearer {token}"
    url = f"{BASE}/courses/{course_id}/students/submissions"
    params = {"student_ids[]": "all", "per_page": 100}
    for sub in paginate(session, url, params):
        posted = sub.get("posted_at")
        yield {
            # composite upsert key: (course_id, assignment_id, user_id)
            "course_id": course_id,
            "assignment_id": sub["assignment_id"],
            "student_key": tokenize(sub["user_id"]),
            # null score stays null — never coerce a missing grade to zero
            "score": sub.get("score"),
            "workflow_state": sub.get("workflow_state"),
            "late": sub.get("late", False),
            "missing": sub.get("missing", False),
            "excused": sub.get("excused", False),
            # embargo unposted grades behind the FERPA posting boundary
            "grade_visible": posted is not None,
            "posted_at": posted,
        }

Pushing these rows into a DataFrame for analytical work — including the letter_grade branching and grading-period filters — is covered step by step in How to Parse Canvas Gradebook JSON with Pandas. For the full endpoint contract, cross-check field availability against the official Instructure Canvas API documentation.

Failure Modes and Edge Cases

Excused submissions scored as zero. excused: true arrives with score: null. Defaulting that null to 0.0 deflates the student’s grade. Drop excused submissions from both numerator and denominator of the group calculation.
Null grading periods. When MGP is disabled, grading_period_id is absent, and a filter that assumes it exists raises a KeyError or silently empties the result. Branch on the presence of grading periods before filtering by them.
Unposted grades leaking. A score with posted_at: null is held behind a manual posting policy. Surfacing it violates the embargo; gate every score on posted_at.
Mid-term weight changes. Editing group_weight or a drop_lowest rule retroactively changes every historical final grade. Snapshot the group configuration per pipeline_run_id so recomputed grades remain explainable.
403 rate-limit, not 429. Canvas signals throttling with 403 Forbidden (Rate Limit Exceeded). A client that only retries on 429 will treat it as an auth failure and abort the run — see error retry logic for sync jobs.
Token expiry mid-extraction. A long roster export can outlive a short-lived token, yielding a 401 partway through pagination. Refresh proactively rather than reactively to avoid a truncated, half-ingested course.
Truncated pagination. Stopping when a page returns fewer than per_page rows is wrong — Canvas can return short pages mid-stream. Only stop when the rel="next" link is absent.

LMS Data Architecture & Schema Mapping — the parent reference covering ingestion, staging, normalization, and the compliance boundary across platforms.
Cross-LMS Student ID Mapping — the shared surrogate-key contract that joins Canvas rows to other source systems.
Blackboard REST API Architecture — how a cursor-envelope API differs from Canvas’s header-link pagination.
Moodle Course & User Schema — the relational, direct-database equivalent for federating grade data across platforms.
How to Parse Canvas Gradebook JSON with Pandas — turning these extracted rows into analytical DataFrames.

Part of: LMS Data Architecture & Schema Mapping

Explore deeper

Related in this section