Logging Failed Grade Syncs with Structured JSON

When a grade sync job hits an HTTP error, a malformed payload, or a transient network failure, an unstructured print() or free-text log line is an operational dead end: you cannot filter it, correlate it to a batch, or feed it to an alerting rule. This page walks through building a self-contained Python logger that emits one machine-readable JSON event per failed grade submission, enforces a FERPA compliance boundary on every field, and classifies each error so the surrounding retry layer can decide whether to back off or dead-letter the job. The goal is telemetry an incident responder can trace from a failed assignment back to a specific course — without ever reconstructing a student’s identity from the log stream.

Prerequisites

This procedure targets a Python 3.10+ sync worker that already issues authenticated requests to an LMS REST API. Before running it, confirm:

Python 3.10 or newer — the script uses datetime.now(timezone.utc) and X | Y union type hints.
requests>=2.31 (or any HTTP client that raises exceptions carrying a .response) — error classification reads error.response.status_code.
No extra logging dependency — the implementation uses only the standard-library logging, json, and hashlib modules so it drops into any worker.
A per-environment salt stored as a secret (env var or secrets manager), never committed — it seeds the deterministic PII hash.
Upstream failure context available at the call site — each failed submission must expose its course_id, the assignment’s external ID, and the sync_batch_id that grouped the run. If you do not yet have a stable cross-system key for students, wire in Cross-LMS Student ID Mapping first so hashed identifiers stay consistent across vendors.
A log destination that ingests newline-delimited JSON (OpenSearch, Datadog, Splunk, or a Cloud Logging agent reading stdout).

Step-by-step implementation

1. Define a deterministic PII hash

python

import hashlib

PII_SALT = os.environ["GRADE_SYNC_LOG_SALT"]  # rotate per environment

def hash_pii(value: str) -> str:
    return hashlib.sha256(f"{PII_SALT}{value}".encode()).hexdigest()[:16]

Why: a salted SHA-256 lets two log events for the same student collate during triage while remaining one-way, so the log store never holds a recoverable identifier — the same placeholder-hash pattern used across the schema-mapping tokenization rules.

2. Sanitize the payload before it reaches the buffer

python

def sanitize_payload(payload: dict | None) -> dict:
    if not payload:
        return {}
    safe = {}
    for key, value in payload.items():
        if key in ("student_email", "student_name", "sis_user_id"):
            safe[key] = hash_pii(str(value))
        elif key == "assignment_description":
            safe[key] = "[REDACTED]"
        else:
            safe[key] = value
    return safe

Why: direct identifiers are hashed and free-text fields are redacted, while operational keys (course_id, assignment_ext_id, sync_batch_id) pass through untouched because they are routing keys, not protected records.

3. Serialize log records to JSON in a custom formatter

python

class FERPACompliantFormatter(logging.Formatter):
    def format(self, record: logging.LogRecord) -> str:
        entry = {
            "timestamp": datetime.now(timezone.utc).isoformat(),
            "level": record.levelname,
            "logger": record.name,
            "message": record.getMessage(),
            "sync_batch_id": getattr(record, "sync_batch_id", None),
            "course_id": getattr(record, "course_id", None),
            "assignment_ext_id": getattr(record, "assignment_ext_id", None),
            "error_type": getattr(record, "error_type", None),
            "retryable": getattr(record, "retryable", False),
            "payload_context": sanitize_payload(getattr(record, "payload", None)),
        }
        return json.dumps({k: v for k, v in entry.items() if v is not None})

Why: overriding format means every call site automatically gets a uniform JSON shape; dropping None keys keeps log volume — and storage cost — down during a multi-thousand-row term close.

4. Classify the error into a retryable signal

python

RETRYABLE_STATUS = {429, 502, 503, 504}

def is_retryable(error: Exception) -> bool:
    if isinstance(error, (ConnectionError, TimeoutError)):
        return True
    response = getattr(error, "response", None)
    return getattr(response, "status_code", None) in RETRYABLE_STATUS

Why: the retryable boolean is the contract the retry engine reads — transient 429/5xx and network faults route to backoff, while a 400 validation error or 401 is escalated immediately instead of being replayed forever. Pair this with Canvas API rate-limit handling so a 429 honours Retry-After rather than hammering the endpoint.

Complete runnable script

The following is self-contained: drop it into a module, set GRADE_SYNC_LOG_SALT, and call log_sync_failure from your except block.

python

import os
import json
import logging
import hashlib
from datetime import datetime, timezone

# --- configuration -----------------------------------------------------------
PII_SALT = os.environ.get("GRADE_SYNC_LOG_SALT", "dev_only_salt_replace_me")
RETRYABLE_STATUS = {429, 502, 503, 504}
PII_FIELDS = ("student_email", "student_name", "sis_user_id")


# --- sanitization ------------------------------------------------------------
def hash_pii(value: str) -> str:
    """One-way, salted hash so events collate without exposing identifiers."""
    return hashlib.sha256(f"{PII_SALT}{value}".encode()).hexdigest()[:16]


def sanitize_payload(payload: dict | None) -> dict:
    """Hash direct identifiers, redact free text, pass operational keys through."""
    if not payload:
        return {}
    safe = {}
    for key, value in payload.items():
        if key in PII_FIELDS:
            safe[key] = hash_pii(str(value))
        elif key == "assignment_description":
            safe[key] = "[REDACTED]"
        else:
            safe[key] = value
    return safe


# --- formatter ---------------------------------------------------------------
class FERPACompliantFormatter(logging.Formatter):
    """Serialize records to FERPA-safe newline-delimited JSON."""

    def format(self, record: logging.LogRecord) -> str:
        entry = {
            "timestamp": datetime.now(timezone.utc).isoformat(),
            "level": record.levelname,
            "logger": record.name,
            "message": record.getMessage(),
            "module": record.module,
            "line": record.lineno,
            "sync_batch_id": getattr(record, "sync_batch_id", None),
            "course_id": getattr(record, "course_id", None),
            "assignment_ext_id": getattr(record, "assignment_ext_id", None),
            "error_type": getattr(record, "error_type", None),
            "retryable": getattr(record, "retryable", False),
            "payload_context": sanitize_payload(getattr(record, "payload", None)),
        }
        return json.dumps({k: v for k, v in entry.items() if v is not None})


def get_logger(name: str = "lms_grade_sync") -> logging.Logger:
    logger = logging.getLogger(name)
    if not logger.handlers:          # idempotent: avoid duplicate handlers on reimport
        logger.setLevel(logging.INFO)
        handler = logging.StreamHandler()
        handler.setFormatter(FERPACompliantFormatter())
        logger.addHandler(handler)
        logger.propagate = False
    return logger


# --- classification + entry point --------------------------------------------
def is_retryable(error: Exception) -> bool:
    if isinstance(error, (ConnectionError, TimeoutError)):
        return True
    response = getattr(error, "response", None)
    return getattr(response, "status_code", None) in RETRYABLE_STATUS


def log_sync_failure(
    error: Exception,
    course_id: str,
    assignment_ext_id: str,
    sync_batch_id: str,
    payload: dict | None = None,
) -> bool:
    """Emit one structured failure event; return the retryable verdict."""
    retryable = is_retryable(error)
    get_logger().error(
        "Grade sync failed for assignment",
        extra={
            "sync_batch_id": sync_batch_id,
            "course_id": course_id,
            "assignment_ext_id": assignment_ext_id,
            "error_type": type(error).__name__,
            "retryable": retryable,
            "payload": payload,
        },
        exc_info=False,  # stack traces shipped separately to avoid log bloat
    )
    return retryable


# --- usage -------------------------------------------------------------------
if __name__ == "__main__":
    class _FakeResp:
        status_code = 429

    err = RuntimeError("Too Many Requests")
    err.response = _FakeResp()
    verdict = log_sync_failure(
        error=err,
        course_id="course_8842",
        assignment_ext_id="canvas:assignment:551203",
        sync_batch_id="batch_2026_06_27_T0200",
        payload={"student_email": "jdoe@example.edu", "score": 91.5,
                 "assignment_description": "Midterm reflection essay"},
    )
    print("retryable:", verdict)

Verification and output validation

Run the script directly (python grade_sync_logger.py). You should see exactly one JSON line on stderr followed by retryable: True. Validate three properties:

It is valid JSON. Pipe the line through python -m json.tool — a parse error means a non-serializable value leaked into the payload (often a datetime or Decimal).
No raw PII survived. student_email must appear as a 16-character hex hash, never the literal address, and assignment_description must read [REDACTED]. The numeric score and the operational keys pass through unchanged.
The classification is correct. With a 429 the event carries "retryable": true; swap the fake status to 400 and confirm the same line flips to "retryable": false.

A quick assertion harness makes this a CI contract rather than a manual check:

python

import json
rec = json.loads(emitted_line)
assert set(rec) >= {"timestamp", "course_id", "error_type", "retryable"}
assert rec["payload_context"]["student_email"] != "jdoe@example.edu"
assert rec["payload_context"]["assignment_description"] == "[REDACTED]"
assert rec["retryable"] is True

Treat the log shape as a schema contract: a JSON-schema check in CI catches a renamed or dropped field before it silently breaks a downstream alert rule.

Troubleshooting

Duplicate JSON lines per failure. Re-importing the module re-runs addHandler, stacking handlers. The if not logger.handlers guard in get_logger makes initialization idempotent; also set logger.propagate = False so the root logger does not re-emit the record in plain text.
KeyError: 'GRADE_SYNC_LOG_SALT'. The salt env var is unset in the worker’s runtime. Use os.environ.get(...) with a non-production default for local runs, but fail loudly in production so unsalted hashes never ship.
TypeError: Object of type Decimal is not JSON serializable. A vendor payload field (e.g. a Decimal score or a datetime) reached json.dumps. Add a default=str argument, or coerce known numeric fields in sanitize_payload.
Raw student email still appears in logs. The PII key is named differently in this vendor’s payload (e.g. login_id or primary_email). Audit the real payload shape against your Canvas Gradebook Data Structure mapping and extend PII_FIELDS — never rely on a fixed list across LMSs.
Every 429 is marked non-retryable. The HTTP client raised a bare exception without attaching .response (some libraries do this on connection-level errors). Catch the client’s specific rate-limit exception type, or inspect the status before the exception is re-raised.
null scores logged as 0. A None score after a failed sync is legitimate (excused, missing, ungraded) and must not be coerced — that silently changes a grade. Carry None through to the log and let weighted grade calculation decide its meaning.

For the FERPA basis behind the hashing and redaction rules above, see the U.S. Department of Education’s FERPA guidance.

Error Retry Logic for Sync Jobs — the parent stage that consumes the error_type and retryable fields this logger emits.
Handling Canvas API Rate Limits — turning a logged 429 into Retry-After-aware backoff.
Async Polling for Grade Syncs — where long-running export failures originate and get logged.
Cross-LMS Student ID Mapping — keeping hashed identifiers consistent across Canvas, Moodle, and Blackboard.

Part of: Error Retry Logic for Sync Jobs