Documentation
A calibration-free, client-side proctoring engine built on MediaPipe. Headless and framework-agnostic; an optional vanilla UI and React hook ship alongside.
Install
Install the package and its MediaPipe peer dependency:
npm i proctor-vision @mediapipe/tasks-visionRuns only in the browser, over HTTPS or localhost (camera requirement). Ships ESM + CJS + types.
Quick start
import { createProctor } from "proctor-vision";
const proctor = createProctor({
features: {
faceDetection: { enabled: true, sensitivity: 0.5 },
eyeGaze: { enabled: true, sensitivity: 0.5, prolongedMs: 5000 },
headMovement: { enabled: true, sensitivity: 0.5, prolongedMs: 5000 },
multiplePerson: { enabled: true, sensitivity: 0.6 },
device: { enabled: true, sensitivity: 0.7, watch: ["cell phone", "book"] },
},
});
proctor.on("violation", e => console.log(e.type, e.message));
proctor.on("prolonged", e => showBanner(e.message)); // fires at prolongedMs
proctor.on("violationEnd", e => save(e)); // { type, durationMs, prolonged }
await proctor.start(videoElement); // an HTMLVideoElement or MediaStream
// ... later
const report = proctor.getReport();
proctor.stop();Detectors & sensitivity
Every detector shares one shape: { enabled, sensitivity } plus optional extras. sensitivity is a single 0–1 dial — higher = more sensitive (flags more readily). The SDK maps it to the right internal threshold, so you tune one number per detector.
| Detector | sensitivity controls | Extras | Default |
|---|---|---|---|
| faceDetection | how eagerly a face is detected | message | 0.5 |
| eyeGaze | how small a glance off-screen counts | prolongedMs, message | 0.5 |
| headMovement | how small a head turn counts | prolongedMs, message | 0.5 |
| multiplePerson | how partial a 2nd person counts | message | 0.6 |
| device | how partial a device counts | watch[], prolongedMs, message | 0.7 |
All default to enabled: true. Turn one off with enabled: false. Change any dial live:
proctor.configure({ features: { device: { sensitivity: 0.9 } } });Globals: autoBaseline (true), smoothing (0.25), headSmoothing (0.4), objectDetectIntervalMs (400), maxFaces (3), debug (false), modelBaseUrl (self-host models).
Events
| Event | Payload | When |
|---|---|---|
| violation | { type, direction?, message, startedAt } | an episode begins (after debounce) |
| prolonged | { type, message, thresholdMs } | episode reaches prolongedMs |
| violationEnd | { type, durationMs, prolonged } | episode ends |
| state | ProctorState | every processed frame |
| started / stopped / error | — / — / Error | lifecycle |
type is one of eyeGaze | headMovement | multiplePerson | device | noFace.
Live state
The state event fires each frame with a snapshot you can render into a HUD:
proctor.on("state", s => {
s.faces; // number of faces
s.gazeDirection; // "CENTER" | "LEFT" | "DOWN-RIGHT" | ...
s.head; // { yawDeg, pitchDeg }
s.devices; // [{ label, score }]
s.baselineReady; // has the neutral gaze been learned yet
s.active; // which detectors are currently firing
});Set debug: true to also receive landmarks and deviceBoxes for a mesh overlay (use drawDebugOverlay from proctor-vision/ui).
Session report
getReport() returns proctoring evidence — a summary plus every episode with durations and timestamps. Meant to be flagged for human review, not an automatic verdict.
{
durationSeconds, mode, config,
summary: { countsByType, totalSecondsByType, prolongedEpisodes, totalViolations },
episodes: [{ type, direction, durationMs, prolonged, startedAt, endedAt }]
}React / Next.js
Create the engine in an effect, point it at a video ref, clean up on unmount. The demo ships a ready-made useProctorVision hook.
const proctor = createProctor(config);
useEffect(() => {
proctor.on("state", setState);
proctor.start(videoRef.current!);
return () => proctor.stop();
}, []);Best-use guidance
For the most reliable gaze/head detection, guide candidates on their setup:
| Factor | Recommendation |
|---|---|
| Distance | ~50–60 cm (20–24 in); acceptable 40–75 cm |
| Framing | Face ≈ ⅓ of the frame, whole head + shoulders, centered |
| Camera height | At eye level (±15°) |
| Lighting | Even, front-facing; avoid backlight (window behind) |
| Glasses | ✅ Fully supported — just avoid strong lens glare |
Glasses: work out of the box — MediaPipe is trained on faces with glasses, head-pose is geometric (lens-independent), and the auto-baseline learns the wearer’s neutral as-is. The only degrader is strong glare on the lenses washing out the iris for gaze; head-pose, person and device detection are unaffected regardless.
Privacy & limits
- All video stays in the browser — the SDK never uploads frames. Only the events/report you choose to persist leave the device.
- A fully-hidden phone can’t be seen by any webcam detector — the sustained eyes/head-down signal is the behavioral backstop.
- Treat output as evidence for a human, not an automatic pass/fail.
- GPU is used when available; otherwise it falls back to CPU (slower FPS).