Documentation

A calibration-free, client-side proctoring engine built on MediaPipe. Headless and framework-agnostic; an optional vanilla UI and React hook ship alongside.

Install

Install the package and its MediaPipe peer dependency:

npm i proctor-vision @mediapipe/tasks-vision

Runs only in the browser, over HTTPS or localhost (camera requirement). Ships ESM + CJS + types.

Quick start

import { createProctor } from "proctor-vision";

const proctor = createProctor({
  features: {
    faceDetection:  { enabled: true, sensitivity: 0.5 },
    eyeGaze:        { enabled: true, sensitivity: 0.5, prolongedMs: 5000 },
    headMovement:   { enabled: true, sensitivity: 0.5, prolongedMs: 5000 },
    multiplePerson: { enabled: true, sensitivity: 0.6 },
    device:         { enabled: true, sensitivity: 0.7, watch: ["cell phone", "book"] },
  },
});

proctor.on("violation",    e => console.log(e.type, e.message));
proctor.on("prolonged",    e => showBanner(e.message));   // fires at prolongedMs
proctor.on("violationEnd", e => save(e));                 // { type, durationMs, prolonged }

await proctor.start(videoElement);   // an HTMLVideoElement or MediaStream
// ... later
const report = proctor.getReport();
proctor.stop();

Detectors & sensitivity

Every detector shares one shape: { enabled, sensitivity } plus optional extras. sensitivity is a single 0–1 dial — higher = more sensitive (flags more readily). The SDK maps it to the right internal threshold, so you tune one number per detector.

Detector	sensitivity controls	Extras	Default
faceDetection	how eagerly a face is detected	message	0.5
eyeGaze	how small a glance off-screen counts	prolongedMs, message	0.5
headMovement	how small a head turn counts	prolongedMs, message	0.5
multiplePerson	how partial a 2nd person counts	message	0.6
device	how partial a device counts	watch[], prolongedMs, message	0.7

All default to enabled: true. Turn one off with enabled: false. Change any dial live:

proctor.configure({ features: { device: { sensitivity: 0.9 } } });

Globals: autoBaseline (true), smoothing (0.25), headSmoothing (0.4), objectDetectIntervalMs (400), maxFaces (3), debug (false), modelBaseUrl (self-host models).

Events

Event	Payload	When
violation	{ type, direction?, message, startedAt }	an episode begins (after debounce)
prolonged	{ type, message, thresholdMs }	episode reaches prolongedMs
violationEnd	{ type, durationMs, prolonged }	episode ends
state	ProctorState	every processed frame
started / stopped / error	— / — / Error	lifecycle

type is one of eyeGaze | headMovement | multiplePerson | device | noFace.

Live state

The state event fires each frame with a snapshot you can render into a HUD:

proctor.on("state", s => {
  s.faces;            // number of faces
  s.gazeDirection;    // "CENTER" | "LEFT" | "DOWN-RIGHT" | ...
  s.head;             // { yawDeg, pitchDeg }
  s.devices;          // [{ label, score }]
  s.baselineReady;    // has the neutral gaze been learned yet
  s.active;           // which detectors are currently firing
});

Set debug: true to also receive landmarks and deviceBoxes for a mesh overlay (use drawDebugOverlay from proctor-vision/ui).

Session report

getReport() returns proctoring evidence — a summary plus every episode with durations and timestamps. Meant to be flagged for human review, not an automatic verdict.

{
  durationSeconds, mode, config,
  summary: { countsByType, totalSecondsByType, prolongedEpisodes, totalViolations },
  episodes: [{ type, direction, durationMs, prolonged, startedAt, endedAt }]
}

React / Next.js

Create the engine in an effect, point it at a video ref, clean up on unmount. The demo ships a ready-made useProctorVision hook.

const proctor = createProctor(config);
useEffect(() => {
  proctor.on("state", setState);
  proctor.start(videoRef.current!);
  return () => proctor.stop();
}, []);

Best-use guidance

For the most reliable gaze/head detection, guide candidates on their setup:

Factor	Recommendation
Distance	~50–60 cm (20–24 in); acceptable 40–75 cm
Framing	Face ≈ ⅓ of the frame, whole head + shoulders, centered
Camera height	At eye level (±15°)
Lighting	Even, front-facing; avoid backlight (window behind)
Glasses	✅ Fully supported — just avoid strong lens glare

Glasses: work out of the box — MediaPipe is trained on faces with glasses, head-pose is geometric (lens-independent), and the auto-baseline learns the wearer’s neutral as-is. The only degrader is strong glare on the lenses washing out the iris for gaze; head-pose, person and device detection are unaffected regardless.

Privacy & limits

All video stays in the browser — the SDK never uploads frames. Only the events/report you choose to persist leave the device.
A fully-hidden phone can’t be seen by any webcam detector — the sustained eyes/head-down signal is the behavioral backstop.
Treat output as evidence for a human, not an automatic pass/fail.
GPU is used when available; otherwise it falls back to CPU (slower FPS).

Try the live demo →GitHub