Ground truth alignment is a powerful feature that allows you to synchronize AI-generated transcription tokens with human-edited text. This is essential when you have corrected transcriptions and want to preserve the original timing information while using the accurate text.
After alignment, tokens become GroundedToken objects:
type GroundedToken = Token & { isUnknown?: boolean; // true if this token wasn't in the AI transcription};type GroundedSegment = Omit<Segment, 'tokens'> & { tokens: GroundedToken[];};
Tokens with isUnknown: true are words from the ground truth that weren’t in the original AI transcription. Their timestamps are interpolated.
import { splitSegment } from 'paragrafs';const segment = { start: 0, end: 10, text: 'This is a long segment', tokens: [/* ... */],};const [first, second] = splitSegment(segment, 5);// Splits at 5 seconds into two segments
When to use updateSegmentWithGroundTruth vs applyGroundTruthToSegment
Use updateSegmentWithGroundTruth when you need to see which words were missing or incorrect (for debugging or analysis). Use applyGroundTruthToSegment for production output where you only want accurate, timestamped tokens.
Handling large transcripts
Process segments in batches to avoid memory issues with very large transcripts. The alignment algorithm is efficient, but processing thousands of segments at once can be memory-intensive.
Preserving original transcriptions
Always keep a copy of the original AI transcription before applying ground truth. This allows you to re-process with different ground truth versions if needed.
import { estimateSegmentFromToken, applyGroundTruthToSegment, formatSegmentsToTimestampedTranscript, markAndCombineSegments,} from 'paragrafs';// Raw AI transcription with errorsconst rawToken = { start: 0, end: 15, text: 'Their our too many errors in this transcripshun',};// Human-corrected versionconst groundTruth = 'There are too many errors in this transcription';// Processconst segment = estimateSegmentFromToken(rawToken);const aligned = applyGroundTruthToSegment(segment, groundTruth);// Format for outputconst options = { fillers: [], gapThreshold: 2, maxSecondsPerSegment: 20, minWordsPerSegment: 3,};const marked = markAndCombineSegments([aligned], options);const transcript = formatSegmentsToTimestampedTranscript(marked, 10);console.log(transcript);// Output shows corrected text with preserved timestamps