Get started with the core functionality of Paragrafs:
import { estimateSegmentFromToken, markAndCombineSegments, mapSegmentsIntoFormattedSegments} from 'paragrafs';// Example token from transcriptionconst token = { start: 0, end: 5, text: 'This is a sample text. It should be properly segmented.',};// Estimate segment with word-level tokensconst segment = estimateSegmentFromToken(token);// Combine and format segmentsconst formattedSegments = mapSegmentsIntoFormattedSegments([segment]);console.log(formattedSegments[0].text);// Output: "This is a sample text. It should be properly segmented."
Synchronize AI-generated tokens with human-edited text using ground-truth alignment:
import { updateSegmentWithGroundTruth } from 'paragrafs';const rawSegment = { start: 0, end: 10, text: 'The Buick crown flock jumps right over the crazy dog.', tokens: [ /* AI-generated word timestamps */ ],};const aligned = updateSegmentWithGroundTruth( rawSegment, 'The quick brown fox jumps right over the lazy dog.');console.log(aligned.tokens);// Each token now matches the ground-truth words exactly,// with missing words interpolated where needed.
The ground-truth alignment uses LCS-based matching to replace AI-generated tokens with human-edited text while preserving timing information.