Ground Truth Alignment

Ground truth alignment functions use Longest Common Subsequence (LCS) algorithms to synchronize AI-generated tokens with human-edited transcripts, ensuring accurate timing for corrected text.

updateSegmentWithGroundTruth

Aligns AI-generated tokens to a ground truth human-edited segment text. Uses Longest Common Subsequence (LCS) to identify anchor matches between tokenized output and ground truth. Where no matches exist, it interpolates timestamped tokens for unmatched words.

function updateSegmentWithGroundTruth(
  segment: Segment,
  groundTruth: string
): GroundedSegment

Parameters

segment

Segment

required

A Segment object with ground truth text and AI-generated tokens

groundTruth

string

required

The ground truth text to apply to the segment’s text and its tokens

Returns

groundedSegment

GroundedSegment

A new GroundedSegment with the tokens adjusted to match the ground truth text along with any unmatched tokens flagged with isUnknown: true

Example

import { updateSegmentWithGroundTruth } from 'paragrafs';

const segment = {
  start: 0,
  end: 10,
  text: 'The Buick crown flock jumps',
  tokens: [
    { start: 0, end: 1, text: 'The' },
    { start: 1, end: 3, text: 'Buick' },
    { start: 3, end: 5, text: 'crown' },
    { start: 5, end: 7, text: 'flock' },
    { start: 7, end: 10, text: 'jumps' }
  ]
};

// Correct the AI mistakes
const aligned = updateSegmentWithGroundTruth(
  segment,
  'The quick brown fox jumps'
);

console.log(aligned.tokens);
// [
//   { start: 0, end: 1, text: 'The' },           // matched
//   { start: 1, end: 3, text: 'quick', isUnknown: true },  // interpolated
//   { start: 3, end: 5, text: 'brown', isUnknown: true },  // interpolated
//   { start: 5, end: 7, text: 'fox', isUnknown: true },    // interpolated
//   { start: 7, end: 10, text: 'jumps' }          // matched
// ]

applyGroundTruthToSegment

Produces a segment with the ground truth replacing the text and its respective tokens. This is a convenience wrapper around updateSegmentWithGroundTruth that filters out unknown tokens.

function applyGroundTruthToSegment(
  segment: Segment,
  groundTruth: string
): Segment

Parameters

segment

Segment

required

The segment to replace the ground truth with

groundTruth

string

required

The human verified transcription of the segment

Returns

segment

Segment

A segment with the ground truth applied to the segment text and its tokens (unknown tokens filtered out)

Example

import { applyGroundTruthToSegment } from 'paragrafs';

const segment = {
  start: 0,
  end: 10,
  text: 'The Buick crown flock',
  tokens: [
    { start: 0, end: 2.5, text: 'The' },
    { start: 2.5, end: 5, text: 'Buick' },
    { start: 5, end: 7.5, text: 'crown' },
    { start: 7.5, end: 10, text: 'flock' }
  ]
};

const corrected = applyGroundTruthToSegment(
  segment,
  'The quick brown fox'
);

console.log(corrected.text);
// 'The quick brown fox'

console.log(corrected.tokens);
// Only matched tokens are included (interpolated tokens are filtered out)

mergeSegments

Merges multiple segments into a single segment. Useful for combining sequential segments into one continuous block.

function mergeSegments(
  segments: Segment[],
  delimiter?: string
): Segment

Parameters

segments

Segment[]

required

Array of segments to merge into one

delimiter

string

default:" "

Optional string to join segment texts (defaults to space)

Returns

merged

Segment

A single merged segment containing all tokens with timing from first to last segment

Example

import { mergeSegments } from 'paragrafs';

const segments = [
  {
    start: 0,
    end: 5,
    text: 'Hello world',
    tokens: [
      { start: 0, end: 2, text: 'Hello' },
      { start: 2, end: 5, text: 'world' }
    ]
  },
  {
    start: 5,
    end: 10,
    text: 'How are you',
    tokens: [
      { start: 5, end: 6, text: 'How' },
      { start: 6, end: 8, text: 'are' },
      { start: 8, end: 10, text: 'you' }
    ]
  }
];

const merged = mergeSegments(segments);
console.log(merged);
// {
//   start: 0,
//   end: 10,
//   text: 'Hello world How are you',
//   tokens: [/* all 5 tokens */]
// }

// Custom delimiter
const mergedNewline = mergeSegments(segments, '\n');
console.log(mergedNewline.text);
// 'Hello world\nHow are you'

splitSegment

Splits a segment at a specific time point into exactly two segments. This function does the opposite of mergeSegments.

function splitSegment(
  segment: Segment,
  splitTime: number
): Segment[]

Parameters

segment

Segment

required

The segment to split

splitTime

number

required

The time (in seconds) at which to split the segment. Tokens with start < splitTime go to the first segment, others to the second.

Returns

segments

Segment[]

An array containing exactly two segments

Example

import { splitSegment } from 'paragrafs';

const segment = {
  start: 0,
  end: 10,
  text: 'The quick brown fox',
  tokens: [
    { start: 0, end: 2, text: 'The' },
    { start: 2, end: 4, text: 'quick' },
    { start: 4, end: 7, text: 'brown' },
    { start: 7, end: 10, text: 'fox' }
  ]
};

const [first, second] = splitSegment(segment, 4);

console.log(first);
// {
//   start: 0,
//   end: 4,
//   text: 'The quick',
//   tokens: [
//     { start: 0, end: 2, text: 'The' },
//     { start: 2, end: 4, text: 'quick' }
//   ]
// }

console.log(second);
// {
//   start: 4,
//   end: 10,
//   text: 'brown fox',
//   tokens: [
//     { start: 4, end: 7, text: 'brown' },
//     { start: 7, end: 10, text: 'fox' }
//   ]
// }

Getting Started

Core Concepts

Guides

API Reference

Resources

Ground Truth Alignment

updateSegmentWithGroundTruth

Parameters

Returns

Example

applyGroundTruthToSegment

Parameters

Returns

Example

mergeSegments

Parameters

Returns

Example

splitSegment

Parameters

Returns

Example

Getting Started

Core Concepts

Guides

API Reference

Resources

Documentation Index

​updateSegmentWithGroundTruth

​Parameters

​Returns

​Example

​applyGroundTruthToSegment

​Parameters

​Returns

​Example

​mergeSegments

​Parameters

​Returns

​Example

​splitSegment

​Parameters

​Returns

​Example

updateSegmentWithGroundTruth

Parameters

Returns

Example

applyGroundTruthToSegment

Parameters

Returns

Example

mergeSegments

Parameters

Returns

Example

splitSegment

Parameters

Returns

Example