Skip to main content
Descriptive alt text

Overview

The transcript dataset provides access to high-accuracy transcripts from regular earnings calls and unique events. Each transcript is associated with an event.
Transcripts are only available for events conducted in English.

How it works

Each transcript record in the API response includes a URL pointing to a JSON file hosted on our CDN. Transcripts are released in two steps:
  1. Raw transcript (type ID = 15) is published shortly after the event concludes, with paragraph breaks.
  2. Edited transcript (type ID = 22) follows once speaker identification has been completed, adding speaker names, roles, and company affiliations.

Chapters

Transcripts can be divided into structured segments using chapters. Use the chapters endpoint to retrieve hierarchical sections with titles and timestamps for a given transcript.

Speaker identification

The edited transcript (type ID = 22) maps each paragraph to a named speaker with their role and company. The JSON structure is the same as the raw transcript, with an added speaker_mapping array at the top level. Older transcripts will not be retroactively updated with speaker data. If your application filters by typeId, include both: typeIds=15,22.
It’s not always possible to identify who is speaking. In such cases, the name, role, and company fields may be null. This can happen when:
  • The speaker is not clearly identified in the audio.
  • The speaker is not part of the event’s official roster.
  • We are unable to verify the speaker’s identity or role.
During high-activity periods like earnings season, some events may be prioritized for speaker attribution based on client interest and market relevance. Speaker data may appear on certain transcripts sooner than others.

Data structure

Transcripts are structured as a hierarchy: the top-level transcript object contains the full text and an array of paragraphs, each paragraph contains sentences, and each sentence contains words. Every level includes start/end timestamps in seconds and a zero-based speaker index.
{
  "version": "1.0",
  "event_id": 123456,
  "company_id": 123,
  "transcript": {
    "text": "This is the full transcript text",
    "number_of_speakers": 3,
    "paragraphs": [
      {
        "text": "This is the paragraph text",
        "start": 0,
        "end": 10,
        "speaker": 0,
        "sentences": [
          {
            "text": "This is the sentence text",
            "start": 0,
            "end": 5,
            "words": [
              {
                "word": "This",
                "punctuated_word": "This",
                "start": 0,
                "end": 5,
                "confidence": 0.9
              }
            ]
          }
        ]
      }
    ]
  }
}

How to access this data

REST API

Query audio files using company and event filters for full control.

Webhooks

Subscribe to webhooks for real-time updates.

Snowflake

Query the transcripts view directly using SQL.