Overview
The transcript dataset provides access to high-accuracy transcripts from regular earnings calls and unique events. Each transcript is associated with an event.Transcripts are only available for events conducted in English.
How it works
Each transcript record in the API response includes a URL pointing to a JSON file hosted on our CDN. Transcripts are released in two steps:- Raw transcript (type ID = 15) is published shortly after the event concludes, with paragraph breaks.
- Edited transcript (type ID = 22) follows once speaker identification has been completed, adding speaker names, roles, and company affiliations.
Chapters
Transcripts can be divided into structured segments using chapters. Use the chapters endpoint to retrieve hierarchical sections with titles and timestamps for a given transcript.Speaker identification
The edited transcript (type ID = 22) maps each paragraph to a named speaker with their role and company. The JSON structure is the same as the raw transcript, with an addedspeaker_mapping array at the top level. Older transcripts will not be retroactively updated with speaker data.
If your application filters by typeId, include both: typeIds=15,22.
Are speaker fields nullable?
Are speaker fields nullable?
It’s not always possible to identify who is speaking. In such cases, the
name, role, and company fields may be null. This can happen when:- The speaker is not clearly identified in the audio.
- The speaker is not part of the event’s official roster.
- We are unable to verify the speaker’s identity or role.
When is speaker identification made available?
When is speaker identification made available?
During high-activity periods like earnings season, some events may be prioritized for speaker attribution based on client interest and market relevance. Speaker data may appear on certain transcripts sooner than others.
Data structure
Transcripts are structured as a hierarchy: the top-leveltranscript object contains the full text and an array of paragraphs, each paragraph contains sentences, and each sentence contains words. Every level includes start/end timestamps in seconds and a zero-based speaker index.
How to access this data
REST API
Query audio files using company and event filters for full control.
Webhooks
Subscribe to webhooks for real-time updates.
Snowflake
Query the transcripts view directly using SQL.

