Releases: hczhu/TickerTick-API
TickerTick stock news dataset 2025-05-26
Use the following link to download the dataset: Dataset Download Link
The dataset has close to 16 million news stories. The dataset file has each stock news story as a line in JSON format in reverse chronological order. An example news story in prettified multi-line JSON format is shown below:
{ "title": "Tech giants Nvidia, OpenAI and others join forces for massive UAE Stargate AI data center", "url": "https://qz.com/american-tech-partners-with-uae-for-new-ai-data-center-1851781991", "unix_timestamp": 1747936200, "id": "-3744939139222479336", "tickers_direct": [ ".openai", "orcl", "nvda" ], "tickers_indirect": [ "csco" ], "description": "A group of global tech giants gathered in Abu Dhabi to pose for a photo as anAI supergroup, including OpenAI's Sam Altman, Oracle's (ORCL) Larry Ellison, Nvidia's (NVDA) Jensen Huang, and Chuck Robbins of Cisco (CSCO), along with their new UAE partners. Read more..." } The fields of the JSON blob are explained below. Most of the fields have the same semantics as the ones in the response of TickerTick API.
| Field name | Meaning | Optional field? (If yes, this field can be missing) |
|---|---|---|
| title | The title of this news story | No |
| url | The original URL for the full news story | No |
| unix_timestamp | The UNIX timestamp when the news was reported | No |
| id | A unique string ID of this news story | No |
| description | A short description of this news story | Yes |
| tickers_direct | The tickers that the news story is directly about, e.g., the name of the company for the ticker is mentioned | Yes |
| tickers_indirect | The tickers that the news story is indirectly about, e.g., the CEO or a product of the company for this ticker is mentioned | Yes |
Note that many well-known pre-IPO startups (e.g., Bytedance, the parent company of TikTok) have made-up tickers like .openai and .databricks.
TickerTick stock news dataset 2023-11-23
Use the following link to download the dataset: Dataset Download Link
The dataset has close to 8 million news stories. The dataset file has each stock news story as a line in JSON format in reverse chronological order. An example news story in prettified multi-line JSON format is shown below:
{ "title": "Europe gives Meta, TikTok six days to share information on response to Israel-Hamas conflict", "url": "https://www.cnbc.com/2023/10/19/israel-hamas-eu-gives-meta-tiktok-six-days-to-provide-information.html", "unix_timestamp": 1697727889, "id": "3341850707742811898", "tickers_direct": [ "meta", "fb" ], "tickers_indirect": [ ".bytedance" ], "description": "The EU said it would like Meta and TikTok to hand over information on how they're tackling misinformation about the Israel-Hamas war." } The fields of the JSON blob are explained below. Most of the fields have the same semantics as the ones in the response of TickerTick API.
| Field name | Meaning | Optional field? (If yes, this field can be missing) |
|---|---|---|
| title | The title of this news story | No |
| url | The original URL for the full news story | No |
| unix_timestamp | The UNIX timestamp when the news was reported | No |
| id | A unique string ID of this news story | No |
| description | A short description of this news story | Yes |
| tickers_direct | The tickers that the news story is directly about, e.g., the name of the company for the ticker is mentioned | Yes |
| tickers_indirect | The tickers that the news story is indirectly about, e.g., the CEO or a product of the company for this ticker is mentioned | Yes |
Note that many well-known pre-IPO startups (e.g., Bytedance, the parent company of TikTok) have made-up tickers like .bytedance and .databricks.