Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
53 changes: 28 additions & 25 deletions docs/references/http_api_reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -419,17 +419,11 @@ Creates a dataset.
- `"embedding_model"`: `string`
- `"permission"`: `string`
- `"chunk_method"`: `string`
- "parser_config": `object`
- "parse_type": `int`
- "pipeline_id": `string`

Note: Choose exactly one ingestion mode when creating a dataset.
- Chunking method: provide `"chunk_method"` (optionally with `"parser_config"`).
- Ingestion pipeline: provide both `"parse_type"` and `"pipeline_id"` and do not provide `"chunk_method"`.
- `"parser_config"`: `object`
- `"parse_type"`: `int`
- `"pipeline_id"`: `string`

These options are mutually exclusive. If all three of `chunk_method`, `parse_type`, and `pipeline_id` are omitted, the system defaults to `chunk_method = "naive"`.

##### Request example
##### A basic request example

```bash
curl --request POST \
Expand All @@ -441,9 +435,11 @@ curl --request POST \
}'
```

##### Request example (ingestion pipeline)
##### A request example specifying ingestion pipeline

Use this form when specifying an ingestion pipeline (do not include `chunk_method`).
:::caution WARNING
You must *not* include `"chunk_method"` or `"parser_config"` when specifying an ingestion pipeline.
:::

```bash
curl --request POST \
Expand All @@ -452,15 +448,11 @@ curl --request POST \
--header 'Authorization: Bearer <YOUR_API_KEY>' \
--data '{
"name": "test-sdk",
"parse_type": <NUMBER_OF_FORMATS_IN_PARSE>,
"parse_type": <NUMBER_OF_PARSERS_IN_YOUR_PARSER_COMPONENT>,
"pipeline_id": "<PIPELINE_ID_32_HEX>"
}'
```

Notes:
- `parse_type` is an integer. Replace `<NUMBER_OF_FORMATS_IN_PARSE>` with your pipeline's parse-type value.
- `pipeline_id` must be a 32-character lowercase hexadecimal string.

##### Request parameters

- `"name"`: (*Body parameter*), `string`, *Required*
Expand Down Expand Up @@ -488,7 +480,8 @@ Notes:
- `"team"`: All team members can manage the dataset.

- `"chunk_method"`: (*Body parameter*), `enum<string>`
The chunking method of the dataset to create. Available options:
The default chunk method of the dataset to create. Mutually exclusive with `"parse_type"` and `"pipeline_id"`. If you set `"chunk_method"`, do not include `"parse_type"` or `"pipeline_id"`.
Available options:
- `"naive"`: General (default)
- `"book"`: Book
- `"email"`: Email
Expand All @@ -501,7 +494,6 @@ Notes:
- `"qa"`: Q&A
- `"table"`: Table
- `"tag"`: Tag
- Mutually exclusive with `parse_type` and `pipeline_id`. If you set `chunk_method`, do not include `parse_type` or `pipeline_id`.

- `"parser_config"`: (*Body parameter*), `object`
The configuration settings for the dataset parser. The attributes in this JSON object vary with the selected `"chunk_method"`:
Expand Down Expand Up @@ -538,14 +530,25 @@ Notes:
- Defaults to: `{"use_raptor": false}`.
- If `"chunk_method"` is `"table"`, `"picture"`, `"one"`, or `"email"`, `"parser_config"` is an empty JSON object.

- "parse_type": (*Body parameter*), `int`
The ingestion pipeline parse type identifier. Required if and only if you are using an ingestion pipeline (together with `"pipeline_id"`). Must not be provided when `"chunk_method"` is set.
- `"parse_type"`: (*Body parameter*), `int`
The ingestion pipeline parse type identifier, i.e., the number of parsers in your **Parser** component.
- Required (along with `"pipeline_id"`) if specifying an ingestion pipeline.
- Must not be included when `"chunk_method"` is specified.

- `"pipeline_id"`: (*Body parameter*), `string`
The ingestion pipeline ID. Can be found in the corresponding URL in the RAGFlow UI.
- Required (along with `"parse_type"`) if specifying an ingestion pipeline.
- Must be a 32-character lowercase hexadecimal string, e.g., `"d0bebe30ae2211f0970942010a8e0005"`.
- Must not be included when `"chunk_method"` is specified.

- "pipeline_id": (*Body parameter*), `string`
The ingestion pipeline ID. Required if and only if you are using an ingestion pipeline (together with `"parse_type"`).
- Must not be provided when `"chunk_method"` is set.
:::caution WARNING
You can choose either of the following ingestion options when creating a dataset, but *not* both:

Note: If none of `chunk_method`, `parse_type`, and `pipeline_id` are provided, the system will default to `chunk_method = "naive"`.
- Use a built-in chunk method -- specify `"chunk_method"` (optionally with `"parser_config"`).
- Use an ingestion pipeline -- specify both `"parse_type"` and `"pipeline_id"`.

If none of `"chunk_method"`, `"parse_type"`, or `"pipeline_id"` are provided, the system defaults to `chunk_method = "naive"`.
:::

#### Response

Expand Down