Workflows
Workflows dashboard
To view the workflows dashboard, on the sidebar, click Workflows.
A workflow in the Unstructured Platform is a defined sequence of processes that automate the data handling from source to destination. It allows users to configure how and when data should be ingested, processed, and stored.
Workflows are crucial for establishing a systematic approach to managing data flows within the platform, ensuring consistency, efficiency, and adherence to specific data processing requirements.
Create a workflow
You must first have an existing source connector and destination connector to add to the workflow.
If you do not have an existing connector for either your target source (input) or destination (output) location, create the source connector, create the destination connector, and then return here.
To see your existing connectors, on the sidebar, click Sources or Destinations.
To create a workflow:
-
On the sidebar, click Workflows.
-
Click New Workflow.
-
Enter a unique Name for this workflow.
-
In the Connectors section, in the Sources dropdown list, select your source location.
-
In the Destination dropdown list, select your destination location.
You can select multiple source and destination locations. Files will be ingested from all of the selected source locations, and the processed data will be delivered to all of the selected destination locations. -
In the Workflow Settings section, choose one of these predefined workflow settings groups:
- Basic is a good choice if you have text-only documents that have no images or tables in them.
- Advanced is a good choice if you have complex documents that have images or tables or both in them.
Learn about the predefined settings for Basic and Advanced.
If neither the Basic nor Advanced predefined settings meet your needs, click Custom to define different settings. If Custom is not available, click Request Access, and wait for Unstructured to enable it. Learn how to define Custom workflow settings.
-
If you want to run this workflow on a regular basis, in the Schedule section, select one of the time periods in the Schedule Type list:
- Monthly: This workflow will automatically run once each month. Choose the day of the month and the time on that day to run this workflow.
- Daily: This workflow will automatically run once each day for one or more days of each week. Choose the days of the week and the time on each of those days to run this workflow.
- Hourly: This workflow will automatically run once each hour. Choose the minute of the hour to run this workflow.
- Frequently: This workflow will automatically run once after each specified number of minutes. Choose the time period in minutes to wait until running this workflow again.
-
Click Save.
Basic workflow settings
To apply the following predefined advanced workflow settings, in the Workflow Settings section of a workflow, click Basic.
To learn more about these settings, see the descriptions for Custom workflow settings.
-
Transform section:
-
Strategy: Fast
-
Image summarization: None
-
Table summarization: None
-
Connector Settings:
- Include Page Breaks: No (unchecked)
- Infer Table Structure: Yes (checked)
-
Elements to Exclude: None (nothing selected)
-
-
Chunk section:
- Chunker Type: Basic
- Include Original Elements: No (unchecked)
- Max Characters: 2048
- New After N Characters: 1500
- Overlap: 160
- Overlap All: No (unchecked)
-
Embed section:
- Vendor: OpenAI
- Embedding Model: text-embedding-3-small (1536 dimensions)
Advanced workflow settings
To apply the following predefined advanced workflow settings, in the Workflow Settings section of a workflow, click Advanced.
To learn more about these settings, see the descriptions for Custom workflow settings.
-
Transform section:
-
Strategy: Hi Res
-
Image summarization: Claude 3.5 Sonnet
-
Table summarization: GPT-4o
-
Connector Settings:
- Include Page Breaks: No (unchecked)
- Infer Table Structure: No (unchecked)
-
Elements to Exclude: None (nothing selected)
-
-
Chunk section:
- Chunker Type: Chunk By Title
- Combine Text Under N Characters: 0
- Include Original Elements: No (unchecked)
- Max Characters: 2048
- Multipage Sections: Yes (checked)
- New After N Characters: 1500
- Overlap: 160
- Overlap All: No (unchecked)
-
Embed section:
- Vendor: OpenAI
- Embedding Model: text-embedding-3-large (3072 dimensions)
Custom workflow settings
To define custom workflow settings, in the Workflow Settings section of a workflow, click Custom. If Custom is not available, click Request Access, and wait for Unstructured to enable it.
The following workflow settings can be customized:
Edit, delete, or run a workflow
For each of the workflows on the Workflows list page, the following actions are available by clicking the ellipses (the three dots) next to the respective workflow name:
-
Edit: Changes the existing configuration of your workflow. This can include changing the source, destination, scheduling, and chunking strategies, among other settings.
-
Delete: Removes the workflow from the platform. Use this action cautiously, as it will permanently delete the workflow and its configurations.
-
Run: Manually runs the workflow outside of its scheduled runs. This is particularly useful for testing or ad-hoc data processing needs.