CrewAI
CrewAI is a popular framework for building AI agents and multi-agent workflows.
This article provides a hands-on, step-by-step walkthrough that uses CrewAI open source, along with the Unstructured Workflow Endpoint MCP Server and Python, to build a multi-agent workflow. This multi-agent workflow uses the MCP server to call various functions within the Unstructured Workflow Endpoint to build Unstructured ETL+ workflows from connector creation all the way through to workflow job completion. This walkthrough uses an Amazon S3 bucket as both the workflow’s source and destination. However, you can modify this multi-agent workflow later to use a different S3 bucket or even different sources and destinations, to have a collection of AI agents quickly build multiple Unstructured ETL+ workflows on your behalf with varying configurations.
Requirements
To complete this walkthrough, you must first have:
- An Unstructured account and an Unstructured API key for that account.
- An Anthropic account and an Anthropic API key for that account.
- A properly configured Amazon S3 bucket with bucket access credentials.
- A Firecrawl account and a Firecrawl API key for that account.
- Python and the CrewAI open-source toolchain installed on your local development machine.
The following sections describe how to get these requirements.
Unstructured account and API key
Before you begin, you must have an Unstructured account and an Unstructured API key, as follows:
If you signed up for Unstructured through the For Enterprise page, or if you are using a self-hosted deployment of Unstructured, the following information about signing up, signing in, and getting your Unstructured API key might apply differently to you. For details, contact Unstructured Sales at sales@unstructured.io.
-
Sign in to your Unstructured account:
- If you do not already have an Unstructured account, go to https://unstructured.io/contact and fill out the online form to indicate your interest.
- If you already have an Unstructured account, go to https://platform.unstructured.io and sign in by using the email address, Google account, or GitHub account that is associated with your Unstructured account. The Unstructured user interface (UI) then appears, and you can start using it right away.
-
Get your Unstructured API key:
a. In the Unstructured UI, click API Keys on the sidebar.
b. Click Generate API Key.
c. Follow the on-screen instructions to finish generating the key.
d. Click the Copy icon next to your new key to add the key to your system’s clipboard. If you lose this key, simply return and click the Copy icon again.
Anthropic account and API key
This walkthrough uses a Claude Opus 3 model from Anthropic. So, before you begin, you must also have an Anthropic account and an Anthropic API key for that account. Sign in to or create your Anthropic account. After you sign in to your Anthropic account, get your Anthropic API key.
Amazon S3 bucket and access credentials
This walkthrough uses an Amazon S3 bucket as both the workflow’s source and destination. So, before you begin, you must also have a properly configured Amazon S3 bucket, with bucket access credentials consisting of an access key ID and a secret access key for the AWS IAM user that has access to the bucket. Follow the S3 connector instructions to create and configure the bucket and get the bucket access credentials if you do not already have this all set up. (In these instructions, do not follow the directions to use the Unstructured UI to create the S3 source connector. CrewAI will do this for you later automatically.)
This walkthrough expects two folders to exist within the bucket, as follows:
- An
input
folder, which contains the files to process. Thisinput
folder must contain at least one file to process. If you do not have any files available to upload into theinput
folder, you can get some from the example-docs folder in the Unstructured open source library repository on GitHub. - An
output
folder, which will contain the processed files’ data from Unstructured after CrewAI runs the workflow. Thisoutput
folder should be empty for now.
Firecrawl account and API key
This walkthrough uses Firecrawl to monitor job statuses. So, before you begin, you must also have a Firecrawl account and a Firecrawl API key for that account. Sign in to or create your Firecrawl account. After you sign in to your Firecrawl account, get your Firecrawl API key.
Python and CrewAI open-source toolchain and project setup
Before you can start coding on your local machine, you must install Python, and you should also install a Python package and project manager to manage your project’s code dependencies.
This walkthrough uses the popular Python pacakge and project manager uv (although uv
is not required to use CrewAI or the Unstructured Workflow Endpoint MCP Server).
Install uv
To install uv
, run one of the following commands, depending on your operating system:
To use curl
with sh
:
To use wget
with sh
instead:
To use curl
with sh
:
To use wget
with sh
instead:
To use PowerShell with irm
to download the script and run it with iex
:
If you need to install uv
by using other approaches such as PyPI, Homebrew, or WinGet,
see Installing uv.
Install CrewAI open source
Use uv
to install the CrewAI open-source toolchain, by running the following command:
Install Python
CrewAI open source works only with Python 3.10, 3.11, and 3.12.
uv
will detect and use Python if you already have it installed.
To view a list of installed Python versions, run the following command:
If, however, you do not already have Python installed, you can install a version of Python for use with uv
by running the following command. For example, this command installs Python 3.12 for use with uv
:
Build and run the CrewAIproject
You are now ready to start coding.
Create the project directory
Switch to the directory on your local development machine where you want to create the project directory for this walkthrough.
This example creates a project directory named crewai_unstructured_demo
within your current working directory and then
switches to this new project directory:
Intiialize the project
From within the new project directory, use uv
to initialize the project by running the following command:
Create a venv virtual environment
To isolate and manage your project’s code dependencies, you should create a virtual environment. This walkthrough uses
the popular Python virtual environment manager venv (although venv
is not required to use CrewAI or the Unstructured Workflow Endpoint MCP Server).
From the root of your project directory, use uv
to create a virtual environment with venv
by running the following command:
Activate the virtual environment
To activate the venv
virtual environment, run one of the following commands from the root of your project directory:
- For
bash
orzsh
, runsource .venv/bin/activate
- For
fish
, runsource .venv/bin/activate.fish
- For
csh
ortcsh
, runsource .venv/bin/activate.csh
- For
pwsh
, run.venv/bin/Activate.ps1
- For
bash
orzsh
, runsource .venv/bin/activate
- For
fish
, runsource .venv/bin/activate.fish
- For
csh
ortcsh
, runsource .venv/bin/activate.csh
- For
pwsh
, run.venv/bin/Activate.ps1
- For
cmd.exe
, run.venv\Scripts\activate.bat
- For
PowerShell
, run.venv\Scripts\Activate.ps1
If you need to deactivate the virtual environment at any time, run the following command:
Get the Unstructured Workflow Endpoint MCP Server's source code
The Unstructured Workflow Endpoint MCP Server is a Python package that provides an MCP server for the Unstructured Workflow Endpoint. To get the MCP server’s source code, run the following command from the root of your project directory:
Install the Unstructured Workflow Endpoint MCP Server's code dependencies
From the root of your project directory, switch to the cloned Unstructured Workflow Endpoint MCP Server’s source directory, and then use uv
to install the MCP server’s code dependencies, by running the following commands:
Install the CrewAI project's code dependencies
Switch back to the CrewAI project’s root directory, and then use uv
to install the CrewAI project’s dependencies, by running the following commands:
Add the CrewAI project's source code
In the main.py
file in the CrewAI project’s root directory, replace that file’s contents with the following Python code. This code defines a set of
CrewAI-compatible agents and tasks that make up a
multi-agent crew. This code then uses the crew’s agents to run the tasks in order to build an Unstructured ETL+ workflows
from connector creation all the way through to workflow job completion:
The preceding code does the following:
- Imports the necessary library modules for the rest of the code to use.
- Loads the environment variables that the code relies on from a
.env
file, which you will create in the next step. - Defines the Pydantic-formatted models for the expected output of each task. These models format the tasks’ output for consistent presentation.
- Defines the agents and tasks for the crew. These agents use their related tasks to create a source connector, a destination connector, and a workflow that uses these connectors, and then runs the newly created workflow as a job and reports the job’s status.
- After the crew is finished, the results of each task are printed.
Create the .env file
Create an .env
file in the root of your CrewAI project directory, and then add the following environment variables to the file:
UNSTRUCTURED_API_KEY
- Your Unstructured API key.ANTHROPIC_API_KEY
- Your Anthropic API key.FIRECRAWL_API_KEY
- Your Firecrawl API key.AWS_KEY
- The AWS access key ID for the AWS IAM user that has access to the S3 bucket.AWS_SECRET
- The IAM user’s AWS secret access key.S3_SOURCE_CONNECTOR_NAME
- Some display name for the S3 source connector.S3_SOURCE_URI
- The URI of theinput
folder in the S3 bucket.S3_DESTINATION_CONNECTOR_NAME
- The name of the S3 destination connector.S3_DESTINATION_URI
- The URI of theoutput
folder in the S3 bucket.WORKFLOW_NAME
- Some display name for the workflow.
For example, your .env
file might look like this:
Unstructured Workflow Endpoint MCP Server
From the root of your project directory, switch to the Unstructured Workflow Endpoint MCP Server’s source directory, and then use make
to run the MCP server locally, by running the following commands:
make
available, see your operating system’s documentation for installation instructions.The MCP server will start running at http://127.0.0.1:8080/sse
.
You must leave the MCP server running in your terminal or command prompt window while you run the CrewAI project.
Run the CrewAI project
-
In a separate terminal or command prompt window, from the root of your project directory, run the CrewAI project by running the following command:
-
The CrewAI project will run and create an Unstructured ETL+ workflow. You can see the crew’s progress in the terminal or command prompt window where you ran the
uv run main.py
command. -
The crew’s agents will create a source connector, a destination connector, and a workflow that uses these connectors.
-
The crew’s agents will then run the workflow as a job and report on the job’s status.
-
After the job is completed, the crew will report final information about the tasks that were completed, for example:
-
You can sign in to your Unstructured account to see the results:
- On the sidebar, click Connectors to see the source and destination connectors.
- On the sidebar, click Workflows to see the workflow.
- On the sidebar, click Jobs to see the job.
-
If you do not want to keep the MCP server running, you can stop it by pressing
Ctrl+C
in the terminal or command prompt window where you ran themake sse-server
command.