1
Install and run Docker
If you do not have Docker already installed and running, you can install and run a tool such as Docker Desktop, which is
available for macOS, Windows, and Linux. Learn how to install and run:
2
Pull the Unstructured Docker image
If you are an experienced Docker user, you plan to parse only a single type of data, and you want to accelerate the image-building process, you can build your own Docker image instead of pulling the latest prebuilt image.
The following steps are for AMD64-based systems.If you are using an ARM64-based system (such as Apple Silicon), follow the instructions on the Docker CLI tab in this step instead.
- In your Docker Desktop UI’s search box, enter
downloads.unstructured.io/unstructured-io/unstructured:latest
. - On the Images tab, next to unstructured-io/unstructured, click Pull.
3
Create and run a container from the image
The following steps are for AMD64-based systems.If you are using an ARM64-based system (such as Apple Silicon), follow the instructions on the Docker CLI tab in this step instead.
- In the Docker Desktop UI’s sidebar, click Images.
- Next to unstructured-io/unstructured, click the play (Run) icon.
- Expand Optional settings.
- For Container name, enter some name for your container, such as
unstructured
. - In the sidebar, click Containers.
- Next to your container, click the play (Start) icon.
4
Interact with the Unstructured open source library by running code inside the container
- In the Docker Desktop UI, in the lower right corner, click the Terminal button.
-
To start a terminal session inside the container, run the following command, replacing
<container-name>
with the name of your container, such asunstructured
: -
Run Unstructured open source library calls from inside the container. For example, start the Python interpreter:
And then run the following commands, one command at a time, to make calls to the Unstructured open source library. These calls process a PDF file in the
/app/example-docs/pdf
directory namedlayout-parser-paper.pdf
. The processed data is written as a JSON file namedlayout-parser-paper-output.json
in that same directory:After the last call finishes running, exit the Python interpreter, and then print the contents of the JSON file to the terminal: -
To exit the terminal session, run the following command, or press
Ctrl+D
:
5
Interact with the Unstructured open source library by running code outside the container
You can also interact with the Unstructured open source library by running code that is on the
same machine as the running container but not within the container itself. To do this, you can
use the Docker CLI to create a container that mounts the local directory containing the
code into the container itself, and then run that code from the container.
-
Run one of the following commands, replacing the following placeholders with the appropriate values:
- Replace
<host-path>
with the path to the directory containing your code, for example/Users/<username>/my_example_code/
. - Replace
<container-path>
with the path to some directory within the container to mount<host-path>
into, for example/app/my_example_code/
. If<container-path>
does not already exist, it will be created at the same time that the container is created. - Replace
<container-name>
with some name for your container, such asunstructured_mount
.
For ARM64-based systems (such as Apple Silicon), run the following command instead: - Replace
-
Start a terminal session inside the container by running the following command, replacing
<container-name>
with the name of your container, such asunstructured_mount
: -
Add
<container-path>
to thePYTHONPATH
environment variable within the container by running the following commands, replacing<container-path>
with the path to the target directory within the container: -
Run Unstructured open source library calls, referencing your code from
<container-path>
. For example, if you have a file namedmain.py
in<host-path>
that contains the four commands following>>>
from the previous step, you can run it as follows, replacing<container-path>
with the path to the target directory within the container:To print the contents of the JSON file to the terminal, run the following command: -
To exit the terminal session, run the following command, or press
Ctrl+D
:
6
Stop running the container
If you do not need the keep running the container, you can stop it as follows:
- In the Docker Desktop UI, in the sidebar, click Containers.
- Next to your container, click the square (Stop) icon.