Dataset | | Base Model’ | | Notes | PubLayNet | [38] F/M | Layouts of modern scientific documents |
PRImA [3] | M | Layouts of scanned modern magazines and scientific reports |
Newspaper | F | Layouts of scanned US newspapers from the 20th century |
TableBank | F | Table region on modern scientific and business document |
HJDataset [31] | F/M | Layouts of history Japanese documents |
```
### Data connector metadata fields
Documents processed through source connectors include additional document metadata. These additional fields only ever
appear if the source document was processed by a connector.
#### Common data connector metadata fields
* Data Source metadata (on json output):
* url
* version
* date created
* date modified
* date processed
* record locator
* Record locator is specific to each connector
#### Additional metadata fields by connector type (via record locator)
| Source connector | Additional metadata |
| --------------------- | -------------------------------- |
| airtable | base id, table id, view id |
| azure (from fsspec) | protocol, remote file path |
| box (from fsspec) | protocol, remote file path |
| confluence | url, page id |
| discord | channel |
| dropbox (from fsspec) | protocol, remote file path |
| elasticsearch | url, index name, document id |
| fsspec | protocol, remote file path |
| google drive | drive id, file id |
| gcs (from fsspec) | protocol, remote file path |
| jira | base url, issue key |
| onedrive | user pname, server relative path |
| outlook | message id, user email |
| s3 (from fsspec) | protocol, remote file path |
| sharepoint | server path, site url |
| wikipedia | page title, age url |
# Examples
Source: https://docs.unstructured.io/api-reference/partition/examples
This page provides some examples of accessing Unstructured Partition Endpoint via different methods.
To use these examples, you'll first need to set an environment variable named `UNSTRUCTURED_API_KEY`,
representing your Unstructured API key. [Get your API key](/api-reference/partition/overview).
For the POST and Unstructured JavaScript/TypeScript SDK examples, you'll also need to set an environment variable named `UNSTRUCTURED_API_URL` to the
value `https://api.unstructuredapp.io/general/v0/general`
For the Unstructured Python SDK, you do not need to set an environment variable named `UNSTRUCTURED_API_URL`, as the Python SDK uses the API URL of
`https://api.unstructuredapp.io/general/v0/general` by default. (The Unstructured JavaScript/TypeScript SDK does not have this feature yet; you must always specify the API URL.)
### Changing partition strategy for a PDF
Here's how you can modify partition strategy for a PDF file, and select an alternative model to use with Unstructured API.