If you’re new to Unstructured, read this note first.
Before you can create a destination connector, you must first sign in to your Unstructured account:
After you sign in, the Unstructured user interface (UI) appears, which you use to create your destination connector.
After you create the destination connector, add it along with a source connector to a workflow. Then run the worklow as a job. To learn how, try out the hands-on UI quickstart or watch the 4-minute video tutorial.
You can also create destination connectors with the Unstructured API. Learn how.
If you need help, reach out to the community on Slack, or contact us directly.
You are now ready to start creating a destination connector! Keep reading to learn how.
Send processed data from Unstructured to IBM watsonx.data.
The requirements are as follows.
An IBM Cloud account. Create an IBM Cloud account if you do not already have one.
An API key for the IBM Cloud account. If you do not have one already, create one as follows:
An IBM Cloud Object Storage (COS) instance in the account, and a bucket within that instance. If you do not have them already, create them as follows:
The name, region, and public endpoint for the target bucket within the target Cloud Object Storage (COS) instance. To get these:
On the sidebar, click the Resource list icon. If the sidebar is not visible, click the Navigation Menu icon to the far left of the top navigation bar.
In the list of resources, expand Storage, and then click the target COS instance.
On the Buckets tab, click the target bucket.
On the Configuration tab, note the following:
us-east
. This is the bucket’s region.s3.us-east.cloud-object-storage.appdomain.cloud
. (Ignore the values of
Private and Direct). This is the bucket’s public endpoint.An HMAC access key ID and secret access key for the target Cloud Object Storage (COS) instance. If you do not have them already, get or create them as follows:
On the sidebar, click the Resource list icon. If the sidebar is not visible, click the Navigation Menu icon to the far left of the top navigation bar.
In the list of resources, expand Storage, and then click the target COS instance.
On the Service credentials tab, if there is a credential that you want to use in the list, expand the credential, and copy the following values to a secure location:
access_key_id
under cos_hmac_keys
, which represents the HMAC access key ID.secret_access_key
under cos_hmac_keys
, which represents the HMAC secret access key.After you have copied the preceding values, you have completed this procedure.
If there is not a credential that you want to use, or there are no credentials at all, click New Credential.
Enter some Name for the credential.
For Role, select at least Writer, leave Select Service ID set to Auto Generated, switch on Include HMAC Credential, and then click Add.
In the list of credentials, expand the credential, and copy the following values to a secure location:
access_key_id
under cos_hmac_keys
, which represents the HMAC access key ID.secret_access_key
under cos_hmac_keys
, which represents the HMAC secret access key.An IBM watsonx.data data store instance in the IBM Cloud account. If you do not have one already, create one as follows:
An Apache Iceberg-based catalog within the watsonx.data data store instance. If you do not have one already, create one as follows:
On the sidebar, click the Resource list icon. If the sidebar is not visible, click the Navigation Menu icon to the far left of the top navigation bar.
In the list of resources, expand Databases, and then click the target watsonx.data data store instance.
Click Open web console.
If prompted, log in to the web console.
On the sidebar, click Infrastructure manager. If the sidebar is not visible, click the Global navigation icon to the far left of the top navigation bar.
Click Add component.
Under Storage, click IBM Cloud Object Storage, and then click Next.
Complete the on-screen instructions to finish creating the Iceberg catalog. This includes providing the following settings:
https://
.Next to Connection status, click Test connection to test the connection. Do not proceed until Successful is shown. If the connection is not successful, check the values you entered for the target bucket name, region, endpoint, access key, and secret access key, and try again.
Check the box labelled Associate Catalog.
Check the box labelled Activate now.
Under Associated catalog, for Catalog type, select Apache Iceberg.
Enter some Catalog name.
Click Associate.
On the sidebar, click Infrastructure manager. Make sure the catalog is associated with the appropriate engines. If it is not, rest your mouse on an unassociated target engine, click the Manage associations icon, check the box next to the target catalog’s name, and then click Save and restart engine.
To create an engine if one is not already shown, click Add component, and follow the on-screen to add an appropriate engine from the list of available Engines (for example, an IBM Presto engine).
The catalog name and metastore REST endpoint for the target Iceberg catalog. To get this:
A namespace (also known as a schema) and a table in the target catalog. If you do not have these already, create them as follows:
On the sidebar, click the Resource list icon. If the sidebar is not visible, click the Navigation Menu icon to the far left of the top navigation bar.
In the list of resources, expand Databases, and then click the target watsonx.data data store instance.
Click Open web console.
If prompted, log in to the web console.
On the sidebar, click Data manager. If the sidebar is not visible, click the Global navigation icon to the far left of the top navigation bar.
On the Browse data tab, under Catalogs associated, click the target catalog.
Click the ellipses, and then click Create schema.
Enter some Name for the schema, and then click Create.
On the sidebar, click Query workspace.
In the SQL editor, enter and run a table creation statement such as the following one that uses
Presto SQL syntax, replacing <catalog-name>
with the name of the target
catalog and <schema-name>
with the name of the target schema:
Incoming elements that do not have matching column
names will be dropped upon record insertion. For example, if the incoming data has an element named sent_from
and there is no
column named sent_from
in the table, the sent_from
element will be dropped upon record insertion. You should modify the preceding
sample table creation statement to add columns for any additional elements that you want to be included upon record
insertion.
To increase query performance, Iceberg uses hidden partitioning to
group similar rows together when writing. You can also
explicitly define partitions as part of the
preceding CREATE TABLE
statement.
The name of the target namespace (also known as a schema) within the target catalog, and name of the target table within that schema. To get these:
The name of the column in the target table that uniquely identifies each of the records in the table.
To improve performance, the target table should be set to regularly remove old metadata files. To do this, run the following Python script.
(You cannot use the preceding CREATE TABLE
statement, or other SQL statements such as ALTER TABLE
, to set this behavior.) To get the
values for the specified environment variables, see the preceding instructions.
To create the destination connector:
Fill in the following fields:
https://
in this value.https://
in this value.15
. The default is 10
. If specified, it must be a number between 2
and 100
, inclusive.150
. The default is 50
. If specified, it must be a number between 2
and 500
, inclusive.record_id
.If you’re new to Unstructured, read this note first.
Before you can create a destination connector, you must first sign in to your Unstructured account:
After you sign in, the Unstructured user interface (UI) appears, which you use to create your destination connector.
After you create the destination connector, add it along with a source connector to a workflow. Then run the worklow as a job. To learn how, try out the hands-on UI quickstart or watch the 4-minute video tutorial.
You can also create destination connectors with the Unstructured API. Learn how.
If you need help, reach out to the community on Slack, or contact us directly.
You are now ready to start creating a destination connector! Keep reading to learn how.
Send processed data from Unstructured to IBM watsonx.data.
The requirements are as follows.
An IBM Cloud account. Create an IBM Cloud account if you do not already have one.
An API key for the IBM Cloud account. If you do not have one already, create one as follows:
An IBM Cloud Object Storage (COS) instance in the account, and a bucket within that instance. If you do not have them already, create them as follows:
The name, region, and public endpoint for the target bucket within the target Cloud Object Storage (COS) instance. To get these:
On the sidebar, click the Resource list icon. If the sidebar is not visible, click the Navigation Menu icon to the far left of the top navigation bar.
In the list of resources, expand Storage, and then click the target COS instance.
On the Buckets tab, click the target bucket.
On the Configuration tab, note the following:
us-east
. This is the bucket’s region.s3.us-east.cloud-object-storage.appdomain.cloud
. (Ignore the values of
Private and Direct). This is the bucket’s public endpoint.An HMAC access key ID and secret access key for the target Cloud Object Storage (COS) instance. If you do not have them already, get or create them as follows:
On the sidebar, click the Resource list icon. If the sidebar is not visible, click the Navigation Menu icon to the far left of the top navigation bar.
In the list of resources, expand Storage, and then click the target COS instance.
On the Service credentials tab, if there is a credential that you want to use in the list, expand the credential, and copy the following values to a secure location:
access_key_id
under cos_hmac_keys
, which represents the HMAC access key ID.secret_access_key
under cos_hmac_keys
, which represents the HMAC secret access key.After you have copied the preceding values, you have completed this procedure.
If there is not a credential that you want to use, or there are no credentials at all, click New Credential.
Enter some Name for the credential.
For Role, select at least Writer, leave Select Service ID set to Auto Generated, switch on Include HMAC Credential, and then click Add.
In the list of credentials, expand the credential, and copy the following values to a secure location:
access_key_id
under cos_hmac_keys
, which represents the HMAC access key ID.secret_access_key
under cos_hmac_keys
, which represents the HMAC secret access key.An IBM watsonx.data data store instance in the IBM Cloud account. If you do not have one already, create one as follows:
An Apache Iceberg-based catalog within the watsonx.data data store instance. If you do not have one already, create one as follows:
On the sidebar, click the Resource list icon. If the sidebar is not visible, click the Navigation Menu icon to the far left of the top navigation bar.
In the list of resources, expand Databases, and then click the target watsonx.data data store instance.
Click Open web console.
If prompted, log in to the web console.
On the sidebar, click Infrastructure manager. If the sidebar is not visible, click the Global navigation icon to the far left of the top navigation bar.
Click Add component.
Under Storage, click IBM Cloud Object Storage, and then click Next.
Complete the on-screen instructions to finish creating the Iceberg catalog. This includes providing the following settings:
https://
.Next to Connection status, click Test connection to test the connection. Do not proceed until Successful is shown. If the connection is not successful, check the values you entered for the target bucket name, region, endpoint, access key, and secret access key, and try again.
Check the box labelled Associate Catalog.
Check the box labelled Activate now.
Under Associated catalog, for Catalog type, select Apache Iceberg.
Enter some Catalog name.
Click Associate.
On the sidebar, click Infrastructure manager. Make sure the catalog is associated with the appropriate engines. If it is not, rest your mouse on an unassociated target engine, click the Manage associations icon, check the box next to the target catalog’s name, and then click Save and restart engine.
To create an engine if one is not already shown, click Add component, and follow the on-screen to add an appropriate engine from the list of available Engines (for example, an IBM Presto engine).
The catalog name and metastore REST endpoint for the target Iceberg catalog. To get this:
A namespace (also known as a schema) and a table in the target catalog. If you do not have these already, create them as follows:
On the sidebar, click the Resource list icon. If the sidebar is not visible, click the Navigation Menu icon to the far left of the top navigation bar.
In the list of resources, expand Databases, and then click the target watsonx.data data store instance.
Click Open web console.
If prompted, log in to the web console.
On the sidebar, click Data manager. If the sidebar is not visible, click the Global navigation icon to the far left of the top navigation bar.
On the Browse data tab, under Catalogs associated, click the target catalog.
Click the ellipses, and then click Create schema.
Enter some Name for the schema, and then click Create.
On the sidebar, click Query workspace.
In the SQL editor, enter and run a table creation statement such as the following one that uses
Presto SQL syntax, replacing <catalog-name>
with the name of the target
catalog and <schema-name>
with the name of the target schema:
Incoming elements that do not have matching column
names will be dropped upon record insertion. For example, if the incoming data has an element named sent_from
and there is no
column named sent_from
in the table, the sent_from
element will be dropped upon record insertion. You should modify the preceding
sample table creation statement to add columns for any additional elements that you want to be included upon record
insertion.
To increase query performance, Iceberg uses hidden partitioning to
group similar rows together when writing. You can also
explicitly define partitions as part of the
preceding CREATE TABLE
statement.
The name of the target namespace (also known as a schema) within the target catalog, and name of the target table within that schema. To get these:
The name of the column in the target table that uniquely identifies each of the records in the table.
To improve performance, the target table should be set to regularly remove old metadata files. To do this, run the following Python script.
(You cannot use the preceding CREATE TABLE
statement, or other SQL statements such as ALTER TABLE
, to set this behavior.) To get the
values for the specified environment variables, see the preceding instructions.
To create the destination connector:
Fill in the following fields:
https://
in this value.https://
in this value.15
. The default is 10
. If specified, it must be a number between 2
and 100
, inclusive.150
. The default is 50
. If specified, it must be a number between 2
and 500
, inclusive.record_id
.