Extract the Base64-encoded representation of specific elements, such as images and tables, in the document.
For each of these extracted elements, decode the Base64-encoded representation of the element into its original visual representation
and then show it.
You will need a document that is one of the document types supported by the extract_image_block_types argument.
See the extract_image_block_types entry in API Parameters.
This example uses a PDF file with embedded images and tables.
For the Unstructured Ingest Python library, you can use the standard Python
json.load function to load into a Python dictionary the contents of a JSON
file that the Ingest Python library outputs after the processing is complete.
Python
Copy
Ask AI
import json, base64, iofrom PIL import Imagedef get_image_block_types(input_json_file_path: str): with open(input_json_file_path, 'r') as file: file_elements = json.load(file) for element in file_elements: if "image_base64" in element["metadata"]: # Decode the Base64-encoded representation of the # processed "Image" or "Table" element into its original # visual representation, and then show it. image_data = base64.b64decode(element["metadata"]["image_base64"]) image = Image.open(io.BytesIO(image_data)) image.show()if __name__ == "__main__": # Source: https://github.com/Unstructured-IO/unstructured-ingest/blob/main/example-docs/pdf/embedded-images-tables.pdf # Specify where to get the local file, relative to this .py file. get_image_block_types( input_json_file_path="local-ingest-output/embedded-images-tables.json" )