> ## Documentation Index
> Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Generate a JSON schema for a file

<Note>
  The following information applies to the legacy Unstructured Partition Endpoint.

  Unstructured recommends that you use the
  [on-demand jobs](/api-reference/workflow/overview#run-an-on-demand-job) functionality in the
  [Unstructured API](/api-reference/overview) instead. Unstructured's on-demand jobs provide
  many benefits over the legacy Unstructured Partition Endpoint, including support for:

  * Production-level usage.
  * Multiple local input files in batches.
  * The latest and highest-performing models.
  * Post-transform enrichments.
  * All of Unstructured's chunking strategies.
  * The generation of vector embeddings.

  The Unstructured API also provides support for processing files and data in remote locations.
</Note>

## Task

You want to generate a schema for a JSON file that Unstructured produces, so that you can validate, test,
and document related JSON files across your systems.

## Approach

Use a Python package such as [genson](https://pypi.org/project/genson/) to generate schemas for your
JSON files.

<Info>The `genson` package is not owned or supported by Unstructured. For questions and
requests, see the [Issues](https://github.com/wolverdude/genson/issues) tab of the
`genson` repository in GitHub.</Info>

## Generate a schema from the terminal

<Steps>
  <Step title="Install genson">
    Use [pip](https://pip.pypa.io/en/stable/installation/) to install the [genson](https://pypi.org/project/genson/) package.

    ```bash  theme={null}
    pip install genson
    ```
  </Step>

  <Step title="Install jq">
    By default, `genson` generates the JSON schema as a single string without any line breaks or indented whitespace.

    To pretty-print the schema that `genson` produces, install the [jq](https://jqlang.github.io/jq/) utility.

    <Info>The `jq` utility is not owned or supported by Unstructured. For questions and
    requests, see the [Issues](https://github.com/jqlang/jq/issues) tab of the
    `jq` repository in GitHub.</Info>
  </Step>

  <Step title="Generate the schema">
    1. Run the `genson` command, specifying the path to the input (source) JSON file, and the path to
       the output (target) JSON schema file to be generated. Use `jq` to pretty-print the schema's content
       into the file to be generated.

       ```bash  theme={null}
       genson "/path/to/input/file.json" | jq '.' > "/path/to/output/schema.json"
       ```

    2. You can find the generated JSON schema file in the output path that you specified.
  </Step>
</Steps>

## Generate a schema from Python code

<Steps>
  <Step title="Install dependencies">
    In your Python project, install the [genson](https://pypi.org/project/genson/) package.

    ```bash  theme={null}
    pip install genson
    ```
  </Step>

  <Step title="Add and run the schema generation code">
    1. Set the following local environment variables:

       * Set `LOCAL_FILE_INPUT_PATH` to the local path to the input (source) JSON file.
       * Set `LOCAL_FILE_OUTPUT_PATH` to the local path to the output (target) JSON schema file to be generated.

    2. Add the following Python code file to your project:

       ```python  theme={null}
       import os, json
       from genson import SchemaBuilder

       def json_schema_from_file(
           input_file_path: str,
           output_schema_path: str
       ) -> None:
           try:
               with open(input_file_path, "r") as file:
                   json_data = json.load(file)

               builder = SchemaBuilder()
               builder.add_object(json_data)

               schema = builder.to_schema()

               try:
                   with open(output_schema_path, "w") as schema_file:
                       json.dump(schema, schema_file, indent=2)
               except IOError as e:
                   raise IOError(f"Error writing to output file: {e}")
               
               print(f"JSON schema successfully generated and saved to '{output_schema_path}'.")
           except FileNotFoundError:
               print(f"Error: Input file '{input_file_path}' not found.")
           except IOError as e:
               print(f"I/O error occurred: {e}")
           except Exception as e:
               print(f"An unexpected error occurred: {e}")

       if __name__ == "__main__":
           json_schema_from_file(
               input_file_path=os.getenv("LOCAL_FILE_INPUT_PATH"),
               output_schema_path=os.getenv("LOCAL_FILE_OUTPUT_PATH")
           )
       ```

    3. Run the Python code file.

    4. Check the path specified by `LOCAL_FILE_OUTPUT_PATH` for the generated JSON schema file.
  </Step>
</Steps>
