jq is a lightweight and flexible command-line JSON processor. You can use jq
on a local development machine to
slice, filter, map, and transform the JSON data that Unstructured outputs in much the same ways that tools such as sed
, awk
, and grep
let you work with text.
To get jq
, see the Download jq page.
jq
is not owned or supported by Unstructured. For questions about jq
and
feature requests for future versions of jq
, see the Issues tab of the
jq
repository in GitHub.
The following command examples use jq
with the
spring-weather.html.json file in the
example-docs directory within the Unstructured-IO/unstructured repository in GitHub.
Find the element with a type
of Address
, and print the element’s text
field’s value.
jq '.[]
| select(.type == "Address")
| .text' spring-weather.html.json
The output is:
"Silver Spring, MD 20910"
Find all elements with a type
of Title
, and print the text
field of each found element as a string in a JSON array.
jq '[
.[]
| select(.type == "Title")
| .text]' spring-weather.html.json
The output is:
[
"News Around NOAA",
"National Program",
"Are You Weather-Ready for the Spring?",
"Weather.gov >",
"News Around NOAA > Are You Weather-Ready for the Spring?",
"US Dept of Commerce",
"National Oceanic and Atmospheric Administration",
"National Weather Service",
"News Around NOAA",
"1325 East West Highway",
"Comments? Questions? Please Contact Us.",
"Disclaimer",
"Information Quality",
"Help",
"Glossary",
"Privacy Policy",
"Freedom of Information Act (FOIA)",
"About Us",
"Career Opportunities"
]
Find all elements with a type
of Title
. Of these, find the ones that have a text
field that contains the phrase Contact Us
, and print the contents of each found element’s metadata.link_urls
field.
jq '.[]
| select(.type == "Title")
| select(.text
| contains("Contact Us"))
| .metadata.link_urls' spring-weather.html.json
The output is:
[
"https://www.weather.gov/news/contact"
]
Find all elements with a type
of ListItem
. Of these, find the ones that have a text
field that contains the phrase Weather Safety
.
For each item in metadata.link_texts
, print the item’s value as the key, followed by the matching item in
metadata.link_urls
as the value. Trim any leading and trailing whitespace from all values. Wrap the output in a JSON array.
jq '[
.[]
| select(.type == "ListItem")
| select(.text | test("Weather Safety"; "i"))
| [.metadata.link_texts, .metadata.link_urls]
| transpose[]
| {
(.[0] | gsub("^\\s+|\\s+$"; "")) : (.[1] | gsub("^\\s+|\\s+$"; ""))
}
]' spring-weather.html.json
The output is:
[
{
"Weather Safety": "http://www.weather.gov/safetycampaign"
},
{
"Air Quality": "https://www.weather.gov/safety/airquality"
},
{
"Beach Hazards": "https://www.weather.gov/safety/beachhazards"
},
{
"Cold": "https://www.weather.gov/safety/cold"
},
{
"Cold Water": "https://www.weather.gov/safety/coldwater"
},
{
"Drought": "https://www.weather.gov/safety/drought"
},
{
"Floods": "https://www.weather.gov/safety/flood"
},
{
"Fog": "https://www.weather.gov/safety/fog"
},
{
"Heat": "https://www.weather.gov/safety/heat"
},
{
"Hurricanes": "https://www.weather.gov/safety/hurricane"
},
{
"Lightning Safety": "https://www.weather.gov/safety/lightning"
},
{
"Rip Currents": "https://www.weather.gov/safety/ripcurrent"
},
{
"Safe Boating": "https://www.weather.gov/safety/safeboating"
},
{
"Space Weather": "https://www.weather.gov/safety/space"
},
{
"Sun (Ultraviolet Radiation)": "https://www.weather.gov/safety/heat-uv"
},
{
"Thunderstorms & Tornadoes": "https://www.weather.gov/safety/thunderstorm"
},
{
"Tornado": "https://www.weather.gov/safety/tornado"
},
{
"Tsunami": "https://www.weather.gov/safety/tsunami"
},
{
"Wildfire": "https://www.weather.gov/safety/wildfire"
},
{
"Wind": "https://www.weather.gov/safety/wind"
},
{
"Winter": "https://www.weather.gov/safety/winter"
}
]
Additional resources