{"id":49,"date":"2020-04-27T14:47:17","date_gmt":"2020-04-27T13:47:17","guid":{"rendered":"https:\/\/blog.vaglid.net\/?p=49"},"modified":"2020-11-22T00:36:16","modified_gmt":"2020-11-21T23:36:16","slug":"why-you-have-to-start-loving-json-and-stop-using-xml","status":"publish","type":"post","link":"https:\/\/blog.vaglid.net\/index.php\/2020\/04\/27\/why-you-have-to-start-loving-json-and-stop-using-xml\/","title":{"rendered":"Why you have to start loving JSON and stop using XML"},"content":{"rendered":"\r\n<p>I recently had to onboard XML data in to Splunk. To say the least, this is not something that is done straight out of the box. <\/p>\r\n\r\n\r\n\r\n<p>This time around I decided to try to work smarter, so I started looking into tools to convert the XML data into something acceptable for Splunk &#8211; enter JSON.<\/p>\r\n\r\n\r\n\r\n<p>The reason for going with JSON is that Splunk is able to ingest proper JSON with little configuration. Every row contains headers, so field extraction magic is also not required.<\/p>\r\n\r\n\r\n\r\n<p>I scavenged the net, and ended up using the now deprecated XML2JSON from <a href=\"https:\/\/github.com\/hay\/xml2json\">GitHub.<\/a><br>This Python script simply inputs XML and spits out JSON, no question asked.<br>For this project I specified &#8220;&#8211;strip_namespace &#8211;strip_newlines &#8211;pretty &#8211;strip_text&#8221;.<br><code>xml2json.py -t xml2json -o ${OUTPUTFILE}.json ${INPUTFILE}.xml<\/code><\/p>\r\n\r\n\r\n\r\n<p>The original XML containted several levels, so I had to use the marvellous tool <a href=\"https:\/\/stedolan.github.io\/jq\/\"><code>jq<\/code> <\/a>to specify that I were only interested in the data some levels down in the three.<br><code>jq .message.body.bodyContent.meeting ${INPUTFILE}.json &gt; ${OUTPUTFILE}.json<\/code><\/p>\r\n\r\n\r\n\r\n<p>On the forwarder (heavy in my case), I had to specify a scripted input to actually fetch the data from the API, and also a monitor input to read the resulting JSON:<\/p>\r\n\r\n\r\n\r\n<p>inputs.conf<br><code>[script:\/\/\/path\/to\/script\/script.sh]<br>index=someindex <br>sourcetype=vendor:product:script<br>disabled=false<br>start_by_shell=false<br><br> [monitor:\/\/\/path\/to\/resulting\/logs\/*.json]<br>index=someindex<br>sourcetype=vendor:product:json<br>disabled=false<br>ignoreOlderThan = 14d<\/code><\/p>\r\n\r\n\r\n\r\n<p>props.conf on the forwarder:<br><code>[vendor:product:json]<br>TRUNCATE = 0<br>CHARSET = UTF-8<br>KV_MODE=none<br>INDEXED_EXTRACTIONS=JSON<br>SHOULD_LINEMERGE=false<br>DATETIME_CONFIG=CURRENT<\/code><br><br>The reason for using DATETIME_CONFIG=CURRENT is due to the fact that these events did not contain any timestamps.<\/p>\r\n\r\n\r\n\r\n<p>props.conf on the search heads:<br><code>[vendor:product:json]<br>KV_MODE=none<\/code><br>KV_MODE=none is specified to specify that the search heads do not need extract fields, as this is already done at index time.<\/p>\r\n\r\n\r\n\r\n<p>Please note that this configuration results in Splunk performing the extractions at index time and not at search time. <br>This could result in better search performance if you search using the <code>key::value<\/code> syntax, or using <code>tstats<\/code> to search indexed data.<br><br>This WILL also consume more storage usage on the indexers.<br>Final note &#8211; indexed extractions is usually not recommended except in specific cases.<\/p>\r\n","protected":false},"excerpt":{"rendered":"<p>I recently had to onboard XML data in to Splunk. To say the least, this is not something that is done straight out of the box. This time around I [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":true,"template":"","format":"standard","meta":{"_price":"","_stock":"","_tribe_ticket_header":"","_tribe_default_ticket_provider":"","_ticket_start_date":"","_ticket_end_date":"","_tribe_ticket_show_description":"","_tribe_ticket_show_not_going":false,"_tribe_ticket_use_global_stock":"","_tribe_ticket_global_stock_level":"","_global_stock_mode":"","_global_stock_cap":"","_tribe_rsvp_for_event":"","_tribe_ticket_going_count":"","_tribe_ticket_not_going_count":"","_tribe_tickets_list":"[]","_tribe_ticket_has_attendee_info_fields":false,"_EventAllDay":false,"_EventTimezone":"","_EventStartDate":"","_EventEndDate":"","_EventStartDateUTC":"","_EventEndDateUTC":"","_EventShowMap":false,"_EventShowMapLink":false,"_EventURL":"","_EventCost":"","_EventCostDescription":"","_EventCurrencySymbol":"","_EventCurrencyCode":"","_EventCurrencyPosition":"","_EventDateTimeSeparator":"","_EventTimeRangeSeparator":"","_EventOrganizerID":[],"_EventVenueID":[],"_OrganizerEmail":"","_OrganizerPhone":"","_OrganizerWebsite":"","_VenueAddress":"","_VenueCity":"","_VenueCountry":"","_VenueProvince":"","_VenueState":"","_VenueZip":"","_VenuePhone":"","_VenueURL":"","_VenueStateProvince":"","_VenueLat":"","_VenueLng":"","_VenueShowMap":false,"_VenueShowMapLink":false,"footnotes":""},"categories":[1],"tags":[13],"class_list":["post-49","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-splunk"],"_links":{"self":[{"href":"https:\/\/blog.vaglid.net\/index.php\/wp-json\/wp\/v2\/posts\/49","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.vaglid.net\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.vaglid.net\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.vaglid.net\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.vaglid.net\/index.php\/wp-json\/wp\/v2\/comments?post=49"}],"version-history":[{"count":5,"href":"https:\/\/blog.vaglid.net\/index.php\/wp-json\/wp\/v2\/posts\/49\/revisions"}],"predecessor-version":[{"id":59,"href":"https:\/\/blog.vaglid.net\/index.php\/wp-json\/wp\/v2\/posts\/49\/revisions\/59"}],"wp:attachment":[{"href":"https:\/\/blog.vaglid.net\/index.php\/wp-json\/wp\/v2\/media?parent=49"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.vaglid.net\/index.php\/wp-json\/wp\/v2\/categories?post=49"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.vaglid.net\/index.php\/wp-json\/wp\/v2\/tags?post=49"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}