As a critical element of nearly all modern applications, JSON has also become the new data interchange format for our own pipelines. Database providers have heeded the call to integrate JSON functionality into their offerings. For instance, Redshift, Amazon’s Data Warehouse, is quite fast in spite of its bare-bones JSON functionality (though in its documentation, Amazon recommends “…using JSON sparingly”). We typically store raw JSON in staging tables as data progresses through our pipeline, from applications to warehouses.
But data gurus need to ask themselves whether they should use the JSON access provided in the DB or pivot the data out of the JSON document and into fields. A data professional can be helpful in evaluating the usage. For example, if the usage is such that you are extracting an extra column for a record at the end of a drill down, modern databases like Redshift are a perfect solution as they can accomplish this task very quickly.
Where we encounter a performance trade-off is in reporting on all of our customers and displaying a JSON element in the structured report. A data professional would need to evaluate the frequency of this data usage to determine if the overhead of extracting the JSON outweighs the cost of pivoting the data.
A simple data evaluation will reveal that performance lags and trajectory widens as more data is poured in. As a simple test, I accessed a cluster of three nodes in AWS Redshift which were comprised of dc2.large. The data held over 90k customers with a unique customer ID, company affiliation, and a JSON field. The JSON document in the record is a single key deep and contained over 1.3 million possible data fields. Before starting, I pivoted all the keys from the customer table to a separate table keyed by the customer ID. I then picked three key names from the JSON documents to report on—and didn’t report on keys that contained empty values. This resulted in a report of just under 160 thousand rows. For the tests, I disabled the results cache in Redshift. The straight join method took 12.5 seconds. The select json_extract_path_tex() method finished in 15.35 seconds. Tables were not give a sortkey and used default diststyle of even.
While JSON parsing is fast when building a large list of results and every second counts, kaving the data stored in a table you can join to will always provide faster results.
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
|cookielawinfo-checbox-analytics||11 months||This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".|
|cookielawinfo-checbox-functional||11 months||The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".|
|cookielawinfo-checbox-others||11 months||This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.|
|cookielawinfo-checkbox-necessary||11 months||This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".|
|cookielawinfo-checkbox-performance||11 months||This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".|
Functional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.
Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.
Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.
Advertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.
Other uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.