Ad Hoc Queries with Big Data or Small Data?
Do you think that you’re working with “Big Data”? or is it “Small Data”? If you’re asking ad hoc questions of your data, you’ll probably need something that supports “query-response” performance or, in other words, “near real-time”. We’re not talking about batch analytics, but more interactive / iterative analytics. Think NoSQL, or “near real-time Hadoop” with technologies like Impala. Here’s my view of Big versus Small with ad hoc analytics in either case.
|Ad Hoc Analytics||Small Data||Big Data|
|Data Volume||Megabytes – Gigabytes||Terabytes (1-100TB)|
|Data Velocity||Update in near real-time (seconds)||Update in real-time (milliseconds)|
|Data Variety||1-6 structured data sources||6+ structured AND 6+ unstructured data sources|
|Data Models||Aggregations with tens of tables||Aggregations with up to 100s – 1000s of tables|
|Business Functions||One line of business (e.g. sales)||Several lines of business – to – 360 view|
|Business Intelligence||Queries are simple, regarding basic transactional summaries/reports.Response times are in seconds across a handful of business analysts.
Example: retrieve a customer’s profile and summarize their overall standing based on current market values for all assets.
This is representative of the work performed when a business asks the question “What is my customer worth today?”
The transaction is a read-only transaction. Questions vary based on what business analyst needs to know interactively.
|Queries can be as complex as with batch analytics, but generally are still read-only and processed against aggregates. Queries span across business functions.Response times are in seconds across large numbers of business analysts.Example: retrieve a customer profile and summarize activities across all customer-touch points, calculating “Life-Time-Value” based on past & current activities.
This is representative of the work performed when a business asks the question “Who are my most profitable customers?”
Questions vary based on what business analyst needs to know interactively.
Want my view on Batch Analytics? Look here.
Want my view on Real-time analytics? Look here.
Here are a few products in this space: