Ad Hoc Queries with Big Data or Small Data?

Do you think that you’re working with “Big Data”? or is it “Small Data”? If you’re asking ad hoc questions of your data, you’ll probably need something that supports “query-response” performance or, in other words, “near real-time”. We’re not talking about batch analytics, but more interactive / iterative analytics. Think NoSQL, or “near real-time Hadoop” with technologies like Impala. Here’s my view of Big versus Small with ad hoc analytics in either case.

Ad Hoc Analytics	Small Data	Big Data
Data Volume	Megabytes – Gigabytes	Terabytes (1-100TB)
Data Velocity	Update in near real-time (seconds)	Update in real-time (milliseconds)
Data Variety	1-6 structured data sources	6+ structured AND 6+ unstructured data sources
Data Models	Aggregations with tens of tables	Aggregations with up to 100s – 1000s of tables
Business Functions	One line of business (e.g. sales)	Several lines of business – to – 360 view
Business Intelligence	Queries are simple, regarding basic transactional summaries/reports.Response times are in seconds across a handful of business analysts. Example: retrieve a customer’s profile and summarize their overall standing based on current market values for all assets. This is representative of the work performed when a business asks the question “What is my customer worth today?” The transaction is a read-only transaction. Questions vary based on what business analyst needs to know interactively.	Queries can be as complex as with batch analytics, but generally are still read-only and processed against aggregates. Queries span across business functions.Response times are in seconds across large numbers of business analysts.Example: retrieve a customer profile and summarize activities across all customer-touch points, calculating “Life-Time-Value” based on past & current activities. This is representative of the work performed when a business asks the question “Who are my most profitable customers?” Questions vary based on what business analyst needs to know interactively.

Want my view on Batch Analytics? Look here.

Want my view on Real-time analytics? Look here.

Here are a few products in this space:

Jim Kaskade

Jim Kaskade is a serial entrepreneur & enterprise software executive of over 38 years. He was the CEO of Conversica, PE-backed leader in AI Automation solutions that help clients grow revenue. He successfully exited PE-backed SaaS company, Janrain, in the digital identity security space. Prior to identity, he led a digital application business of over 7,000 people ($1B). Prior to that he led a big data & analytics business of over 1,000 ($250M). He was the CEO of a Big Data Cloud company ($50M); was an EIR at PARC (the Bell Labs of Silicon Valley) which resulted in a spinout of AML AI company, Quantiply; led two separate private cloud software startups; founded of one of the most advanced digital video SaaS companies delivering online and wireless solutions to over 10,000 enterprises; and was involved with three semiconductor startups (two of which he founded, one of which he sold). He started his career engineering massively parallel processing datacenter applications. Jim holds an Electrical and Computer Science Engineering degree from University of California, Santa Barbara, with an emphasis in semiconductor design and computer science; and an MBA from the University of San Diego with an emphasis in entrepreneurship and finance.

3 thoughts on “Ad Hoc Queries with Big Data or Small Data?”

Pingback: Big Data versus Small Data – Jim Kaskade
Pingback: Real-time Big Data or Small Data? – Jim Kaskade
Pingback: Ad Hoc Queries with Big Data or Small Data? – Jim Kaskade | Sykes' Blog

Comments are closed.

Jim Kaskade