Ad Hoc Queries with Big Data or Small Data?

big-dog-little-dog

Do you think that you’re working with “Big Data”? or is it “Small Data”? If you’re asking ad hoc questions of your data, you’ll probably need something that supports “query-response” performance or, in other words, “near real-time”. We’re not talking about batch analytics, but more interactive / iterative analytics. Think NoSQL, or “near real-time Hadoop” with technologies like Impala. Here’s my view of Big versus Small with ad hoc analytics in either case.

Ad Hoc AnalyticsSmall DataBig Data
Data VolumeMegabytes – GigabytesTerabytes (1-100TB)
Data VelocityUpdate in near real-time (seconds)Update in real-time (milliseconds)
Data Variety1-6 structured data sources6+ structured AND 6+ unstructured data sources
Data ModelsAggregations with tens of tablesAggregations with up to 100s – 1000s of tables
Business FunctionsOne line of business (e.g. sales)Several lines of business – to – 360 view
Business IntelligenceQueries are simple, regarding basic transactional summaries/reports.Response times are in seconds across a handful of business analysts. 

 

Example: retrieve a customer’s profile and summarize their overall standing based on current market values for all assets.

 

This is representative of the work performed when a business asks the question “What is my customer worth today?”

 

The transaction is a read-only transaction. Questions vary based on what business analyst needs to know interactively.

Queries can be as complex as with batch analytics, but generally are still read-only and processed against aggregates. Queries span across business functions.Response times are in seconds across large numbers of business analysts.Example: retrieve a customer profile and summarize activities across all customer-touch points, calculating “Life-Time-Value” based on past & current activities.

This is representative of the work performed when a business asks the question “Who are my most profitable customers?”

 

Questions vary based on what business analyst needs to know interactively.

Want my view on Batch Analytics? Look here.

Want my view on Real-time analytics? Look here.

Here are a few products in this space:

Jim Kaskade

Jim Kaskade is a serial entrepreneur & enterprise software executive of over 35 years. He recently successfully exited a PE-backed SaaS company, Janrain, in the digital identity security space. He started his career engineering massively parallel processing datacenter applications. Prior to identity, he led a digital application business of over 7,000 people ($1B). Prior to that he led a big data & analytics business of over 1,000 ($250M). He was the CEO of a Big Data Cloud company ($50M); was an EIR at PARC (the Bell Labs of Silicon Valley) which resulted in a spinout of an AML AI company; led two separate private cloud software startups; founded of one of the most advanced digital video SaaS companies delivering online and wireless solutions to over 10,000 enterprises; and was involved with three semiconductor startups (two of which he founded, one of which he sold). Jim has an Electrical and Computer Science Engineering degree from University of California, Santa Barbara, with an emphasis in semiconductor design and computer science; and an MBA from the University of San Diego with an emphasis in entrepreneurship and finance.

3 thoughts on “Ad Hoc Queries with Big Data or Small Data?

Leave a Reply