Batch with Big Data versus Small Data

How do you know whether you are dealing with Big Data or Small Data? I’m constantly asked for my definition of “Big Data”. Well, here it is…for batch analytics, now addressed by technologies such as Hadoop.

Batch Analytics

Batch Analytics	Small Data	Big Data
Data Volume	Gigabytes	Terabytes – Petabytes
Data Velocity	Updated periodically with non-real-time intervals	Updated both in real-time and through bulk timed intervals
Data Variety	1-6 structured sources	6+ structured AND 6+ unstructured sources
Data Models	Store data without cleaning, transforming, or normalizing.	Store data without cleaning, transforming, and normalizing. Then apply schemas based on application needs.
Business Functions	One line of business (e.g. sales)	Several lines of business – to – 360 view
Business Intelligence	Queries are complex requiring many concurrent data modifications, a rich breadth of operators, and many selectivity constraints. However, they are applied to a simpler data structure.Response times are in minutes to hours, issued by one or maybe two experts.Example: determine how much profit is made on a given line of parts, broken out by supplier, by geography, by year.	Queries are complex requiring many concurrent data modifications, a rich breadth of operators, and many selectivity constraints. Queries span across business functions.Response times are in minutes to hours, issued by a small group of experts. Example: determine how much profit is made on a given line of parts, broken out by supplier, by geography, by year; and then determining which customers purchased the higher profit parts, by geography, by year; determining the profile of those high-profit customers; finding out what products purchased by high-profit customers were NOT purchased by other similar customers in order to cross-sell / up-sell.

Want to see my view on Ad Hoc and Interactive Analytics? Go here.

Want to see my view on Real-Time Analytics? Go here.

Here are a few other products in this space:

Jim Kaskade

Jim Kaskade is a serial entrepreneur & enterprise software executive of over 38 years. He was the CEO of Conversica, PE-backed leader in AI Automation solutions that help clients grow revenue. He successfully exited PE-backed SaaS company, Janrain, in the digital identity security space. Prior to identity, he led a digital application business of over 7,000 people ($1B). Prior to that he led a big data & analytics business of over 1,000 ($250M). He was the CEO of a Big Data Cloud company ($50M); was an EIR at PARC (the Bell Labs of Silicon Valley) which resulted in a spinout of AML AI company, Quantiply; led two separate private cloud software startups; founded of one of the most advanced digital video SaaS companies delivering online and wireless solutions to over 10,000 enterprises; and was involved with three semiconductor startups (two of which he founded, one of which he sold). He started his career engineering massively parallel processing datacenter applications. Jim holds an Electrical and Computer Science Engineering degree from University of California, Santa Barbara, with an emphasis in semiconductor design and computer science; and an MBA from the University of San Diego with an emphasis in entrepreneurship and finance.

2 thoughts on “Batch with Big Data versus Small Data”

Pingback: Ad Hoc Queries with Big Data or Small Data? – Jim Kaskade
Pingback: Real-time Big Data or Small Data? – Jim Kaskade

Comments are closed.

Batch with Big Data versus Small Data

Batch Analytics

Jim Kaskade

How brands should prepare for the convergence of identities and the Internet of Things

Integrity of Things

2015 Big Data Startups

Toys and Big Data

2 thoughts on “Batch with Big Data versus Small Data”

Batch with Big Data versus Small Data

Batch Analytics

Share with your network:

Jim Kaskade

How brands should prepare for the convergence of identities and the Internet of Things

Integrity of Things

2015 Big Data Startups

Toys and Big Data

2 thoughts on “Batch with Big Data versus Small Data”