Batch with Big Data versus Small Data
How do you know whether you are dealing with Big Data or Small Data? I’m constantly asked for my definition of “Big Data”. Well, here it is…for batch analytics, now addressed by technologies such as Hadoop.
|Batch Analytics||Small Data||Big Data|
|Data Volume||Gigabytes||Terabytes – Petabytes|
|Data Velocity||Updated periodically with non-real-time intervals||Updated both in real-time and through bulk timed intervals|
|Data Variety||1-6 structured sources||6+ structured AND 6+ unstructured sources|
|Data Models||Store data without cleaning, transforming, or normalizing.||Store data without cleaning, transforming, and normalizing. Then apply schemas based on application needs.|
|Business Functions||One line of business (e.g. sales)||Several lines of business – to – 360 view|
|Business Intelligence||Queries are complex requiring many concurrent data modifications, a rich breadth of operators, and many selectivity constraints. However, they are applied to a simpler data structure.Response times are in minutes to hours, issued by one or maybe two experts.Example: determine how much profit is made on a given line of parts, broken out by supplier, by geography, by year.|
|Queries are complex requiring many concurrent data modifications, a rich breadth of operators, and many selectivity constraints. Queries span across business functions.Response times are in minutes to hours, issued by a small group of experts. |
Example: determine how much profit is made on a given line of parts, broken out by supplier, by geography, by year; and then determining which customers purchased the higher profit parts, by geography, by year; determining the profile of those high-profit customers; finding out what products purchased by high-profit customers were NOT purchased by other similar customers in order to cross-sell / up-sell.
Want to see my view on Ad Hoc and Interactive Analytics? Go here.
Want to see my view on Real-Time Analytics? Go here.
Here are a few other products in this space: