Real-time Big Data or Small Data?
Have you heard of products like IBM’s InfoSphere Streams, Tibco’s Event Processing product, or Oracle’s CEP product? All good examples of commercially available stream processing technologies which help you process events in real-time.
I’ve been asked what I consider as “Big Data” versus “Small Data” in this domain. Here’s my view.
Real-Time Analytics | Small Data | Big Data |
Data Volume | None | None |
Data Velocity | 100K events / day (<<1K events / second) | Billion+ events / day (>>1K events / second) |
Data Variety | 1-6 structured sources AND 1 single destination (an output file, a SQL database, a BI tool) | 6+ structured and 6+ unstructured sources AND many destinations (a custom application, a BI tool, several SQL databases, NoSQL databases, Hadoop) |
Data Models | Used for “transport” mainly. Little to no ETL, in-stream analytics, or complex event processing performed. | Transport is the foundation. However, distributed ETL, linearly scalable in-memory and in-stream analytics are applied, and complex event processing is the norm. |
Business Functions | One line of business (e.g. financial trading) | Several lines of business – to – 360 view |
Business Intelligence | No queries are performed against the data in motion. This is simply a mechanism for transporting transaction or event from the source to a database.Transport times are <1 second.Example: connect to desktop trading applications and transport trade events to an Oracle database. | ETL, sophisticated algorithms, complex business logic, and even queries can be applied to the stream of events as they are in motion. Analytics span across all data sources and, thus, all business functions.Transport and analytics occur in < 1 second.Example: connect to desktop trading applications, market data feeds, social media, and provide instantaneous trending reports. Allow traders to subscribe to information pertinent to their trades and have analytics applied in real-time for personalized reporting. |
Want to see my view of Batch Analytics? Go Here.
Want to see my view of Ad Hoc Analytics? Go Here.
Here are a few other products in this space:
- Infochimps Cloud::Streams
- S4
- AccelOps
- Storm
- HStreaming
- Streambase
- SQLStream
- OpenCQ
- NiagaraCQ
- TelegraphCQ
- Rapide
- Gemfire
- DistCEP
- CEDR
- Cayuga
- Raced
- Sase+
- Amit
- TESLA/T-Rex
- Progress Apama
- Aleri/Coral8
Missing in the CEP space are important players, in any order
Drools, Esper, WSO2, TIbco