Real-time Big Data or Small Data?

big_little_bird

Have you heard of products like IBM’s InfoSphere Streams, Tibco’s Event Processing product, or Oracle’s CEP product? All good examples of commercially available stream processing technologies which help you process events in real-time.

I’ve been asked what I consider as “Big Data” versus “Small Data” in this domain. Here’s my view.

Real-Time AnalyticsSmall DataBig Data
Data VolumeNoneNone
Data Velocity100K events / day (<<1K events / second)Billion+ events / day (>>1K events / second)
Data Variety1-6 structured sources AND 1 single destination (an output file, a SQL database, a BI tool)6+ structured and 6+ unstructured sources AND many destinations (a custom application, a BI tool, several SQL databases, NoSQL databases, Hadoop)
Data ModelsUsed for “transport” mainly. Little to no ETL, in-stream analytics, or complex event processing performed.Transport is the foundation. However, distributed ETL, linearly scalable in-memory and in-stream analytics are applied, and complex event processing is the norm.
Business FunctionsOne line of business (e.g. financial trading)Several lines of business – to – 360 view
Business IntelligenceNo queries are performed against the data in motion. This is simply a mechanism for transporting transaction or event from the source to a database.Transport times are <1 second.Example: connect to desktop trading applications and transport trade events to an Oracle database.ETL, sophisticated algorithms, complex business logic, and even queries can be applied to the stream of events as they are in motion.  Analytics span across all data sources and, thus, all business functions.Transport and analytics occur in < 1 second.Example: connect to desktop trading applications, market data feeds, social media, and provide instantaneous trending reports. Allow traders to subscribe to information pertinent to their trades and have analytics applied in real-time for personalized reporting.

Want to see my view of Batch Analytics? Go Here.

Want to see my view of Ad Hoc Analytics? Go Here.

Here are a few other products in this space:

 

Jim Kaskade

Jim Kaskade is a serial entrepreneur & enterprise software executive of over 35 years. He recently successfully exited a PE-backed SaaS company, Janrain, in the digital identity security space. He started his career engineering massively parallel processing datacenter applications. Prior to identity, he led a digital application business of over 7,000 people ($1B). Prior to that he led a big data & analytics business of over 1,000 ($250M). He was the CEO of a Big Data Cloud company ($50M); was an EIR at PARC (the Bell Labs of Silicon Valley) which resulted in a spinout of an AML AI company; led two separate private cloud software startups; founded of one of the most advanced digital video SaaS companies delivering online and wireless solutions to over 10,000 enterprises; and was involved with three semiconductor startups (two of which he founded, one of which he sold). Jim has an Electrical and Computer Science Engineering degree from University of California, Santa Barbara, with an emphasis in semiconductor design and computer science; and an MBA from the University of San Diego with an emphasis in entrepreneurship and finance.

3 thoughts on “Real-time Big Data or Small Data?

Leave a Reply