Skip to content

IoT is the new Big Data?


My favorite writer, Gil Press, sums it up with, “It’s Official: The Internet Of Things Takes Over Big Data As The Most Hyped Technology” where he talks about how Gartner released its latest Hype Cycle for Emerging Technologies, and how big data has moved down the “trough of disillusionment,” replaced by the Internet of Things at the top of the hype cycle.

The term Internet of Things was coined by the British technologist Kevin Ashton in 1999, to describe a system where the Internet is connected to the physical world via ubiquitous sensors.

Today, the huge amounts of data we are producing and the advances in mobile technologies are bringing the idea of Internet connected devices into our homes and daily lives.

Definition of IoT Has Expanded

Internet of Things was a popular concept dating far back to articles like  Scientific American in 2004. RFID and sensor technology enabling computers to observe, identify and understand the world—without the limitations of human-entered data.

However, I think people took it beyond the capture of “physical” events/data. I think Kevin Ashton envisioned a network of things that originally was wholly dependent on human beings for information, and then expanded to involve anything that touched a person (physical or not), passing from machine to machine.

Capturing the behavior of people will require a much broader collection of data beyond just sensor technology…beyond the “physical” – whether that is web server clickstream data, e-commerce transaction data, customer service call logs, search logs, video surveillance,  documents, etc. There is much more than “physical” or sensor-only data that involves the customer.

To truly begin understanding the behavior of people, you need to capture data from any touch point, gaining a holistic view of that person. Gaining a 360 degree of your customers, or a 360 degree view of your business by leveraging an environment of structured and unstructured data that can be analyzed….M2M (Machine to Machine) and/or IoT (Internet of Things) involving physical devices becomes a subset of the data sources available to such a project.

Is IoT a subset of Big Data or Visa Versa?

I was talking to the head of Big Data & Analytics at SAP (a peer to CSC’s Big Data & Analytics), David Parker, regarding IOT vs. Big Data. Their management has established a new IOT business unit, which I guaranteed David, would be addressing similar business use cases as his Big Data team at the end of the day.

Screen Shot 2014-08-18 at 1.59.46 PM

Last year Mukul Krishna, from Frost & Sullivan, presented a simple incremental view of how IoT feeds Big Data which then feeds a broader analytic platform. Think of IoT as a bunch of customized data sources (typically machines and sensors) leveraging customized collectors that feed a comprehensive platform (e.g. Hadoop vendors like Cloudera and Hortonworks) which, in turn, allow us to feed downstream analytic, BI, and visualization platforms.

Are Sensors the Core of IoT?

A sensor is technically any device which converts one form of energy to another form, the end usually being an electrical form mainly for measurement, control or monitoring purposes.

Take a typical temperature sensor like a gas pressure based tube sensor which expands or contracts to convert the temperature into a mechanical motion which can be displayed, recorded or used for control as required. Translation….I just described a thermostat as used in a refrigerator.

The raw electrical signal from a physical sensor is usually in analog form, and can be conveniently processed further and displayed on a meter or other suitable indication device or recorded on paper or other media such as magnetic tape or more advanced digital systems as required.

The sensor is typically classified as per its application and there could be many different types of sensors, with their own inherent advantages or disadvantages for a particular application. Putting it simply, the sensor generates an output which can be conveniently displayed, recorded or used to control or monitor the application at the point where the sensor is installed.

What’s so special about sensors? You can translate the analog physical world into a digital computer world. We convert the sensor’s analog signals into digital signals so that the computer can be able to read it, and then we feed that with other digital signals into a Big Data platform.

“Technologies that operate upon the physical world are a key component of the digital business opportunity.” as described by Gartner. Many of these “physical sensor technologies” may be new to IT, but they are expected to be high impact and even transformational.

I think IoT requires a lot of talent on the many types of physical sensors and how they are ultimately converted into a form that the emerging Big Data platforms can consume and analyze.

IoT needs a Big Data Platform

Getting your plants or your fridge to talk to you through sensors is one thing. Getting your plants to talk to your heating system is quite another. As we map the spread of IoT, it starts to get more complicated and barriers appear with a centralized big data analytics platform, or lack thereof, likely to halt progress.

Jeff Hagins Founder and CTO of SmartThings, described the data platform he has been working on that should help expand IoT and help product designers work out new ways of connecting machines and people.

He believes that the Internet of Things has to be built on a platform that is easy, intelligent and open. I argue that the evolving Big Data platforms introduced by the web scale giants like Google, Yahoo!, Linkedin, Twitter, Facebook and the like will become a standard for IoT-based applications….and IoT is just that, a set of specialized sensor connectors or sources coupled with a Big Data platform that enables a new generation of applications.

The blurring the physical and virtual worlds are strong concepts in this point. Physical assets become digitalized and become equal factors in understanding and managing the business value chain alongside already-digital entities, such as enterprise data warehouses, emerging big data systems and next-generation applications.

What do you think?

A few companies in the IoT space:

A few interesting videos to watch

Posted in Data.

Tagged with , , , , .

Avoid the Spiral


I ventured out of the “big company” environment back in 1998. It was 15 years later when I found myself back in the big company environment – a $13B revenue company after my startup was acquired.

As part of an executive team of 18 at a publicly-traded company, the environment could be considered a lot different when compared to any of the eight startups I was involved with prior. However, the reality is that it is not.

A good company environment is made up of the same factors, regardless of company size.

Even though I find it challenging these days to make time to reflect on the changes I need to make to improve my own leadership….I know that if I don’t, I will fall into a trap…a spiral.

Creating a “good” company environment, or in my case, a good business unit environment, may not be that important when things are going well.

When things are going well, the staff is excited to be working at the company because:

  • Their career path is wide open with lots of interesting jobs that naturally open up.
  • You, your peers, your family, and even your friends all think you are lucky for choosing and being a part of such a success.
  • Your resume is getting stronger by working at a company during its boom period.
  • It’s most likely lucrative with variable compensation plans paying off, bonuses being given, equity growing in value.

However, when things are challenging…and your business is struggling…all those reasons become reasons to leave.

The only thing that keeps an employee at a company when things are challenging is that people actually like their job.

Having worked and led staff through the toughest times of company life and death – including things like working for free, working long days and nights, working on weekends – I know what can be asked of a team when times are tough. But no team is going to respond to your requests of sacrifice for long, if they are working in a bad company environment.

In bad company environments, good employees disappear. In highly competitive and quick to change technology companies, disappearing talent starts the spiral.

When your company’s most important assets leave (your top performers), the company struggles to hit its numbers; it tries to backfill its core talent but can’t recruit it fast enough; it misses its milestones; declines in value; loses more of its key employees.

Spirals are extremely difficult to reverse.

So, yes, creating a “good” company environment isn’t that important when things are going well….but it sure the hell IS important when things go wrong.

…and things always go wrong.

I personally come to work every day because of the people first…..then the adrenalin fix I get from the business sector I’m in….and, finally, the technologies and products we can produce to disrupt the market….in that order.

Staying away from the Spiral

In great organizations, people focus on their work and they have almost a tribal confidence that they can get their job done. Good things happen for both the company and them personally.

You come to work each day knowing that your work can make a difference for the organization as well as yourself….motivating you and fulfilling your needs to support the sacrifice – the long hours, the missed kid birthday parties, and the canceled date nights with your spouse.

In poor organizations, people spend much of their time fighting organizational boundaries and broken processes. They are not clear on what their jobs are, so there is no way to know if they are getting the job done properly or not.

In some cases, because of pure will, your star performers will work ridiculous hours and deliver on their promises…but they will have little idea what it actually means for the company or for their careers.

To make it worse, when your star performers voice how screwed up the company situation is, management still denies the problem, defends the status quo, and, frankly, ignores the fact that they are dealing with people…not just quarterly goals, revenue targets, and operating income…again, only something you see in a poor company environment.

So how does one create a “good company environment”. For me it’s as simple as breathing air….it comes down to “telling it like it is” with: 1) transparency, and 2) strong communication….and I don’t mean detailing quarterly results with internal webex all-hands.

I like to personally “go out on a limb” by exposing the truths….by being personally vulnerable….ultimately, leading the team in a way that establishes a level of trust. I do this with a healthy dose of transparency and communication.

And in order to get the level of trust I need, one can’t emphasize the level to which you have to provide transparency and communication… a leader, you’ll feel uncomfortable with it when you are approaching the level that truely earns trust.

I have many techniques that I use to empower, not only my senior team, but my entire organization with the proper communication patterns only found in any good company environment….and needed to weather personal and professional storms.

Here are a few examples as I reflect over the past 15 years:

  • 1998: 50% marketshare erosion in one year due to changes in the customer ecosystem
  • 1999: A dysfunctional leadership team that can’t run the business
  • 2000: Dot com bust leading to a 2x increase in sales cycles
  • 2001: 911 requiring a 50% staff reduction
  • 2002: Your lead customer cancels their largest product line, on which you have bet the whole company
  • 2003: You enter into a patent war with your largest customer
  • 2004: Your leading acquisition transaction falls through with only months of capital left
  • 2005: Your co-founder and CTO loses his 1 and 3 year old sons in a car accident
  • 2006: A disruptive player enters the market, fundamentally changing the landscape
  • 2007: Your services go offline, crippling your top ten customers and impacting their businesses significantly
  • 2008: An acquisition candidate does an “end-around” going after your key engineering talent
  • 2009: Your co-founder and CTO gets “cold feet” and can’t commit prior to securing your next critical round of financing
  • 2010: The board of directors pulls funding right after you hired the “A-Team” & begin to ramp sales
  • 2011: Your product launch is significantly delayed (12 months) due to a fatal flaw in the technology
  • 2012: You realize that your original product that was 3 years in the making has to be thrown away
  • 2013: Investors back out with only 4 weeks of cash flow left

During this time, my father had heart surgery, my mother was diagnosed with leukemia, my wife had postpartum depression, my oldest son was diagnosed with dyslexia, I was diagnosed with a blockage in my left coronary artery*, we had to sell our perfect home in order to keep the startup funds coming…and the personal list goes on.

My question for you:

Do you have the type of company environment where you have the support needed to weather any business or personal storm?

*False positive, thank god.

Posted in Leadership.

Tagged with , , , , , , , , .

Leadership Means Sacrificing

Screen Shot 2014-04-24 at 3.55.53 PM

What do Special Forces, Army Rangers, Navy SEALs, and Marines all have in common?

Teams like these go through what is considered by some to be the toughest military training in the world.

They also encounter obstacles that develop and test their stamina, leadership and ability to work as a team like no other.

I was talking recently to a colleague of mine about some our own own leadership at work. Emotions were strong. Deep sighs punctuated every other sentence.

We’re going through a business transformation, and as with most company turn-arounds, there is a strong conflict between the “old” and the “new”. This means old vs. new target markets, old vs. new business processes,  old vs. new people…and, at the core of most issues, old vs. new culture.

This colleague is part of the “new team”, chartered to help create change.

“I struggle with some of the leadership.”  he said, which reflected a general theme throughout the conversation.

This reminded me of the book, Fearless, the story of Adam Brown, a Navy SEAL, who sacrificed his life during the hunt for Osama Bin Laden.

Strange to think about the military when talking about business, since these two worlds couldn’t be the furthest from the other….or are they?

What Kind of Leadership Would You Prefer?

When Navy SEAL Adam Brown woke up on March 17, 2010, he didn’t know that he would die that night in the Hindu Kush Mountains of Afghanistan.

Who risks their lives for others so that they may survive? Heroes like Adam Brown do. You’ll find that military personnel are trained to risk their lives for others so that they may survive.

Would you want to be a part of a team with people who are willing to sacrifice themselves so that others like you may gain? Who wouldn’t?

In business, unfortunately, we give bonuses to employees who are willing to sacrifice others so that the business may gain.

I don’t know about you, but most people that I know want to work for an organization in which you have ABSOLUTE CONFIDENCE that others in the organization would sacrifice…so that YOU can gain…not them, not the business.

And guess what? The leadership and the business end up gaining in the end….because they have a workforce that doesn’t waste its time always looking over its shoulder, wondering what is going to happen next.

A Winning Culture

In my work to create high performing teams, I look for the type of business collegaues who are more like Adam Brown…the ones who sacrifice for the good of the team, not themselves. We want people who value this. This isn’t negotiable.

I want the team to know that I will GO OUT OF MY WAY to improve their well-being….that I care more about their success than my own. It’s not bullshit. Just ask anyone who has been part of a high-performing team….and you’ll probably hear the same.

” I care more about their success than my own”

Why? Because their success is our success. It’s that simple.

A winning culture is one where you have a team of people who are interested in improving each other…sacrificing their own interests in order to help the other.

In the end, you are NEVER looking over your shoulder…you are NEVER wasting energy trying to understand the mission. You’re focused, and you execute.

That’s a winning culture…a winning team…that’s leadership.

My collegue and I regained our enthusiasm as we reflected on our similar views. His last words still echoing in my head…

“One team, one fight. Unity is what brings the necessary efficiencies to fight effectively. Lack of unity creates unnecessary distractions from the objective at hand.”

Feb 2015 Update: See Simon Sinek’s video on “Why good leaders make you feel safe.” Not directly related, but similar use of military to emphasize a point.

Posted in Leadership.

Tagged with , , , , , , , , , , , .

Big Data Top Ten


Screen Shot 2013-12-20 at 7.52.56 AM

What do you get when you combine Big Data technologies….like Pig and Hive? A flying pig?

No, you get a “Logical Data Warehouse”.

My general prediction is that Cloudera and Hortonworks are both aggressively moving to fulfilling a vision which looks a lot like Gartner’s “Logical Data Warehouse”….namely, “the next-generation data warehouse that improves agility, enables innovation and responds more efficiently to changing business requirements.”

In 2012, Infochimps (now CSC) leveraged its early use of stream processing, NoSQLs, and Hadoop to create a design pattern which combined real-time, ad-hoc, and batch analytics. This concept of combining the best-in-breed Big Data technologies will continue to advance across the industry until the entire legacy (and proprietary) data infrastructure stack will be replaced with a new (and open) one.

As this is happening, I predict that the following 10 Big Data events will occur in 2014.

1. Consolidation of NoSQLs begins

A few projects have strong commercialization companies backing them. These are companies who have reached “critical mass”, including Datastax with Cassandra, 10gen with MongoDB, and Couchbase with CouchDB.  Leading open source projects, like these, will pull further and further away from the pack of 150+ other NoSQLs, who are either fighting for the same value propositions (with a lot less traction) or solving small niche use-cases (and markets).

2. The Hadoop Clone wars end

The industry will begin standardizing on two distributions. Everyone else will become less relevant (It’s Intel vs. AMD. Lets not forget the other x86 vendors like IBM, UMC, NEC, NexGen, National, Cyrix, IDT, Rise, and Transmeta). If you are a Hadoop vendor, you’re either the Intel or AMD. Otherwise, you better be acquired or get out of the business by end of 2014.

3. Open source business model is acknowledged by Wall Street

Because the open source, scale-out, commodity approach to Big Data is fundamental to the new breed of Big Data technologies, open source now becomes a clear antithesis of the proprietary, scale-up, our-hardware-only, take-it-or-leave-it solutions. Unfortunately, the promises of international expansion, improved traction from sales force expansion, new products and alliances, will all fall on deaf ears of Wall Street analysts. Time to short the platform RDBMS and Enterprise Data Warehouse stocks.

4. Big Data and Cloud really means private cloud

Many claimed that 2013 was the “year of Big Data in the Cloud”. However, what really happened is that the Global 2000 immediately began their bare metal projects under tight control. Now that those projects are underway, 2014 will exhibit the next phase of Big Data on virtualized platforms. Open source projects like Serengeti for VSphere; Savanna for OpenStack; Ironfan for AWS, OpenStack, and VMware combined, or venture-backed and proprietary solutions like Bluedata will enable virtualized Big Data private clouds.

5. 2014 starts the era of analytic applications

Enterprises become savvy to the new reference architecture of combined legacy and new generation IT data infrastructure. Now it’s time to develop a new generation of applications that take advantage of both to solve business problems. System Integrators will shift resources, hire data scientists, and guide enterprises in their development of data-driven applications. This, of course, realizes the concepts like the 360 degree view, Internet of things, and marketing to one.

6. Search-based business intelligence tools will become the norm with Big Data

Having a “Google-like” interface that allows users to explore structured and unstructured data with little formal training is the where the new generation is going. Just look at Splunk for searching machine data. Imagine a marketer being able to simply “Google Search” for insights on their customers?

7. Real-time in-memory analytics, complex event processing, and ETL combine

The days of ETL in its pure form are numbered. It’s either ‘E’, then ‘L’, then ‘T’ with Hadoop, or it’s EAL (extract, apply analytics, and load) with new real-time stream-processing frameworks. Now that high-speed social data streams are the norm, so are processing frameworks that combine streaming data with micro-batch and batch data, performing complex processors on that data and feeding applications in sub-second response times.

8. Prescriptive analytics become more mainstream

After descriptive and predictive, comes prescriptive. Prescriptive analytics automatically synthesizes big data, multiple disciplines of mathematical sciences and computational sciences, and business rules, to make predictions and then suggests decision options to take advantage of the predictions. We will begin seeing powerful use-cases of this in 2014. Business users want to be recommended specific courses of action and to be shown the likely outcome of each decision.

9. MDM will provide the dimensions for big data facts

With Big Data, master data management will now cover both internal data that the organization has been managing over years (like customer, product and supplier data) as well as Big Data that is flowing into the organization from external sources (like social media, third party data, web-log data) and from internal data sources (such as unstructured content in documents and email). MDM will support polyglot persistence.

 10. Security in Big Data won’t be a big issue

Peter Sondergaard, Gartner’s senior vice president of research, will say that when it comes to big data and security that “You should anticipate events and headlines that continuously raise public awareness and create fear.” I’m not dismissing the fact that with MORE data comes  more responsibilities, and perhaps liabilities, for those that harbor the data. However, in terms of the infrastructure security itself, I believe 2014 will end with a clear understanding of how to apply those familiar best-practicies to your new Big Data platform including trusted Kerberos, LDAP integration, Active Directory integration, encryption, and overall policy administration.

Posted in Data.

Tagged with , , , , , , , , , , , , , .

SAP & Big Data


SAP customers are confused about the positioning between SAP Sybase IQ and SAP Hana as it applies to data warehousing. Go figure, so is SAP. You want to learn about their data warehousing offering, and all you hear is “Hana this” and “Hana that”.

It reminds me of the time after I left Teradata when the BI appliances came on the scene. First Netezza, then Greenplum, then Vertica and Aster Data, then ParAccel. Everyone was confused about what the BI appliance was in relation to the EDW. Do I need an EDW, a BI appliance, an EDW + BI appliance?

With SAP, Sybase IQ is supposed to be the data warehouse and Hana is the BI or analytic appliance that sits off to its side. Ok. SAP has a few customers on Sybase IQ, but are they the larger well-known brands? Lets face it….since its acquisition of Sybase in 2010, SAP has struggled with positioning it against incumbents like Teradata, IBM, and even Oracle.

SAP Roadmap


SAP’s move from exploiting it’s leadership position in enterprise ERP to exploring the new BI appliance and Big Data markets has been impressive IMHO. With acquisitions of EDW and RDBMS company, Sybase, in 2010 after earlier acquisition of BI leader, Business Objects, in 2007 was necessary to be relevant in the race to providing an end-to-end data infrastructure story. This was; however, a period of “catch-up” or “late entry” to the race.

The beginning of its true exploration began with SAP Hana and now strategic partnership with Hadoop commercialization company, Hortonworks. The ability to rise ahead of Data Warehouse and database management system leaders will require defining a new Gartner quadrant – the Big Data quadrant.

SAP Product Positioning

SAP_Product_PositioningLets look back in time at SAP’s early positioning. We have the core ERP business, the new “business warehouse” business, and the soon to be launched Hana business. The SAP data warehouse equation is essentially = Business Objects + Sybase IQ + Hana. Positioning Hana, as with most data warehouse vendors, is a struggle since it can be positioned as a data mart within larger footprints, or as THE EDW database altogether in smaller accounts. One would think that with proper guidelines, this positioning would be straightforward. But there is more than database size, and complexity of queries, but a very challenging variable of customer organizational requirements and politics that play into platform choice. As shown above, you can tell that SAP struggled with simplifying its message for its sales teams early on.

SAP Hana – More than a BI Appliance

SAP released its first version of their in-memory platform, SAP HANA 1.0 SP02, to the market on June 21st 2011. It was (and is) based on an acquired technology from Transact In Memory, a company that had developed a memory-centric relational database positioned for “real-time acquisition and analysis of update-intensive stream workloads such as sensor data streams in manufacturing, intelligence and defense; market data streams in financial services; call detail record streams in Telco; and item-level RFID tracking.” Sound familiar to our Big Data use-cases today?

As with most BI appliances back then, customers spent about $150k for a basic 1TB configuration (SAP partnered with Dell) for the hardware only – add software and installation services and we were looking at $300K, minimally, as the entry point. SAP started off with either a BI appliance (HANA 1.0) or a BW Data Warehouse appliance (HANA 1.0 SP03). Both of these using the SAP IMDB Database Technology (SAP HANA Database) as their underlying RDBMS.

BI Appliances come with analytics, of course


When SAP first started marketing their Hana analytics, you were promised a suite of sophisticated analytics as part of their Predictive Analysis Library (PAL) which can be called directly in a “L wrapper” within an SQL Script. The inputs and outputs are all tables. PAL includes seven well known predictive analysis algorithms in several data mining algorithm categories:

  • Cluster analysis (K-means)
  • Classification analysis (C4.5 Decision Tree, K-nearest Neighbor, Multiple Linear Regression, ABC Classification)
  • Association analysis (Apriori)
  • Time Series (Moving Average)
  • Other (Weighted Score Table Calculation)

HANA’s main use case started with a focus around its installed base with a real-time in-memory data mart for analyzing data from SAP ERP systems. For example, profitability analysis (CO-PA) is one of the most commonly used capabilities within SAP ERP. The CO-PA Accelerator allows significantly faster processing of complex allocations and basically instantaneous ad hoc profitability queries. It belongs to accelerator-type usage scenarios in which SAP HANA becomes a secondary database for SAP products such as SAP ERP. This means SAP ERP data is replicated from SAP ERP into SAP HANA in real time for secondary storage.

BI Appliances are only as good as the application suite

Other use-cases for Hana include:

  • Profitability reporting and forecasting,
  • Retail merchandizing and supply-chain optimization,
  • Security and fraud detection,
  • Energy use monitoring and optimization, and,
  • Telecommunications network monitoring and optimization.

Applications developed on the platform include:

  • SAP COPA Accelerator
  • SAP Smart Meter Analytics
  • SAP Business Objects Strategic Workforce Planning
  • SAP SCM Sales and Operations Planning
  • SAP SCM Demand Signal Management

Most opportunities were initially “accelerators” with its in-memory performance improvements.

Aggregate real-time data sources

There are two main mechanisms that HANA supports for near-real-time data loads. First is the Sybase Replication Server (SRS), which works with SAP or non-SAP source systems running on Microsoft, IBM or Oracle databases. This was expected to be the most common mechanism for SAP data sources. There used to be some license challenges around replicating data out of Microsoft and Oracle databases, depending on how you license the database layer of SAP. I’ve been out of touch on whether these have been fully addressed.

SAP has a second choice of replication mechanism called System Landscape Transformation (SLT). SLT is also near-real-time and works from a trigger from within the SAP Business Suite products. This is both database-independent and pretty clever, because it allows for application-layer transformations and therefore greater flexibility than the SRS model. Note that SLT may only work with SAP source systems.

High-performance in-memory performance

HANA stores information in electronic memory, which is 50x faster (depending on how you calculate) than disk. HANA stores a copy on magnetic disk, in case of power failure or the like. In addition, most SAP systems have the database on one system and a calculation engine on another, and they pass information between them. With HANA, this all happens within the same machine.

 Why Hadoop?

SAP HANA is not a platform for loading, processing, and analyzing huge volumes – petabytes or more – of unstructured data, commonly referred to as big data. Therefore, HANA is not suited for social networking and social media data analytics. For such uses cases, enterprises are better off looking to open-source big-data approaches such as Apache Hadoop, or even MPP-based next generation data warehousing appliances like Pivotal Greenplum or similar.

SAP’s partnership with Hortonworks enables the ability to migrate data between HANA and Hadoop platforms. The basic idea is to treat Hadoop systems as an inexpensive repository of tier 2 and tier 3 data that can be, in turn, processed and analyzed at high speeds on the HANA platform. This is a typical design pattern between Hadoop and any BI appliance (SMP or MPP).

Screen Shot 2013-11-30 at 7.26.13 AM

SAP “Big Data White Space”?

Where do SAP customers need support? Where is the “Big Data White Space?”. SAP seems to think that persuading customers to run core ERP applications on HANA is all that matters. Are customer responding? Answer – not really.

Customers are saying they’re not planning to use it, with most of them citing high costs and a lack of clear benefit (aka use-case) behind their decision. Even analysts are advising against it - Forrester research said the HANA strategy is “understandable but not appealing”.

“If it’s about speeding up reporting of what’s just happened, I’ve got you, that’s all cool, but it’s not helping me process more widgets faster.”, SAP Customer.

SAP is betting its future on HANA + SaaS. However, what is working in SAP’s favor for the moment is the high level of commitment among existing (european) customers to on-premise software.

This is where the “white space” comes in. Bundling a core suite of well-designed business discovery services around the SAP solution-set will allow customers to feel like they are being listened to first, and sold technology second.

Understanding how to increase REVENUE with new greenfield applications around unstructured data that leverages the structured data from ERP systems can be a powerful opportunity. This means architecting a balance of historic “what happened”, real-time “what is currently happening”, and a combined “what will happen IF” all together into a single data symphony. Hana can be leveraged for more ad-hoc analytics on the combined historic and real-time data for business analysts to explore, rather than just be a report accelerator.

This will require:

  • Sophisticated business consulting services: to support uncovering the true revenue upside
  • Advanced data science services: to support building a new suite of algorithms on a combined real-time and historic analytics framework
  • Platform architecture services: to support the combination of open source ecosystem technologies with SAP legacy infrastructure

This isn’t rocket science. It just takes a focused tactical execution, leading with business cases first. The SAP-enabled Bid Data system can then be further optimized with cloud delivery as a cost reducer and time-to-value enhancer, along with a further focus around application development. Therefore, other white space includes:

  • Cloud delivery
  • Big Data application development

SAP must keep its traditional customers and SI partners (like CSC) engaged with “add-ons” to its core business applications with incentives for investing in HANA, while at the same time evolving its offerings for line of business buyers.

Some think that SAP can change the game by reaching/selling to marketers with new analytics offerings (e.g. see SAP & KXEN), enhanced mobile capabilities, ecosystem of start-ups, and a potential to incorporate its social/collaboration and e-commerce capabilities into one integrated offering for digital marketers and merchandisers.

Is a path to define a stronger CRM vision for marketers? It won’t be able to without credible SI partners who have experience with new media, digital agencies and specialty service providers who are defining the next wave of content- and data-driven campaigns and customer experiences.

Do you agree?

Posted in Data.

Tagged with , , , , , , , .

Infochimps, a CSC Company = Big Data Made Better


What’s a $15B powerhouse in information technology (IT) and professional services doing with an open source based Big Data startup?

It starts with “Generation-OS”. We’re not talking about Gen-Y or Gen-Z. We’re talking Generation ‘Open Source’.

Massive disruption is occurring in information technology as businesses are building upon and around recent advances in analytics, cloud computing and storage, and an omni-channel experience across all connected devices. However, traditional paradigms in software development are not supporting the accelerating rate of change in mobile, web, and social experiences. This is where open source is fueling the most disruptive period in information technology since the move from the mainframe to client-server – Generation Open Source.

Infochimps = Open Standards based Big Data

Infochimps delivers Big Data systems with unprecedented speed, scale and flexibility to enterprise companies. (And when we say “enterprise companies,” we mean the Global 2000 – a market in which CSC has proven their success.) By joining forces with CSC, we together will deliver one of the most powerful analytic platforms to the enterprise in an unprecedented amount of time.

At the core of Infochimps’ DNA is our unique, open source-based Big Data and cloud expertise. Infochimps was founded by data scientists, cloud computing, and open source experts, who have built three critical analytic services required by virtually all next-generation enterprise applications: real-time data processing and analytics, batch analytics, and ad hoc analytics – all for actionable insights, and all powered by open-standards.

CSC = IT Delivery and Profession Services

When CSC begins to insert the Infochimps DNA into its global staff of 90,00 employees, focused on bringing Big Data to a broad enterprise customer base, powerful things are bound to happen. Infochimps Inc., with offices in both Austin, TX and Silicon Valley, becomes a wholly-owned subsidiary, reporting into CSC’s Big Data and Analytics business unit.

The Infochimps’ Big Data team and culture will remain intact, as CSC leverages our bold, nimble approach as a force multiplier in driving new client experiences and thought leadership. Infochimps will remain under its existing leadership, with a focus on continuous and collaborative innovation across CSC offerings.

I regularly coach F2K executives on the important topic of “splicing Big Data DNA” into their organizations. We now have the opportunity to practice what we’ve been preaching, by splicing the Infochimps DNA into the CSC organization, acting as a change agent, and ultimately accelerating CSC’s development of its data services platform.

Infochimps + CSC = Big Data Made Better

I laugh many times when we’re knocking on the doors of Fortune 100 CEOs.

“There’s a ‘monkey company’ at the door.”

The Big Data industry seems to be built on animal-based brands like the Hadoop Elephant. So to keep running with the animal theme, I’ve been asking C-levels the following question when they inquire about how to create their own Big Data expertise internally:

“If you want to create a creature that can breathe underwater and fly, would it be more feasible to insert the genes for gills into a seagull, or splice the genes for wings into a herring?”

In other words, do you insert Big Data DNA into the business savvy with simplified Big Data tools, or insert business DNA into your Big Data-savvy IT organization? In the case of CSC and Infochimps, I doubt that Mike Lawrie, CSC CEO, wants to be associated with either a seagull or a herring, but I do know he and his senior team is executing on a key strategy to become the thought leader in next-generation technology starting with Big Data and cloud.

Regardless of your preference for animals (chimpanzees, elephants, birds, or fish), the CSC and Infochimps combination speaks very well to CSC’s strategy for future growth with Big Data, cloud, and open source. Infochimps can now leverage CSC’s enterprise client base, industrialized sales and marketing, solutions development and production resources to scale our value proposition in the marketplace.

“Infochimps, a CSC company, is at the door.”

 Jim Kaskade
Infochimps, a CSC Company

Posted in Cloud, Data.

Tagged with , , , , .

Real-time Big Data or Small Data?


Have you heard of products like IBM’s InfoSphere Streams, Tibco’s Event Processing product, or Oracle’s CEP product? All good examples of commercially available stream processing technologies which help you process events in real-time.

I’ve been asked what I consider as “Big Data” versus “Small Data” in this domain. Here’s my view.

Real-Time Analytics Small Data Big Data
Data Volume None None
Data Velocity 100K events / day (<<1K events / second) Billion+ events / day (>>1K events / second)
Data Variety 1-6 structured sources AND 1 single destination (an output file, a SQL database, a BI tool) 6+ structured and 6+ unstructured sources AND many destinations (a custom application, a BI tool, several SQL databases, NoSQL databases, Hadoop)
Data Models Used for “transport” mainly. Little to no ETL, in-stream analytics, or complex event processing performed. Transport is the foundation. However, distributed ETL, linearly scalable in-memory and in-stream analytics are applied, and complex event processing is the norm.
Business Functions One line of business (e.g. financial trading) Several lines of business – to – 360 view
Business Intelligence No queries are performed against the data in motion. This is simply a mechanism for transporting transaction or event from the source to a database.Transport times are <1 second.Example: connect to desktop trading applications and transport trade events to an Oracle database. ETL, sophisticated algorithms, complex business logic, and even queries can be applied to the stream of events as they are in motion.  Analytics span across all data sources and, thus, all business functions.Transport and analytics occur in < 1 second.Example: connect to desktop trading applications, market data feeds, social media, and provide instantaneous trending reports. Allow traders to subscribe to information pertinent to their trades and have analytics applied in real-time for personalized reporting.

Want to see my view of Batch Analytics? Go Here.

Want to see my view of Ad Hoc Analytics? Go Here.

Here are a few other products in this space:


Posted in Data.

Tagged with , , , , , , , , , .

Ad Hoc Queries with Big Data or Small Data?


Do you think that you’re working with “Big Data”? or is it “Small Data”? If you’re asking ad hoc questions of your data, you’ll probably need something that supports “query-response” performance or, in other words, “near real-time”. We’re not talking about batch analytics, but more interactive / iterative analytics. Think NoSQL, or “near real-time Hadoop” with technologies like Impala. Here’s my view of Big versus Small with ad hoc analytics in either case.

Ad Hoc Analytics Small Data Big Data
Data Volume Megabytes – Gigabytes Terabytes (1-100TB)
Data Velocity Update in near real-time (seconds) Update in real-time (milliseconds)
Data Variety 1-6 structured data sources 6+ structured AND 6+ unstructured data sources
Data Models Aggregations with tens of tables Aggregations with up to 100s – 1000s of tables
Business Functions One line of business (e.g. sales) Several lines of business – to – 360 view
Business Intelligence Queries are simple, regarding basic transactional summaries/reports.Response times are in seconds across a handful of business analysts. 


Example: retrieve a customer’s profile and summarize their overall standing based on current market values for all assets.


This is representative of the work performed when a business asks the question “What is my customer worth today?”


The transaction is a read-only transaction. Questions vary based on what business analyst needs to know interactively.

Queries can be as complex as with batch analytics, but generally are still read-only and processed against aggregates. Queries span across business functions.Response times are in seconds across large numbers of business analysts.Example: retrieve a customer profile and summarize activities across all customer-touch points, calculating “Life-Time-Value” based on past & current activities.

This is representative of the work performed when a business asks the question “Who are my most profitable customers?”


Questions vary based on what business analyst needs to know interactively.

Want my view on Batch Analytics? Look here.

Want my view on Real-time analytics? Look here.

Here are a few products in this space:

Posted in Data.

Tagged with , , , , , .

Batch with Big Data versus Small Data


How do you know whether you are dealing with Big Data or Small Data? I’m constantly asked for my definition of “Big Data”. Well, here it is…for batch analytics, now addressed by technologies such as Hadoop.

Batch Analytics

Batch Analytics Small Data Big Data
Data Volume Gigabytes Terabytes – Petabytes
Data Velocity Updated periodically with non-real-time intervals Updated both in real-time  and through bulk timed intervals
Data Variety 1-6 structured sources 6+ structured AND 6+ unstructured sources
Data Models Store data without cleaning, transforming, or normalizing. Store data without cleaning, transforming, and normalizing. Then apply schemas based on application needs.
Business Functions One line of business (e.g. sales) Several lines of business – to – 360 view
Business Intelligence Queries are complex requiring many concurrent data modifications, a rich breadth of operators, and many selectivity constraints. However, they are applied to a simpler data structure.Response times are in minutes to hours, issued by one or maybe two experts.Example: determine how much profit is made on a given line of parts, broken out by supplier, by geography, by year.


Queries are complex requiring many concurrent data modifications, a rich breadth of operators, and many selectivity constraints. Queries span across business functions.Response times are in minutes to hours, issued by a small group of experts. 

Example: determine how much profit is made on a given line of parts, broken out by supplier, by geography, by year; and then determining which customers purchased the higher profit parts, by geography, by year; determining the profile of those high-profit customers; finding out what products purchased by high-profit customers were NOT purchased by other similar customers in order to cross-sell / up-sell.

Want to see my view on Ad Hoc and Interactive Analytics? Go here.

Want to see my view on Real-Time Analytics? Go here.

Here are a few other products in this space:

ICS Hadoop








Posted in Data.

Tagged with , , , , .

Splice data scientist DNA into your existing team

Screen Shot 2013-05-07 at 2.38.52 PMAs organizations continue to grapple with big data demands, they may find that business managers who understand data may meet their “data scientist” needs better than the hard core data technologists.

There’s little doubt that data-derived insight will be a key differentiator in business success, and even less doubt that those who produce such insight are going to be in very high demand. Harvard Business Review called “data scientist” the“sexiest” job of the 21st century, and McKinsey predicts a shortfall of about 140,000 by 2018. Yet most companies are still clueless as to how they’re going to meet this shortfall.

Unfortunately, the job description for a data scientist has become quite lofty. Unless your company is Google-level cool, you’re going to struggle to hire your big data dream team (well, at least right now), and few firms out there could recruit them for you. Ultimately, most organizations will need to enlist the support of existing staff to achieve their data-driven goals, and train them to become data scientists. To accomplish this, you must determine the basic elements of data scientist “DNA” and strategically splice it into the right people.


Image credit: Thinkstock

Posted in Data.

Tagged with , , , .