Skip to content


Real-time Big Data or Small Data?

big_little_bird

Have you heard of products like IBM’s InfoSphere Streams, Tibco’s Event Processing product, or Oracle’s CEP product? All good examples of commercially available stream processing technologies which help you process events in real-time.

I’ve been asked what I consider as “Big Data” versus “Small Data” in this domain. Here’s my view.

Real-Time Analytics Small Data Big Data
Data Volume None None
Data Velocity 100K events / day (<<1K events / second) Billion+ events / day (>>1K events / second)
Data Variety 1-6 unstructured on sources AND 1 single destination (an output file, a SQL database, a BI tool) 6+ structured and 6+ unstructured for sources AND many destinations (a custom application, a BI tool, several SQL databases, NoSQL databases, Hadoop)
Data Models Used for “transport” mainly. Little to no ETL, in-stream analytics, or complex event processing performed. Transport is the foundation. However, distributed ETL, linearly scalable in-memory and in-stream analytics are applied, and complex event processing is the norm.
Business Functions One line of business (e.g. financial trading) Several lines of business – to – 360 view
Business Intelligence No queries are performed against the data in motion. This is simply a mechanism for transporting transaction or event from the source to a database.Transport times are <1 second.

 

Example: connect to desktop trading applications and transport trade events to an Oracle database.

ETL, sophisticated algorithms, complex business logic, and even queries can be applied to the stream of events as they are in motion.  Analytics span across all data sources and, thus, all business functions.Transport and analytics occur in < 1 second. 

Example: connect to desktop trading applications, market data feeds, social media, and provide instantaneous trending reports. Allow traders to subscribe to information pertinent to their trades and have analytics applied in real-time for personalized reporting.

Want to see my view of Batch Analytics? Go Here.

Want to see my view of Ad Hoc Analytics? Go Here.

Here are a few other products in this space:

 

Posted in Data.

Tagged with , , , , , , , , , .


Ad Hoc Queries with Big Data or Small Data?

big-dog-little-dog

Do you think that you’re working with “Big Data”? or is it “Small Data”? If you’re asking ad hoc questions of your data, you’ll probably need something that supports “query-response” performance or, in other words, “near real-time”. We’re not talking about batch analytics, but more interactive / iterative analytics. Think NoSQL, or “near real-time Hadoop” with technologies like Impala. Here’s my view of Big versus Small with ad hoc analytics in either case.

Ad Hoc Analytics Small Data Big Data
Data Volume Megabytes – Gigabytes Terabytes (1-100TB)
Data Velocity Update in near real-time (seconds) Update in real-time (milliseconds)
Data Variety 1-6 unstructured data sources 6+ structured AND 6+ unstructured data sources
Data Models Aggregations with tens of tables Aggregations with up 100s – 1000s of tables
Business Functions One line of business (e.g. sales) Several lines of business – to – 360 view
Business Intelligence Queries are simple, regarding basic transactional summaries/reports.Response times are in seconds across a handful of business analysts.

 

Example: retrieve a customer’s profile and summarize their overall standing based on current market values for all assets.

 

This is representative of the work performed when a business asks the question “What is my customer worth today?”

 

The transaction is a read-only transaction. Questions vary based on what business analyst needs to know interactively.

Queries can be as complex as with batch analytics, but generally are still read-only and processed against aggregates. Queries span across business functions.Response times are in seconds across large numbers of business analysts.Example: retrieve a customer profile and summarize activities across all customer-touch points, calculating “Live-Time-Value” based on past & current activities.

 

This is representative of the work performed when a business asks the question “Who are my most profitable customers?”

 

Questions vary based on what business analyst needs to know interactively.

Want my view on Batch Analytics? Look here.

Want my view on Real-time analytics? Look here.

Here are a few products in this space:

Posted in Data.

Tagged with , , , , , .


Big Data versus Small Data

Big-Small

How do you know whether you are dealing with Big Data or Small Data? I’m constantly asked for my definition of “Big Data”. Well, here it is…for batch analytics, now addressed by technologies such as Hadoop.

Batch Analytics

Batch Analytics Small Data Big Data
Data Volume Gigabytes Terabytes – Petabytes
Data Velocity Updated periodically with non-real-time intervals Updated both in real-time  and through bulk timed intervals
Data Variety 1-6 unstructured sources 6+ structured AND 6+ unstructured sources
Data Models Store data without cleaning, transforming, or normalizing. Store data without cleaning, transforming, and normalizing. Then apply schemas based on application needs.
Business Functions One line of business (e.g. sales) Several lines of business – to – 360 view
Business Intelligence Queries are complex requiring many concurrent data modifications, a rich breadth of operators, and many selectivity constraints. However, they are applied to a simpler data structure.Response times are in minutes to hours, issued by one or maybe two experts.

 

Example: determine how much profit is made on a given line of parts, broken out by supplier, by geography, by year.

 

Queries are complex requiring many concurrent data modifications, a rich breadth of operators, and many selectivity constraints. Queries span across business functions. 

Response times are in minutes to hours, issued by a small group of experts.

 

Example: determine how much profit is made on a given line of parts, broken out by supplier, by geography, by year; and then determining which customers purchased the higher profit parts, by geography, by year; determining the profile of those high-profit customers; finding out what products purchased by high-profit customers were NOT purchased by other similar customers in order to cross-sell / up-sell.

Want to see my view on Ad Hoc and Interactive Analytics? Go here.

Want to see my view on Real-Time Analytics? Go here.

Here are a few other products in this space:

ICS Hadoop

Cloudera

MapR

Hortonworks

EMC

 

Posted in Data.

Tagged with , , , , .


Splice data scientist DNA into your existing team

Screen Shot 2013-05-07 at 2.38.52 PMAs organizations continue to grapple with big data demands, they may find that business managers who understand data may meet their “data scientist” needs better than the hard core data technologists.

There’s little doubt that data-derived insight will be a key differentiator in business success, and even less doubt that those who produce such insight are going to be in very high demand. Harvard Business Review called “data scientist” the“sexiest” job of the 21st century, and McKinsey predicts a shortfall of about 140,000 by 2018. Yet most companies are still clueless as to how they’re going to meet this shortfall.

Unfortunately, the job description for a data scientist has become quite lofty. Unless your company is Google-level cool, you’re going to struggle to hire your big data dream team (well, at least right now), and few firms out there could recruit them for you. Ultimately, most organizations will need to enlist the support of existing staff to achieve their data-driven goals, and train them to become data scientists. To accomplish this, you must determine the basic elements of data scientist “DNA” and strategically splice it into the right people.

READ MORE>>

Image credit: Thinkstock

Posted in Data.

Tagged with , , , .


Why the Pivotal Initiative’s Fate will Mirror VMware’s

Screen Shot 2013-04-24 at 8.16.00 AM

An Enterprise PaaS must truly be agnostic to the underlying elastic infrastructure, and fully support open standards. So the big question is whether the Pivotal Initiative will be able to break away from its roots with EMC and VMware and the associated ties to VSphere?

Lets itemize just a few of the major components of the stack from top to bottom:

  • Pivotal Labs: Besides the source of Paul Maritz‘s new company name, this is an agile software development consulting firm focused on Ruby on Rails, pair programming, test-driven development and behavior driven development. It is known for Pivotal Tracker, a project management and collaboration software package.
  • OpenChorus: real-time social collaboration on predictive analytics projects, allowing businesses to iterate faster and more effectively.
  • Cetas: End-to-End analytics platform from data ingestion, to data source connectors, to data processing and analytics, and visualization to recommendations.
  • Vfabric SpringSource: Eclipse-based application development framework for building Java-based enterprise applications.
  • Vfabric Data Director: Database provisioning, high availability, backup, and cloning. This product includes the ability to provision Hadoop on VSphere using open source project Serengeti (powered by the open source orchestration project Ironfan).
  • Vfabric Gemfire: An in-memory stream processing technology that combines the power of stream data processing capabilities with traditional database management. It supports ’Continuous Querying‘ which eliminates the need for application polling and supports the rich semantics of event driven architectures.
  • Vfabric RabbitMQ: Enterprise messaging middleware implementation of AMQP supporting a full range of Internet protocols for lightweight messaging— including HTTP, HTTPS and STOMP – enabling you to connect nearly any imaginable type of applications, components, or services.
  • Greenplum: An ad hoc query and analytics database. The Greenplum database is based on PostgreSQL. It primarily functions as a data mart / analytic appliance and utilizes a shared-nothing, massively parallel processing (MPP) architecture. It has a parallel query optimizer that converting SQL AND MapReduce into a physical execution plan.
  • Pivotal HD (Hadoop Distribution): The distribution is competitive with Cloudera. EMC (now Pivotal) created their own distribution so it could improve query response time (but this occurred before they were aware of the introduction of Impala. Many believe that Pivotal HD was created solely to boost struggling sales of its Greenplum software and appliances.
  • Cloudfoundry: an open source cloud computing Platform as a service (PaaS) software written in Ruby.
  • Bosh: an open source tool chain for release engineering, deployment and lifecycle management of large scale distributed services. It was initially developed to manage the Cloud Foundry PaaS, but as it is a large scale distributed application, bosh turned into a genreal purpose orchestration tool chain that can handle any application. It currently bosh supports four different IaaS providers: OpenStack, AWS, vSphere & vCloud.
  • IaaS – OpenStack, AWS, vSphere & vCloudSupport starts with vCloud and VCenter APIs, and extends with later additions of OpenStack and AWS (via the Bosh orchestration layer).

So when you look at this sample of technologies (and I’m sure I’m leaving many off the list), you might see through the EMC/VMware veil….to see a collection of open source projects. We’ll see how Paul Maritz pulls this all together – clearly a powerful number of teams and technology.

So why do I refer to VMware’s “fate”…well, it’s no secret that VMware’s business has begun to plateau under the pressure from open projects like OpenStack. Did Paul get out right in the “nick of time”? Can he create a long-term sustainable business on open source?

Posted in Cloud Computing, Data.

Tagged with , , , , , , , , , , , , , , , .


Big Data and Banking – More than Hadoop

Jim's_BankFraud is definitely top of mind for all banks. Steve Rosenbush at the Wall Street Journal recently wrote about Visa’s new Big Data analytic engine which has changed the way the company combats fraud. Visa estimates that its new Big Data fraud platform has identified $2 billion in potential annual incremental fraud savings. With Big Data, their new analytic engine can study as many as 500 aspects of a transaction at once. That’s a sharp improvement from the company’s previous analytic engine, which could study only 40 aspects at once. And instead of using just one analytic model, Visa now operates 16 models, covering different segments of its market, such as geographic regions.

Do you think Visa, or any bank for that matter, uses just batch analytics to provide fraud detection? Hadoop can play a significant role in building models. However, only a real-time solution will allow you to take those models and apply them in a timeframe that can make an impact.

The banking industry is based on data – the products and services in banking have no physical presence – and as a consequence, banks have to contend with ever-increasing volumes (and velocity, and variety) of data. Beyond the basic transactional data concerning debits/credits and payments, banks now:

  • Gather data from many external sources (including news) to gain insight into their risk position;
  • Chart their brand’s reputation in social media and other online forums.

This data is both structured and unstructured, as well as very time-critical. And, of course, in all cases financial data is highly sensitive and often subject to extensive regulation. By applying advanced analytics, the bank can turn this volume, velocity, and variety of data into actionable, real-time and secure intelligence with applications including:

  • Customer experience
  • Risk Management
  • Operations Optimization

It’s important to note that applying new technologies like Hadoop is only a start (it addresses 20% of the solution). Turing your insights into real-time actions will require additional Big Data technologies that help you “operationalize” the output of your batch analytics.

Customer Experience

Customer-Experience-Management-Customer-Centric-Organization-copyBanks are trying to become more focused on the specific needs of their customers and less on the products that they offer. They need to:

  • Engage customers in interactive/personalized conversations (real-time)
  • Provide a consistent, cross-channel experience including real-time touch points like web and mobile
  • Act at critical moments in the customer sales cycle (in the moment)
  • Market and sell based on customer real-time activities

Noting a general theme here? Big Data can assist banks with this transformation and reduce the cost of customer acquisition, increase retention, increase customer acceptance of marketing offers, increase sales by targeted marketing activities, and increase brand loyalty and trust. Big Data presents a phenomenal opportunity. However, the definition of Big Data HAS to be broader then Hadoop.

Big Data promises the following technology solutions to help with this transformation:

  • Single View of Customer (all detailed data in one location)
  • Targeted Marketing with micro-segmentation (sophisticated analytics on ALL of the data)
  • Multichannel Customer Experience (operationalizing back out to all the customer touch points)

Risk Management

Quality-Risk-ManagementRisk management is also critically important to the bank. Risk management needs to be pervasive within the organizational culture and operating model of the bank in order to make risk-aware business decisions, allocate capital appropriately, and reduce the cost of compliance. Ultimately, this means making data analytics as accessible as it is at Yahoo! If the bank could provide a “data playground” where all data sources were readily available with tools that were easy to use…well, lets just say that new risk management products would be popping up left and right.

Big Data promises a way of providing the organization integrated risk management solutions, covering:

 

  • Financial Risk (Risk Architecture, Data Architecture, Risk Analytics, Performance & reporting)
  • Operational Risk & Compliance
  • Financial Crimes (AML, Fraud, Case Management)
  • IT Risk (Security, Business Continuity and Resilience)

The key is to focus on one use-case first, and expand from there. But no matter which risk use-case you attack first, you will need batch, ad hoc, and real-time analytics.

Operations Optimization

operations_managementLarge banks often become unwieldy organizations through many acquisitions. Increasing flexibility and streamlining operations is therefore even more important in today’s more competitive banking industry. A bank that is able to increase their flexibility and streamline operations by transforming their core functions will be able to drive higher growth and profits; develop more modular back-room office systems; and respond quickly to changing business needs in a highly flexible environment.

This means that banks need new core infrastructure solutions. Examples might involve reducing loan origination times by standardizing its loan processes across all entities using Big Data.  Streamlining and automating these business processes will result in higher loan profitability, while complying with new government mandates.

Operational leverage improves when banks can deliver global, regional and local transaction and payment services efficiently and also when they use transaction insights to deliver the right services at the right price to the right clients.

Many banks are seeking to innovate in the areas of processing, data management and supply chain optimization. For example, in the past, when new payment business needs would arise, the bank would often build a payments solution from scratch to address it, leading to a fragmented and complex payments infrastructure. With Big Data technologies, the bank can develop an enterprise payments hub solution that gives a better understanding of product and payments platform utilization and improved efficiency.

Are you a bank and interested in new Big Data technologies like HadoopNoSQL datastores, and real-time stream processing? Interested in one integrated platform of all three?

 

Posted in Data.


Expansion Stage Companies

The webster definition of “expansion stage” is:

“Financing provided by a venture capital firm to a company whose service or product is commercially available. Though the company’s revenues may look strong and show significant growth, the company may not be profitable. Typically, a company that receives an expansion stage investment has been in business three years or longer.”

This definition was “ok”, but I was looking for something with more depth. That’s when I read an old blog post from Scott Maxwell at OpenView Partners where he states that “expansion stage” begins with:

  1. Whole Product: You have a core product vision and “whole product” offering with enough functionality and enough of a competitive differentiation that your target market customers are purchasing/using your “whole product” with a high enough win/conversion rate.
  2. Referenceable Customers: You have a set of happy (or at least satisfied) customers (that are willing to be used as references, used for case studies, and/or say good things about your company online and offline) and your customers and target market are generally happy with your product and go to market approach.
  3. G2M that works: You have a core Go-To-Market Strategy and are executing it in a way that gets solid economic results (sometimes we call this “sales economics”, “sales and marketing economics”, “distribution economics”, or “funnel economics“). Ideally, the management team has gone down the learning curve far enough that the benefits of growing the resources outweighs the more difficult continuous improvement of a larger set of resources (that you will have as you go through the expansion stage).
  4. Foundation: You have adequate organizational and operational methodologies and people to support additional resources and additional business.

I like Scott’s view of “expansion stage”. It is a high-tech startup CEO’s focus and goal to break free of the early-stage startup phase and enter into the beginning of this phase. So it is worth defining it for your organization and managing the change required to transition into it successfully. I reflect on this using my own experiences with Infochimps over our own transformation over the past two quarters.

Customers Buying “Whole Product”

If you are a high-tech CEO and you are reading this, you might ask, “What is a ‘Whole Product’?” Lets be clear, you are never finished developing, improving, and adding to your product/service.

However, you know you are still in the early-stage if you are only delivering on a part of your promise, with plans to “add XYZ when we’re further along”….and the part you are missing is still required for your customer to really receive the value you are promising.

For example, had Infochimps only delivered “Hadoop as a Service” as our cloud service for our customers, we would have provided HUGE potential. However, we would have fallen short on our promise of solving our customer’s business problems. ALL of our customers require MORE than Hadoop to solve their problems. Therefore, we needed to make sure that our cloud services provided real-time analytics, ad hoc analytics, and batch analytics – this was required to provide a “whole product” to our customers.

On the flip side, we have SO many ideas on how to improve the developer’s experience including a rich number of data flow and analytic libraries developed specifically around customer use-cases, customer GUIs that give our customers operational views into how our cloud service is operating, etc. However, all these ideas fall into the category of improving on the “whole product” we have today. In other words, our customers are receiving the value associated with solving their business problem, even if there are still some “rough edges” to iron out.

Scott  also mentions another important element of “whole product” where you are experiencing a high enough “win rate”. This is a requirement that I have always struggled with throughout my 15 years as a startup CEO.

What is a high enough win rate? I can personally translate this into two things, which the team at Infochimps has spent several months perfecting:

  • Having a sales process which is well defined. This means that although you may not have every phase of the sales cycle understood to the point that you can predict the odds of closing them perfectly (e.g. we have a “measurable phase” called “Relationship Building” which we associate with 50% odds of closing), you continue to measure and adjust our sales process with the goal of statistically closing what you predict (e.g. at least half of the customers who reach the “Relationship Building” phase in the sales cycle). What is important is that our entire sales team (sales operations, inside sales, direct sales, systems engineers) and even marketing are: a) aligned on what is being measured, b) know exactly what criteria the organization is using to define each measureable phase, and c) are constantly improving the process.
  • Knowing how/when to say “No” to customers. In some cases, this means you don’t take on new customer prospects unless you have a high degree of certainty that you can make THEM successful. This statement is loaded because it involves understanding your customer’s business, their targeted use-case, and your ability to provide a solution that successfully solves that use-case. It also has a lot to do with knowing your target market (and knowing what markets you want to avoid). In the case of Infochimps, we would like to work exclusively with Fortune 1000 companies, scrutinizing smaller company prospects.

At Infochimps, we use a number of well-defined and measureable phases to help us understand how to qualify and obtain customers with a high win rate. Here’s how we define our MPs (measurable phases):

  1. MP1 (measurable phase 1 = Potential Opportunity): We have performed “business discovery” and we understand the customer’s use-case which equates to a “big data problem”, we know their goals (success criteria), we know their deployment requirements (e.g. public, virtual private, private cloud), we have confirmed that they have and can spend budget, there is a clear champion, and ultimately there is a compelling and/or impending event…all driving the need for our cloud services. Inside sales focuses on potential opportunities.
  2. MP2 (Confirmed Opportunity): Our direct sales force has a detailed dialogue with the champion to confirm all criteria in MP1, but digs further into the use-case and the potential impact on our customer’s business (how do they win? does it increase revenue? to what extent? how far does the needle move?). At the end of this phase we have determined that a significant level of our investment is justified (e.g. system engineering is brought in). We then begin to build what is called a “Client Specification” which details the opportunity.
  3. MP3 (Relationship): Creating a relationship requires understanding the tasks/timelines involved, knowing the people are involved in authorizing the spend / thumbs up, getting the customer to speak to their success criteria succinctly, reaching a point where the customer believes that our solution solves their problem (technical and business validation begins), and frankly getting to know the customer (creating a level of trust). The output of this phase is a “Proposal” to the customer.
  4. MP4 (Negotiation): This phase occurs when you have submitted your proposal to solving their problem and it includes the economics involved. You enter into a level of “technical” and “business” due diligence that can result in a “bake-off” where you may lose their business, or you potentially “fire your customer prospect” because they don’t fit your target market model, or you win their business and begin to move forward, formalizing the partnership. Our team then puts together what’s called a “Mutual Action Plan” (MAP) which outlines the steps to a formal partnership.
  5. MP5 (Procurement): This is the phase where business terms are drafted into a legal contract. At Infochimps, we have a standard “Cloud Services Agreement”. However, many of our Fortune 1000 customers arm-wrestle on certain terms and many provide their own “paper”. Note: don’t think that because you are in this “legal” or “procurement” period that customers will stay quiet on business terms. I find that customers still like to negotiate during this period (especially if timing gets close to the end of your quarter….which they always leverage). We’ve also seen deals fall apart during this phase.
  6. MP6 (Customer Win): Of course, execution of the contract is not the “end” of the sales cycle for Infochimps. In fact, our sales directors are incentivized based on a smooth process of “pre-sales” to “post-sales” so that our customers don’t feel an abrupt transition. We know that the hard work starts at this point, making our customers successful.

Our sales team reviews our customer prospects every week and shares these with the entire company every two weeks! We don’t expect to have the perfect probabilities associated with each of these above phases, but we do know what the goals are, how to measure them, and that we need to constantly improve in order to achieving and exceed our goals.

Referenceable Customers

If you don’t have this as a company S.M.A.R.T goal, you are failing to meet one of the most important requirements to becoming an expansion stage company.

This is MORE than just saying, “Oh yeah! You can call any of our customers and they will vouch for us.” For us at Infochimps, we have a specific goal that all in the company understand.

I have an example to emphasize the importance of this objective from the senior team level down. Remember “Delivering Happiness” by Tony Hseih? Our entire executive team visited Zappos and met with Tony Hseih and Fred Mosser with the opportunity to spend personal one-on-one with them. The topic? How should Infochimps create a corporate culture of their own which makes customers the #1 focus throughout the organization? What are the important ways to establishing a customer-centric culture and making sure it is sustained with the addition of each new team member?

We too go out of our way to establish a personal connection with our customers. We will go the “extra mile” for our customers….maybe not quite as extreme as ordering pizza for them, but close. We’re also doing so in a way that scales economically. Unfortunately, our business can become very “professional services” heavy with each customer “touch” potentially leading to a “sucking sound” (customers love the fact that we are the ‘experts’ in Big Data….which can quickly result in them leaning on us to do a lot of customer work…and a business that is far from profitable).

So what processes have we established over the past two quarters which addresses this important characteristic of an “expansion stage company”? Some key milestones for us include:

  • Establishing clear success criteria with our customers based on a clearly stated use-case.
  • Clearly defining what is expected of our customers so that they become accountable to the success of the project.
  • Involving our customers throughout the process with distinct checkpoints where we ask consistent questions – starting on day 1 (as part of a kickoff meeting), and at various phases of being deployed on our cloud (we have three phases today), and then on a regularly scheduled basis after “going live”. This is led by both our “expert services” and “customer service” departments.
  • Onsite visits by our System Engineering and Product Management teams to assess “what we need to do better” which plays nicely into our cloud service roadmap.
  • Executive sponsorship, involving our VP of Sales talking to our customer champion, as well as CEO to CEO dialogue. Yeap, I personally take the time out to have calls with my peers within every customer account. In some cases where the “CEO” is clearly not going to take my call (top ten bank globally), I go as high as I can within the organization to communicate our commitment and establish a direct connection. This may not fit your model (e.g. direct to consumer), but I find that it remains an invaluable opportunity for even expansion stage companies.

By the way, all our customers are “referenceable”…even those who were and are now not currently using our cloud service  ;-)

Go-To-Market That Works

There are two important aspects here:

  • Having a Go-To-Market strategy (yeap, you actually need one of these to then measure your results against)
  • Executing it in a way that gets solid economic results (your G2M needs to produce a profitable business)
  • Where the benefits of growing the resources outweighs the more difficult continuous improvement of a larger set of resources (“just add water”)

These may seem like simple ideas, but I generally find that only 10% of my peers really understand the mechanics of these three things. Lets take a brief look at each.

Having a G2M

One of the FIRST things we discussed as a team was G2M. I technically spent an entire quarter defining it, testing it with customers, and going back to defining it. This was a very iterative process that involved talking to real customer prospects, as well as ecosystem companies we felt were necessary in helping us execute on it. I’ll mention a few components involved:

  • Outline your ecosystem with your company in the center.
  • Understand your  target market – early adopters versus early majority
  • Profile decision makers within your target market(s)
  • Understand the sales process involved
  • Define the Why, How, What of your product
  • Create a plan around lead-gen & qualification
  • Establish your Direct vs. Indirect plan
  • Define your sales process for both
  • Create competitive positioning
  • Distill your market messaging
  • Set your revenue goals
  • Define the resources required and when

Then we knew we had to measure how well our work was paying off. For us this meant having marketing and sales tied to mutual objectives and measuring the lead process from “marketing campaign” all the way to “customer win”. It also meant A LOT of work with Salesforce.com. Wow, you’d think SFDC would be easier…but if you feel that you’re creating everything from scratch in this tool, you are not alone. Key metrics we measure include:

  • Cost of Acquisition Cost Ratio (CAC): Bruce Cleveland from Interwest as well as the Bessemer folk talk about this metric quite a bit. Our CAC ratio is at 48%. Costs include campaign costs per win + marketing salaries allocated + direct sales commissions/salaries + inside sales commissions/salaries + pre-sales SE costs.
  • Sales cycle (from lead generation to close). Ours is 4 months.
  • Duration within the lead/sales lifecycle (all measurable phases). Email me for this.
  • Conversion rates throughout each phase. Email me for this.
  • Time from contract close to deployment (realizing revenue/value). This is 30 days.

Other “dashboard” items that assist us in measuring the business include:

  • Monthly Recurring Revenue: I not only look at what is committed under contract (CMRR) but also the annual recurring revenue (ARR) or the first 12 months of recurring (CMRR x 12). Our average is around $20K MRR or $240K ARR.
  • Total annual contract value (TACV): Which includes both recurring and one-time fees (expert or professional services) within the first 12 months of contract. I measure from start of recurring (when deployed) + 12 months out. This is currently around $300K.
  • Pipeline (Pipe): All-odds pipeline is everything. Then there’s what is forecast in the current quarter, plus the upside/backfill (which could come in this quarter or next), and everything else (next quarter+). I measure both MRR + one-time = Total Contract Value. This is around $18M currently.
  • Churn: How many customers do you lose after 12 months (we require a 12 month upfront commitment. So this milestone is important). This is 20% for us (due to our early focus on startups…need I say more?).
  • Customer Life Time Value (CLTV). For our target customers, we’re seeing a 3 year minimum life-time on projects at an average of $780K.

Economic Results

We look at EVERY prospect customer deployment on our cloud as it compares to our “target profile” or what we call a “Standard Reference Platform (SRP)” customer. Infochimps has a “model customer” where we define everything from revenue to profitability – it’s a full “margin model”. We look at variable costs (all cost of goods, customer support, delivery services) that contribute to gross profit, and then allocated indirect costs (allocated sales and marketing, R&D, etc.) that drives profit (for a complete P&L view) for all the supported cloud configurations for our customers. My guidance to my peers, is that you make sure your gross profits (gross margins) are 70-90%, and your profit margins are 25-35%. This is a healthy operating model ;-) .

Just Add Water

We all know that if your operational model requires many “human” moving parts, that adding more customers will actually exponentially raise the cost of doing business. In this case, “just adding water” doesn’t equate to a scalable growth, but rather a business which will implode. Over the past six months the Infochimps team has been focused on two things to make sure we avoid this outcome, and can scale with lots of operational leverage:

  • Hardening our existing cloud services so that they support the largest deployments (making sure that as our size of customers, our size of problems, and the size of our cloud deployments all grow, we don’t experience a non-linear amount of effort in supporting them)
  • Automation of our cloud deployment and management such that we have superior operational leverage. For a 10th of an operational engineer, our customers need 10 people to do the same…and by achieving this, we’ll always be 24 months ahead of our customers.

This concept also means that when we add another “sales team” (which consists of a combination of direct sales, inside sales, pre-sales SE, and post-sales expert services), that the number of customers, the resulting revenue, and ultimately the net profits scales well. I’m proud to say that over the past six months we have surpassed most companies our size with a process and “equation” which supports this. A message to my peers – it comes down to really understanding your margin model, and making sure the entire organization also understands their contribution to improving it.

Foundation For Growth

This is the most subjective characteristic, and yet the one that could have the greatest impact to your organization. The good news is that with the right leadership you can apply changes which ensure your ability to scale your business…and this, indeed, is all about “scaling” with “people processes”.

Remember, this means having adequate organizational and operational methodologies to scale your business. The number of areas within this category which we’ve focused on at Infochimps include:

  • Establishing a vision that all understand and support
  • Creating an ROI-focused organization (people understand that everything affects the P&L)
  • Innovating by focusing on sustainable competitive advantage
  • Being nimble and comfortable with change
  • Hiring people with passion & commitment
  • Fostering a level of communication that is “straight but sensitive”
  • Creating a business that is centered around the customer (solving real problems)
  • Creating a corporate culture which is about the “we” not “I”

I’ll add that we also deploy many typical processes like an agile engineering process, and a lean but effective product realization process. But let me focus on the more “fluffy” for a moment. I’ll give you an example (which my executive team didn’t exactly appreciate at first).

As a CEO, I know my job is to set the vision/direction of the company; make sure that we have the right people/resources; make sure we’re executing well; and ultimately being responsible for removing  obstacles. However, what I believe many of my peers seem to discount is that if you establish the proper executive team communication practices, and push those down into the organization, your company can withstand the challenges associated with scaling to any size.

I can’t tell you how many times I’ve been told by an executive who says they are good at operating well under stress, and has fallen significantly short…..and it always has something to do with communication at its core. On example of how we address “communication” issues is at our executive meetings. Our weekly executive meetings have a seemingly standard agenda….except for one major difference. Here’s our agenda:

  1. Good news check-in
  2. Discussion around “Real Issues” & top priorities
  3. Customer and employee hassles
  4. Review of overall quarterly status
  5. Commitments/cascading messages
  6. Wrap – one sentence close

Notice the discussion around “real issues”? Here’s where our staff meetings stray from most. Our definition of a “real issue”:

  • A topic that would make your stomach linings churn, if brought up as a team
  • Something that you are uncomfortable talking about (especially as a team)
  • Event(s) which are affecting the group (staff, company) negatively

Why does our executive meeting need to address “real-issues”?

  • Teams (companies) fail based on process (team dynamics) not content (what is actually being talked about)
  • Every team “hits a wall”. Great teams work through the “real issues”
  • Every “real issue” that has the potential of “blowing the team apart” is exactly what makes it stronger
  • Reality always wins. It’s our job to get in touch with it.
  • There are no secrets in teams, just dysfunctional dynamics thinking so.

Our executive team actually works through issues which assist us in facilitating change needed to grow. That’s what every expansion stage company needs, and most early-stage companies lack. As an executive team we constantly assess and work to improve our ability to operate (see Five Dysfunctions of a Team).

Curious about my management style? Have other ideas about “expansion stage” companies? I’m always available for beers after work.

Posted in Leadership.

Tagged with , , , , , , , , .


Customized, Intelligent, Vertical Applications – the future of Big Data?

 

The Ideal Big Data Application Development Environment

Lets assume that your entire organization had access to the following building blocks:

  • Data: All sources of data from the enterprise (at rest and in motion)
  • Analytics: Queries, Algorithms, Machine Learning Models
  • Application Business Logic: Domain specific use-cases / business problems
  • Actionable Insights: Knowledge of how to apply analytics against data through the use of application business logic to produce a positive impact to the business
  • Infrastructure Configuration: High scalable, distributed, enterprise-class infrastructure capable of combining data, analytics, with app logic to produce actionable insights

Imagine if your entire organization was empowered to produce data-driven applications tailored specifically for your vertical use-cases?

Data-Driven Vertical Apps

You are a regional bank who is under heavier regulation, focused on risk management, and expanding your mobile offerings. You are seeking ways to get ahead of your competition through the use of Big Data by optimizing financial decisions and yields.

What if there was an easy and automated way to define new data sources, create new algorithms, apply these to gain better insight into your risk position, and ultimately operationalize all this by improving your ability to reject and accept loans?

You are a retailer who is being affected by the economic downturn, demographic shifts, and new competition from online sources. You are seeking ways of leveraging the fact that your customers are empowered by mobile and social by transforming the shopping experience through the use of Big Data.

What if there was an easy and automated way to capture all customer touch points, create new segmentation and customer experience analytics, apply these to create a customized cross-channel solution which integrates online shopping with social media, personalized promotions, and relevant content?

You are a fixed line operator, wireless network provider, or fixed broadband provider who is in the middle of convergence of both services and networks, and feeling price pressures of existing services. You are seeking ways to leverage cloud and Big Data to create smarter networks (autonomous and self-analyzing), smarter operations (improving working efficiency and capacity of day-to-day operations), and ways to leverage subscriber demographic data to create new data products and services to partners.

What if there was an easy and automated way to start by consuming additional data across the organization, deploy segmentation analytics to better target customers and increase ARPU?

It Starts With The “Infrastructure Recipe”

Ok. You are a member of the application development team. All you have to do is create a data driven application “deploy package”. It’s your recipe of what data sources, analytics, and application logic needed to insert into this magical cloud service which produces your industry and use-case specific application. You don’t need to be an analytics expert; you don’t need to be a DBA; an ETL expert; let alone a Big Data technologist. All you need is a clear understanding of your business problem and you can assemble the parts through a simple-to-use “recipe” which is abstracted from the details of the infrastructure used to execute on that recipe.

Any Data Source

Imagine an environment where your enterprise data is at your fingertips. No heavy ETL tools, no database exports, no Hadoop flume or sqoop jobs. Access to data is as simple as defining “nouns” in a sentence. Where your data lives is not a worry. You are equipped with the magic ability to simply define what the data source is and where it lives and accessing it is automated. You also care less whether the data is some large historic volume living in a relational database or whether it is real-time streaming event data.

Analytics Made Easy

Imagine a world where you can pick from literally thousands of algorithms and apply them to any of the above data sources in part or in combination. You create one algorithm and can apply it to years of historic data and/or a stream of live real-time data.  Also, imagine a world where configuring your data in a format that your algorithms can consume is made  seamless. Lastly, your algorithms execute on infrastructure in a parallel, distributed, highly scalable way. Getting excited yet?

Focus on Applications With Actionable Insights

Now lets embody this combination of analytics and data in a way that can actually be consumed and acted upon. Imagine a world where you can produce your insights and report on them with your BI tool of choice. That’s kind of exciting.

But what’s even more exciting is the ability to deploy your insights operationally through an application which leverages your domain expertise and understanding of the business logic associated with the targeted use-case you are solving against. Translation – you can code up a Java, Python, PHP, or Ruby application which is light, simple, and easy to build/maintain. Why? Because the underlying logic normally embedded in ETL tools, separate analytics software tools, MapReduce code, NoSQL queries, and stream processing logic is pushed up into the hands of application developer. Drooling yet? Wait, it gets better.

Big Data, Cloud, and The Enterprise

Lets take this entire programming paradigm and automate it within an elastic cloud service purpose built for the organization with the ability to submit your application “deploy packages” to be instantly processed without having to understand the compute infrastructure and, better yet, without having to understand the underlying data analytic services required to process your various data sources in real-time, near real-time, or in batch modes.

Ok…if we had such an environment, we’d all be producing a ton of next-generation applications…..data-driven, highly intelligent, and specific to our industry and use-cases.

I’m ready…are you?

Posted in Cloud Computing, Data.


Infochimps Leading New Category of Intelligent Applications

Back in September, I briefly mentioned a new generation of data-driven applications, which I also refer to as intelligent applications (a new cloud category and era being led by Infochimps). We recently made a formal announcement of our new Enterprise Cloud for Big Data, powering intelligent application development.

Now Forrester is the first analyst group to reinforce this new category – a category that is more than Business Intelligence, and more than just predictive analytics.

“Forrester uses the term ‘smart computing’ to define apps that, for instance, provide direct access to data for decision-making. It also includes data analytics and business intelligence in the category.”

Intelligent applications will fuel a component of the software market to the tune of $41 billion in 2013 (out of a total software market of $542B), increasing to $48 billion in 2014. There is a turning point beginning this year, where application development will begin to incorporate real-time stream processing and analytics along with ad-hoc query and batch analytics to create more sophisticated, interactive, intelligent web and mobile applications.

With the proper cloud infrastructure, companies can accelerate their development of smart computing apps, creating new SaaS products for their customers (B2C), and for other businesses (B2B).

One of the most interesting perspectives from Forrester’s report has to do with the leading intelligent application categories (below):

  • Enterprise Vertical Applications: $51B
  • Enterprise Process Applications: $118B
  • Information Management Applications: $28B
  • Desktop Applications: $32B

The big (data) question is how these categories are affected through the combination of CLOUD and BIG DATA technologies, moving forward?

Also, many debate whether a few companies will corner the market with intelligent vertical SaaS applications, or whether enterprises will compliment SaaS applications will a large number of custom developed applications which leverage their internal domain expertise.

I suspect that because Big Data is still so new, enterprises will need to create value from their data by first launching new internally-generated data-driven applications.

If you were one of many application developers within an enterprise, did you resist new technologies like javascript and HTML5 in the past? Why would creating data-driven or intelligent applications with Big Data technologies be any different? Justin LaFayette says it well here when he says that the future of Big Data is apps, not infrastructure:

“However the largest wave of Big Data value creation is still to come and it will focus on exploiting the infrastructure to create new applications that analytically optimize business processes.”

Rishidot Research’s Krishnan Subramanian believes the same. See his recent blog on Big Data At The Core of Platform Services or his presentation on “Intelligent Platforms“.

What do you think?

Related posts:

Big Data Predictions for 2013

Era of Analytic Applications – Part 1

Era of Analytic Applications – Part 2

Big Data’s Fourth Dimension – Time

Enterprise Big Data Cloud

New Cloud Ecosystem

The Data Era – Moving from 1.0 to 2.0

 

Posted in Cloud Computing, Data.

Tagged with , , , , , , , , .


Big Data Predictions for 2013

Data-Driven Applications – the Big Data theme for 2013

My prediction for 2013 is that competitive advantage will translate into enterprises using sophisticated Big Data analytics to create a new breed of applications – Data-Driven Applications, or also referred to as Intelligent Applications.

“It’s more than just insights from MapReduce”, a CIO from a fortune 100 told me, “It’s about using data to make our customer touch points more engaging, more interactive, more data-driven.”

So when you hear about “Big Data solutions”, you need to translate that into an emerging category and era of “Intelligent Applications”. At the end of the day, it’s not about people pouring through petabytes of data. It’s actually about how one turns the data into revenue (or profits).

This means that you MUST:

  1. Start with the business problem first (preferably one with revenue upside versus cost savings)
  2. Determine which data elements you can leverage AFTER #1
  3. Define a three-tier data analytic architecture (as shown above)

Which Big Data market segments will grow the fastest in 2013?

Morgan Stanley named the top ten as follows:

  1. Healthcare
  2. Entertainment
  3. Com/Media
  4. Manufacturing
  5. Financial
  6. Business Services
  7. Transportation
  8. Web Tech
  9. Distribution
  10. Engineering

Many have predicted which Industry is the most attractive (see McKinsey’s Quarterly for another). I personally like Ad-Tech and Financial Services for verticals….followed by Information Management , Health (if you can partner to speed up sales cycles), and Communications.

But what about market segments by technology?

I predict, as shown above, that Data Analytics as a Service (or also referred to as Big Data as a Service (BDaaS)) will have the highest growth (obviously building from a small base in revenue given its level of maturity). Business Intelligence as a Service is the next high-growth segment, given the need for easier ways to present and visualize data, followed by Logging as a Service.

But don’t take my word for this….my data comes from prominent research organizations. I’m just compiling and presenting their data in a slightly new way.

What challenges will end-user organizations struggle with the most in 2013?

End-users will continue to struggle with making sense out of the many technologies available. Is it EMC Greenplum connected to EMC Hadoop? Is it Cloudera Impala + Hadoop? Is it AsterData + Hortonworks? Is it MapR Hbase + HDFS? I think one thing is definite….you have lots of options.

The biggest problem will be whether they are actually satisfying the needs of the business problem. Here are my leading predictions for end-user organizations:

  1. End users just want to solve problems, but will continue to fight IT over who owns the platform powering their much-needed data-driven applications
  2. Ultimately, end-users will be forced to chase “shinny objects” because IT groups will persuade them to wait for the “technology bake-offs” around the Big Data platform soon to be launched (24 months from now)
  3. In the end, many organizations will fail at creating value from Big Data due to a lack of focus on business problems, time-to-market, and in some cases the wrong technology choice

What are some of the key technologies that will dominate the Big Data market in 2013?

So many equate Big Data with Hadoop. But as you begin to see with announcements like Impala from Cloudera, it’s more than just Hadoop. It’s about servicing all the application response time requirements. It’s about volume, velocity, and variety but also time-to-value with your data analytics.

My prediction for 2013 is that you will need the following technology components:

  • Real-time stream processing
  • Ad-hoc analytics (see NoSQL and NewSQL data stores)
  • Batch Analytics

Not one, but all three!

What steps can customers take to maximize competitive advantage with Big Data in 2013?

Competitive advantage is ALL about time-to-market. I have no doubt that every Global 2000 company will launch their Big Data initiatives in 2013. The question is when they will turn those initiatives into additional revenue…how long will it take from the time that they hire Accenture, CSC, Capgemini, IBM or the like to implement their Big Data strategies, to launching an intelligent application?

My prediction for 2013:

Cloud will become a large part of big data deployment – established by a new cloud ecosystem.

This will be driven by the need for time-to-market and ultimately, competitive advantage. Cloud usually lags any disruption made behind the firewall….by at least 12 months. In the case of Big Data, the launch of Apache 1.0 in December of 2011 basically makes 2013 the year for Cloud-based Big Data.

That being said, large volumes of data, privacy and public cloud are not usually mentioned in the same paragraph by IT in a Global 2000 enterprise. That’s why we’re going to see elastic big data clouds behind the firewall and within trusted third party data center providers.

Related posts:

Era of Analytic Applications – Part 1

Era of Analytic Applications – Part 2

Big Data’s Fourth Dimension – Time

Enterprise Big Data Cloud

New Cloud Ecosystem

The Data Era – Moving from 1.0 to 2.0

 

Posted in Cloud Computing, Data.




Switch to our mobile site