What’s Big Data?

Big Data is a phrase heard a lot these days. It seems like there is a lot of confusion in business as to what to do with Big Data and how to prepare for it.  Yet, Gartner predicts that Big Data will affect every industry – banking, healthcare, and transportation just to name a few.

Let’s put a definition to the phrase “Big Data”. Gartner defines Big Data as the high-volume, high velocity and/or high variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation. As the name suggests “Big,” deserves a rather broad definition.  In order to shorten the definition, just remember the 3V’s: Volume, Velocity, and Variety.

  • Volume – increasing amount of data
  • Variety – the increasing range of data types and sources
  • Velocity – the increasing speed of data

 

Big Data started to creep onto the stage in 1990’s in a fairly significant way – in the form of ERP and CRM applications.  Businesses began noticing better automation and began spending more on office computers and applications to help improve business efficiency.  The data being collected in this stage was called “Structured Data,” meaning a high degree of organization.  As data storage became cheaper it led to the rise of collection for Semi-Structured Data and then eventually Unstructured Data.  Unstructured Data is an opportunity that was previously considered beyond the reach of analysis, and is considered a new frontier.  Some organizations are writing off Unstructured Data as junk (and most CIO’s and executives would agree that the majority of Unstructured Data is junk).

Now for the question of the day: Is it worth pursuing Unstructured Data if 99% is junk and 1% is pure gold? Ten years ago, the answer would have been NO! But, today’s industries are now open to exploring Unstructured Data in ways that were simply not possible before, due to inexpensive storage costs.  Technological and manufacturing improvements have made the cost significantly more affordable over the years – $200,000 for 1 GB in 1980 now costs under $0.10.

Figure 1. Storage Cost Decline 1985-2015

Cost of Storage

Source: mkomo.com, A History of Storage Cost, 3/9/2014.

The decreasing price trend in storage has had quite the opposite effect to the trend in Unstructured Data being stored.  Figure 2, illustrates the changes in the type of data being archived:

Figure 2. Type of Data Being Stored 2008-2015

Structured vs Unstructured

Source: analysis-bisolutions.blogspot.com, Total Archived Capacity 2008-2015 (Petabytes), 2/19/2015

Why is Big Data Being Talked About?

What are the possibilities of Big Data? That is a very good question to ask. Netflix, uses big data as variables in its algorithm to suggest movies and TV shows for its users.  Amazon, uses big data to suggest products to its users and remind them of products they have previously viewed.  Facebook has been used for marketing and used as a tool for polling leaders in presidential elections.  In sum, the first frontier that big data has breached is marketing and e-commerce  – but future frontiers lie ahead in healthcare, manufacturing, and education, just to list a few.

How much Data is Out There?

 

Let’s take a look at the uptick of Big Data…

2011 Statistics highlights the sheer number of video playbacks:

  • 1 trillion: The number of video playbacks on YouTube
  • 140: The number of YouTube video playbacks per person on Earth
  • 48: Hours video uploaded to YouTube every minute
  • 5: Percentage of the U.S. internet audience that viewed video online
  • 4 billion: Number of videos viewed online per month (October 2011)
  • 43: Percentage shares of all worldwide video views delivered by Google sites including YouTube.

In 2008, it is estimated that Americans consumed 3.6 zettabytes of information (A zettabyte is equal to 1 billion terabytes)…and it continues to grow. Cisco estimates in 2016, 130 exabytes will travel through the internet (An Exabyte is equal to 1 million terabytes).

Figure 3. 2016 Every Minute of the Day

Every Minute of the DAy

Source: aci.info, Data Never Sleeps, 7/12/2014

 

Summary

Big Data is continuing to make strides across organizations on a global scale, with growth that is particularly visible in e-commerce and marketing. The drastic reduction in storage pricing has led to the new frontier of Unstructured Data, opening up even more new possibilities for exploration.  Big data will continue to have a large (and growing) impact on industries – and examples of its utilization continues to build proof as to why we should pay more attention to it.