Moore’s Law:
I hope it’s safe to mention that Moore’s Law for Data Storage has become as significant as Einstein’s Mass-Energy Equivalence for physics. For a quick recap of Moore’s law: Gordon Moore, co-founder of Intel, had observed for years and predicted that number of transistors per square inch on Integrated Circuits will double for every 18 months while the cost of the ICs decrease. In this post, we will examine how was Moore’s law observed in Data Warehousing and Business Intelligence over last few years.
Conventionally, Data Warehousing (EDW) applications were known for requiring more data storage and computing power than other applications in an organization. So, let’s look at how the cost of data storage has varied over years -
- From 1993 to 2013, the cost of a terabyte of disk storage has dropped from $2 million to less than $100. (This is the cost of the drives alone. Storage-management hardware and software are separate.)
- Dynamic RAM (DRAM), the volatile memory of the computer, has seen a similar drop in price ($250 for every 4 megabytes of 8-bit memory versus $450 for every 32 gigabytes of 64-bit memory). Today’s servers can be fitted with 64,000 times as much memory for only twice the cost of 20 years ago.
- Cost per gigabyte is only part of the story. There is a similar dramatic increase in density (from 1 MB to 8 GB per chip), allowing for smaller installations, less space, and using less energy.
- A proprietary Unix server with four Intel 486 chips, 384 MB of RAM, and 32GB of storage cost $650,000 in 1993, greater than $1 million in constant dollars. Today’s mid range laptops of the same capacity cost between $2,000 and $4,000.
A pictorial representation of these factors will be like:
With the slump in storage costs over years, many IT firms have started offering Storage and computing power as a service (cloud computing) for Data Warehousing and Business Intelligence.
Cloud computing for DW/BI:
Cloud computing has emerged as one of the hot topics of the last few years with the promise of affordable, “pay as you go” computing infrastructure designed to minimize both the up front investment in infrastructure, and the lead time required to deploy compute resources for new projects. With so many big players in IT providing cloud services, the time to provision even moderately complex environments can be reduced to under an hour, with entry-level costs at less than one dollar per hour. However, cloud-based environments for big data analytics, or more specifically, data warehousing analytics for structured data, are not appropriate for all use cases.
Here are some of the metrics that can be used to determine whether a DW/BI solution needs Cloud computing power or not.
- Total volume of data
- Volume of data to be loaded daily
- Sensitivity of data/Regulatory and compliance requirements
- Scope of Analytics (e.g. mart or full-scale EDW)
- Primary environment use (e.g. dev/test/production)
Assuming a business has all the characters mentioned above, let us look at some of the major players in offering cloud based services for DW/BI. The following services did not only provide storage and computing power but also changed the way organizations worked with their data and analytics.
Some cloud services for DW/BI:
Amazon Redshift:
One cannot get away without mentioning Amazon Redshift cloud services for Data Warehousing. The service was one of the initial players in this area and still offers the best services at a highly competitive pricing, starts at as low as 85c/hour.
What makes Redshift a good choice:
What makes Redshift a good choice:
- Fast - Optimized for DW and scalable
- Cheap - No upfront costs, pay-for-use
- Simple - Get started in minutes and auto-backed up
- Secure - Encrypted, Isolated network with Cloudtrail
- Compatible - With many SQL databases
IBM DB2 with BLU acceleration:
IBM DB2 with BLU Acceleration is the next generation database technology that changes the game for in-memory computing. Delivering a combination of innovations from IBM Research & Development labs, BLU Acceleration provides breakthrough performance by delivering instant insight from real-time operational data and historical data.
Salient features:
Salient features:
- Fast - claims 35x faster analytics with in-memory processing
- Simple - Transactions and analytics together
- Agile - Available on-premise and cloud alike with extensive SQL compatibility
Future of DW/BI:
With 80% of data in unstructured format, extracting, transforming and loading data from unstructured data sources and analyzing that data will be the next big thing for data warehousing and Business Intelligence applications. As we have discussed in previous blog posts, the DW2.0 is already making it’s way to this goal and these applications might be able to process petabytes of data on cloud to provide business insights.
References: