The data gravity and its effect on data storage infrastructure

Vivek, September 1, 2021

332 0

The data gravity and its effect on data storage infrastructure

Data gravity has a significant impact on the entire IT infrastructure and should be taken into account when developing data storage management strategies. It is critical to ensure that no single data set has an outsized influence on the rest of the IT and application ecosystem.

Data, like physical capital and intellectual property, is now a critical asset for businesses in every vertical. With ever-increasing amounts of structured and unstructured data, data growth will continue at previously unheard-of rates in the coming years.

Meanwhile, data sprawl — the increasing dispersion of business data across data centres and geographies — complicates the challenges of managing data growth, movement, and activation.

Enterprises must develop a strategy for managing large amounts of data efficiently across cloud, edge, and endpoint environments. And it is critical now more than ever to develop a calculated plan when designing large-scale data storage infrastructure.

Enterprises should seek better economics, less friction, and a simpler experience as they seek to overcome the cost and complexity of storing, moving, and activating data at scale. A novel approach to data.

The concept of data gravity is critical to bear in mind during these endeavours.

According to a new Seagate-sponsored report from IDC, as storage for massive data sets grows, so will its gravitational pull on other elements of the IT universe.

Data gravity is a function of the volume and level of activation of the data. A suitable analogy is found in elementary physics: a body with a greater mass has a greater gravitational effect on the bodies in its vicinity. According to the IDC report, “workloads with the highest data volumes have the greatest mass within their ‘universe,’ attracting applications, services, and other infrastructure resources into their orbit.”

A large and active dataset will invariably have an effect on the location and treatment of smaller datasets that require interaction. Thus, data gravity reflects the dynamics of the data lifecycle and must be used to inform IT architecture decisions.

Consider two datasets, one of which is 1 petabyte in size and the other of which is 1 gigabyte in size. It is more efficient to move the smaller dataset to the location of the larger dataset when integrating the two sets. As a result, the storage system that previously stored the 1-petabyte set now also stores the 1-gigabyte set. Due to the fact that large datasets tend to ‘attract’ other smaller datasets, large databases accrete data, increasing their overall data gravity.

Additionally, managing, analysing, and activating data requires applications and services, whether provided by a private or public cloud vendor or an on-premises data management team. Applications gather and generate data; significant work must be done on the data. Naturally, the larger a data set becomes, the more difficult it is to use it unless it is close to the applications. As a result, applications are frequently relocated close to data sets. Data gravity is a property that spans the entire IT infrastructure, from on-premises data centres to public clouds and edge computing.

However, the IDC report notes that such massive data sets can resemble black holes. “Storing stored data, applications, and services in a single location is inefficient unless IT environments are designed to enable the migration and management of stored data, as well as the applications and services that rely on it, regardless of operational location.”

Due to the fact that data gravity can have a significant impact on an entire IT infrastructure, it should be a primary design consideration when developing data management strategies. According to IDC, a critical objective when designing a data ecosystem is to “ensure that no single data set exerts uncontrollable force on the rest of the IT and application ecosystem.”

Ensuring applications have access to data, regardless of location

IT architecture strategies should prioritise mass storage and data movement. This begins with data location optimization. A data-centred architecture locates applications, services, and user interaction closer to the data itself, rather than relying on time-consuming and frequently expensive long-distance transfers of large amounts of data to and from centralised service providers.

According to IDC, one way to mitigate data gravity’s impact is to ensure that stored data is colocated adjacent to applications regardless of their location.

This model is possible through the use of colocated data centres that house multiple private and public cloud service providers.

A data-centric architecture’s fundamental goal is data accessibility. Accessibility can have a positive impact on future business innovation by enhancing the ability to generate metadata and new datasets, facilitating search and discovery, and empowering data scientists to deploy data for machine learning and artificial intelligence.

However, placing data at the centre of the IT architecture can have a beneficial effect on application performance optimization. Overall data reliability and durability are also significant benefits: reliability refers to the ability to access data when necessary, while durability refers to the ability to preserve data over an extended period of time.

Put data at the centre of IT strategy

Taken together, these factors have a significant impact on enterprise data management planning — from defining an overall IT strategy to developing a business initiative. Planning the necessary workloads and jobs entails taking data gravity into account.

Key questions to ask include:

What is the volume of data being generated or consumed?
What is data distribution across the data centre, private clouds, public clouds, edge devices, and remote and branch offices?
What is the velocity of the data being transmitted across the entire IT ecosystem?

Addressing these issues will improve the data infrastructure’s efficiency and may prevent costly data pipeline issues in the future.

“Do not let a single workload or operational location dictate the movement of storage or data resources,” IDC advises in its report. Due to the gravitational pull of data, data infrastructure must be designed in such a way that large individual workloads do not exert a disproportionate gravitational pull on storage resources.

This requires constant awareness of which datasets are being pulled where, the most efficient way to move the data, and what optimises the performance of those workloads. This can also include automating data movement in order to reduce storage costs or relocating underperforming datasets that are no longer required.

Putting these concepts into practise requires adaptive data architecture, infrastructure, and management processes. Thus, while an organization’s data gravity considerations may be clear today, they may not be the same in five years.

“Not every enterprise manages multiple massive data sets,” IDC notes in the report. “However, many do.” “And, given the speed with which businesses are digitising and the importance placed on enterprise data and data collection, many organisations will soon find themselves managing massive data sets.”

Each data management system should evolve in response to changing data requirements. Data management, as well as the data architecture that supports it, must be adaptable to changing business requirements.

Connect The Dots