Moving Data to the Cloud

Posted 1 month ago by Delv

Introducing a cloud environment into an organization can be an overwhelming process.  There are many aspects to consider, including how to incorporate the new cloud environment into your network infrastructure, user authentication across environments, and file sharing from on-premise to cloud access.  But one of the most common and foundational questions is:  How will we move our data to the cloud?  And relatedly: what might our future-state cloud data environment look like?  This article addresses those core questions, looking at:

  1. Cloud migration
  2. Data management when moving to the cloud
  3. Cloud data access.    

1. Cloud Migration

Key considerations when planning a cloud data migration are:

  1. Where will your data be stored in the cloud, and what database options are there?  
  2. What cloud architecture will you need, in terms of data lakes, data warehouses etc?
  3. How will you move your data from on-premise to the cloud, and how will you change it in the cloud environment?  
  4. What security considerations do you need to take into account? 

1.a. Cloud Data Storage.  There are two primary types of cloud data storage: 

Cloud Storage can be thought of as simply a folder structure, not unlike a typical network file share structure (\\share\folder\file).  Cloud storage is often referred to as a ‘bucket’ (e.g. an Amazon S3 bucket).   

Database Storage has multiple options:  Relational, Columnar or Key/Value Pair.  

1. b. Cloud Data Structure.  Data in the cloud is typically saved in the following structure:  

1.c. On-Premise and Cloud Data Movement.  Data movement (‘pipelines’) may occur through multiple processes across and within environments:  

1.d. Cloud Data Security.  Security of the data is a paramount consideration for the overall strategy and architecture of any cloud migration.  The following states of data should be considered when moving to the cloud:   

Combining these methods provides the optimal data security. 

2. Data Management when Moving to the Cloud

Data Catalog and Metadata.  What data is available in your cloud environment, and who owns it?  For finding data, a data catalog is key to understanding what data is available, what its source is, where it is, and who owns it.  The catalog should include technical information such as formats, as well as information such as classification, ownership, and any jurisdictional aspects, especially those that might drive regulatory compliance.  As data assets do not all have an equal value, a data catalog strategy should be developed for your cloud migration that determines catalog granularity based on business value.  Business definitions would be an additional key piece of metadata (data dictionary).

The design should include automatic capture of metadata where possible, including aspects such as ownership (e.g. data stewards), provenance (e.g. source system), rights-of-use (e.g. data use information, contractual or other obligations), and timeliness.  Metadata should also include metrics and SLAs to support automated data quality efforts, including attribute-based data quality statistics collected during the data movement process.  It should also include lifecycle information to support automated adherence to retention and archiving policies.  

Any related data can also be associated in the metadata to aid discovery and avoid duplication.  Any information about taxonomies (hierarchical classifications) or associated ontologies (e.g. geographic, sector, departmental, product, lifecycle) should also be added. 

3. Cloud Data Access  

How is data accessed in the cloud? Is the data secure?  From a control perspective, who has accessed it?

Querying Data.  Across the environment, access to data should be managed through an interface structure to provide for consistent access, security, and logging.  This might be an API that provides a layer between the user and data that incorporates both in-transit and in-use encryption, also taking into account PII (personally identifiable information) considerations and any rights-of-use limitations. 

Logging Data Access.  Centralized logging is key for determining who has accessed data, when, where it occurred and, hopefully, the purpose of usage.  From an audit and control perspective, this can be difficult when the accessed data is in many locations or is incomplete.   

Conclusion

Prior to moving to the cloud, developing a robust strategy and plan is critical to ensuring a fit-for-purpose solution.  Defining and prioritizing requirements for your cloud migration based on the above guidelines can assist in determining a strategic direction and the overall architecture.  And let’s not forget fundamentals such as testing: the overall plan needs to include a test plan to validate the implementation of each requirement — be it technology or data — and ensure a successful solution.

See also:

The Ripple Effect

In our blog “Blockchain’s Napster Moment,” we highlighted the importance of separating the utility of blockchain from the fervor of cryptocurrencies. Since then, XRP made headlines as its value skyrocketed and minted Ripple’s cofounders as the world’s newest billionaires.   Ironically, the hype of XRP stole thunder from the product itself, which deserves far more attention. The product is RippleNet, and more specifically its xCurrent software, which enables near instantaneous cross-border payments between banks. It’s a decentralized private network in which banks—the nodes […]