What Is Data Warehousing in ecommerce?

The primary concept of data warehousing is that the data stored for business analysis can most effectively be accessed by separating it from the data in the operational systems. A data warehouse is a collection of computer-based information that is critical to successful execution of enterprise initiatives. A data warehouse is more than an archive for corporate data and more than a new way of accessing corporate data. A data warehouse is a subject-oriented repository designed with enterprise-wide access in mind. It provides tools to satisfy the information needs of the employees organizational levels-not just for complex data queries, but as general facility for getting quick, accurate and often insightfulinformation. A data warehouse is designed so that its users can recognize the information they want and access that information using simple tools.
One of the principal reasons for developing a data warehouse is to integrate operational data from various sources into a single and consistent architecture that supports analysis and decision-making within the enterprise. Operational systems create, update and delete production data that feed the data warehouse. A data warehouse is analogous to a physical warehouse. Operational systems create data ‘parts’ that are loaded into the warehouse. Some of those parts are summarised into information ‘components’ and are stored in the warehouse. Data warehouse users make requests and are delivered information ‘products’ that are created from the components and parts stored in the warehouse. A data warehouse is typically a blending of technologies, including relational and multidimensional databases, client/ server architecture, extraction / transformation programs, graphical user interfaces, and more.
Definitions

Corporate Data Warehouse in E-Commerce

Data Warehouse:
The term Data Warehouse was coined by Bill Inmon in 1990, which he defined in the following way: “A warehouse is a subject-oriented, integrated, time-variant and nonvolatile collection of data in support of management’s decision making process”. He defined the terms in the sentence as follows:
Subject Oriented:
Data that gives information about a particular subject instead of about a company’s ongoing operations.
Integrated:
Data that is gathered into the data warehouse from a variety of sources and merged into a coherent whole.
Time-variant:
All data in the data warehouse is identified with a particular time period.
Non-volatile
Data is stable in a data warehouse. More data is added but data is never removed.
This enables management to gain a consistent picture of the business. This definition remains reasonably accurate almost ten years later. However, a single-subject data warehouse is typically referred to as a data mart, while data warehouses are generally enterprise in scope. Also, data warehouses can be volatile. Due to the large amount of storage required for a data warehouse, (multi-terabyte data warehouses are not uncommon), only a certain number of periods of history are kept in the warehouse. For instance, if three years of data are decided on and loaded into the warehouse, every month the oldest month will be “rolled off” the database, and the newest month added. Ralph Kimball provided a much simpler definition of a data warehouse. A data warehouse is “a copy o f transaction data specifically structured for query and analysis”.
This definition provides less insight and depth than Mr. Inmon’s, but is no less accurate. Data warehousing is essentially what you need to do in order to create a data warehouse, and what you do with it. It is the process of creating, populating, and then querying a data warehouse and can involve a number of discrete technologies such as:
ADVANTAGES OF DATA WAREHOUSE
Implementing a Data warehouse provides significant benefits many tangible, some intangible.
  • More cost effective decision making - A Data Warehouse allows reduction of staff and computer resources required to support queries and reports against operational and production databases. This typically offers significant savings. Having a Data Warehouse also eliminates the resource drain on production systems when executing long - running, complex queries and reports.
  • Better enterprise intelligence - Increased quality and flexibility of enterprise analysis arises from the multi - tired data structures of a Data Warehouse that supports data ranging from detailed transactional level to high - level summary information. Guaranteed data accuracy and reliability result from ensuring that a Data Warehouse contains only ‘trusted’ data.
  • Enhanced customer service - An enterprise can maintain better customer relationships by correlating all customer data via a single Data Warehouse architecture.
  • Business reengineering - Allowing unlimited analysis of enterprise information often provides insights into enterprise processes that may yield breakthrough ideas for reengineering those processes. Just defining the requirements for Data Warehouse, results in better enterprise goals and measure. Knowing what information is important to an enterprise will provide direction and priority for reengineering efforts.
  • Information systems reengineering - A Data Warehouse that is based upon enterprise- wide data requirements provides a cost - effective means of establishing both data standardization and operational system interoperability. Data Warehouse development can be an effective first step in reengineering the enterprise’s legacy systems.
Types of Data warehouses:
The term data warehouse is currently being used to describe a number of different facilities each with diverse characteristics.
Physical data warehouse: This is an actual, physical database into which all the corporate data for the data warehouse are gathered, along with schemas (information about data) and the processing logic used to organize, package and pre-process the data for end user access.
Logical data warehouse: This contains all the metadata, business rules and processing logic required scrub, organize, package, and pre-process the data. In addition, it contains the information required to find and access the actual data, wherever it actually resides.
Data library: This is a subset of the enterprise wide data warehouse. Typically, it performs the role of departmental, regional, or functional data warehouse. As part of the data warehouse process, the organization builds a series of data libraries over time and eventually links them via an enterprise wide logical data warehouse.
Decision support systems (DSSs): These systems are not data warehouses but applications that make use of the data warehouse. They are also called executive information systems (EIS)
Aspects of Data Warehouse Architecture
This list of aspects of architecture that the data warehouse decision maker will have to deal with themselves. There are many other architecture issues that affect the data warehouse, e.g., network topology, but these have to be made with all of an organization’s systems in mind (and with people other than the data warehouse team being the main decision makers.)
Data consistency architecture
This is the choice of what data sources, dimensions, business rules, semantics, and metrics an organization chooses to put into common usage. It is also the equally important choice of what data sources, dimensions, business rules, semantics, and metrics an organization chooses not to put into common usage. This is by far the hardest aspect of architecture to implement and maintain because it involves organizational politics. However, determining this architecture has more to do with determining the place of the data warehouse in your business than any other architectural decision. In my opinion, the decisions involved in determining this architecture should drive all other architectural decisions.

Reporting data store and staging data store architecture
The main reasons we store data in a data warehousing systems are so they can be:
1) reported against,
2) cleaned up, and (sometimes)
3) transported
Data modeling architecture
This is the choice of whether you wish to use demoralized, normalized, objectoriented, proprietary multidimensional, etc. data models. As you may guess, it makes perfect sense for an organization to use a variety of models.
Tool architecture
This is your choice of the tools you are going to use for reporting and for what I call infrastructure.
Processing tiers architecture
This is your choice of what physical platforms will do what pieces of the concurrent processing that takes place when using a data warehouse. This can range from an architecture as simple as host-based reporting to one as complicated.
Security architecture
If you need to restrict access down to the row or field level, you will probably have to use some other means to accomplish this other than the usual security mechanisms at your organization. Note that while security may not be technically difficult to implement, it can cause political consternation.
In the long run, decisions on data consistency architecture will probably have much more influence on the return of investment in the data warehouse than any other architectural decisions. To get the most return from a data warehouse (or any other system), business practices have to change in conjunction with or as a result of the system implementation. Conscious determination of data consistency architecture is almost always a prerequisite to using a data warehouse to effect business practice change.

Comments

Popular posts from this blog

E-Commerce - B2C Model

E-Commerce - Disadvantages

E-Commerce - Business Models