Essbase® Analytic Services Database Administrator's Guide | | Update Contents | Previous | Next | Print | ? | |
Information Map | |
Loading a large data source into an Analytic Services database can take hours. You can speed up the data loading process by improving two areas:
This chapter contains the following sections:
This section is not relevant to aggregate storage databases.
To optimize data load performance, you must think in terms of database structure. Analytic Services loads data block by block. For each unique combination of sparse dimension members, one data block contains the data for all the dense dimension combinations, assuming there is at least one cell containing data. For faster access to block locations, Analytic Services uses an index. Each entry in the index corresponds to one data block. For further explanation of how sparse and dense dimensions affect database structure, see Sparse and Dense Dimensions, Selection of Sparse and Dense Dimensions, and Dense and Sparse Selection Scenarios.
When Analytic Services loads a data source, Analytic Services processes the data in three main stages:
This process is repeated until all data is loaded. By using one or more processing threads in each stage, Analytic Services can perform some processes in parallel. For a description of the parallel data load process, see Managing Parallel Data Load Processing.
All examples in this chapter assume that you are familiar with the discussions about data sources in Data Sources.
This section is not relevant to aggregate storage databases.
The most effective strategy to improve performance is to minimize the number of disk I/Os that Analytic Services must perform while reading or writing to the database. Because Analytic Services loads data block by block, organizing the source data to correspond to the physical block organization reduces the number of physical disk I/Os that Analytic Services must perform.
Arrange the data source so that records with the same unique combination of sparse dimensions are grouped together. This arrangement corresponds to blocks in the database.
The examples in this chapter illustrate various ways you can organize the data following this strategy. These examples use a subset of the Sample Basic database, as shown in Table 80.
Sparse, Non attribute Dimensions |
Dense Dimensions |
---|---|
Note: Because you do not load data into attribute dimensions, they are not relevant to this discussion even though they are sparse.
First, consider the data shown in Figure 238. Because it is not grouped by sparse-dimension member combinations, this data has not been sorted for optimization. As Analytic Services reads each record, it must deal with different members of the sparse dimensions.
Figure 238: Non-Optimized Sequence of Source Data
Jan Actual Cola Ohio Sales 25 Budget "Root Beer" Florida Sales 28 Actual "Root Beer" Ohio Sales 18 Budget Cola Florida Sales 30
This data loads slowly because Analytic Services accesses four different blocks instead of one.
From the same Sample Basic database, Figure 239 shows different records sorted by a unique combination of sparse-dimension members: Actual -> Cola -> Ohio. Analytic Services accesses only one block to load these four records.
Figure 239: Optimally-Organized Source Data
Actual Cola Ohio Jan Sales 25 Actual Cola Ohio Jan Margin 18 Actual Cola Ohio Jan COGS 20 Actual Cola Ohio Jan Profit 5
You can use a data source that loads more than one cell per record. Make sure that records are grouped together by unique sparse-dimension member combinations, then order the records so that the dimension in the record for which you provide multiple values is a dense dimension.
Figure 240 uses a header record to identify the members of the Measures dimension, which is a dense dimension. The data is sorted first by members of the dense dimension Year and grouped hierarchically by members of the other dimensions. Multiple values for the Measures dimension are provided on each record.
Figure 240: Source Data Sorted and Grouped by Dense Dimensions
Sales Margin COG Profit Jan Actual Cola Ohio 25 18 20 5 Jan Actual Cola Florida 30 19 20 10 Jan Actual "Root Beer" Ohio 18 12 10 8 Jan Actual "Root Beer" Florida 28 18 20 8
The heading and first data line in this example provide the same data shown in four lines in Figure 239.
For detailed information, including examples, about arranging data in source files before loading, see Data Sources That Do Not Need a Rules File.
Make the data source as small as possible. The fewer fields that Analytic Services reads in the data source, the less time is needed to read and load the data.
Group the data into ranges. Eliminating redundancy in the data source reduces the number of fields that Analytic Services must read before loading data values.
Figure 241 shows a file that is not organized in ranges. It includes unneeded repetition of fields. All values are Profit values. Profit needs to be included only at the beginning of the group of data applicable to it. This example contains 33 fields that Analytic Services must read in order to load the data values properly.
Figure 241: Data Source Without Ranges
Profit Jan "New York" Cola 4 Jan "New York" "Diet Cola" 3 Jan Ohio Cola 8 Jan Ohio "Diet Cola" 7 Feb "New York" Cola 6 Feb "New York" "Diet Cola" 8 Feb Ohio Cola 7 Feb Ohio "Diet Cola" 9
Figure 242 shows the same file optimized by grouping members in ranges. By eliminating redundancy, this example contains only 23 fields that Analytic Services must read in order to load the data values properly.
Figure 242: Data Source Organized in Ranges
Profit Jan "New York" Cola 4 "Diet Cola" 3 Ohio Cola 8 "Diet Cola" 7 Feb "New York" Cola 6 "Diet Cola" 8 Ohio Cola 7 "Diet Cola" 9
Analytic Services assigns the first value, 4, to Jan->New York->Cola; it assigns the next value, 3, to Jan->New York->Diet Cola and so on.
Although sorted efficiently, the data in Figure 240 still shows a lot of repetition that can slow down the load process. You can further optimize this data by grouping the data into ranges. The optimized data source shown in Figure 243 eliminates the redundant fields, thereby reducing processing time.
Figure 243: Source Data Sorted and Grouped in Ranges
Sales Margin COG Profit Jan Actual Cola Ohio 25 18 20 5 Florida 30 19 20 10 "Root Beer" Ohio 18 12 10 8 Florida 28 18 20 8
For information about and examples of organizing source data into ranges, see Formatting Ranges of Member Fields.
Making fields in a data source smaller enables Analytic Services to read and load the data in less time.
Make the fields in the data source as small as possible by performing the following tasks:
This section is not relevant to aggregate storage databases.
The index is organized in the same order as the sparse dimensions in the outline. To further optimize the data source, with the sparse data combinations in the data source grouped together, arrange the data so that sparse dimensions are in the same order as the outline.
Analytic Services pages portions of the index in and out of memory as requested by the data load or other operations. Arranging the source data to match the order of entries in the index speeds up the data load because it requires less paging of the index. Less paging results in fewer I/O operations.
Analytic Services uses the index cache size to determine how much of the index can be paged into memory. Adjusting the size of the index cache may also improve data load performance.
Note: If the index cache size is large enough to hold the entire index in memory, positioning data in the same order as the outline does not affect the speed of data loads.
For detailed information about setting the index cache size, see Sizing the Index Cache.
Loading the data source from the Analytic Server computer is faster than loading from a client computer. To load a data source from the server, move the data source to the server computer and then start the load.
Loading data from the server improves performance because the data does not have to be transported over the network from the client computer to the server computer.
The methods described earlier in this chapter give you the most substantial data load performance enhancements. If you have not done so, you also need to carefully evaluate your processor speed and memory requirements and upgrade your computers to meet these requirements.
Another method to speed up data loads is to work with the Analytic Services parallel data load feature to optimize use of processor resources. The parallel data load feature recognizes opportunities to process data load tasks at the same time. Although some opportunities present themselves on single-processor computers, many more opportunities are available on multiple-processor computers.
To enable you to fine tune processor use for specific application and database situations, Analytic Services provides three essbase.cfg
settings: DLTHREADSPREPARE, DLTHREADSWRITE, and DLSINGLETHREADPERSTAGE.
When Analytic Services loads a data source, it works with a portion of data at a time, in stages. Analytic Services looks at each stage as a task and uses separate processing threads in memory to perform each task.
One form of parallel processing occurs when one thread takes advantage of processor resources that are left idle during the wait time of another thread. For example, while a thread performs I/O processing, it must wait for the slower hardware to perform its task. While this thread waits, another thread can use the idle processor resource. Processing staged tasks in parallel can improve processor efficiency by minimizing idle time.
When computers have multiple processors, Analytic Services can perform an additional form of parallel processing. When a data load stage completes its work on a portion of data, it can pass the work to the next stage and start work immediately on another portion of data. Processing threads perform their tasks simultaneously on the different processors, providing even faster throughput.
Even though Analytic Services uses parallel processing to optimize processor resources across the data load stages, there are still times when processor resources can be idle. To take advantage of these idle times, Analytic Services can further divide up record processing in the preparation and write stages. To tailor parallel processing to your situation, you can use the DLTHREADSPREPARE and DLTHREADSWRITE essbase.cfg
settings to tell Analytic Services to use additional threads during these stages.
As shown in Table 81, Analytic Services provides three essbase.cfg
settings that enable you to manage parallel data load processing.
You can specify setting values that apply to all applications on a given Analytic Server or you can specify settings multiple times with different values for different applications and databases.
Setting |
Description |
---|---|
Specifies how many threads Analytic Services may use during the data load stage that codifies and organizes the data in preparation to being written to blocks in memory |
|
Specifies how many threads Analytic Services may use during the data load stage that writes data to the disk. High values may require allocation of additional cache. For an explanation of the relationship between DLTHREADSWRITE and data cache size, see Implications in Sizing the Data Cache. Note: For aggregate storage databases, Analytic Server uses one thread with aggregate storage cache. The DLTHREADSWRITE setting is ignored. |
|
Specifies that Analytic Services use a single thread per stage, ignoring the values in the DLTHREADSPREPARE and DLTHREADSWRITE settings |
Only when the DLSINGLETHREADPERSTAGE setting is set to FALSE for the specific application and database being loaded does the data load process use the thread values specified in the DLTHREADSPREPARE and DLTHREADSWRITE settings.
See the Technical Reference for details about these settings and their parameters.
For block storage databases, Analytic Server allocates the data cache memory area to hold uncompressed data blocks. Each thread specified by the DLTHREADSWRITE setting uses an area in the data cache equal to the size of an expanded block.
Depending on the size of the block, the number of threads, and how much data cache is used by other concurrent operations during a data load, it may be possible to need more data cache than is available. In such circumstances, decrease the number of threads or increase the size of the data cache.
To change the data cache size, see Changing the Data Cache Size.
While processing data loads, you can view processor activity. Different operating systems provide different tools for viewing processor activity. For example, the Task Manager in Windows/NT and Windows/2000 enables you to view processor and memory usage and processes. Among the tools available on UNIX are top and vmstat. You can also use third-party tools to view and analyze system utilization.
To assess system usage during data load processing;
essbase.cfg
settings that are described in Setting Parallel Data Load Settings.
![]() |