Friday 5 September 2008

Data Integrator -Transform Overview

Operation Code:

Operation codes that describe the status of each row in each data set described by the inputs to and outputs from objects in data flows.
  • NORMAL:Creates a new row in the target. All rows in a data set are flagged as NORMAL when they are extracted by a source table or file.
  • INSERT:Rows can be flagged as INSERT by the Table_Comparison transform to indicate that a change occurred in a data set as compared with an earlier image of the same data set. The Map_Operation transform can also produce rows flagged as INSERT. Only History_Preserving and Key_Generation transforms can accept data sets with rows flagged as INSERT as input.
  • DELETE:Rows can be flagged as DELETE in the Map_Operation and Table Comparison transforms. Only the History_Preserving transform with the Preserve delete row(s) as update row(s) option selected, can accept data sets with rows flagged as DELETE.
  • UPDATE: Overwrites an existing row in the target table. Rows can be flagged as UPDATE by the Table_Comparison transform to indicate that a change occurred in a data set as compared with an earlier image of the same data set. The Map_Operation transform can also produce rows flagged as UPDATE. Only History_Preserving and Key_Generation transforms can accept data sets with rows flagged as UPDATE as input.
总结:Operators 与transform的关系分2种,产生与接受。Operators主要用于,根据2个data source的不同,来执行操作。主要用于CDC。CDC分成2种,一种是perserving:可以由 History_Preserving 来处理,还有就是MAP_CDC。或者自由处理,MAP_Operation.

Descriptions of transforms:

处理分支/合并结构的:
  • Case: Simplifies branch logic in data flows by consolidating case or decision making logic in one transform. Paths are defined in an expression table.
  • Data_Transfer:Allows a data flow to split its processing into two sub data flows and push down resource-consuming operations to the database server.
  • Merge: Unifies rows from two or more sources into a single target.
生成新的一列:
  • Date_Generation: Generates a column filled with date values based on the start and
    end dates and increment that you provide.
  • Effective_Date: Generates an additional “effective to” column based on the primary
    key’s “effective date.” ???
  • Key_Generation:Generates new keys for source data, starting from a value based on existing keys in the table you specify.
  • Row_Generation: Generates a column filled with int values starting at zero and incrementing by one to the end value you。
对当前Schema进行转换的:
  • Hierarchy_Flattening: Flattens hierarchical data into relational tables so that it can
    participate in a star schema. Hierarchy flattening can be both vertical and horizontal. ???
  • Pivot (Columns to Rows): Rotates the values in specified columns to rows.
  • Reverse Pivot (Rows to Columns) :Rotates the values in specified rows to columns.
对目标data操作:
  • History_Preserving: Converts rows flagged as UPDATE to UPDATE plus INSERT, so
    that the original values are preserved in the target. You specify in which column to look for updated data. If it receives a DELETE row, this transform can generate an UPDATE row to close the record in the target table.(CDC update的另外一种方式)
  • Map_CDC_Operation: While commonly used to support relational or mainframe changed data capture, this transform supports any data stream if its input requirements are met. Sorts input data, maps output data, and resolves before- and after-images for UPDATE rows.
  • Map_Operation: Allows conversions between operation codes (map UPDATE to
    INSERT) on data sets to produce the desired output.
对Source data 进行筛选/甄别
  • Query:Retrieves a data set that satisfies conditions that you specify. A query transform is similar to a SQL SELECT statement.
  • Table_Comparison:Compares two data sets and produces the difference between
    them as a data set with rows flagged as INSERT, UPDATE or DELETE.
Others:
  • SQL: Performs the indicated SQL query operation when it cannot be performed by other built-in transforms.
  • Validation: Ensures that the data at any stage in the data flow meets your criteria. You can filter out or replace data that fails your criteria.
  • XML_Pipeline: Processes large XML inputs in small batches.

No comments: