基础扎实,做事认真,善于总结,有想法,能承受压力,有一技之长就行。
总结很精辟。:)
有兴趣的看链接~~
http://rdc.taobao.com/blog/dba/html/203_taobao_jobs_2009.html
Wednesday, 10 September 2008
Data Quailty: A survival guide for marketing
http://www.businessobjects.com/product/im/data_quality.asp
DUPLICATE ACCOUNT RECORDS
DQ 9 functions
Data Quality Architecture
Typical Data Problems in a Marketing Campaign
DUPLICATE ACCOUNT RECORDS
- 8:1 duplication ratio between customer records.
- Potential reasons: Mergers and acquisitions (M&A) ;
- Account managers;
- Poor visibility and linkage across systems.
- Deliver the message is impacted.(Crucial Information: address, email, or phone number field) [Y: Report related]
- Segment prospects into the correct categories or demographics is impacted.(Title, salutation, job code, or ethnicity) [Y: Dimention depends]
- Identify similar or related records across systems (Social security, account number, log-in ID, or account name ) [Y: PK related identifier]
- NO-ontime update
- System migrations
- Fraud
- The scope and depth
- The cause of the defects
- Many data quality problems are processed-based.
- reporting of the findings.
- The best place for direct marketers to cleanse their data is as close to the point of creation as possible.
- Transactional updates–often at the point of creation.e.g. Web Site. Use real-time processing,respond in milliseconds, and also be able to service multiple transactional applications.
- Operational feeds–upstream and before the data enters your system. Use batch-oriented data quality functions;utilizes a predefined cleansing job;scheduled to run automatically on a specific data flow
- Purchased data–if you’re buying it, demand that it is clean.Validating purchased data;matching the purchased data against your current data set.
- Legacy migration–data is in the enterprise, but not in your system yet.
- Regular maintenance–as your data ages, you need to cleanse it.
DQ 9 functions
Data Quality Architecture
- Data Quality Repository
- Configuration rules: transforms, blueprints, substitution files etc.
- Data examples: sample data used with blueprints
- Runtime metadata: log files etc.
- Data Quality Server: host Data Quality engine.
- Metadata Repository: a relational database used by the Data Quality Server and Project Portal. This database contains statistics and samples generated by running projects.
- Data Quality Project Architect: a graphical user interface used to create projects.
- Data Quality Command Line: a command line utility used to run projects.
- Integration SDK: an Integration API written in C++ and Java that allows you to create socket connections for direct interaction with the Data Quality Server, without need for the web services.
- Data Quality Documentations
- Web Tools
- Data Quality Web Service: the mechanism used to send and receive data between your application and the Data Quality Server.
- Data Quality Web Service Samples
- Data Quality Project Portal: a web-based tool that is designed to provide you with one environment for manageable administration of your processes.
- Web services
- Integration SDK
- JAVA API
- C++ API
- Socket
Subscribe to:
Posts (Atom)