Wednesday 10 September 2008

taobao招聘要求

基础扎实,做事认真,善于总结,有想法,能承受压力,有一技之长就行。
总结很精辟。:)
有兴趣的看链接~~
http://rdc.taobao.com/blog/dba/html/203_taobao_jobs_2009.html

Data Quailty: A survival guide for marketing

http://www.businessobjects.com/product/im/data_quality.asp

Typical Data Problems in a Marketing Campaign

DUPLICATE ACCOUNT RECORDS
  • 8:1 duplication ratio between customer records.
  • Potential reasons: Mergers and acquisitions (M&A) ;
  • Account managers;
  • Poor visibility and linkage across systems.
INCOMPLETE DATA (blank data fields)
  • Deliver the message is impacted.(Crucial Information: address, email, or phone number field) [Y: Report related]
  • Segment prospects into the correct categories or demographics is impacted.(Title, salutation, job code, or ethnicity) [Y: Dimention depends]
  • Identify similar or related records across systems (Social security, account number, log-in ID, or account name ) [Y: PK related identifier]
THE WRONG DATA
  • NO-ontime update
  • System migrations
  • Fraud
HOW DO I FIND MY DATA QUALITY PROBLEMS?
  • The scope and depth
  • The cause of the defects
  • Many data quality problems are processed-based.
  • reporting of the findings.
  • The best place for direct marketers to cleanse their data is as close to the point of creation as possible.
5 Places to cleansing data
  • Transactional updates–often at the point of creation.e.g. Web Site. Use real-time processing,respond in milliseconds, and also be able to service multiple transactional applications.
  • Operational feeds–upstream and before the data enters your system. Use batch-oriented data quality functions;utilizes a predefined cleansing job;scheduled to run automatically on a specific data flow
  • Purchased data–if you’re buying it, demand that it is clean.Validating purchased data;matching the purchased data against your current data set.
  • Legacy migration–data is in the enterprise, but not in your system yet.
  • Regular maintenance–as your data ages, you need to cleanse it.
A typical lead-generation process is used to map each of the five cleansing opportunities and highlight how they relate to each other.



DQ 9 functions


Data Quality Architecture

  • Data Quality Repository
    • Configuration rules: transforms, blueprints, substitution files etc.
    • Data examples: sample data used with blueprints
    • Runtime metadata: log files etc.
  • Data Quality Server: host Data Quality engine.
  • Metadata Repository: a relational database used by the Data Quality Server and Project Portal. This database contains statistics and samples generated by running projects.
Client Tools
  • Data Quality Project Architect: a graphical user interface used to create projects.
  • Data Quality Command Line: a command line utility used to run projects.
  • Integration SDK: an Integration API written in C++ and Java that allows you to create socket connections for direct interaction with the Data Quality Server, without need for the web services.
  • Data Quality Documentations
  • Web Tools
    • Data Quality Web Service: the mechanism used to send and receive data between your application and the Data Quality Server.
    • Data Quality Web Service Samples
    • Data Quality Project Portal: a web-based tool that is designed to provide you with one environment for manageable administration of your processes.
Data Quality Integration
  • Web services
  • Integration SDK
    • JAVA API
    • C++ API
  • Socket