Setting up Completeness Reports

 

There are 3 tables which are used to run a completeness report. The first is ETL_QA_SQL. This table creates our measures. There has to be a source measure and a target measure. So for CUSTOMER_DIM there 2 records in this table.

 

ETL_QA_SQL

 

 

QA_CATEGORY

QA_STAT_NAME

SQL_TO_RUN

CUSTOMER_DIM

CUSTOMER_DIM_S

select count(*) from SOURCE.CUSTOMERS;

CUSTOMER_DIM

CUSTOMER_DIM_T

select count(*) from CUSTOMER_DIM Where ActiveFlag = 1;

INVOICE_FACT_AmountPaid

INVOICE_FACT_AmountPaid_S

select sum(AMT_PD) from SOURCE.INVOICES;

INVOICE_FACT_AmountPaid

INVOICE_FACT_AmountPaid_T

select sum(AmountPaid) from INVOICE_FACT;

INVOICE_FACT_CustomerId

INVOICE_FACT_CustomerId_S

select count(*) from INVOICE_FACT Where CustomerId <> 0 And LastUpdate = current_date;

INVOICE_FACT_CustomerId

INVOICE_FACT_CustomerId_T

select count(*) from INVOICE_FACT Where LastUpdate = current_date;

 

Now when you calculate these measures the results are put in the ETL_QA_RESULT. Here is a sample.

 

ETL_QA_RESULT

 

 

 

ETL_QA_RESULT_ID

CREATE_TS

QA_STAT

SQL_TO_RUN

CUSTOMER_DIM_S

2007-03-05 10:55

975

select count(*) from SOURCE.CUSTOMERS;

CUSTOMER_DIM_T

2007-03-05 10:56

1000

select count(*) from CUSTOMER_DIM Where ActiveFlag = 1;

INVOICE_FACT_AmountPaid_S

2007-03-05 11:05

1000

select sum(AMT_PD) from SOURCE.INVOICES;

INVOICE_FACT_AmountPaid_T

2007-03-05 11:06

1000

select sum(AmountPaid) from INVOICE_FACT;

INVOICE_FACT_CustomerId_S

2007-03-05 10:35

960

select count(*) from INVOICE_FACT Where CustomerId <> 0 And LastUpdate = current_date;

INVOICE_FACT_CustomerId_T

2007-03-05 10:36

1000

select count(*) from INVOICE_FACT Where LastUpdate = current_date;

 

Now all we need to do is setup the report. The report has to know the source measure and the target measure to figure out the completeness. This is controlled through the ETL_QA_RESULT_MAP table. Here is how it would look.

 

ETL_QA_RESULT_MAP

 

 

TABLE_NAME

ETL_QA_RESULT_ID_S

ETL_QA_RESULT_ID_T

CUSTOMER_DIM

CUSTOMER_DIM_S

CUSTOMER_DIM_T

INVOICE_FACT

INVOICE_FACT_AmountPaid_S

INVOICE_FACT_AmountPaid_T

INVOICE_FACT

INVOICE_FACT_CustomerId_S

INVOICE_FACT_CustomerId_T

 


 

The report is created from joining the last 2 tables together where you only use the last run of results. Here is the SQL.

 

/* -----------------------------------------------------------------

   EtlResultCompleteness.sql

   Decription: Reports completeness of dimensions and fact tables.

   Written by: Kim Duke

   Notes:

   Need to setup tables

   1. ETL_QA_SQL

   2. ETL_QA_RESULT_MAP

   Then run sequence SeqQaCompleteness

   ----------------------------------------------------------------- */

Select

   ETL_QA_RESULT_MAP.TABLE_NAME AS QA_CATEGORY,

   convert(decimal(18,0),AllRuns_T.QA_STAT) AS TARGET_COUNT,

   convert(decimal(18,0),AllRuns_S.QA_STAT) AS SOURCE_COUNT,

   /* Oracle ROUND((100 * (AllRuns_T.QA_STAT / AllRuns_S.QA_STAT)), 2) AS COMPLETENESS */

   convert(decimal(18,2),ROUND((100 * (convert(decimal(18,2),AllRuns_T.QA_STAT) / convert(decimal(18,2),AllRuns_S.QA_STAT))), 2)) AS COMPLETENESS

From

   ETL_QA_RESULT_MAP,

   ETL_QA_RESULT AllRuns_S,

   (

      Select

         ETL_QA_RESULT.ETL_QA_RESULT_ID,

         Max(ETL_QA_RESULT.CREATE_TS) AS MAX_CREATE_TS

      From

         ETL_QA_RESULT 

      Group By

         ETL_QA_RESULT.ETL_QA_RESULT_ID

   ) LastRun_S,

   ETL_QA_RESULT AllRuns_T,

   (

      Select

         ETL_QA_RESULT.ETL_QA_RESULT_ID,

         Max(ETL_QA_RESULT.CREATE_TS) AS MAX_CREATE_TS

      From

         ETL_QA_RESULT 

      Group By

         ETL_QA_RESULT.ETL_QA_RESULT_ID

   ) LastRun_T

Where

   AllRuns_S.ETL_QA_RESULT_ID = LastRun_S.ETL_QA_RESULT_ID

   AND AllRuns_S.CREATE_TS = LastRun_S.MAX_CREATE_TS

   AND AllRuns_S.ETL_QA_RESULT_ID = ETL_QA_RESULT_MAP.ETL_QA_RESULT_ID_S

   AND AllRuns_T.ETL_QA_RESULT_ID = LastRun_T.ETL_QA_RESULT_ID

   AND AllRuns_T.CREATE_TS = LastRun_T.MAX_CREATE_TS

   AND AllRuns_T.ETL_QA_RESULT_ID = ETL_QA_RESULT_MAP.ETL_QA_RESULT_ID_T

Order By

   ETL_QA_RESULT_MAP.TABLE_NAME

;

 

Completeness Report

 

 

 

QA_CATEGORY

TARGET_COUNT

SOURCE_COUNT

COMPLETENESS

CUSTOMER_DIM

975

1000

97.50%

INVOICE_FACT_AmountPaid

1000

1000

100.00%

INVOICE_FACT_CustomerId

960

1000

96.00%

 

This is a simple report. The html page is a quick and dirty way to report straight out of DataStage. It is generated from a DataStage job I wrote. Any SQL statement can be sent in this format to any user or group of users. This whole process is automatically ran after our daily loads are done. This is what we call a sanity check. All the real SQL has only the records loaded today. So the WHERE clause has LastUpdate = current_date. Some dimensions calculate the whole table because it does not take that long.