The agile manifesto (www.agilemanifesto.org) suggests that working software should be delivered in short delivery cycles. To realize this, one of the practices that teams need is test automation.
Business intelligence projects have the testing challenge of validating data. BI projects bring together data from a variety of source systems to support decision-making and performance management. That means, data has to be extracted, transformed, and quality assured prior to performing analysis and presenting the results. Therefore, for business intelligence projects to be agile, they need automated testing for data quality, data transformation rules, and the data presentation.
Data Quality Assurance Components
Based upon my experience, I can name at least nine data components that need testing and validation in a business intelligence project. (See Figure 1)
Figure 1 Business Intelligence & Analytic Data Components
These components need to be evaluated from a technical point view as well as business point of view. Table 1 describe my view on the data components that need to be tested.
|No.||Component||Technical Assurance||Business Assurance Check|
|1||The transaction data in information and operational systems is transferred to or accessed from a data-warehousing environment.||The data from the source system matches the data loaded.||The data is sourced from the expected source and operational systems, and it is consistent with expectations.|
|2||The transaction data is transformed to include new or adjusted business rules, calculations, and aggregations.||The technical transformation of the data meets the agreed specified business rules.||The resulting data is plausible and logical for the intended business usage, i.e., the business rules meet the expectations.|
|3||The master data is referenced and consistent between systems.||The master data is consistent between source system and the data warehouse.||The master data for customers, products, etc matches the official reference source and is complete according to expectations.|
|4||The attribute definitions are aligned to the transaction data.||The attirbute data is consistent between systems and aligned to the correct transaction data.||The data includes the attributes that should be used for querying, referening, etc. The data is defined as expected and aligned to the right transaction data.|
|5||The hierarchies describe how the transaction data should be aggregated.||The transaction data is aggregated according to the hierarchies.||The hierarchies define the correct structure and aggregation levels.|
|Â 6||Within an application, the data is loaded into the correct table, field, measure in the application.||Compare data loaded from the data warehouse with data displayed in the application.||Compare data with existing reports, examples, etc. May need to validate that the examples are correct.|
|7||The data is transformedÂ within an application.||The calculations and business rules are implemented according to the specification.||The resulting data is plausible and logical for the intended business usage, i.e., the business rules meet the business expectations.|
|8||Within an application, the data flows between the different application components (e.g., workbooks, reports, etc.).||The data fields or measures are consistently referenced in differnt parts of the application.||Compare data from applications component to component; for example from screen to screen, from screen to workbook, etc.|
|9||The data includes the required content.||The tables, fields, and measures are correct, and the names and content are consistent with organizational data definitions.||Compare data with previous reports, examples, etc. (May need to validate that the examples and previous reports from the existing systems are correct.)|
Table 1 Business Intelligence & Analytic Data Components
One approach for performing the test automation is to use different software tools to collaborate with the business users to extract their requirements and to design tests that can be automated.
At Agile Testing Days 2012, Mike Scott and Tom Roden reported they were able to use examples from the business, define a shared IT and business language, and using FitNesse, an open source test automation tool, to automate 100 % of the regression tests. Witin six months, they reduced regression-testing effort from 100 person days at its peak for 151 tests to 23 person days of effort for executing 311 tests. In another six months, they were able to reduce the regression testing effort to zero for 601 tests.
Two data warehousing professionals in different situations also described the need for using design patterns and automation on the Extract-Transform-Load (ETL) processes.
- A Consultant described how he follows simple design patterns, and uses Jailer, an open source data-generation tool, a self-designed front end to automate testing of ETL processes.
- A Data Warehousing Manager described how he used an object-oriented architecture and automation to deliver fast, efficient, and quality assured data extracts. The architecture includes a flat file creator and technical, business logic, and output layers. With an automated process, the team could focus on discussing and understanding the business logic and rules. The format specifications and business logic rules were entered in customization tables and the automation produced the data extracts in a standardized format. According to the data warehouse manager, â€œwe did not have to program any longer; we were able to concentrate purely on implementing the business logic/ rules.
Open Source Test Software
Open source packages offer a reasonable starting place for designing business intelligence test automation. A small selection of such testing software is described in Table 2.
|Jenkins||Jenkins Continuous Integration (CI) is an open source application that monitors the execution of repeated jobs. It is currently used to focus on continuously building and testing software projects and monitoring externally run jobs. It can be used for build management, deployment automation, test automation, and release management|
|CruiseControl||Cruise Control is an open source tool and framework for continuous integration and creating a custom continuous build process. It includes plug-in for source control, build technologies, and notification schemes.|
|Jailer||Jailer is an open source tool that can be used to extract data and maintain its referential integrity. It can be used for database sub-setting and scheme and data browsing. It generates DBUnit datasets; hierarchy structured XML, and topologically stored SQL-DML. It is written in Java, platform independent, and database agnostic. DBUnit is a Junit extension that can be used to export and import databases data to and from XML datasets.|
|FitNesse||FitNesse is an open source integrated standalone wiki and acceptance testing framework. The wiki pages run as tests so the specifications can be tested against the application.|
|Selenium||Selenium is a set of tools to automate web browsers that can be controlled by different programming languages and testing frameworks… It can be used for automating web applications for testing and for web-based administration tasks.|
|Ataccama DQ Analyzer||Ataccama DQ Analyzer is an open source data profiling tools with an advanced data profiling and analysis capabilities.|
|Talend Data Profiler||Talend Data Profiler is an open source data proflier to analyze data from variety of sources, customize and define your indicators, and produce reports.|
|CloverETL Data Profiler||CloverETL Data Profiler is an open source tool that uses statistics to identify data inconsistencies, including duplicate or missing data in the source.|
|Apache JMeter™||The Apache JMeter™ is an open source desktop application to test performance of static and dynamic resources (Files, Web dynamic languages – PHP, Java, ASP.NET, etc. -, Java Objects, Data Bases and Queries, FTP Servers and more). It can simulate a heavy load on a server, group of servers, network or object to test its strength or to analyze overall performance under different load types. In addition, it can make a graphical analysis of performance or test server/script/object behavior under heavy concurrent load.|
TableÂ 2 Selected Test Automation Software
Automation is a must
In summary, test automation is necessary for agile business intelligence projects. The team should establish design standards so that patterns of development can be used to facilitate automation of testing as well as standard data warehousing processes, e.g., ETL Table 3 offers an example for how automated testing might be integrated into a business intelligence project.
|Activity||Description||People and software|
|Prepare user story||Specify the user stories that are independent, negotiable, valuable, estimate able, small, and testable.||
|Specify tests & examples||Clarify what is meant by the story including provide examples of the expected results (usually in excel files or other sample reports.) Discuss and define the business logic and business rules.||
|Automate Tests||Automate the test. Automating the test and development occur in parallel.||
|Develop||Develop in patterns to facilitate automation and using a Test Driven Development (TDD) process. Automating the test and development occur in parallel.||
|Session-based Testing||Perform testing||
|Customer Verification||Verify the final results||
Table 3 Business Intelligence Projects and Automated Testing