January 10, 2024 ADC Team 6 min read

Data-Driven Testing: Complete Guide with Python

Data-Driven Testing (DDT) consists of separating test data from test code, allowing the same scenario to be executed with multiple datasets. It is one of the most powerful practices for increasing coverage without duplicating code.

Key takeaways

pytest.mark.parametrize: the simplest DDT approach in Python — one decorator automatically generates a separate test for each data tuple
Data sources ranked by use case: CSV (large volumes, non-dev teams) ? JSON (hierarchical data, Git-friendly) ? Excel/openpyxl (business-maintained data) ? database (live environment state)
Test isolation is mandatory: each test must run independently — never depend on data created by another test in the suite
DDT is a coverage multiplier: one test definition × N datasets = N scenarios, with zero code duplication

What is Data-Driven Testing?

DDT allows running the same test with different combinations of inputs and expected results. Data is stored separately (CSV, JSON, Excel, database) and injected dynamically into the tests. The result: maximum coverage with minimum code.

1. DDT with pytest.mark.parametrize

The most straightforward method in Python. The @pytest.mark.parametrize decorator allows automatically iterating over multiple datasets.

Example: Testing a discount calculation function with different scenarios (0%, 10%, 25%) in a single test definition. pytest automatically generates a separate test for each tuple of parameters.

2. Data from a CSV File

For large volumes of data or data maintained by non-developers, CSV is ideal. The standard csv library or pandas allow loading data and passing it to parametrize.

3. Data from JSON

JSON is preferable when data has a hierarchical structure (nested objects, arrays). It is human-readable and easy to version with Git.

4. Data from Excel with openpyxl

For business teams that maintain their test data in Excel, openpyxl allows reading sheets directly. Be careful to lock the file structure to avoid breaking changes.

5. Data from a Database

For dynamic data or data linked to the test environment, a direct database connection (SQLite, PostgreSQL via psycopg2) is the best option. The data always reflects the real state of the system.

6. Test Data Management Strategy

Isolation: Each test must be able to run independently — do not depend on data created by another test
Cleanup: Use pytest fixtures (setup/teardown) to prepare and clean data
Realistic data: Use generators (Faker) to create data close to production
Version the data: Store CSV/JSON in Git alongside test code
Edge cases: Always include edge cases (null values, empty strings, negative integers)

Conclusion

Data-Driven Testing is a coverage multiplier. By separating data and test logic, you reduce duplication, ease maintenance and make your test suite accessible to non-developers. Start simple with parametrize, then evolve towards external sources depending on the complexity of your project.

Deepen your knowledge with our courses

Our QA Automation Python course covers DDT in depth: pytest, advanced fixtures, CI/CD integration.

View the QA Automation course