Skip to main content Link Menu Expand (external link) Document Search Copy Copied

IBM Data Science - Best Practices

Sizing

The following table lists some of the most common drives of complexity and size of ML solutions and projects. Not all are directly related to the ML model, but still heavily influence the sizing.

Factor Type Description
Business Problem Complexity Business The number of use cases and the use cases themselves are typically defining the business problem and therefore also its complexity.
Legal, Ethical & Regulatory Requirements Business This covers topics like GDPR or CCPA but also the “Rome Call for AI Ethics”. (Further reading material on ethics: Everyday Ethics for AI)
Non-Functional Requirements (NFRs) Technical NFRs cover topics like Performance (including Capacity), Volumetrics, Scalability, Security, Regulatory Compliance, Usability, Maintainability (including Flexibility), Availability (including Recoverability and Reliability), Manageability, Environmental (including Safety).
Stakeholder Complexity Business This refers to the stakeholders of the project. This is driven mainly by the number of stakeholders and their relationships towards the project and to each other.
User Complexity Business The users as such should be included in the stakeholders, however depending on the skills, disparity and needs they often drive the complexity by e.g., their interface requirements, explainability needs and availability demands.
Data Sourcing Complexity Technical The complexity of data sources is mainly driven by the amount of data sources, their availability, accessibility and disparity.
Data Complexity Technical The data complexity is mainly driven by the size of the data (e.g., row size, number of total rows, total storage size) and the complexity of the data model including its consistency within the data. Especially the uniqueness of (data) keys or its absence can be driving the complexity.
Hypothesis Complexity Technical The hypothesis is usually directly derived from the use case. This covers the complexity of the hypothesis is self but also the complexity of testing / verifying the hypothesis.
Interface Complexity Technical The interface complexity is driven by the amount, type disparity and individual complexity. It covers both the interfaces with data sources (see above) but also the interfaces during interference.
Solution Complexity Technical The complexity of the solution is mainly driven by the use cases, functional and non-functional requirements (see above) but also what is technically possible. This defines how complex the required architecture will need to be. For AI systems in particular the solution complexity is driven by e.g., the deployment type (cloud vs. edge) and the training mode (one-time vs. batch vs. continuous).
Model Complexity Technical The model complexity is driven by the use case and its solution outline. This is then often reflected in the number of models needed, their size and their individual complexity.