Bias Detection When Developing AI Algorithms

May 13, 2021 | 7 minutes read

As artificial intelligence has become more widespread in recent years, topics such as bias in AI algorithms are more important than ever before. An AI system will only be as unbiased as the people who develop the technology, as the implicit biases that people hold against other members of society will inevitably seep into the work that they create.

Moreover, as many development teams who work on AI algorithms are not themselves diverse, it is very difficult to create AI models that represent large populations with both accuracy and efficiency. While identifying any biases you may possess is the first step in tackling bias in AI algorithms, there are other steps that can be taken to ensure that all technology created is functioning from an unbiased perspective at all times.

As the development of AI happens over the course of many stages, it is imperative that unconscious biases are addressed at every stage of development. Four common examples of these stages include data collection, data processing, data analysis, and modeling.

AI Data collection

Data collection is the first stage in development in which bias may be detected and the most influential stage in development overall. The primary reason for this is that as data sets are obviously created by human inputs, they are also subject to human biases, prejudices, or preconceived notions. It is imperative that diversity is introduced to the data set at this stage to ensure that the AI model will have a wide range of references and information to choose from once the development process is concluded. A common bias found in the data collection stage is selection bias, which is defined as the selection of data in such a way that the sample is not accurately represented in the data set.

To give an example, many research studies will use college-aged students as the focus group to test whatever hypothesis they are seeking to put forth. However, students are not representative of the entire population of the country, whether it be from the perspective of age, physical health, or a myriad of other factors. As such, a study that drew responses strictly from college-aged students would be biased towards the feelings of a younger demographic. Another area in the data collection stage where bias is the “framing effect”, in which survey questions are constructed with a particular slant.

To give another example, option 1 may offer a $100 gain, while option 2 may offer $200, but at a $100 loss. Despite the fact that both of these options would total a $100 gain, the framing effect posits that some people may view the second option as more appealing than the first because it is framed in a way that involves more money. This is an example of the bias that more is always better, irrespective of any evidence to the contrary. Another example of bias that may appear in the data collection stage is a systematic bias, in which there is both consistent and repeatable error. While this form of bias can be extremely difficult to detect, it is most often the result of faulty machinery and can be easily combated by maintaining a good understanding of the hardware being used to create the AI model at hand.

AI Data Processing

While gathering a wide range of data is the most important step in creating an effective AI model, this data is still raw and needs to be processed so that it can be understood by ML algorithms. In the data processing stage, the data collected in the collection stage can then be extracted and structured, unstructured, or semi-structured according to the model that is being created. What’s more, there are three stages within the data processing stage, formatting, cleaning, and sampling. As there are different forms in which data can be found such as the Parquet file format and the proprietary format, data formatting makes it easier for machine learning to models to work with data in the most efficient manner possible. In the cleaning stage, unwanted, corrupted, or missing data can be removed so that the end result is as polished and refined as possible. Finally, the sampling stage allows developers to save time and memory space by selecting a particular sample of the data set which will allow for faster prototyping and exploration solutions.

In terms of addressing bias, outlier detection is an example of removing potential bias before it has the chance to form in the data processing stage. For instance, in a data set where the vast majority of people are 20-25 years old healthy college students, a person who is 90 years old and retired would be less likely to be representative of the data. To give another example, in a license plate data set containing primarily American license plates, a European license plate would not be very representative of the overall data despite the fact that it is technically a license plate as well.

Data analysis

After the data has been collected and processed, it is then ready for further analysis. During the third stage of development, the data analysis phase, the data can be fine-tuned further to ensure the best possible end product. For example, statistical data models such as regression analysis and correlation can be used by developers to identify relationships among the data variables in the set. Alternatively, data visualization techniques can be used to examine the data in a graphical format with the aim of gaining further insight and clarity in the data set itself. Additionally, the data analysis phase is an opportune time to check that the first two stages of development are satisfactory and efficient before moving on to the final stage of development.

With regard to bias, confirmation bias can have an adverse effect on a data set during the data analysis phase. As it is very easy for people to seek out information and metrics that already affirm their current viewpoint, it is important to consider perspectives outside of one’s own worldview when compiling data. Conversely, something as simple as a misleading graph can introduce bias to an AI model. For instance, a data scientist may choose to start the y-axis of their graph at 0, which could in turn make the results of the said graph appear to be far more pronounced and noticeable than if the y-axis were at a number like 70 or 80.

AI modeling

Once the first three stages of development have been completed, deploying the model into production is the final step in the process. However, the model must first be tested for accuracy and efficiency before it can be released to the general public. An example of such tests includes unit tests, in which the AI program is broken down into blocks and each element is tested separately. Another common test is the regression test, in which the AI is tested against another existing software option to ensure that the model being created won’t suddenly break or experience other technical issues. Integration tests can also be implemented and function by observing how multiple components in a program work together. In addition to these tests, AI can also be tested with more general metrics such as accuracy, loss, precision, and recall.

Forms of bias that may be present in the modeling stage include bias/variance trade-off and concept drift. While a model with high bias will prove to be inadequate over time, a certain level of variance is needed to ensure general responses. On the contrary, concept drift describes a phenomenon in which the statistical properties of a target variable change gradually over time in unexpected ways. For instance, a model may be developed to predict the behavior of customers frequenting an online clothing store. While the model may start out great, it could gradually diminish into ineffectiveness over the course of a calendar year. What has happened in this instance is that customer behavior has changed during the year, and it is very difficult to quantify these changes as human behavior is not based on any particular metric.

Regardless of any stage of development or associated technical jargon, it is on developers themselves to try and identify their biases and make sure that they are creating AI models that are representative of the entirety of the population they are serving at a given time.