Machine Learning is a type of artificial intelligence where a computer is trained to recognize examples from a large data set. The data could include pictures, sounds, faces, or gestures. We train Machine Learning programs using data sets. This could be thousands of photos, hours of audio, or millions of lines of text. To be equitable, a data set must include a diverse range of people and experiences.
The specific part of the data set used to « teach » the model, is called training data. If the training data is missing certain groups, like people with different accents or genders the AI develops a « blind spot » or Bias.
Explore the video below to understand the importance of creating diverse data-sets when training machine learning programs.