Home technologyHow Scale AI taskers are being used to scrape the internet for training data

How Scale AI taskers are being used to scrape the internet for training data

by Ava Mercer
0 comments
How Scale AI taskers are being used to scrape the internet for training data

Tens of thousands of people have been paid by Scale AI, a company 49% controlled by Mark Zuckerberg’s Meta, to help train artificial intelligence systems by searching Instagram accounts, collecting copyrighted work and transcribing pornographic soundtracks, the Guardian reports.

The work is being carried out through a platform called Outlier, which Scale AI uses to recruit people with specialist knowledge in subjects such as medicine, physics and economics. The company presents the role as a way to help improve advanced AI systems, using the slogan: “Become the expert that AI learns from.”

According to the report, the workers involved in this process are often described as “taskers.” Their job involves pulling information from across the internet and turning it into material that can be used to train AI models. The Guardian says this includes combing through Instagram profiles, collecting copyrighted content and working with explicit audio material.

Scale AI’s use of these workers highlights the hidden labor behind AI training, where large numbers of people are used to sort, label and process information that helps machine learning systems improve. The company has sought to attract highly educated workers by emphasizing flexibility and expertise, while positioning the work as part of the development of cutting-edge AI tools.

The report also points to the connection between Scale AI and Meta, noting that the social media company controls 49% of Scale AI. That link gives added significance to the company’s role in the AI supply chain, particularly at a time when major technology firms are competing to build more capable systems.

Scale AI has recruited specialists across several disciplines, including medicine, physics and economics, to contribute to the process. The work is presented as a chance for knowledgeable people to influence how AI learns, but the Guardian’s findings suggest that the underlying tasks can involve ethically sensitive material and the harvesting of content from personal online spaces.

The company’s Outlier platform is central to that effort. Through it, Scale AI connects workers with assignments designed to refine AI models. The report says this can include material drawn from private or copyrighted sources, raising questions about the boundaries of data collection in AI development.

As AI companies continue to expand their training operations, the role of contractors and gig workers has become increasingly important. This latest report suggests that, behind the public messaging about expertise and innovation, a significant amount of the work may involve repetitive and sometimes disturbing material pulled from the internet.

The Guardian says the findings reveal how desperation and low-paid labor are helping power the AI industry, even as some of the most advanced systems depend on workers scraping together data from personal profiles and protected content.

You may also like

Leave a Comment