Decentralized OORT AI data hits top ranks on Google Kaggle
14 Maio 2025 - 9:18AM
Cointelegraph


An artificial intelligence training image data set developed by
decentralized AI solution provider OORT saw considerable success on
Google’s platform Kaggle.
OORT’s Diverse Tools Kaggle data
set listing was released in early April; since then, it
has climbed to the first page in multiple categories. Kaggle is a
Google-owned online platform for data science and machine learning
competitions, learning and collaboration.
Ramkumar Subramaniam, core contributor at crypto AI project
OpenLedger, told Cointelegraph that “a front-page Kaggle ranking is
a strong social signal, indicating that the data set is engaging
the right communities of data scientists, machine learning
engineers and practitioners.“
Max Li, founder and CEO of OORT, told Cointelegraph that the
firm “observed promising engagement metrics that validate the early
demand and relevance” of its training data gathered through a
decentralized model. He added:
“The organic interest from the community, including
active usage and contributions — demonstrates how decentralized,
community-driven data pipelines like OORT’s can achieve rapid
distribution and engagement without relying on centralized
intermediaries.“
Li also said that in the coming months, OORT plans to release
multiple other data sets. Among those is an in-car voice commands
data set, one for smart home voice commands and another one for
deepfake videos meant to improve AI-powered media verification.
Related:
AI agents are coming for DeFi — Wallets are the weakest
link
First page in multiple categories
The data set in question was independently verified by
Cointelegraph to have reached the first page in Kaggle’s General
AI, Retail & Shopping, Manufacturing, and Engineering
categories earlier this month. At the time of publication, it lost
those positions following a possibly unrelated data set update on
May 6 and another on May 14.
OORT’s data set on the first Kaggle page in
Engineering category. Source: Kaggle
While recognizing the achievement, Subramaniam told
Cointelegraph that “it’s not a definitive indicator of real-world
adoption or enterprise-grade quality.” He said that what sets
OORT’s data set apart “is not just the ranking, but the provenance
and incentive layer behind the data set.” He explained:
“Unlike centralized vendors that may rely on opaque
pipelines, a transparent, token-incentivized system offers
traceability, community curation, and the potential for continuous
improvement assuming the right governance is in
place.“
Lex Sokolin, partner at AI venture capital firm Generative
Ventures, said that while he does not think these results are hard
to replicate, “it does show that crypto projects can use
decentralized incentives to organize economically valuable
activity.”
Related:
Sweat wallet adds AI assistant, expands to multichain
DeFi
High-quality AI training data: a scarce commodity
Data published by AI
research firm Epoch AI estimates that human-generated text AI
training data will be exhausted in 2028. The pressure is high
enough that investors are now mediating
deals giving rights to copyrighted materials to AI companies.
Reports concerning increasingly scarce AI training data and how
it may limit growth in the space have been
circulating for years. While synthetic (AI-generated) data is
increasingly used with at least some degree of success, human data
is still largely viewed as the better alternative, higher-quality
data that leads to better AI models.
When it comes to images for AI training specifically, things are
becoming increasingly complicated with artists sabotaging training
efforts on purpose. Meant to protect their images from being used
for AI training without permission, Nightshade
allows users to “poison” their images and severely degrade model
performance.
Model performance per number of poisoned images.
Source:
TowardsDataScience
Subramaniam said, “We’re entering an era where high-quality
image data will become increasingly scarce.” He also recognized
that this scarcity is made more dire by the increasing popularity
of image poisoning:
“With the rise of techniques like image cloaking and
adversarial watermarking to poison AI training, open-source
datasets face a dual challenge: quantity and trust.”
In this situation, Subramaniam said that verifiable and
community-sourced incentivized data sets are “more valuable than
ever.” According to him, such projects “can become not just
alternatives, but pillars of AI alignment and provenance in the
data economy.“
Magazine:
AI Eye: AI’s trained on AI content go MAD, is Threads a
loss leader for AI data?
...
Continue reading Decentralized OORT AI data hits top
ranks on Google Kaggle
The post
Decentralized OORT AI data hits top ranks on Google
Kaggle appeared first on
CoinTelegraph.
EOS (COIN:EOSUSD)
Gráfico Histórico do Ativo
De Mai 2025 até Jun 2025
EOS (COIN:EOSUSD)
Gráfico Histórico do Ativo
De Jun 2024 até Jun 2025