Claritypoint AI
No Result
View All Result
  • Login
  • Tech

    Biotech leaders: Macroeconomics, US policy shifts making M&A harder

    Funding crisis looms for European med tech

    Sila opens US factory to make silicon anodes for energy-dense EV batteries

    Telo raises $20 million to build tiny electric trucks for cities

    Do startups still need Silicon Valley? Leaders at SignalFire, Lago, and Revolution debate at TechCrunch Disrupt 2025

    OmniCore EyeMotion lets robots adapt to complex environments in real time, says ABB

    Auterion raises $130M to build drone swarms for defense

    Tim Chen has quietly become of one the most sought-after solo investors

    TechCrunch Disrupt 2025 ticket rates increase after just 4 days

    Trending Tags

  • AI News
  • Science
  • Security
  • Generative
  • Entertainment
  • Lifestyle
PRICING
SUBSCRIBE
  • Tech

    Biotech leaders: Macroeconomics, US policy shifts making M&A harder

    Funding crisis looms for European med tech

    Sila opens US factory to make silicon anodes for energy-dense EV batteries

    Telo raises $20 million to build tiny electric trucks for cities

    Do startups still need Silicon Valley? Leaders at SignalFire, Lago, and Revolution debate at TechCrunch Disrupt 2025

    OmniCore EyeMotion lets robots adapt to complex environments in real time, says ABB

    Auterion raises $130M to build drone swarms for defense

    Tim Chen has quietly become of one the most sought-after solo investors

    TechCrunch Disrupt 2025 ticket rates increase after just 4 days

    Trending Tags

  • AI News
  • Science
  • Security
  • Generative
  • Entertainment
  • Lifestyle
No Result
View All Result
Claritypoint AI
No Result
View All Result
Home Tech

Numeral raises $35M to automate sales tax with AI

Chase by Chase
September 25, 2025
Reading Time: 3 mins read
0

# Beyond Scaling: Why Data Quality is the New Frontier in AI

RELATED POSTS

Biotech leaders: Macroeconomics, US policy shifts making M&A harder

Funding crisis looms for European med tech

Sila opens US factory to make silicon anodes for energy-dense EV batteries

For the past several years, a simple mantra has dominated the development of large-scale AI: bigger is better. The scaling laws, empirically demonstrated and widely accepted, showed a clear correlation between model size, dataset size, and performance. This thinking gave us the leap from GPT-2 to GPT-3 and the subsequent explosion of generative AI. The race was on to train ever-larger models on ever-larger swaths of the internet. But as a field, we are now confronting the natural limits of this brute-force approach. The next significant leap forward won’t come from simply adding another trillion parameters; it will come from a fundamental shift in focus—from scale to substance.

—

### The Diminishing Returns of Brute Force

The “more is more” paradigm is hitting three critical walls: computational cost, data scarcity, and diminishing returns.

First, the cost of training state-of-the-art foundation models has become astronomical, running into the hundreds of millions of dollars for a single training run. This level of investment is sustainable for only a handful of hyperscale tech companies, creating a significant barrier to entry and stifling broader innovation. The associated energy consumption also raises serious environmental and ethical questions that we can no longer ignore.

Second, we are quite literally running out of high-quality data. Foundational models have already been trained on a significant portion of the publicly accessible internet. While there is still more text and imagery to be found, much of it is low-quality, repetitive, or toxic. Feeding a model more of this “data sludge” can actually degrade its performance, introducing noise, bias, and unpredictability. The well of easily accessible, high-quality human-generated data is not infinite, and we are approaching its bottom.

ADVERTISEMENT

Finally, the performance gains from scaling are no longer as dramatic as they once were. While moving from a 10-billion to a 100-billion parameter model yielded transformative results, the improvements gained by going from 1 trillion to 2 trillion parameters are far less pronounced, especially when measured against the exponential increase in cost and complexity. The curve is flattening.

### The Pivot to Data-Centric AI and Architectural Efficiency

This is where the new frontier emerges. Instead of focusing solely on the model architecture, the most innovative research is now pivoting to a **data-centric** approach. The core idea is simple: a smaller model trained on a pristine, perfectly curated dataset can outperform a much larger model trained on a noisy, unfiltered one.

We’re seeing this play out in several key areas:

1. **Meticulous Data Curation:** The process is shifting from data *hoarding* to data *refining*. This involves sophisticated filtering pipelines to remove duplicates, toxic content, and personally identifiable information. It also means actively selecting for data that exhibits complex reasoning, diverse perspectives, and factual accuracy. The success of models like Microsoft’s Phi series, which achieve remarkable performance with relatively few parameters by training on “textbook-quality” data, is a testament to this approach. They proved that quality, not just quantity, is a primary driver of capability.

2. **The Rise of Synthetic Data:** Perhaps the most exciting development is the use of highly capable models to generate synthetic training data for the next generation. A state-of-the-art model can be prompted to create millions of high-quality, structured examples of reasoning, coding, or instruction-following. This creates a powerful self-improvement loop, or “distillation” process, where the knowledge of a massive, expensive model can be transferred to a smaller, more efficient one. This allows us to create specialized models that are both powerful and economical to run.

3. **Smarter, Not Just Bigger, Architectures:** Alongside the data-centric shift, architectural innovations are enabling greater efficiency. **Mixture-of-Experts (MoE)** models are a prime example. Instead of activating an entire massive network for every single token, an MoE model routes each token to a small subset of “expert” sub-networks. This means that while the model may have a huge number of total parameters, the computational cost for inference is dramatically lower. It’s a move from a monolithic brain to a specialized committee, bringing massive performance at a fraction of the operational cost.

—

### A More Sustainable and Capable Future

The era of scaling is not over, but its dominance is waning. The future of AI development is more nuanced and, frankly, more interesting. It’s about surgical precision, not blunt force. By focusing on data quality, harnessing the power of synthetic generation, and building more efficient architectures, we are paving the way for a new generation of AI systems. These models will not only be more capable and reliable, but also more accessible, specialized, and sustainable, marking the maturation of our field from an age of explosive growth to one of refined engineering.

This post is based on the original article at https://techcrunch.com/2025/09/18/numeral-raises-35m-to-automate-sales-tax-with-ai/.

Share219Tweet137Pin49
Chase

Chase

Related Posts

Tech

Biotech leaders: Macroeconomics, US policy shifts making M&A harder

September 26, 2025
Tech

Funding crisis looms for European med tech

September 26, 2025
Tech

Sila opens US factory to make silicon anodes for energy-dense EV batteries

September 25, 2025
Tech

Telo raises $20 million to build tiny electric trucks for cities

September 25, 2025
Tech

Do startups still need Silicon Valley? Leaders at SignalFire, Lago, and Revolution debate at TechCrunch Disrupt 2025

September 25, 2025
Tech

OmniCore EyeMotion lets robots adapt to complex environments in real time, says ABB

September 25, 2025
Next Post

A pivotal meeting on vaccine guidance is underway—and former CDC leaders are alarmed

Imperative's Symphony plays well for clot removal, still not first chair

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Recommended Stories

The Download: Google’s AI energy expenditure, and handing over DNA data to the police

September 7, 2025

Appointments and advancements for August 28, 2025

September 7, 2025

Ronovo Surgical’s Carina robot gains $67M boost, J&J collaboration

September 7, 2025

Popular Stories

  • Ronovo Surgical’s Carina robot gains $67M boost, J&J collaboration

    548 shares
    Share 219 Tweet 137
  • Awake’s new app requires heavy sleepers to complete tasks in order to turn off the alarm

    547 shares
    Share 219 Tweet 137
  • Appointments and advancements for August 28, 2025

    547 shares
    Share 219 Tweet 137
  • Why is an Amazon-backed AI startup making Orson Welles fan fiction?

    547 shares
    Share 219 Tweet 137
  • NICE tells docs to pay less for TAVR when possible

    547 shares
    Share 219 Tweet 137
  • Home
Email Us: service@claritypoint.ai

© 2025 LLC - Premium Ai magazineJegtheme.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result
  • Home
  • Subscription
  • Category
  • Landing Page
  • Buy JNews
  • Support Forum
  • Pre-sale Question
  • Contact Us

© 2025 LLC - Premium Ai magazineJegtheme.

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?