While llms.txt helps AI read the web and APIs help them connect, neither solves the infinite customization found in the economically important tasks in enterprise software. The real solution lies in computer-use agents that operate at the pixel level, learning from human demonstrations to navigate screens directly. This approach bypasses brittle connectors, allowing AI to handle complex workflows while humans remain in the loop for critical verification.While llms.txt helps AI read the web and APIs help them connect, neither solves the infinite customization found in the economically important tasks in enterprise software. The real solution lies in computer-use agents that operate at the pixel level, learning from human demonstrations to navigate screens directly. This approach bypasses brittle connectors, allowing AI to handle complex workflows while humans remain in the loop for critical verification.

The Screen Is the API

"Why not just use llms.txt to understand the page?"

My friend was watching an AI agent work through a complex enterprise workflow. Clicking through menus, filling forms, handling the kind of nested configuration screens that were the definition of scope creep.

It was a reasonable question. Everyone is excited about llms.txt right now. A simple text file that tells AI systems what your website contains. Finally, the thinking goes, we have a standardized way for machines, or LLMs, to understand the web.

But my friend was confusing two very different problems. Reading is not doing.

The web did not become useful when machines learned to read it. It became useful when machines learned to act on it. And right now, the reading part is limited and we must shift focus to the doing.

Reading Isn’t the Hard Part

Let me be clear about what llms.txt actually does. It is a curated map for LLM inference. A structured way for language models to understand what exists on a website and where to find it. 

This is useful for bringing information to an LLM. But it is not a control mechanism. It does not let AI systems actually do anything. The gap between reading and acting is where the real work begins.

The Action Space

When people talk about AI automation, they usually mean APIs. Expose endpoints, let the AI call them, and you have automation. Simple.

Except it is not simple at all.

APIs expose only what developers choose to expose. They represent a curated subset of functionality that someone decided was worth the engineering effort to formalize. And in enterprise software, that subset is usually tiny compared to what users actually need to do.

Then came MCP, the Model Context Protocol. MCP tries to solve the connector problem. Instead of every AI system needing custom integrations with every application, you build one MCP connector and any MCP-compatible AI can use it.

This is an improvement. It solves the M×N problem where M AI systems need to integrate with N applications. But it assumes someone builds the connector in the first place.

Building these connectors is still hard. It requires understanding both the application and the MCP protocol. Most enterprise software will never get proper MCP support because the economics, I believe, are hard to justify. \n \n Attempts to automate API to MCP conversion have become popular, but they mostly produce brittle, low-level tools. As Han Lee and others point out, REST APIs are designed around nouns (resources with GET/PUT/POST/DELETE), while MCP works best when tools are verbs (deleteRow, createTask). Auto-wrapping one into the other hides that mismatch instead of solving it.

The M×N×P Problem

There is a deeper issue that neither APIs nor MCP can address. Call it the P variable: interface diversity.

P represents the number of unique ways the same software can be configured. And in enterprise software, P grows to enormous scale.

Consider SAP. A single SAP S/4HANA server contains tens of thousands of customizing tables. Every implementation is different. Every organization has its own approval chains, its own business rules, its own custom ABAP developments.

Here is a concrete example. Take something supposedly simple: a purchase order approval workflow. In a real SAP implementation, this involves parallel approval processes with all-or-nothing requirements. Custom rules like auto-approve if a contract covers the full purchase order amount. Multi-level approval chains where limits are maintained in custom tables. Dynamic role assignment based on cost center responsibility.

None of this is standard.

The approval chain requires both the Department Manager and Finance Department to approve simultaneously. Either rejection kills the whole workflow. 

Then come the rules. If the purchase order references a contract and the totals match, auto-approve. Otherwise, check approval limits in a custom table. If the first approver lacks sufficient authority, cascade to the next level.

And the approvers themselves? Assigned dynamically. Sometimes it is the Manager of Workflow Initiator. Sometimes the Cost Center Responsible. Sometimes specific users pulled from yet another custom table.

This is one workflow in one module. 

It requires domain-knowledge-specific consultants to implement because the out-of-the-box logic is too simple for how real organizations actually work.

This is the M×N×P problem. Even if you solved M×N with perfect connector protocols (like the MCP), you would still face the reality that every enterprise implementation is effectively a unique interface.

Computer-Use as the Universal Layer

There is one interface that is universal: the screen.

Computer-use agents operate at the pixel level. They see what humans see. They click where humans click. They navigate the same menus and fill out the same forms.

This sounds crude compared to elegant API calls. But it has one massive advantage: it works with everything. No connector required. No API exposure decisions. No MCP protocol adoption. If a human can do it, a computer-use agent can learn to do it.

The question is whether computer-use works well enough for production use. And here the research is early but encouraging.

The Demonstration Effect

The SCUBA benchmark tests AI agents on real Salesforce CRM workflows. In zero-shot settings, meaning no task-specific training, open-source models achieved less than 5% success rates. Even strong models that perform well on generic desktop benchmarks failed catastrophically when confronted with actual enterprise software.

But with demonstrations, meaning examples of humans completing the workflows, success rates jumped to 50%. Simultaneously, time and costs (of the agents) dropped by 13% and 16% respectively.

General capability is not enough. You need specific training on specific workflows.

Data Efficiency

In my experience, collecting computer-use trajectories is painful. Domain experts rarely understand what actually challenges a model. The infrastructure stacks on top of brittle web environments. Building those environments is pure tedium. When every example costs this much, data efficiency stops being nice-to-have.

Which is why the PC Agent-E research matters. Trained on just 312 trajectories, the model achieved a 141% improvement over the base model.

312 examples. Not millions. Not even thousands. A few hundred carefully chosen demonstrations of the exact workflows.

The model outperformed Claude 3.7 Sonnet with extended thinking on the WindowsAgentArena benchmark. And it generalized well to different operating systems, suggesting the learned behaviors were not brittle.

The economics of enterprise AI automation are simple: you do not need massive datasets. You need the right datasets from the right workflows.

The Honest Trade-Off

Now for the uncomfortable part. Generalization is necessary but not sufficient for high-stakes operations.

The same research that shows promising results also reveals gaps. Some agents that perform well on generic benchmarks like OSWorld achieve less than 5% success on specialized enterprise environments. Despite advances, today's RL systems struggle to generalize beyond narrow training contexts.

The sim-to-real gap persists. An agent that performs flawlessly in simulation may fail in production due to unmodeled variables. 

For high-volume, repetitive workflows like expense approvals, CRM updates, and standard procurement, trained computer-use agents are approaching production readiness. The error rate is acceptable because any single mistake is recoverable.

For one-off, high-stakes operations like schema migrations, financial reconciliations, and compliance configurations, the calculus is different. A database configuration error can cost millions. A compliance failure can trigger regulatory action.

The honest answer is that computer-use can handle navigation and execution for these tasks, but humans must remain at verification checkpoints. The agent does the clicking. The human confirms the consequences.

This is not a failure of the technology. It is appropriate risk management. And it still represents an enormous productivity gain. Navigating to the right screen, filling in the right fields, and preparing the right configurations is most of the work. Human verification at critical decision points is the remaining essential piece. At least for now.

Down The Middle: Agents and Humans

The path forward is not pure automation or pure human control. They are hybrid workflows where computer-use agents handle the interface complexity while humans handle the judgment calls. Human-in-the-loop is already the norm for production AI agents.

This requires new infrastructure. You need training pipelines for enterprise-specific demonstrations. You need simulation environments that match production configurations. You need checkpoint mechanisms that pause for human review at appropriate moments. Companies like Applied Compute, Theta, Osmosis, and Scale AI are starting to build this infrastructure.

But the hard technical problem, making computers reliably operate arbitrary interfaces, is being solved. The remaining problems are organizational and economic. Those problems have a tendency to get solved when the benefits are large enough.

The best agents still fail on most real enterprise tasks. But a few years ago they could barely hit a single submit button. The screen is the only universal interface. That's where the work should go.

\n

\

Market Opportunity
Tx24 Logo
Tx24 Price(TXT)
$0.00456
$0.00456$0.00456
+1.10%
USD
Tx24 (TXT) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Bitcoin Has Taken Gold’s Role In Today’s World, Eric Trump Says

Bitcoin Has Taken Gold’s Role In Today’s World, Eric Trump Says

Eric Trump on Tuesday described Bitcoin as a “modern-day gold,” calling it a liquid store of value that can act as a hedge to real estate and other assets. Related Reading: XRP’s Biggest Rally Yet? Analyst Projects $20+ In October 2025 According to reports, the remark came during a TV appearance on CNBC’s Squawk Box, tied to the launch of American Bitcoin, the mining and treasury firm he helped start. Company Holdings And Strategy Based on public filings and company summaries, American Bitcoin has accumulated 2,443 BTC on its balance sheet. That stash has been valued in the low hundreds of millions of dollars at recent spot prices. The firm mixes large-scale mining with the goal of holding Bitcoin as a strategic reserve, which it says will help it grow both production and asset holdings over time. Eric Trump’s comments were direct. He told viewers that institutions are treating Bitcoin more like a store of value than a fringe idea, and he warned firms that resist blockchain adoption. The tone was strong at times, and the line about Bitcoin being a modern equivalent of gold was used to frame American Bitcoin’s role as both miner and holder.   Eric Trump has said: bitcoin is modern-day gold — unusual_whales (@unusual_whales) September 16, 2025 How The Company Went Public American Bitcoin moved toward a public listing via an all-stock merger with Gryphon Digital Mining earlier this year, a deal that kept most of the original shareholders in control and positioned the new entity for a Nasdaq debut. Reports show that mining partner Hut 8 holds a large ownership stake, leaving the Trump family and other backers with a minority share. The listing brought fresh attention and capital to the firm as it began trading under the ticker ABTC. Market watchers say the firm’s public debut highlights two trends: mining companies are trying to grow by both producing and holding Bitcoin, and political ties are bringing more headlines to crypto firms. Some analysts point out that holding large amounts of Bitcoin on the balance sheet exposes a company to price swings, while supporters argue it aligns incentives between miners and investors. Related Reading: Ethereum Bulls Target $8,500 With Big Money Backing The Move – Details Reaction And Possible Risks Based on coverage of the launch, investors have reacted with both enthusiasm and caution. Supporters praise the prospect of a US-based miner that aims to be transparent and aggressive about building a reserve. Critics point to governance questions, possible conflicts tied to high-profile backers, and the usual risks of a volatile asset being held on corporate balance sheets. Eric Trump’s remark that Bitcoin has taken gold’s role in today’s world reflects both his belief in its value and American Bitcoin’s strategy of mining and holding. Whether that view sticks will depend on how investors and institutions respond in the months ahead. Featured image from Meta, chart from TradingView
Share
NewsBTC2025/09/18 06:00
Nasdaq-listed iPower reaches $30 million convertible note financing agreement to launch DAT strategy.

Nasdaq-listed iPower reaches $30 million convertible note financing agreement to launch DAT strategy.

PANews reported on December 23 that, according to Globenewswire, Nasdaq-listed e-commerce and supply chain platform iPower announced it has reached a $30 million
Share
PANews2025/12/23 22:19
DOGE ETF Hype Fades as Whales Sell and Traders Await Decline

DOGE ETF Hype Fades as Whales Sell and Traders Await Decline

The post DOGE ETF Hype Fades as Whales Sell and Traders Await Decline appeared on BitcoinEthereumNews.com. Leading meme coin Dogecoin (DOGE) has struggled to gain momentum despite excitement surrounding the anticipated launch of a US-listed Dogecoin ETF this week. On-chain data reveals a decline in whale participation and a general uptick in coin selloffs across exchanges, hinting at the possibility of a deeper price pullback in the coming days. Sponsored Sponsored DOGE Faces Decline as Whales Hold Back, Traders Sell The market is anticipating the launch of Rex-Osprey’s Dogecoin ETF (DOJE) tomorrow, which is expected to give traditional investors direct exposure to Dogecoin’s price movements.  However, DOGE’s price performance has remained muted ahead of the milestone, signaling a lack of enthusiasm from traders. According to on-chain analytics platform Nansen, whale accumulation has slowed notably over the past week. Large investors, with wallets containing DOGE coins worth more than $1 million, appear unconvinced by the ETF narrative and have reduced their holdings by over 4% in the past week.  For token TA and market updates: Want more token insights like this? Sign up for Editor Harsh Notariya’s Daily Crypto Newsletter here. Dogecoin Whale Activity. Source: Nansen When large holders reduce their accumulation, it signals a bearish shift in market sentiment. This reduced DOGE demand from significant players can lead to decreased buying pressure, potentially resulting in price stagnation or declines in the near term. Sponsored Sponsored Furthermore, DOGE’s exchange reserve has risen steadily in the past week, suggesting that more traders are transferring DOGE to exchanges with the intent to sell. As of this writing, the altcoin’s exchange balance sits at 28 billion DOGE, climbing by 12% in the past seven days. DOGE Balance on Exchanges. Source: Glassnode A rising exchange balance indicates that holders are moving their assets to trading platforms to sell rather than to hold. This influx of coins onto exchanges increases the available supply in…
Share
BitcoinEthereumNews2025/09/18 05:07