Skip to content
Security Article

The GitHub Clone Farm That Beat VirusTotal

A 10,000-repo Trojan campaign weaponizes GitHub's trust signals, search ranking and commit history — and the next mark is your AI agent.

Ji-ho Choi
Ji-ho Choi
Security & Cloud Editor · Jun 19, 2026 · 7 min read
The GitHub Clone Farm That Beat VirusTotal

For a decade the supply-chain conversation has fixated on the package registry: typosquatted npm names, malicious PyPI wheels, compromised maintainer accounts pushing a poisoned minor version. A campaign surfaced in mid-June by the independent researcher behind the blog Orchid Files — and corroborated by reporting from Cybernews and TechTimes — points at a different, less-watched seam: the repository itself as a trust object. Not the code you npm install, but the code you find.

The claim is large — roughly 10,000 GitHub repositories quietly distributing Trojans, some for over a year. That headline figure rests on a single researcher's detection tool and should be read with that caveat; the technique, however, is independently established. Cybersecurity firm Hexastrike documented the same payload family across 109 repositories back in April, and Malwarebytes separately reported fake software on GitHub and SourceForge pushing remote-access malware in May, noting that takedowns barely dent it: "users should expect new ones to continue appearing." The thesis worth taking seriously is that this is a genuine tactical shift — an attack engineered against the signals developers and their tooling use to judge trust, not against the dependency graph.

The evasion is the product

What makes this campaign notable is not the malware but the delivery architecture, which is built to defeat three different detection layers at once.

Trust-signal cloning. Each malicious repo is a full clone of a real project — complete commit history, the original contributor list, an intact README. The only injected change is a link to a downloadable ZIP. To a human skimming a repo, every heuristic reads green: a multi-year timeline, contributors whose accounts link back to legitimate long-standing profiles, code that matches the project they were searching for. The social proof is real; it was just stolen.

Delivery/payload separation. Submit the ZIP's download URL to VirusTotal and you get zero detections. Upload the ZIP itself and the Trojan lights up. The attackers deliberately decoupled the link from the binary, so any scanner that vets URLs — including the ones embedded in CI pipelines and chat bots — sees nothing. The archive carries four files: a .cmd launcher (Application.cmd or Launcher.cmd), an executable (loader.exe, luajit.exe, or a randomized name), a randomly named data blob, and lua51.dll.

Anomaly-detection laundering. This is the cleverest piece. Every few hours the repo deletes its latest commit and re-pushes an identical one — always titled Update README.md. The working theory from Orchid Files is that GitHub's automated security tooling flags newly suspicious behavior, so a repo that churns the same benign-looking commit forever looks like routine maintenance rather than an intrusion. Long-running, low-amplitude anomaly beats short, sharp anomaly. The repos that have survived longest are the ones that never spiked.

Layered on top is a distribution strategy that inverts the usual instinct. The clones target newly created, low-traffic projects rather than popular ones. That keeps them out of the original community's line of sight while letting them rank near the top of search results for low-competition queries — exactly the long-tail searches a developer runs when hunting for an obscure CLI or framework.

A payload built to outlive takedowns

Hexastrike's April analysis is the most reliable view of what runs after a victim double-clicks the launcher. The .cmd kicks off a LuaJIT interpreter loading an obfuscated Lua script the firm names SmartLoader. Rather than hardcoding a command-and-control IP or domain — the thing defenders block and sinkhole — SmartLoader resolves its live infrastructure through a smart contract on the Polygon blockchain. The operator rotates servers by updating on-chain state; no sample needs rebuilding, no deployed loader needs updating, and there is no DNS record to seize.

flowchart LR
  A[ZIP from README link] --> B[.cmd launcher]
  B --> C[LuaJIT runs SmartLoader]
  C --> D[Read C2 address from Polygon smart contract]
  D --> E[Fingerprint host + screenshot]
  E --> F[Exfiltrate data]
  F --> G[StealC infostealer + 2 scheduled tasks]

Once connected, SmartLoader fingerprints the host, grabs a screenshot, exfiltrates what it collects, and persists via two scheduled tasks with separate recovery paths. The same staging repo also carried an encrypted build of StealC, a commercial infostealer sold on criminal markets. Blockchain-resolved C2 is not new in isolation, but pairing it with cloned-repo SEO and anomaly-laundering produces something with no single chokepoint: you can't block the URL (clean on VirusTotal), can't easily flag the repo (looks maintained), and can't kill the C2 (it's on-chain).

The real target is your agent, not you

The most forward-looking framing comes from Cybernews, which reads the campaign as aimed at AI agents. That's the part developers should sit with. A careful human might pause before running Launcher.cmd from a ZIP linked in a README. An autonomous coding agent told to "find a library that does X and get it working" will happily search, pick the top-ranked repo, follow the README's setup steps, and execute what it's told to — which is precisely the behavior this campaign is optimized to capture.

Every property of the attack maps onto agent weaknesses:

  • SEO on long-tail queries lands the clone at the top of exactly the searches an agent runs for unusual tooling.
  • Intact history and contributors satisfy the shallow trust checks an agent (or a code-review bot) is likely to perform.
  • README-driven setup is the agent's natural interface; "download this ZIP and run the launcher" is an instruction, not a red flag, to a tool that treats READMEs as gospel.

If the agent-targeting read pans out — and the design strongly supports it — this is an early instance of malware tuned for a machine reader rather than a human one. Worth watching closely as agentic dev workflows go mainstream over the next year.

What to actually do

None of this arrives through your package manager, so registry-pinning and lockfiles don't help. The mitigations are old-fashioned and behavioral:

  • Never execute binaries linked from a README. A legitimate library does not ask you to download a .cmd and an .exe from a ZIP. Releases belong in GitHub Releases or a package registry, not a readme link.
  • Scan the file, not the URL. The whole campaign hinges on URL scanners returning clean. Pull the artifact into a sandbox and scan the bytes.
  • Treat search rank as zero evidence. Reach packages through the registry or a known-canonical URL, not through a search for the project name — the top result is the attacker's preferred slot.
  • Verify identity, not history. Commit history and contributor lists are copyable. Check that the repo is the canonical one (stars-over-time, the maintainer's own links, the project's documented home), not a fresh account that merely contains a real history.
  • Sandbox your agents. If a coding agent can fetch and run arbitrary code, assume it eventually will. Containerize execution, strip outbound network by default, and require human approval before running downloaded binaries.

The Orchid Files researcher published an open-source detector, Git Malware Finder, alongside the repo list — a reasonable move given GitHub Support reportedly took roughly six weeks to act on the initial report and a community thread returned only AI-generated non-answers.

The uncomfortable takeaway is structural. GitHub's value rests on the visible signals of legitimacy — stars, history, contributors, search presence — and this campaign turns each of them into an attack primitive while staying under the platform's own anomaly thresholds. The 10,000 figure may be a single researcher's count, but the design is sound enough to assume it's a floor, not a ceiling. The defensive posture that follows is simple to state and hard to enforce: stop trusting how a repository looks, and start trusting only where it provably came from.

Sources & further reading

  1. I found 10k GitHub repositories distributing Trojan malware — orchidfiles.com
  2. GitHub Malicious Repositories: 10,000 Trojan Clones Evade Detection for Over a Year — techtimes.com
Ji-ho Choi
Written by
Ji-ho Choi · Security & Cloud Editor

Ji-ho covers the increasingly tangled overlap between cloud architecture and security, drawing on a background as a penetration tester to keep his reporting grounded in real-world attack paths. He never lets a vendor claim go unquestioned and insists that every buzzword come with a proof of concept.

Discussion 1

Join the discussion

Sign in or create an account to comment and vote.

Kat Sorensen @contrarian_kat · 49 minutes ago

i'm curious to see how github responds to this, given that their search ranking algorithm is basically being gamed by these trojan repositories - will they finally prioritize repo verification over ease of discovery?

Related Reading