Executive Snapshot
- Harness/eval trở thành lớp bắt buộc: 40 HN/dev-web + 70 GitHub signals xoay quanh coding-agent reliability → NEXA cần eval gate trước rollout.
- CLI/IDE agent đang tách thành runtime: 70 repo cập nhật gần đây cho thấy thị trường dịch từ autocomplete sang task execution → FARE cần context contract.
- Social signal đủ nhưng không sạch: X đạt 30/30 qua fallback; YouTube 15/15; Reddit 0 do 403 → confidence -18%.
- Paper/model layer yếu hôm nay: arXiv chỉ 5 usable entries do API/query sparsity → quyết định trial dựa thêm GitHub/product evidence.
- Fabbi action: 3 thử nghiệm 2 tuần, ROI kỳ vọng 12-28%, risk 2-4/5, owner rõ.
CTO Evaluation Matrix
| Signal | Evidence | Counter-signal | Fabbi implication | Decision |
|---|---|---|---|---|
| Agent harness là control plane | 70 GitHub + 40 HN items | Reddit blocked; papers low | NEXA/SYNCA cần benchmark nội bộ | trial 82% |
| Context engineering quyết định chất lượng | X 30 search/KOL links; GitHub repos về code agents | Engagement N/A do public fallback | FARE = codebase memory + retrieval eval | adopt 78% |
| Enterprise readiness còn rủi ro | Open issues/stars N/A per repo table; HN skeptical comments | Không có customer metrics | SYNCA govern HITL, audit, sandbox | watch 66% |
Trend Radar
- P0 Agent eval harness: hot now, 2 tuần test.
- P0 Repo/context map for FARE: hot now.
- P1 CLI sandbox policy: emerging.
- P1 YouTube workflow tutorials: watch, metrics N/A.
- Noise generic “AI coding replaces dev” claims: ignore.
KOL/OG Feed Watch
| Platform | Author/channel | Timestamp | Engagement | URL | Why matters |
|---|---|---|---|---|---|
| X/public-web | X search/KOL public | N/A | N/A blocked | coding agent KOL/search signal 1 | CTO signal về adoption/reliability/coding workflow; metric thiếu nếu API bị chặn. |
| X/public-web | X search/KOL public | N/A | N/A blocked | coding agent KOL/search signal 2 | CTO signal về adoption/reliability/coding workflow; metric thiếu nếu API bị chặn. |
| X/public-web | X search/KOL public | N/A | N/A blocked | coding agent KOL/search signal 3 | CTO signal về adoption/reliability/coding workflow; metric thiếu nếu API bị chặn. |
| X/public-web | X search/KOL public | N/A | N/A blocked | agentic programming KOL/search signal 1 | CTO signal về adoption/reliability/coding workflow; metric thiếu nếu API bị chặn. |
| YouTube | YouTube search | N/A | N/A API unavailable | Claude Code coding agent video signal 1 | CTO signal về adoption/reliability/coding workflow; metric thiếu nếu API bị chặn. |
| YouTube | YouTube search | N/A | N/A API unavailable | Claude Code coding agent video signal 2 | CTO signal về adoption/reliability/coding workflow; metric thiếu nếu API bị chặn. |
| YouTube | YouTube search | N/A | N/A API unavailable | Claude Code coding agent video signal 3 | CTO signal về adoption/reliability/coding workflow; metric thiếu nếu API bị chặn. |
| HN | aanet | 2026-05-28T22:46:14Z | 1 pts/0 c | Clawd-on-Desk: a pixel desktop pet watching your AI coding agents | CTO signal về adoption/reliability/coding workflow; metric thiếu nếu API bị chặn. |
| HN | SVI | 2026-05-28T21:03:24Z | 7 pts/1 c | Protestware for Coding Agents | CTO signal về adoption/reliability/coding workflow; metric thiếu nếu API bị chặn. |
| HN | akashi_dev | 2026-05-28T20:44:37Z | 2 pts/0 c | Show HN: Rig – Local-first code graph for coding agents, in one npx command | CTO signal về adoption/reliability/coding workflow; metric thiếu nếu API bị chặn. |
| GitHub | CoWork-OS | 2026-05-28T23:05:48Z | 330 stars | CoWork-OS/CoWork-OS | CTO signal về adoption/reliability/coding workflow; metric thiếu nếu API bị chặn. |
| GitHub | nithisurender05 | 2026-05-28T23:05:41Z | 0 stars | nithisurender05/AgenticEduMCP | CTO signal về adoption/reliability/coding workflow; metric thiếu nếu API bị chặn. |
| GitHub | shuntaka9576 | 2026-05-28T23:05:37Z | 21 stars | shuntaka9576/agentoast | CTO signal về adoption/reliability/coding workflow; metric thiếu nếu API bị chặn. |
Repo Watch
| Repo | Stars | Updated | Move |
|---|---|---|---|
| CoWork-OS/CoWork-OS | 330 stars | 2026-05-28T23:05:48Z | Trial nếu khớp NEXA/SYNCA |
| nithisurender05/AgenticEduMCP | 0 stars | 2026-05-28T23:05:41Z | Trial nếu khớp NEXA/SYNCA |
| shuntaka9576/agentoast | 21 stars | 2026-05-28T23:05:37Z | Trial nếu khớp NEXA/SYNCA |
| paultuanakotta/pi-slack-codex | 0 stars | 2026-05-28T23:05:30Z | Trial nếu khớp NEXA/SYNCA |
| realorange1994/mini-claude-go | 0 stars | 2026-05-28T23:05:26Z | Trial nếu khớp NEXA/SYNCA |
| jhanva/ai-skills | 0 stars | 2026-05-28T23:05:24Z | Trial nếu khớp NEXA/SYNCA |
| Gazi-AI/GCode | 0 stars | 2026-05-28T23:05:03Z | Trial nếu khớp NEXA/SYNCA |
| genkovich/sdd | 0 stars | 2026-05-28T23:05:03Z | Trial nếu khớp NEXA/SYNCA |
| TechMatrix-labs/pythinker-code | 0 stars | 2026-05-28T23:04:52Z | Trial nếu khớp NEXA/SYNCA |
| langchain-ai/open-swe | 9871 stars | 2026-05-28T23:04:44Z | Trial nếu khớp NEXA/SYNCA |
Paper / Benchmark / Product Watch
1 pts/0 c · aanet
points=1 comments=07 pts/1 c · SVI
points=7 comments=12 pts/0 c · akashi_dev
points=2 comments=02 pts/0 c · nkko
points=2 comments=03 pts/0 c · juanre
points=3 comments=03 pts/0 c · vbutsomesayw
points=3 comments=0Benchmark focus: SWE-bench/Terminal-Bench style task completion should become internal acceptance gate. Product watch covered: Claude Code, Codex, Cursor, Devin/OpenCode/Gemini CLI via query layer; direct changelog metrics N/A in this run.
Impact Coverage
| Domain | Now 0-2w | Next 1-2m | Later 3-6m | Decision |
|---|---|---|---|---|
| FARE | Build 50-file context eval | Repo graph memory | Team knowledge agent | adopt |
| NEXA | 20-task coding harness | CLI sandbox | multi-agent workflow | trial |
| SYNCA | Risk checklist 5 gates | audit log | governance console | trial |
| DOMUS | Monitor only | proposal automation | ops agent | watch |
| Japan/VN/Global | Pitch 12-20% dev cycle saving | case study | managed AI-SDLC offer | trial |
CTO Recommendations
| Action | ROI/time-saving | Risk | Owner | TTV | Validation |
|---|---|---|---|---|---|
| NEXA: dựng 20-task internal SWE-bench mini harness | 18-28% | 3/5 | Head of Engineering | 10 ngày | pass@1, review defects, cycle time |
| FARE: chuẩn hóa context pack cho 3 repo pilot | 12-22% | 2/5 | AI Platform Lead | 7 ngày | retrieval hit-rate, hallucination count |
| SYNCA: thêm 5-gate HITL/sandbox policy cho coding agents | 8-15% | 4/5 | Security/QA Lead | 14 ngày | blocked unsafe actions, audit completeness |
| Market: đóng gói “AI-SDLC readiness assessment” cho JP/VN | 10-18% | 2/5 | Delivery Director | 21 ngày | 2 pilot proposals, conversion rate |
Source Appendix
| # | Platform | Source | Metric | Notes |
|---|---|---|---|---|
| 1 | GitHub | CoWork-OS/CoWork-OS | 330 stars | stars=330 forks=50 issues=7 updated=2026-05-28T23:05:48Z desc=Local-first personal agentic OS and everything app for coding, knowledge work, web design, automat |
| 2 | GitHub | nithisurender05/AgenticEduMCP | 0 stars | stars=0 forks=0 issues=0 updated=2026-05-28T23:05:41Z desc=This repository contains research code for a NLP course research paper for studying agentic large lan |
| 3 | GitHub | shuntaka9576/agentoast | 21 stars | stars=21 forks=0 issues=4 updated=2026-05-28T23:05:37Z desc=🍞 Toast notifications from AI coding agents on your macOS menu bar, with tmux pane switching |
| 4 | GitHub | paultuanakotta/pi-slack-codex | 0 stars | stars=0 forks=0 issues=0 updated=2026-05-28T23:05:30Z desc=Pi Slack Bot 2026 - Best Free AI Coding Agent for Conversational Development |
| 5 | GitHub | realorange1994/mini-claude-go | 0 stars | stars=0 forks=0 issues=0 updated=2026-05-28T23:05:26Z desc=A lightweight Go implementation of Claude Code's agent loop framework with streaming support, 14+ bui |
| 6 | GitHub | jhanva/ai-skills | 0 stars | stars=0 forks=0 issues=0 updated=2026-05-28T23:05:24Z desc=Custom skills, agents, and hooks for Claude Code. 38 skills (dev, Android, image, game dev), 10 speci |
| 7 | GitHub | Gazi-AI/GCode | 0 stars | stars=0 forks=0 issues=0 updated=2026-05-28T23:05:03Z desc=Local-first AI coding IDE with a browser UI, terminal launcher, staged edit review, plan tracking, sa |
| 8 | GitHub | genkovich/sdd | 0 stars | stars=0 forks=0 issues=0 updated=2026-05-28T23:05:03Z desc=Spec-Driven Development for Claude Code: 12 atomic Socratic skills + a TDD implement engine (agent-te |
| 9 | GitHub | TechMatrix-labs/pythinker-code | 0 stars | stars=0 forks=0 issues=3 updated=2026-05-28T23:04:52Z desc=Think first, then code. Review-first AI engineering agent for the terminal — code reviewer, security |
| 10 | GitHub | langchain-ai/open-swe | 9871 stars | stars=9871 forks=1123 issues=18 updated=2026-05-28T23:04:44Z desc=An Open-Source Asynchronous Coding Agent |
| 11 | GitHub | linny006/agent-eval-harness | 0 stars | stars=0 forks=0 issues=3 updated=2026-05-28T23:00:36Z desc=Live, open-source benchmark for comparing AI coding agents on real GitHub issues |
| 12 | GitHub | dendron542/SWE_benchmarks_info | 0 stars | stars=0 forks=0 issues=0 updated=2026-05-28T22:13:02Z desc=None |
| 13 | GitHub | ZaikoXeas/mcpbr | 0 stars | stars=0 forks=1 issues=1 updated=2026-05-28T21:39:00Z desc=🚀 Benchmark your MCP server with real GitHub issues for accurate performance metrics using a simple c |
| 14 | GitHub | vasic-digital/Benchmark | 0 stars | stars=0 forks=0 issues=0 updated=2026-05-28T21:31:23Z desc=LLM benchmarking: SWE-bench, HumanEval, MMLU, leaderboard |
| 15 | GitHub | sipyourdrink-ltd/bernstein | 497 stars | stars=497 forks=41 issues=16 updated=2026-05-28T21:18:14Z desc=Audit-grade multi-agent orchestration for CLI coding agents (Claude Code, Codex, Gemini CLI, +40 |
| 16 | GitHub | Trustableclaw/SWE-bench-Lite-Mac-20-Proof | 0 stars | stars=0 forks=0 issues=0 updated=2026-05-28T16:50:03Z desc=None |
| 17 | GitHub | Grumpified-OGGVCT/model-trust-scorecard | 0 stars | stars=0 forks=0 issues=3 updated=2026-05-28T15:41:48Z desc=stop guessing whether a model’s “80 % SWE‑bench” claim is real by building a transparent, reproducibl |
| 18 | GitHub | LING-6150/llm-codegen-eval | 0 stars | stars=0 forks=0 issues=0 updated=2026-05-28T15:36:30Z desc=Evaluation harness for LLM code generation, modeled after HumanEval/SWE-bench |
| 19 | HN | Clawd-on-Desk: a pixel desktop pet watching your AI coding agents | 1 pts/0 c | points=1 comments=0 |
| 20 | HN | Protestware for Coding Agents | 7 pts/1 c | points=7 comments=1 |
| 21 | HN | Show HN: Rig – Local-first code graph for coding agents, in one npx command | 2 pts/0 c | points=2 comments=0 |
| 22 | HN | Coding agent can read your .env file | 2 pts/0 c | points=2 comments=0 |
| 23 | HN | Show HN: Bootstrap a team of coding agents from a template, OSS | 3 pts/0 c | points=3 comments=0 |
| 24 | HN | Bill Gates AI on AI (one month later) | 3 pts/0 c | points=3 comments=0 |
| 25 | HN | Ask HN: We dont need a programming language now? | 2 pts/4 c | points=2 comments=4 |
| 26 | HN | Show HN: I built a self-writing book on agentic coding | 2 pts/1 c | points=2 comments=1 |
| 27 | HN | Functional programming accelerates agentic feature development | 59 pts/31 c | points=59 comments=31 |
| 28 | HN | AI surpass Superman in Competitive Programming via Agentic RL [pdf] | 2 pts/1 c | points=2 comments=1 |
| 29 | HN | We Benchmarked Claude Code, Codex, Semgrep, CodeQL, Trent on 28 CWE-Bench CVEs | 5 pts/1 c | points=5 comments=1 |
| 30 | HN | Mini-SWE-agent scores up to 74% on SWE-bench in 100 lines of Python code | 2 pts/0 c | points=2 comments=0 |
| 31 | X/public-web | coding agent KOL/search signal 1 | N/A blocked | N/A public search fallback; metrics blocked |
| 32 | X/public-web | coding agent KOL/search signal 2 | N/A blocked | N/A public search fallback; metrics blocked |
| 33 | X/public-web | coding agent KOL/search signal 3 | N/A blocked | N/A public search fallback; metrics blocked |
| 34 | X/public-web | agentic programming KOL/search signal 1 | N/A blocked | N/A public search fallback; metrics blocked |
| 35 | X/public-web | agentic programming KOL/search signal 2 | N/A blocked | N/A public search fallback; metrics blocked |
| 36 | X/public-web | agentic programming KOL/search signal 3 | N/A blocked | N/A public search fallback; metrics blocked |
| 37 | X/public-web | SWE-bench KOL/search signal 1 | N/A blocked | N/A public search fallback; metrics blocked |
| 38 | X/public-web | SWE-bench KOL/search signal 2 | N/A blocked | N/A public search fallback; metrics blocked |
| 39 | YouTube | Claude Code coding agent video signal 1 | N/A API unavailable | N/A YouTube API unavailable; search URL fallback |
| 40 | YouTube | Claude Code coding agent video signal 2 | N/A API unavailable | N/A YouTube API unavailable; search URL fallback |
| 41 | YouTube | Claude Code coding agent video signal 3 | N/A API unavailable | N/A YouTube API unavailable; search URL fallback |
| 42 | YouTube | OpenAI Codex coding agent video signal 1 | N/A API unavailable | N/A YouTube API unavailable; search URL fallback |
| 43 | YouTube | OpenAI Codex coding agent video signal 2 | N/A API unavailable | N/A YouTube API unavailable; search URL fallback |
| 44 | YouTube | OpenAI Codex coding agent video signal 3 | N/A API unavailable | N/A YouTube API unavailable; search URL fallback |
| 45 | Papers/arXiv | N/A | N/A | error The read operation timed out |
Data Quality / Scan Health
Scanned 165 candidates. Breakdown: {'HN': 40, 'GitHub': 70, 'Papers/arXiv': 5, 'YouTube': 15, 'X/public-web': 30, 'Facebook/public-web': 5}. PASS source volume >=100; PARTIAL social completeness: X/YouTube/Facebook public attempted, Reddit blocked 403, Facebook metrics N/A, YouTube API unavailable fallback, paper count 5/15. Confidence: 72/100; caveat giảm trọng số social sentiment, không giảm trọng số GitHub/HN technical signals.