SKILL.md
---
name: rich-data-research-page
description: Use when the user asks to build a rich data research page or topic dashboard — a single interactive page that compares 5–20 same-kind entities (cancers, countries, companies, products) across multiple metrics with scatter chart, ranking bars, filter, focus, compare, and shareable URL state. Distilled from /cancers-overview and /ai-token-usage-research on 2aran.com.
---
# Rich Data Research Page Skill
Use this skill when the user wants a topic-research deliverable that is **not a Markdown article** but a **single interactive page** showing multi-dimensional data for a set of comparable entities.
Wrong fit: a 5000-word write-up. Right fit: 10 cancers × 6 metrics, 8 cloud providers × pricing tiers, 12 countries × education spending, 15 LLMs × benchmark scores.
## When to use
Trigger when the user says any of:
- "做一个富页面 / 数据可视化 / 专题调研"
- "把这些 X 做成一个能筛选 / 对比 / 分享的页面"
- "我希望可视化做的美观,有各种数据支撑,关键成因分析…,最好还能筛选,可以分享"
Skip if: the user wants a single timeline, a long-form essay, a chat tool, or a personal log.
## Inputs you must collect
1. The entity list — 5 to 20 of the same kind (cancers, countries, companies, products, AI models, etc.).
2. At least three metrics per entity:
- **Primary magnitude** (e.g., annual incidence, revenue, GDP) — used for X axis and main bar
- **Secondary magnitude** (e.g., mortality, cost) — used for bubble radius and overlay bar
- **Rate / share** (e.g., 5-year survival %, margin %) — used for Y axis and tone coloring
3. Optional but high-value:
- Multi-region / multi-version variants (e.g., global vs China, 2022 vs 2023)
- Multi-bucket distribution (e.g., 10 age buckets, 6 revenue source breakdowns)
- Weighted breakdown items with category tags (e.g., risk factors with category: lifestyle/genetic/virus/…)
- Narrative fields (warnings, screening guidance, primary cause)
4. Disclaimers and authoritative sources — at minimum 2 first-party data citations.
If the user provides incomplete data, ask precisely which metric is missing rather than guessing.
## File layout (Next.js App Router)
```
app/(site)/<topic>-overview/
page.jsx # static metadata + dynamic = 'force-static'
data.js # entities array + helpers + sources
<Topic>OverviewClient.jsx # 'use client' rich page
```
Wire it into siteNav under "实验室" with a New tag.
## The 10 patterns
### 1. Entity schema
Every entity is one object with these field groups:
```js
{
id, nameZh, nameEn, color, category, // meta
incidence, mortality, survival5y, // primary / secondary / rate
genderRatio: { male, female }, // share split
ageDistribution: [10 numbers summing ≈ 100], // multi-bucket
riskFactors: [{ name, weight, category }], // weighted breakdown
warnings: [...], screening: '...', // narrative
cn: { incidence, mortality, survival5y }, // optional region variant
}
```
One entity = one object. All views share the same data.
### 2. Region / version toggle
Put variant blocks (`cn`, `v2024`, …) on each entity. Add a top-level toggle and a view helper:
```js
export function cancerView(c, region = 'global') {
if (region === 'cn' && c.cn) return { ...c, ...c.cn }
return c
}
```
Switching region must not touch chart structure — only the swapped fields change.
### 3. Quadrant scatter (the money shot)
Pure SVG, no chart library. X = log10(primary), Y = rate (0–100% inverted), r = secondary scaled to 6–24px, fill = category color, stroke darker on focus.
Background: two faint rectangles split by the median Y, with corner labels like "常见·难治" / "常见·可控" / "少见·难治" / "少见·可控". This makes the four quadrants legible at a glance.
```jsx
const xPos = (v) => padL + ((Math.log10(v) - xMin) / (xMax - xMin)) * innerW
const yPos = (v) => padT + (1 - v / 100) * innerH
```
Use `viewBox` + `preserveAspectRatio="xMidYMid meet"` — never set width/height in px. Adjust xMin/xMax when region changes so all bubbles spread out.
### 4. Tooltip overlay
Don't use SVG `<title>` or `<foreignObject>`. Wrap the SVG in a `relative` container, render an absolutely-positioned HTML div whose `left` / `top` are computed from bubble coords as **percentages of the chart viewBox**. This survives SVG scaling.
Flip right-side tooltips to the left when `leftPct > 65` so they don't overflow.
### 5. Focus dimming
Maintain one `focusIds` array (= [openId] in single mode, = compareIds in compare mode). Non-focused bubbles → `opacity 0.18`. Non-focused ranking rows → `opacity 0.45`. Hover state always wins.
This removes the need for a separate "highlight" mechanism — the eye is led automatically.
### 6. Horizontal ranking bars
Below the scatter. All entities, one row each, in selected sort order.
Per row: index + color square + name + gender/category chip + main bar + optional darker overlay (secondary metric) + main number + 2 mini chips (rate + reverse rate).
When sort = incidence, overlay mortality on the same row at `width = mortality/incidence × pct`. The viewer reads both magnitudes simultaneously.
### 7. Compare mode
Top-level toggle: 单选详情 / 两两对比. In compare mode, clicking a row/bubble toggles inclusion in `compareIds` (cap 3).
Replace the single detail panel with a horizontal table:
- columns = entities
- rows = metric labels (年新发 / 年死亡 / 5y / 致死 / 性别 / 主因 / 筛查 / Top 4 风险因子)
- per cell tone color (good / bad) by threshold
- each column header has × to remove
This is the most-requested feature on data dashboards. Don't ship the page without it.
### 8. Filter strip
- Search input — match nameZh + nameEn + primary cause
- Category pills (multi-select, click to toggle)
- Sort pills (4 options: primary / secondary / rate / inverse rate)
- Gender / camp toggle (3 options: all / A-dominant / B-dominant)
- Reset button only appears when any filter is active
All pills, no `<select>` — consistent rhythm.
### 9. URL state
On mount: read URL → setState (one-shot). On state change: state → URL via `history.replaceState` (so back button isn't polluted).
Parameters: `region`, `q`, `gender`, `sort`, `cats` (comma), `mode`, `compare` (comma), `open`. Default values are omitted from URL to keep it short.
### 10. Disclaimers + sources, top and bottom
**Top**: one sentence in the header naming the data sources with inline links + one short strong-red disclaimer ("不构成医学建议" / "不构成投资建议" / "数据为公开估算").
**Bottom**: a disc list expanding every source, year, scope, methodology caveat, and the limits of each metric (e.g., risk factor weights are illustrative, not PAF). First-party links must be clickable.
For health / money / policy topics, this is non-negotiable.
## Reuse checklist for a new topic
When building `/<topic>-overview` for a new dataset:
1. Pick a topic with 5–20 same-kind entities and 3+ comparable metrics.
2. Source the data — at least 2 first-party links (org statistics, regulator reports, peer-reviewed papers, official company filings).
3. Build `data.js` with the entity schema above. Add a region/version variant if the data has multiple credible cuts.
4. Copy the structure of `/cancers-overview/CancersOverviewClient.jsx` and rename:
- Replace metric field names in the accessor functions (incidence → revenue, survival5y → margin, …).
- Adjust scatter X axis log range to fit the data spread.
- Update tone thresholds (good / bad) per metric.
- Rewrite the four-quadrant corner labels for the new domain.
- Rewrite filter copy and source links.
5. Update siteNav with the new route under "实验室", `tag: 'New'`.
6. Verify URL state round-trips for every filter combination.
7. Verify mobile: SVG should scale; ranking bars should remain readable; compare table should horizontally scroll.
## Output constraints
- Never invent statistics. If a number is uncertain, use the most reputable source available and cite it inline.
- Never imply medical / financial / legal advice. Frame everything as data visualization of public sources.
- Do not introduce a chart library (recharts, nivo, chart.js). The cost in bundle size and styling rigidity is not worth it — pure SVG covers every pattern here.
- Do not split into multiple routes per entity. The strength is one-page comparison.
- Do not skip the compare mode or the share URL — they are the two highest-ROI features.
### 11. Compound entities — X × Y pairs, not single objects
When each row is a pairing — platform × framework, company × product, country × sector — split into `sideA` / `sideB` fields instead of a single `name`. Card header shows both with a × separator. Search must hit both sides. Sort options may target either side.
### 12. Subjective scoring — expose the rubric
When metrics include 0–100 subjective scores (lock-in, backlash, integration quality), three rules:
1. Label the field with a "subjective" chip distinct from measured metrics.
2. State the rubric somewhere visible: "0 = fully portable; 100 = cannot leave platform".
3. Round to nearest 5 or 10 — never show 73.4%. Fake precision is worse than honest estimate.
### 13. Status as first-class filter + default color
When entities sit in different lifecycle states (active / historical / forming / neutral / deprecated), status takes over:
- Top-level filter chip, on par with the gender / camp toggle.
- Default color encoding for bubbles in the scatter (instead of category).
- Its own row in the compare table with tone-colored cells.
Don't stack `color = category` and `color = status` — pick one.
### 14. Per-entity verification badge
When some entity facts come from rumors, leaks, or estimates (vs. confirmed sources), add `verified: boolean`. For `verified=false`:
- Show a "estimate" / "unverified" / "rumor" badge top-right of the card.
- Show the same badge at the top of the detail panel.
- Surface in the compare table column header.
Don't bury this in the footer. The reader needs to know which numbers are solid before they read the numbers.
### 15. Latest-signal line
Every entity carries a `latest_signal`: one sentence with a year/quarter naming the most recent state change ("2025-Q2 Fluid Compute announced", "2024-09 VoidZero founded").
- Detail panel: under the title.
- Compare table: dedicated row.
- Required for event-driven or rapidly evolving topics.
A single global "last updated" date in the footer is not enough when entities evolve at different rates.
### 16. New facts that reshape the thesis: rewrite top-down, don't just append a row
When a new event or fact changes the **core narrative spine** of the research (not a detail correction — a verdict shift), do NOT just append one more entity to the data array.
Test for thesis-level disruption: if the new fact flips ANY of these claims, you're at a thesis-level change, not an entity-level one:
- who moved first
- who is defending against whom
- what type of game this even is
When the test fires:
1. **Rewrite** title + thesis + eventBadge. The old framing was wrong; band-aiding it confuses readers.
2. **Rebuild** the signal timeline. Don't append the new event at the end — re-order so the new pivot reads correctly.
3. **Re-argue** the load-bearing sections (motivation / impact / AI analysis). The causal chain you wrote before assumed the old framing; check each step.
4. **Check if a new entity type appeared.** A new "kind" of entity (e.g. AI-company × runtime is not the same as deployment-platform × framework) means a new `pairType` / `category` field is needed, with its own color and filter chip.
5. **Sync every surface** in one commit: SHARE_COPY (title / lead / full), OG image, page header eyebrow, hero strip, siteNav label, works summary. Mixed-framing pages — half old narrative, half new — are worse than either consistent version.
Distilled from /platform-framework-pairs rewriting from "双巨头割据" to "三极割据" after surfacing Anthropic × Bun (2025-12-02). Adding Bun wasn't +1; it inverted the question from "which deployment platform owns which framework" to "AI companies are bypassing the deployment-platform layer entirely". The old framing positioned Cloudflare as "对冲 Vercel"; the new one positions it as "回应 Anthropic". Every load-bearing sentence had to be checked.
## Version Log
- v0.3(2026-06-05):加入 #16 新事实重塑叙事支点时要从核心 thesis 往下重做。沉淀自 /platform-framework-pairs 从「双巨头」改写成「三极割据」的实战教训。
- v0.2(2026-06-04):加入 5 条新 pattern(11–15),覆盖复合实体、主观打分透明、Status 一级筛选、逐实体核实徽章、最近信号 line。沉淀自 /platform-framework-pairs 的建设过程。
- v0.1(2026-06-04):初版,从 /cancers-overview 提炼 10 条 pattern 与完整施工清单。