AI Scoring Methodology

Docs

How DeepSeek scores backlinks 1-100, what signals it uses, and what the score actually predicts.

This page documents how TraceLinker's AI scoring works in detail. If you're a senior SEO or you want to defend the scores to a client, this is the canonical reference.

What we score

Every backlink (one row in your discovered_backlinks or monitored_links table) gets:

Score: 1-100 integer.
Reasoning: one sentence explaining the score.
Toxicity: separate safe/caution/toxic flag.
Toxicity reasons: array of signals fired.
AI actions: 3-5 concrete next steps (or empty if action is "do nothing").

These come from a single DeepSeek API call per link, prompted with all signals we've gathered.

Inputs to the model

For each backlink, the model receives:

Signal	Source
Source URL	Input CSV / GSC / monitored row
Source domain	Parsed from URL
Target URL	Input CSV / GSC
Anchor text	Crawled from source page
Surrounding context	200 chars before and after the anchor in the source page
Page title	Crawled from source `<title>`
Page word count	Crawled from source body
Outbound link count	Crawled - how many links does the page have total
Source domain TLD	Parsed
Dofollow / nofollow / sponsored / ugc	From `rel` attribute
Position on page	First 25%, middle, last 25%, footer
Image vs text link	Whether anchor is an `<img>`

Things we do NOT use (and why):

DR/UR or other tool-derived metrics - we want to be tool-independent.
Per-link traffic data - too noisy and often wrong.
Indexation status - already implied by the source being reachable.
Historical age - not in our data unless monitored from day one.

The prompt structure

Without giving away the exact wording, the prompt has four sections:

System instructions - "You are an expert backlink quality analyst. Score this link 1-100 based on SEO value to the target site. Output JSON."
Signal table - the inputs above formatted as a clean table.
Few-shot examples - 3 to 5 worked examples covering high, medium, low score ranges.
Output schema - explicit Zod schema the response must conform to.

We use DeepSeek's deepseek-chat model with temperature: 0.1 for consistency. Same input → essentially same output.

Output schema

{
  score: number,                   // 1..100
  reasoning: string,               // <= 200 chars
  toxicity: {
    score: number,                 // 1..100, higher = more toxic
    is_toxic: boolean,             // is_toxic = score >= 70
    reasons: string[],             // signals that fired
  },
  ai_actions: string[],            // 3-5 short imperatives
  outreach: {                      // present only if score < 70 AND status = alive
    subject: string,
    body: string,
  } | null,
}

We validate every response with Zod. Malformed responses retry once; if both fail, the row is marked score = null and you can manually rescore.

What the score means

The score is meant to predict "if this link was the only thing changing about your profile, how much would it move rankings?" It is a one-number summary of a multi-dimensional concept, so don't read it too literally.

We loosely calibrate against this rubric:

Score	Description
90-100	Top-of-funnel ranking driver. Authority site, exact-topic-relevance, contextual placement, natural anchor. Worth defending and reclaiming aggressively.
70-89	Solid contributor. Decent authority, on-topic, in-content placement. Worth monitoring.
50-69	Borderline. Minor authority, off-topic, sidebar/footer placement, or thin content surrounding. Acceptable but not a priority.
30-49	Weak. Low-authority host, generic sidebar/widget link, or directory-style placement. Don't actively pursue.
1-29	Suspicious. Spam-adjacent context, scrape sites, link farms, comment spam, or bizarre TLD patterns. Disavow candidate if also flagged toxic.

These ranges are calibrated on internal benchmarks. They don't match Ahrefs DR or Moz DA - intentionally. A site with DR 70 might score 60 here if the placement is in the footer; a DR 30 site might score 85 if the placement is contextual and editorial.

Toxicity scoring

Toxicity is a separate output. It's binary in spirit (toxic / not toxic) but expressed as a score 1-100 to allow nuanced thresholds:

Toxicity score	Flag	Meaning
0-30	`safe`	No negative signals fired.
30-69	`caution`	One or two soft signals (slightly thin context, one anchor over-optimized).
70-100	`toxic`	Multiple signals fired, or one strong signal (PBN footprint, malware host).

The reasons array tells you exactly which signals fired:

pbn_footer_only - link is in footer, page has minimal content.
pbn_recent_registration - whois shows recent registration.
pbn_thin_content - source page word count below threshold.
anchor_overoptimization - exact-match keyword anchor + multiple similar anchors on same domain.
bad_tld - .tk, .ml, mass-spam TLD.
bad_context - casino, adult, malware, pharma proximity.
comment_spam - link is in user-generated comment block.
link_farm - source page is mostly outbound links with no real content.
private_blog_network - shared registration/IP/theme footprint with other low-quality sites.

What the AI actions look like

Sample output for a high-score link:

{
  "score": 87,
  "reasoning": "Topically relevant SaaS blog with strong domain authority, dofollow anchor in mid-content, natural placement.",
  "toxicity": { "score": 8, "is_toxic": false, "reasons": [] },
  "ai_actions": [
    "Move to monitoring with daily checks - this is a placement worth defending.",
    "Bookmark the source domain for future outreach on related content.",
    "Check if other pages on this domain link to your site; if not, pitch them similar content."
  ]
}

Sample for a borderline link:

{
  "score": 52,
  "reasoning": "Low-authority directory listing, dofollow but generic anchor, no surrounding context.",
  "toxicity": { "score": 25, "is_toxic": false, "reasons": [] },
  "ai_actions": [
    "Do not pursue - directory links are low-leverage.",
    "If listing is paid or expensive, consider letting it lapse.",
    "Don't add to monitoring - not worth the slot."
  ]
}

Sample for a toxic link:

{
  "score": 12,
  "reasoning": "Scrape site with no real content, anchor over-optimized, suspicious TLD.",
  "toxicity": {
    "score": 85,
    "is_toxic": true,
    "reasons": ["pbn_thin_content", "bad_tld", "anchor_overoptimization"]
  },
  "ai_actions": [
    "Add to disavow file candidates.",
    "Check if other pages on this domain also link to you and disavow at the domain level.",
    "Investigate whether this is part of a coordinated negative-SEO campaign."
  ]
}

Cost

Per link: roughly 1,500 tokens in + 600 tokens out = $0.0003 (DeepSeek pricing as of writing).

So a 1,000-link audit costs us about $0.30. A 10,000-link Agency-tier audit is $3.

Crawl4AI is free. The only marginal cost per link is the DeepSeek call.

This is why we can offer a $9 Pro plan with 5,000 audited links/month - the per-user variable cost is under $2 even at heavy use.

Reproducibility

Same input → essentially same output (temperature 0.1). Differences between runs:

Crawl might fetch different content if the source page changed.
The few-shot example pool is pseudo-random per call.
DeepSeek occasionally has small drift (mostly in reasoning wording, rarely in score).

If you re-score a link the next day and the score moves more than 5 points, the source page probably changed. If it moves 5+ points without source change, that's expected variance - the score is a fuzzy estimate, not a measured quantity.

Known limitations

We are upfront about what the model can and cannot do.

Strengths

Fast - per-link in under 2 seconds.
Cheap - lets us offer pricing 5-10x below incumbents.
Reasoning is human-readable - you get a sentence, not a black box.
Toxicity catches obvious negative SEO patterns - PBNs, link farms, comment spam.

Weaknesses

No knowledge of your business - it doesn't know your competitors, your strategy, or your target keywords. Model doesn't know that a "DR 35 niche site" might be incredibly valuable specifically because of the niche match.
Limited authority signals - we don't have an Ahrefs-style global crawl, so authority is inferred from page-level signals.
Doesn't catch sophisticated PBNs - networks built specifically to evade detection (varied themes, varied registrars, real-looking content) can pass the toxicity classifier. We catch maybe 80% of obvious PBN patterns; the rest you need human eyes.
Doesn't account for link equity flow - we don't compute PageRank-style flow because we don't have the global graph.

Where AI scoring should be combined with human judgment

For high-stakes decisions (large disavow files, major link removal claims to Google, expensive paid placements) - use the AI score as input and apply your own judgment. The score is an opinion, not a verdict.

How we improve the model

False positive feedback - when you mark a row "false positive" or "mark safe", the (link, label) goes into a feedback pool we use for prompt tuning.
Quarterly re-calibration - we periodically benchmark on a curated test set of 1,000 manually-scored links to detect drift and adjust prompts.
Major model swaps - if a substantially better/cheaper model ships, we'll evaluate it. Currently DeepSeek is the price/quality winner for our prompt structure.

We don't fine-tune the model itself - prompts only. Fine-tuning becomes worth it if we hit a quality ceiling we can't move with prompting.

Self-hosting the AI

Self-hosted TraceLinker (open-source build) lets you:

Use any DeepSeek-compatible API endpoint (e.g. self-hosted DeepSeek behind your firewall).
Swap models entirely - GPT-4, Claude, Gemini all work if their JSON-mode is reliable. Prompt may need tuning for non-DeepSeek models.

See lib/ai.ts in the repo. Constants MODEL_NAME and BASE_URL control the endpoint.

FAQ

Why DeepSeek and not GPT-4? Cost. DeepSeek is roughly 50x cheaper per token at comparable quality for this specific task. We've benchmarked and the score quality is within 2% on a 1,000-link test set.

Does the model see my CSV data? The model sees per-link signals: source URL, target URL, anchor, surrounding context, etc. It does not see your account email, your other links in the audit, or anyone else's data.

Is my audit data used to train the model? No. DeepSeek's API has a no-training mode we use. Audit data is not sent for training to any model provider.

What if DeepSeek goes down? Audits queue and retry. If the outage is multi-hour, you'll see "AI scoring unavailable" banners and can wait for recovery. Critical user-action paths (downloading a disavow.txt, exporting CSV) work without AI.

Run a Backlink Audit - put scoring into action.
Disavow Toxic Links - the toxicity output in action.
Glossary - terms used in this page.

Affiliate Program CSV Format