Duplicate Content Detector
Paste 2–6 blocks of text and get a pairwise Jaccard similarity matrix. Catches near-duplicates, partial rewrites, and keyword-cannibalising pages — all in your browser.
Documents
Paste text blocks
Two to six documents. Each needs at least 5 words. Label each block so the pair matrix stays readable.
Shingle size
N-gram length for comparison. 5 is the SEO standard — catches genuine phrase overlap without being fooled by stopword collisions.
Similarity matrix
- infoPage A vs Page B: 26% Jaccard — some overlap but probably acceptable.
How to Fix Duplicate Content
Four options, ranked from best to worst.
301 redirect to one canonical
The cleanest fix. Pick the URL with the most backlinks and authority, 301 all duplicates to it. You consolidate signals and stop keyword cannibalisation in one move.
Use rel="canonical"
If you must keep both URLs live (e.g. filtered views, pagination), point all variants' canonical tag to the authoritative version. Google respects this ~80% of the time.
Rewrite to genuine uniqueness
If both pages need to exist AND both need to rank, rewrite the bodies to target different intent. "SEO services" and "SEO services Sydney" should have distinctive content, not swapped-in location words.
Noindex the weakest one
Last resort when you can't redirect, canonical, or rewrite. Keeps the page live for users but out of Google. Lose the rank potential, keep the UX.
Similarity Thresholds & What They Mean
The thresholds we use in SEO audits.
75%+ Near-duplicate
Almost identical. Pick one canonical and 301 the others. Leaving both live wastes crawl budget and confuses Google.
40–75% Substantial
Rewrite shared passages or consolidate pages. Often the result of "localised" landing pages that only swap place names.
15–40% Similar
Probably fine. Same topic, genuinely different angles. Audit shared phrases — maybe tighten if they're boilerplate.
Under 15% Unique
Different pages on a similar theme. Healthy variety. Common shingles are usually high-frequency natural language.
Jaccard vs Containment
Jaccard treats pages symmetrically. Containment shows what % of the smaller doc is inside the bigger one — catches partial rewrites.
5-word shingle size
Standard for SEO duplicate detection. Short enough to catch paraphrases, long enough to avoid false positives on common stopword chains.
Cannibalisation
Two of your own pages targeting the same keyword. Jaccard helps spot it. Fix by merging, redirecting, or sharpening intent.
Scraped content
If your content appears on another domain at 80%+ Jaccard, it's likely scraped. DMCA takedown or ensure your canonical is strong.
Where Duplicate Content Usually Hides
Common sources of accidental duplication — almost all of them fixable.
CMS-generated
Editorial patterns
Technical
External duplication
Duplicate Content FAQ
What counts as duplicate content for SEO?
Google broadly treats substantial blocks of matching or near-matching content as duplicate — both within a site and across domains. "Substantial" is Google's word, not a precise percentage, but a Jaccard similarity over 50% on 5-word shingles is a strong flag worth investigating.
Does duplicate content always hurt rankings?
Not always. Google picks one version to rank and suppresses others — but the penalty is lost rankings, not a manual action. Exact duplicates across domains (scraped content) are higher risk. Internal duplicates are usually just wasted crawl budget.
How do I fix duplicate content?
Best: 301 redirect duplicates to one canonical URL. Second: rel="canonical" to the authoritative version. Third: rewrite the duplicate content to be genuinely unique. Worst: leave it — Google will still pick one and ignore the rest, but you lose control of which.
Is my text sent anywhere?
No. All similarity calculations run locally in your browser. Nothing is uploaded or stored.
Want a Full Site Duplicate-Content Audit?
Our Australian SEO team crawls your site, maps duplicate and near-duplicate content, and delivers a prioritised redirect / canonical / rewrite plan.
- Site-wide duplication scan
- Canonical + redirect plan
- No lock-in commitment
No long-term commitment. Cancel anytime. 100% satisfaction guaranteed.
