Combating spam websites from Tor

Published on April 27, 2020 by

Certain actors spam Tor by creating many duplicate websites under different .onion domains and then linking them to each other. The cost of doing that is pretty low, considering that all you need is creating a new public key pair (the onion domain is the hash of the public key). In theory anyone can create an infinite number of onion domains.

Sadly, bad actors are using this spam technique also for promoting websites with child exploitation content. The motivation behind creating many onion domains for essentially the same website (sometimes with rotating the content slightly for each copy) is likely to increase coverage by Tor search engines.

Since those spam websites provide 0 value and often host illegal content, we have decided to completely delete them from our search index. We are also reporting child exploitation content to organizations that work together with law enforcement. There are also technical considerations why we want to refrain from indexing spam content: Our crawlers should be busy with indexing actual onion websites and storage and system resources should not be wasted for content that has no value.

Detecting Spam Onion Domains

The spam websites are typically SEO optimized – after all that is why the spam technique is used in the first place. This means that they have descriptive meta tags in the HTML data, as well as domain names that may indicate the type of content.

Therefore, our algorithms take the following into consideration to fingerprint websites to classify as spam:

  • HTML tag <title>
  • HTML tag <meta name=”description”>
  • HTML tag <meta name=”keywords”>
  • Subdomain name
  • Text of outgoing links (<a> tags)
  • “alt” attribute of <img> tags (= alt text of pictures)

Statistics

Our algorithms have removed:

  • 209,081 unique onion domains (this number includes sub-domains)
  • About 500 GB of archived text pages
  • About 34 million archived text pages and related index files
  • 28% of our overall Tor index

Related articles

Open Government 🏛

Published on November 20, 2020 by

Open government is the governing doctrine which holds that citizens have the right to access the documents and proceedings of the government to allow for effective public oversight. Wikipedia After the recent firing of Christopher Krebs, the 1st director of the Cybersecurity and Infrastructure Security Agency (CISA), we decided to leap into action and preserve


Newsletter 2020-11-17

Published on November 17, 2020 by

🎙️ Join the webinar with Media Sonar Tomorrow at 2 p.m. EST (= 20:00 CET) we are hosting a webinar with Media Sonar! Sign up here: https://mediasonar.com/intelligencex-osint-cyber-investigation/ The title is “How Analysts Use OSINT and Dark Web for Cyber Investigations”. Our CEO will talk about how you can use Intelligence X, and will give some


Newsletter 2020-10-19

Published on October 19, 2020 by

October 2020: Hunter Biden, Maltego Transform v4, Decentralized TLDs 🥳 It has been 2 years since the launch of Intelligence X! At this point we would like to thank our users & customers for their trust and we look forward to the future. -Intelligence X Team Make sure to follow us on Twitter for the


Search the blog: