A few days ago a new search engine DarkSearch for Tor launched, adding to the mix of other existing search engines out there like Ahmia, Torch, Not Evil, and Haystack – it’s time for a feature comparison!
No search engine can cover 100% of the pages due to the nature of Tor. There is no central .onion repository so the first challenge is to find the .onion links. Other challenges when running a search engine include data size (and associated storage and processing power), data formats, and many smaller challenges like depth of crawling (i.e. how many sub-pages, how to behave when there are infinite sup-pages).
The following graph shows our index of Tor (dark blue) and I2P (light blue). As of April 2019, we have 10,197,379 items indexed for Tor and 1,557,915 items for I2P. An item can be any supported file format – including HTML, text, PDF, office documents (Word, Excel, and PowerPoint files), and since yesterday, even eBooks.
We have 2,250,020 .onion addresses in our index, although only a small fraction is actually active. For I2P our index has 3,565 .i2p domains listed.
August 2020: Latest News & Statistics Public API keys are retired We have retired public API keys. All 3rd party tools and integrations must use per-user API keys. Note: This also affects SpiderFoot users. We have updated our SDK to reflect that change. You can find your personal API key here: https://intelx.io/account?tab=developer In related news,
At Intelligence X, we value quality over quantity. Our goal is continuous improvement, sustainability, and stability. As we cross the mark of 25 billion records with 100+ TB of storage, it is time to set sail for 100 billion records. Counting records A single record is an extracted selector (search term) like “test.com”. A search
August 2020: Recap of 3rd-party OSINT tools and integrations We are listing all approved 3rd-party integrations here: https://intelx.io/integrations h8mail: “an email OSINT and breach hunting tool using different breach and reconnaissance services” Maltego Transform subfinder: “subdomain discovery tool that discovers valid subdomains for websites by using passive online sources” theHarvester: “The tool gathers emails, names,