A few days ago a new search engine DarkSearch for Tor launched, adding to the mix of other existing search engines out there like Ahmia, Torch, Not Evil, and Haystack – it’s time for a feature comparison!
No search engine can cover 100% of the pages due to the nature of Tor. There is no central .onion repository so the first challenge is to find the .onion links. Other challenges when running a search engine include data size (and associated storage and processing power), data formats, and many smaller challenges like depth of crawling (i.e. how many sub-pages, how to behave when there are infinite sup-pages).
The following graph shows our index of Tor (dark blue) and I2P (light blue). As of April 2019, we have 10,197,379 items indexed for Tor and 1,557,915 items for I2P. An item can be any supported file format – including HTML, text, PDF, office documents (Word, Excel, and PowerPoint files), and since yesterday, even eBooks.
We have 2,250,020 .onion addresses in our index, although only a small fraction is actually active. For I2P our index has 3,565 .i2p domains listed.
We launched a new product: “Identity Portal”! It allows users to find all lines in a text where a search term appears, and to download a list of leaked accounts under a specific domain or email address. This product is exclusively available on request to companies and governments. If you are interested, please contact us!
June 2020: New Phonebook service! 🎉 We just launched a free new service: https://phonebook.cz It lists all email addresses, subdomains, and URLs for the input domain. Try it out – it’s free! It uses the same dataset as intelx.io – which is 20 billion records. There is an existing phonebook feature at intelx.io since its
May 2020: New dorks website, Tor, DDoS test and a Europol takedown Our dataset continues to grow significantly: 17,660,962,195 selectors In the past few months, we have invested in 200+ TB of enterprise storage which allows us to scale up data collection even more. As for the public web, we are currently crawling these TLDs: