We want to give some insight into top-level domains (TLDs) supported by Intelligence X. Even though there are a total of 1,508 available TLDs (full list by IANA), we choose to only support certain TLDs in order to prevent spam, false positives, and maintain a high quality data set.
Validation of top-level domains appears in these places:
“ZIP” is an example TLD that we intentionally do not support. It has only 2 domains (according to DomainTools), but supporting it would mean millions (if not billions) of false positives, due to its common use as file extension for ZIP files. “test.zip” as filename is on its own indistinguishable from “test.zip” as domain name.
If you try to search for a domain with an unsupported TLD, you will get this message:
2 years ago we last updated the list of supported TLDs and included 362 domains. Today, we added 29 that meet our criteria for quality. Interestingly, there are some TLDs that met the criteria 2 years ago, but now do not.
Intelligence X operates public web crawlers. At the time of writing, 8 TLDs are currently being crawled and more are in preparation.
June 2021: New Usenet data category We added the new data category Usenet. It contains historical and current data from Usenet, which is “a worldwide distributed discussion system”. Today, Usenet is mostly used for piracy. This new category stores currently 209,469,453 selectors and is expected to grow substantially. Improved inline statistics We have improved the
Intelligence X supports Peernet – Founder’s Statement I am excited to announce Peernet, a decentralized network that allows sharing of data freely without censorship and restrictions. Here is the pitch deck: https://peernet.org/dl/Peernet%20Deck.pdf Peernet is making quick progress from its inception as I am finalizing the whitepaper and developing the core library. I would like to
February 2021: Launch of the European Internet Archive The European Internet Archive just launched! 🎉🥳 ➡ https://archive.eu/ 225 TLDs added to the list of web crawling We have added 225 top-level domains (TLDs) to the list of web crawling. Find the full list and how we are categorizing them in this blog post. Our dataset