Top-level domain considerations

Published on September 21, 2020 by

We want to give some insight into top-level domains (TLDs) supported by Intelligence X. Even though there are a total of 1,508 available TLDs (full list by IANA), we choose to only support certain TLDs in order to prevent spam, false positives, and maintain a high quality data set.

Validation of top-level domains appears in these places:

  1. Search term input: Detect when the user enters a valid domain (as well as email address, URL)
  2. Indexing of search results: Detect when a valid domain name appears in text

False Positives

“ZIP” is an example TLD that we intentionally do not support. It has only 2 domains (according to DomainTools), but supporting it would mean millions (if not billions) of false positives, due to its common use as file extension for ZIP files. “test.zip” as filename is on its own indistinguishable from “test.zip” as domain name.

If you try to search for a domain with an unsupported TLD, you will get this message:

Count of supported TLDs

2 years ago we last updated the list of supported TLDs and included 362 domains. Today, we added 29 that meet our criteria for quality. Interestingly, there are some TLDs that met the criteria 2 years ago, but now do not.

Crawling

Intelligence X operates public web crawlers. At the time of writing, 8 TLDs are currently being crawled and more are in preparation.

Related articles

Newsletter 2021-01-23

Published on January 23, 2021 by

January 2021: Capitol Hill Riots Archive, New Features Our dataset contains now 30 billion records and grows by +3 billion every month. Last year has been incredible for us. We are growing by any metric – all organically. Searches are up 113% Q-on-Q and our user base is growing sustainably at 18% month-on-month. Capitol Hill


Archiving Capitol Hill riots’ media

Published on January 7, 2021 by

We are archiving media from todays Capitol Hill riots. Follow our Twitter account for updates. We have archived the Capitol Hill riots media here: https://intelx.io/?did=814b39fe-ad98-45a1-9f44-0346bc9f9b94 Use the “Tree View” tab to see all pictures and videos. The FTP server stores now 200 GB (about 4000 files). Note: Some files contain graphic content.


Newsletter 2020-12-22

Published on December 22, 2020 by

December 2020: Telegram channel Subscribe to our new Telegram channel at: https://t.me/intelxio New Telegram subscribers receive a free 3-day Professional license code! We will post frequent updates and technical background info to this channel. 🏛 Open Government We are archiving all .GOV and .MIL websites. Any changes will be preserved, any alterations detected! Read the


Search the blog: