Top-level domain considerations

Published on September 21, 2020 by

We want to give some insight into top-level domains (TLDs) supported by Intelligence X. Even though there are a total of 1,508 available TLDs (full list by IANA), we choose to only support certain TLDs in order to prevent spam, false positives, and maintain a high quality data set.

Validation of top-level domains appears in these places:

  1. Search term input: Detect when the user enters a valid domain (as well as email address, URL)
  2. Indexing of search results: Detect when a valid domain name appears in text

False Positives

“ZIP” is an example TLD that we intentionally do not support. It has only 2 domains (according to DomainTools), but supporting it would mean millions (if not billions) of false positives, due to its common use as file extension for ZIP files. “test.zip” as filename is on its own indistinguishable from “test.zip” as domain name.

If you try to search for a domain with an unsupported TLD, you will get this message:

Count of supported TLDs

2 years ago we last updated the list of supported TLDs and included 362 domains. Today, we added 29 that meet our criteria for quality. Interestingly, there are some TLDs that met the criteria 2 years ago, but now do not.

Crawling

Intelligence X operates public web crawlers. At the time of writing, 8 TLDs are currently being crawled and more are in preparation.

Related articles

List of buckets

Published on May 5, 2022 by

At Intelligence X we categorize data sources into buckets. Buckets can be used as filters and to broadly identify the source of individual search results. For example, the bucket “Darknet Tor” indicates the result origins from some a Tor hidden service (.onion domain) and was collected by our Tor crawler. Buckets have human readable names


Adding support for new top-level domains

Published on April 6, 2022 by

We just added support for an additional 152 top-level domains (TLDs), increasing the support to 511 TLDs in total. Support means that you can search for those domains across intelx.io and APIs, and internally that our backend supports processing them. While you can start searching for them immediately, it will take some time until our


A word of #OPSEC @theguardian

Published on March 7, 2022 by

Earlier today at 11:24 The Guardian Journalist Shaun Walker posted the security procedure and the security token used to pass makeshift checkpoints in Ukraine related to the Russian Ukrainian war: This is a reminder to journalists – and the public – to take OPSEC (operations security) seriously and not endanger people on the ground. Posting


Search the blog: