Open government is the governing doctrine which holds that citizens have the right to access the documents and proceedings of the government to allow for effective public oversight.Wikipedia
After the recent firing of Christopher Krebs, the 1st director of the Cybersecurity and Infrastructure Security Agency (CISA), we decided to leap into action and preserve public data from US governmental websites.
We created the new category, “Government US”, that includes all Public Web data from these TLDs:
As a result, you can now access historical versions of the CISA website via https://intelx.io/?did=3dab2dd6-724f-4c66-916f-62c586ab7037. New copies are made every few days so it will catch any changes – and preserve any content that might be deleted or altered by the current or future administration. This data is available for free (you don’t even need an account).
To search for historical versions of a particular US government domain, select the “Government US” category in the Advanced menu:
You will then see the website with all crawled URLs visualized as tree:
At Intelligence X, transparency is paramount. Our users have full access to our data set and we are transparent where data is coming from. If you click on a search result there is a “Metadata” tab that shows you all the details.
At the time of writing, the crawlers were running for less than 24 hours, even though the dataset is already growing quickly:
February 2021: Launch of the European Internet Archive The European Internet Archive just launched! 🎉🥳 ➡ https://archive.eu/ 225 TLDs added to the list of web crawling We have added 225 top-level domains (TLDs) to the list of web crawling. Find the full list and how we are categorizing them in this blog post. Our dataset
We have added the new category “Bot Logs”. It contains data collected by and leaked from viruses such as Azorult. Such data is often sold on marketplaces such as the Genesis Market. We decided to index such data into this new category to help filtering out relevant results. You can find this new category in
We are excited to announce that we just added 225 top-level domains (TLDs) to the list of web crawling! Below is the full list. The domain count per TLD represents the domains registered according to DomainTools. We group multiple TLDs into “buckets” to make it manageable – you can select these buckets in the Advanced