We released our Go “fileconversion” library here: https://github.com/IntelligenceX/fileconversion
It supports converting many file formats to plaintext, and provides other related functions. It was tested on 184+ million files and is used for intelx.io. We are happy to contribute back to the open source community.
February 2021: Launch of the European Internet Archive The European Internet Archive just launched! 🎉🥳 ➡ https://archive.eu/ 225 TLDs added to the list of web crawling We have added 225 top-level domains (TLDs) to the list of web crawling. Find the full list and how we are categorizing them in this blog post. Our dataset
We have added the new category “Bot Logs”. It contains data collected by and leaked from viruses such as Azorult. Such data is often sold on marketplaces such as the Genesis Market. We decided to index such data into this new category to help filtering out relevant results. You can find this new category in
We are excited to announce that we just added 225 top-level domains (TLDs) to the list of web crawling! Below is the full list. The domain count per TLD represents the domains registered according to DomainTools. We group multiple TLDs into “buckets” to make it manageable – you can select these buckets in the Advanced