We just added native support to Intelligence X for the following data formats:
Native support means end-to-end support. This ranges from indexing and crawling files of various data sources, to processing them internally and presenting them to the end-user on the frontend intelx.io. Indexing is the process of taking a file, reading it, and extracting any text thus making it searchable.
intelx.io shows text preview in the results and supports inline view. This means that it immediately shows the text of a document in a detailed view (when you click on a result) without forcing the user to leave the website or download the file locally.
Inline view of office documents is a convenient feature, but also has an important security aspect: if the end-user downloads and opens unknown office documents (especially from the darknet), there is a risk of malicious embedded VBA macros and other exploits.
Intelligence X now natively supports all major office formats: Word, Excel, PowerPoint, and PDF.
Before, a PowerPoint file was displayed in the detailed view as “data salad” 🥗:
Now, you can see the text of the presentation (both in the preview and detailed view):
June 2021: New Usenet data category We added the new data category Usenet. It contains historical and current data from Usenet, which is “a worldwide distributed discussion system”. Today, Usenet is mostly used for piracy. This new category stores currently 209,469,453 selectors and is expected to grow substantially. Improved inline statistics We have improved the
Intelligence X supports Peernet – Founder’s Statement I am excited to announce Peernet, a decentralized network that allows sharing of data freely without censorship and restrictions. Here is the pitch deck: https://peernet.org/dl/Peernet%20Deck.pdf Peernet is making quick progress from its inception as I am finalizing the whitepaper and developing the core library. I would like to
February 2021: Launch of the European Internet Archive The European Internet Archive just launched! 🎉🥳 ➡ https://archive.eu/ 225 TLDs added to the list of web crawling We have added 225 top-level domains (TLDs) to the list of web crawling. Find the full list and how we are categorizing them in this blog post. Our dataset