So I’m running a YaCy node – which is a pretty awesome project to create a search engine indexed “by the people, for the people.”
YaCy provides a java servent that can index internal resources and external web pages. You have MANY controls over what and how it’s indexing and the resources allocated to it. There are tons of built-in analytics and logging for the stats geek in you.
I really like the idea of indexing all the content you care about and also providing that index to the world at large to search, but I have concerns over the long-term impact of more ‘bots crawling the web. I would like to see YaCy figure out a way to minimize it’s impact on a global level – if every yacy node is indexing the same sites, it could easily escalate to a DDoS-level problem. Perhaps they’re already working on this issue.