A student recently decided to create an algorithm like Google's to index web pages. After some iterations and months of hard work the code did not seem to work as well as Google's, so he decided to use the fruits of his labor in a more restricted field - he limited his crawler to find bit-torrent files. He then proceeded to check those torrents to eliminate the deadwood and the fakes and to put up the ones that had sufficient information on a server. Here is his description of what he did and you can check the result of his work on the site -
TorrentFactory.org that hosts the final product - over 3 million active torrents sorted by number of seeders and searchable on-site.
I'm a young student and when I had to do a project, I created an algorithm to index web pages like the Google one. It first was very basic, just grabbing pages, indexing them and searching through, with the number of occurrence of keywords as the only sorting method. A few months later, I had finished a more complex algorithm, with a sort of PageRank and so. But after all, it still wasn't as good as google. So I had the idea to use this "technology" for projects and maybe build a company on it. And that's what I've started with TorrentFactory.org.
Instead of searching through the entire database, I'm splitting the content to get only torrent files and save those which have enough information (at least a title). It's a kind of specialization, but I'm preparing the future with other websites related to the internet content, it's the base for my work.
Moreover, I have 6 servers at home running to download and split the content and two web servers, if I would have tried to serve search results, I would have needed a lot of other servers and probably wouldn't get any real visitors so that's why I decided to specialize my first services in a way.
By the way, I'm not making any money with that project, it's free and ad-free, because a company (DediServ.eu) believe in the project and are sponsoring it, so they lend me free servers to run the sites, although the biggest part is still done at my home.
I've spent exactly 1025 hours on that project ( I know the exact time because we did a party 2 days back for the 1 000 hour reach ), and the basic search engine takes almost 1 000 000 of lines of codes.
And here a message from the site that gives a bit more explanation of how things work.
TorrentFactory.org isn't a regular torrent website like a lot of new online torrent sites. I indeed use a crawler but I'm not simply indexing torrents from popular site or what ever, I have developed a complete engine that crawls the entire Internet, exactly like Google and other large search engine do.
Once pages are stored on large capacity hard drives (I currently have 10 TB on local network storage), my home servers split the data between the content pages and the torrent files - and that's a huge job. It takes a lot of time and resources because it also tests the torrent content, see if there are fakes etc. When there is enough information to save the torrent (a title related to the torrent filename, a category, maybe a description etc.), the torrent is then transfered to another server where the trackers are tested, if all trackers are dead, the torrent will not be added. This server tests each torrent 3 times per day during 4 days and uploads it to the production environement, the final servers (
http://www.torrentfactory.org)
By the way, you can follow me on the
TorrentFactory's twitter.
The entire project is sponsored by DediServ.eu who ensure our reliability and your privacy protection while browsing our site, who also allow us to run a clean ad-FREE site !
You need to be a member of P2P Foundation to add comments!
Join P2P Foundation