Ah, I got an idea, how about an "image search" of some kind? It would be useful to search for screenshots with. Not sure how it would be implemented though, it could index links from the screenshot threads here on the forums, and collect thumbnails etc from the mod pages.
In addition to this, when searching mods, there could be an additional "find screenshots" option beside entries, would be useful.
Unfortunaly this is a "great idea and I would love to do it" thing but.... It would take quite a time to implement and use up quite a bit of CPU resources. I will keep it in mind.
So Yacoby, as it seems that indexing the downloads section of my forum isn't possible, what is necessary to make it work. And what will optimize it?
(I intend to use my new-found PHP skills to create an integrated download section at GHF at some point in the maybe-not-so-near-future, and I want to ensure that it can be easily and efficiently indexed.)
I have had another look and I may be able to parse it. Maybe. I don't think I can parse descriptions, but I should be able to grab the title and author.
Regarding parsing, the vauge good ideas are:
A page with a mod on should be identifiable via the url. For example, a TESNexus mod page is:
http://www.tesnexus.com/downloads/file.php?id=
numberWhich makes it easy for me to give it a higher priority, and not try and parse junk pages, filling up my error log. It also ensures I don't have to parse thousands of pages to get to a single mod.
Every mod page should have an exact HTML structure, and the section with the mod details shouldn't differ at all (Which is part of the reason I struggle (and have given up trying) to parse some smaller sites).
The mod elements should be identifiable in some way. Wiwiland has every element (Name, author, version etc) in a unique css tag which is ideal, even though they have the author details prefixed with author. I can strip that off.
Table based structures also aren't a problem.