Online webcomic scraper database
Since the site layout or urls of webcomics regularly change, comic scrapers need to be constantly updated. The actual engine (Dosage itself) doesn't change so much and it should - in my opinion - be separated from the data/content.
What I'm suggesting is creating an online 'service' that provides all the necessary information for scraping a certain webcomic on request. Depending on the implementation of this, it could also be usable for various other webcomic downloaders or even user created scripts.
For example a repository system like apt:
- User can subscribe to (one or more) url, dosage retrieves comic list.
- User can also define local comic scrapers (these always overrule the online ones)
This would not only 'ensure' up to date scrapers for every user, but would also enable administrators to easily remove defunct or prohibited content (removal request from webhosts/authors ?).
By creating such a centralized system it would also be possible the add some sort of availability check to known comics (via i.e. RSS or a simple GET request) and keep track of recently updated sites (to prevent unnecessary network activity and thus stress on hosting sites).
Of course, a community interface would be a very nice addition because it would enable users to submit their own scrapers (or update existing).
Question information
- Language:
- English Edit question
- Status:
- Answered
- For:
- Dosage Edit question
- Assignee:
- No assignee Edit question
- Last query:
- Last reply:
Can you help with this problem?
Provide an answer of your own, or ask Jeroen Rommers for more information if necessary.