Reply to comment

Automated / Spider submissions to drupal

I have a lot of info I'd like to post to one of my drupal website (www.whatsupinmelville.co.za), but, like any geek, I'm too lazy to do it manually. So instead of doing it manually (which will probably take me 2 or 3 hours), I'll code something to do it in a few minutes.

The fact that it will take me more than 2 or 3 hours, as I'm learning python in the process, doesn't detract from the experience :).

As I'm quite new to python, I searched the web for existing methods to submit web forms using python.

The first hurdle to overcome was to accept cookies so that we can login to the drupal site. I found this site aspn.activestate.com which gave me more than enough information for that, and also included some valuable info on using urllib and urllib2 to post the necessary data.

The second hurdle was to actually post the data. www.voidspace.org.uk mentioned the use of urllib to encode the post data. I constructed the whole drupal form in python, only to realize drupal's got some kind of form validation going which makes it nigh impossible to directly post data to one of their forms...

Darn'it. I'm lazy, I'm not going to first get the form, parse it for form_build_id and form_token, the offending form values, and then submit the form. So I hacked drupal.

In includes/form.inc, on line 566 - 567 (Drupal 6.2):
// Setting this error will cause the form to fail validation.
form_set_error('form_token', t('Validation error, please try again. If this error persists, please contact the site administrator.'));

Ok, so just don't set the error. Just comment the line containing form_set_error('form_token'), and you can submit data using some or other spider at hearts wish...

It is of course possible to go the circuitous route and first fetch the page on which the form resides, get the token and build id, and then post it. Perhaps at some later stage. I think it's easier to just hack drupal, do your updates, and then unhack it.

ByvoegingGroote
submitspider.py3.1 KB

Reply

  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <pre> <h3> <h4>
  • Lines and paragraphs break automatically.

More information about formatting options