So this weekend, I wanted to spend some time on gathering some
statistical data from the Iracing interface via automated scrapping.
Here is the initial report, lets cut the suspense for the ones
who are reading :)
So the ones that are not familiar with Iracing license model it can
be explained as simple as:
When you join Iracing, you start with a rookie license, you can
get promoted to the next level license only if you complete
certain # of races, and meet a safety rating 3/4 or more.
As of right now, nobody had gathered and shared this statistic
of what % of users are in which license level as far as I know
and thats why i wanted to take on the challenge just to keep my coding/scrapping skills fresh as a small weekend hackathon :)
So How did I do it?
Well, when I started on the idea Friday night, i was quick enough
to find out, that you cant scrape using conventional scripting methods with python (mechanize and beautifullSoup). And yes I do enjoy scripting these types of things in Python :)
This is because mechanize does not support javascript, and without javascript support I can't even login. After a couple hours of researching it turned out the best way is to use a package called "Selenium RC" and invoke it via its python API.Selenium RC is just a simple remote control interface for firefox(it also supports other browsers for runtime but not debugging)... I can say ,after my short experience, Selenium is really good for these types of web UI automation&testing tasks.
I used to use a similar tool back in the day called OpenKapow. But it had a
very extensive ETL like browser IDE.I personally feel I have more control over with Python compared to using a bulky IDE...exception handling is also easier this way.
+1 for Selenium folks! :)