Using jQuery in Node with jsdom

After having watched a ton of Node.js tutorials (and TAing for a JS class), I decided a while ago “for my next script, I’m totally going to use Node.”

So I finally got the opportunity this last week to write a script. Tasked with a menial job, making a script to accomplish it brightened my day.

The first script was dealing with an xml api feed. So I immediately found xml2js, a nice converter and set about looping through some api urls, collecting the data I needed and totaling it up. It was a mess, and looked like this:

var https = require(“https”);
var parseString = require(‘xml2js’).parseString;

https.get(“https://someplace/someapi”, function(response){

var body = ”;
response.on(“data”, function(chunk) {
body += chunk;

response.on(“end”, function(){
parseString(body, function (err, result) {
totalEntries += result.feed.entry.length;
for(var i=0; i < result.feed.entry.length; i++){ something += parseInt(result.feed.entry[i]['something'][0]['somethingelse'][0].$.thingiwant); } console.log("Total stuff: " + something); }); }); } [/sourcecode] This one was easy to get what I needed, but clearly not the right way to do it. Because the functions happen asynchronously, blah blah blah, that's not what I'm writing about. The next one was very similar, but I had to scrape a webpage, not just xml data. So I found a nice lib called jsdom, which created a dom for me to use jquery on. [sourcecode language="javascript"] var jsdom = require("jsdom"); jsdom.env(url, function(errors, window){ var $ = require("jquery")(window); var total = 0; $(".some_class").each(function(key, value){ // just use a regex to get it // it's buried in the onclick, so I'll have to use a regex regardless... var result = value.innerHTML.match(/newWindow\('([^']*)'/)[1]; // get first grouping jsdom.env(host + result, function(errors, window){ var $ = require("jquery")(window); // use regex to get the xxxxxxx because I'm lazy var result = $('head').html().match(/someRegex/g); if(result !== null){ for(var i = 0; i < result.length; i++){ var thing = result[i].match(/"([^"]*)"/)[1]; // get first grouping total += thing; } } }); }); }); [/sourcecode] This was super easy / super powerful to use something I'm already so familiar with to accomplish a task that is well suited to that. The scripts themselves took minutes to write -- if you don't take into account the time I spent finding where to get what I needed.

1 Comment

  1. JaZahn says:

    Yes. Phantom would have been a better choice.

Leave a comment