adding caller id information to vonage voicemail emails

A few months ago I wrote about a Perl script I use to convert the .wav attachments on my Vonage voicemail emails to .mp3, in order to save mailbox space. One other thing that has always annoyed me about those Vonage emails is that they don’t contain Caller ID information. Vonage already has this data, so why not put it on the email?

Perl to the rescue! I was actually rather surprised to find that there isn’t already a Perl module to look up this data from some web service. The only Caller ID module I could find dealt with converting data strings from modems, so my next task was to find a way to get this data from a website.

I couldn’t find a web service which provides this data, so that meant I was gonna have to scrape some HTML from somewhere. After a bit of Googling® for free reverse-phone search websites, I found that switchboard.com had the best HTML output (HTML 4.01 Strict, yay!) and cleanest layout for easily determining and scraping the data I wanted (results are in handy <div class="listing"> tags — thanks Switchboard.com!), plus a query URL that was rather simple.

So, here is the “callerid.pl” script I came up with. Simply call it with the phone number you want to query, and it will return either the listing(s) available (name, address, city/state/zip, formatted phone #) or a message saying there is no info available. I’ll leave it as an exercise for the reader how to incorporate this into the Vonage .wav-to-.mp3 script.

This script requires the standard LWP module and the nonstandard HTML::TreeBuilder module.

Update 4/4/2007: I have updated the script to include gd’s fixes from comment #4 below due to changes in Switchboard’s page. Thanks gd!

#!/usr/bin/perl

use warnings;
use strict;

use LWP::UserAgent;
use HTML::TreeBuilder;

my $phone = shift or die "Usage: callerid.pl <phone number>\n";

my $data = &lookup($phone);

if(defined $data) {
    my %hash   = map { $_, 1 } @$data;
    my @unique = keys %hash;
    print join("\n", @unique), "\n";
} else {
    print "No Caller ID information available\n";
}

sub lookup {
    my($phone) = @_;

    my $ua = LWP::UserAgent->new;
    $ua->agent("MyApp/0.1 ");

    my $url = "http://www-p.switchboard.com/swbd.main/dir/rpresults.htm?SR=&MEM=1&TYPE=BOTH&QV=&PH=$phone&search.x=44&search.y=21&search=Search";
    my $req = HTTP::Request->new(GET => $url);

    my $res = $ua->request($req);

    if ($res->is_success) {
        return &parse($res->content);
    } else {
        return undef;
    }
}

sub parse {
    my($data) = @_;
    my $tree = HTML::TreeBuilder->new();

    $tree->parse($data);
    $tree->eof();

    my $listings = [];
    my $info;

    &find_listings($tree, $listings);

    if(scalar @$listings > 0) {
        $info = &get_info($listings);
    }

    $tree->delete();
    return $info;
}

sub find_listings {
    my($node, $listings, $depth) = @_;
    $depth ||= 0;

    if($depth > 100) {
        die "infinite recursion detected";
    }

    return unless ref($node);
    $node->{'_parent'} = undef;

    my($tag, $class, $content) = @{ $node }{ qw(_tag class _content) };

    if(defined($class) && $class eq 'listing' &&
        defined($tag) && $tag eq 'div') {

        push @$listings, $content;
        return;
    }

    return if (!(defined($content) || ref($content) ne 'ARRAY'));

    foreach my $child (@$content) {
        &find_listings($child, $listings, $depth + 1);
    }
}

sub get_info {
    my($nodes) = @_;

    my $info = [];

    my($data, $tag, $class, $content);

NODE:
    foreach my $node (@$nodes) {

        next NODE unless ref($node) eq 'ARRAY';

        foreach my $item (@$node) {

            ($tag, $class, $content)
                = @{ $item }{ qw(_tag class _content) };

            if(defined($tag) && defined($class) && defined($content) &&
                $tag eq 'span' &&
                $class =~ m/^(contactinfo|address|citystatezip)$/ ) {

                $data = &extract_divtext($content);
                push(@$info, @$data);

            } elsif (defined($tag) && defined($class) && defined($content) &&
                $tag eq 'div' &&
                $class =~ m/^header$/) {

                $data = &extract_divtext($content);
                push(@$info, @$data);
            }

        }

    }

    return $info;
}

sub extract_divtext {
    my($content, $depth) = @_;

    return [] unless ref($content) eq 'ARRAY';

    $depth ||= 0;

    if($depth > 100) {
        die "infinite recursion detected";
    }

    my $data = [];

    foreach my $line(@$content) {

        if(ref($line) eq 'HTML::Element') {

            my $subdata = &extract_divtext($line->{_content}, $depth + 1);
            push @$data, @$subdata;

        } elsif(!ref($line)) {

            push @$data, $line;

        }
    }

    @$data = map { s/^\s+//; $_ } @$data;

    return $data;
}