A few months ago I wrote about a Perl script I use to convert the .wav attachments on my Vonage voicemail emails to .mp3, in order to save mailbox space. One other thing that has always annoyed me about those Vonage emails is that they don’t contain Caller ID information. Vonage already has this data, so why not put it on the email?
Perl to the rescue! I was actually rather surprised to find that there isn’t already a Perl module to look up this data from some web service. The only Caller ID module I could find dealt with converting data strings from modems, so my next task was to find a way to get this data from a website.
I couldn’t find a web service which provides this data, so that meant I was gonna have to scrape some HTML from somewhere. After a bit of Googling® for free reverse-phone search websites, I found that switchboard.com had the best HTML output (HTML 4.01 Strict, yay!) and cleanest layout for easily determining and scraping the data I wanted (results are in handy <div class="listing"> tags — thanks Switchboard.com!), plus a query URL that was rather simple.
So, here is the “callerid.pl” script I came up with. Simply call it with the phone number you want to query, and it will return either the listing(s) available (name, address, city/state/zip, formatted phone #) or a message saying there is no info available. I’ll leave it as an exercise for the reader how to incorporate this into the Vonage .wav-to-.mp3 script.
This script requires the standard LWP module and the nonstandard HTML::TreeBuilder module.
Update 4/4/2007: I have updated the script to include gd’s fixes from comment #4 below due to changes in Switchboard’s page. Thanks gd!
#!/usr/bin/perl
use warnings;
use strict;
use LWP::UserAgent;
use HTML::TreeBuilder;
my $phone = shift or die "Usage: callerid.pl <phone number>\n";
my $data = &lookup($phone);
if(defined $data) {
my %hash = map { $_, 1 } @$data;
my @unique = keys %hash;
print join("\n", @unique), "\n";
} else {
print "No Caller ID information available\n";
}
sub lookup {
my($phone) = @_;
my $ua = LWP::UserAgent->new;
$ua->agent("MyApp/0.1 ");
my $url = "http://www-p.switchboard.com/swbd.main/dir/rpresults.htm?SR=&MEM=1&TYPE=BOTH&QV=&PH=$phone&search.x=44&search.y=21&search=Search";
my $req = HTTP::Request->new(GET => $url);
my $res = $ua->request($req);
if ($res->is_success) {
return &parse($res->content);
} else {
return undef;
}
}
sub parse {
my($data) = @_;
my $tree = HTML::TreeBuilder->new();
$tree->parse($data);
$tree->eof();
my $listings = [];
my $info;
&find_listings($tree, $listings);
if(scalar @$listings > 0) {
$info = &get_info($listings);
}
$tree->delete();
return $info;
}
sub find_listings {
my($node, $listings, $depth) = @_;
$depth ||= 0;
if($depth > 100) {
die "infinite recursion detected";
}
return unless ref($node);
$node->{'_parent'} = undef;
my($tag, $class, $content) = @{ $node }{ qw(_tag class _content) };
if(defined($class) && $class eq 'listing' &&
defined($tag) && $tag eq 'div') {
push @$listings, $content;
return;
}
return if (!(defined($content) || ref($content) ne 'ARRAY'));
foreach my $child (@$content) {
&find_listings($child, $listings, $depth + 1);
}
}
sub get_info {
my($nodes) = @_;
my $info = [];
my($data, $tag, $class, $content);
NODE:
foreach my $node (@$nodes) {
next NODE unless ref($node) eq 'ARRAY';
foreach my $item (@$node) {
($tag, $class, $content)
= @{ $item }{ qw(_tag class _content) };
if(defined($tag) && defined($class) && defined($content) &&
$tag eq 'span' &&
$class =~ m/^(contactinfo|address|citystatezip)$/ ) {
$data = &extract_divtext($content);
push(@$info, @$data);
} elsif (defined($tag) && defined($class) && defined($content) &&
$tag eq 'div' &&
$class =~ m/^header$/) {
$data = &extract_divtext($content);
push(@$info, @$data);
}
}
}
return $info;
}
sub extract_divtext {
my($content, $depth) = @_;
return [] unless ref($content) eq 'ARRAY';
$depth ||= 0;
if($depth > 100) {
die "infinite recursion detected";
}
my $data = [];
foreach my $line(@$content) {
if(ref($line) eq 'HTML::Element') {
my $subdata = &extract_divtext($line->{_content}, $depth + 1);
push @$data, @$subdata;
} elsif(!ref($line)) {
push @$data, $line;
}
}
@$data = map { s/^\s+//; $_ } @$data;
return $data;
}




The PumaCode.org Blog :: converting vonage voicemails to mp3 (or ogg vorbis) | 15-Sep-06 at 10:20 am | Permalink
[...] UPDATE 9/15/2006: I have also written a Perl script to grab Caller ID information from Switchboard.com, which can be used to add this info to your emails as well. [...]
Glenn Howe | 03-Feb-07 at 4:50 pm | Permalink
I love your script. However, it stopped working all of the sudden. I notice that switchboard has changed something. Just wondering if your were going to update your script… Thanks.
toby | 03-Feb-07 at 8:25 pm | Permalink
Hello Glenn, you’re right, it doesn’t work anymore but I hadn’t even noticed yet. Looks like Switchboard has totally changed their format around plus changed the query string as well. Guess that’s the danger of “scraping” websites.
I’ll have to do a bit more looking around again to see if I can figure something out…
gd | 09-Feb-07 at 11:12 am | Permalink
Hi toby, your post helped me to solve my task regarding vonage emails. thank you.
I;m posting here actual changes to your code.
1. eliminate duplicates from $data array. switchboard has 2 div with same data.
2. post URL is changed now.
3. some of old divs are now span
I hope this will help some ppl :)
John Tehan | 07-Jul-07 at 2:42 am | Permalink
Could this be reversed so that I could get the phone number if I have name and address?