One Street, Five Databases: Evaluating Business Directory Listings Across Multiple Products
March 23, 2018
Bill Pardue, Arlington Heights Memorial Library
Rather than a product review, this exercise aims to demonstrate a method for understanding the variations in coverage between supposedly similar products and analyze the possible impacts these variations might have for libraries’ business services.
Many libraries provide access to at least one “large” business directory product, which purportedly includes information on tens of millions of U.S. businesses. Common offerings are InfoGroup’s ReferenceUSA or its direct competitor, AtoZdatabases, although there are other contenders. Patrons may use these for generating contact lists, researching potential employers, competitive intelligence, and developing marketing plans. Librarians who manage database subscriptions often examine the differences between competing products, looking for ways to compare interface design, ease of downloading and, perhaps most important, comprehensiveness of data. Any time a discussion of large business directories comes up at the bimonthly meeting of the north/northwest-suburban ELSUM (ELectronic SUbscription Managers) group, stories immediately emerge of how one database is holding on too long to obsolete listings, while another is missing the latest sushi place on the corner.
I was looking for a way to do a manageable comparison of a variety of products that would be limited in scope, yet would give me a quick “lay of the land” with regard to coverage. In 2016, I managed to share such a comparison via the ELSUM and BIG (Business Interest Group) mailing lists, but wanted to revisit it this year, with some additional analysis.
My own library, Arlington Heights Memorial Library, has access to ReferenceUSA and Lexis Nexis’ Corporate Affiliations (an oft-overlooked large directory within that product). Thanks to the Illinois State Library’s annual Try-it Illinois product trials, I was also able to include listings from AtoZdatabases, Mergent Intellect, and Gale’s Demographics Now (while Demographics Now is not primarily marketed as a “big directory” product,
it does have a company search function similar to ReferenceUSA and AtoZdatabases).
SELECTING THE DATA
I wanted to search a specific geography in each product, but did not want to be overwhelmed with thousands of results. I realized that a business might be listed in different ways in different products, and I wanted to be able to “clean up” those listings in Excel without too much difficulty. I eventually settled on the idea of searching for businesses along a single street in Arlington Heights: Vail Avenue north and south of central Arlington Heights. Vail Avenue is primarily a residential street, but it does have a cluster of businesses as it runs through downtown Arlington Heights for about 4/10 of a mile.
My thought was that there would be a few dozen listings there, along with a number of home-based business listings in the more residential areas—enough to do some basic analysis.
With that in mind, I went into each database and searched for a list of all current businesses on Vail Avenue in Arlington Heights, and downloaded the results, retrieving the largest set of information available for each business. However, for my analysis, I eventually used only the business name, address, employee count, and location sales. I also selected the most comprehensive current listings available from each product, e.g., both “verified” and “unverified” in ReferenceUSA, or “All Records” in AtoZdatabases. In ReferenceUSA, changing the status from “Verified” to “Unverified” made a significant difference (dropping the total from 113 to 52), but changing the status in AtoZdatabases between “All Records,” “Records with a Deliverable Address and Phone Number” and “Records With a Phone Number” made much less difference, varying from 40 to 42 results.
After trimming each database’s results, I added a column to each, indicating the source database (ReferenceUSA, Mergent Intellect, etc.), to help with later analysis. Then I pasted the results from each into a single combined spreadsheet, and the real work began. First, I weeded out listings for ATMs and Redbox. The more tedious job was going through to merge variations on business names, so that every business would be listed with just one name across all databases—there would often be slight variations in spelling, punctuation, abbreviations, LLC designation, etc. Repeatedly scanning a pivot table of business names allowed me to go back to the complete listings and make sure that any given name was listed the same way in all five sets of results. Ultimately, I was able to whittle the listings down to a set of 241 unique business names. Once that was done, I could do additional work with pivot tables to really get an idea of what I was working with.
My first analysis was simply to find out how many listings each product found along Vail Avenue. This was already somewhat obvious from the search and download process, but the pivot table allowed me to group findings clearly:
Results ranged from 39 listings out of AtoZ to 140 listings out of Mergent Intellect. I chose to use both “verified” and
“unverified” listings from ReferenceUSA, which produced 113 listings (ReferenceUSA’s search interface indicates that verified results are those for which the phone number has verified and the address quality has been checked). If I restricted the results to “verified” listings, I had 42 records.
I then wanted to do some overlap analysis of the records. How many businesses were listed in only one database?
Surprisingly, a very large number of records (124), more than half of all businesses found, existed in only one product:
Next, I examined which databases contained the greatest number of unique listings and how listings were distributed across multiple databases (this required working in MS Access).
For the unique listings, the clear leaders were ReferenceUSA (verified + unverified) and Mergent Intellect:
Finally, the distribution of listings across multiple databases reflected again in how few were found in every source:
ATOZ AND REFERENCEUSA
I was especially interested in a comparison between AtoZdatabases and ReferenceUSA, since they market against each other directly. At first glance, ReferenceUSA has the upper hand, with 113 listings, while AtoZdatabases had 39. However, after filtering out ReferenceUSA’s “unverified” listings, there were only 52 records from that database. I presumed that the two lists might overlap more completely. However, of the 69 records between them, only 22 were listed in both, with 47 unique to one product or the other (30 for ReferenceUSA, 17 for AtoZdatabases).
ODD SALES ESTIMATES FINDINGS
Within each database, there were odd cases of duplication for sales estimates across several dissimilar businesses. Some of this might be expected for “round” sales figures, such as “$250,000” or “$1,500,000,” etc., but this seemed to happen for strangely precise sales amounts. While ReferenceUSA listed five different construction/remodeling firms at $1,001,000, AtoZ listed four different businesses with sales estimated at $318,570 (a psychic, a salon, a chiropractor, and a local ballroom). Company Dossier listed seven to eight different businesses each at the amounts of$110,000, $140,000 and $150,000, and Mergent Intellect listed two firms at the astonishingly specific precise sales amount of$87,243 (an intellectual property firm and a sushi restaurant).
Databases could also have wildly varying estimates of a business’s sales. One restaurant shows sales of $275,955 in both Mergent Intellect and DemographicsNow (which usually match on sales data), while AtoZdatabases lists it at $4,672,360. A Thai restaurant lists four different sales amounts from the five different products, ranging from $150,000 to $637,140, while another business ranges from $67,000 to $781,000.
To some extent, this variation in sales amounts may be one of the most frustrating aspects of these products for our patrons, since knowing the sales volume of one’s target market can be so important.
It is tempting to look at this and think that the obvious conclusion is simply that, say, Mergent Intellect or ReferenceUSA are the “winners” because of their larger retrieval numbers (140 and 113, respectively). However, it also seems to highlight that no one product seems to provide a definitive snapshot of the current business population for any given geography. The bottom line is that, if comprehensive coverage is your goal, you should offer multiple “big directory” products to your users.
Naturally, issues such as interface design and ease of use are also very important in selecting and using business directories, but the underlying data are ultimately the key issue. Unfortunately, short of mounting a significant phone call/door knocking campaign to independently verify the listings from the various databases, there’s a limited amount we can do to really test these products.