Start Looking

Monday, June 26, 2006

Census Name Number Nonsense

Of all the stories about Ancestry.com's indexing triumph, this one stands out.

US genealogy site Ancestry.com has now completed indexing and digitizing the entire US Census from 1790 to 1930 – featuring more than five billion names.
Ancestry.com's team spent 6.6 million hours of labour deciphering handwriting from 13 million original census documents and 21.9 billion keystrokes manually entering information into the database. [Link]
Five billion names from 13 million census pages over 15 censuses? That comes out to about 385 names per page, and would give the United States an average population in those years of 333 million.* This is odd, since our population isn't supposed to reach 300 million until this fall.

Also, they managed to type in 5 billion names with only 21.9 billion keystrokes, meaning that the average American's name was only 4 or 5 letters long. Notwithstanding "Cher," most American names ramble on for at least 6 or 7 characters.

This is what happens when you read a press release too quickly:
The addition of the complete census collection makes Ancestry.com the most comprehensive genealogical database ever compiled online with more than five billion searchable names.
*The average population would be even higher since only the head-of-household was named in the first six censuses, and the 1890 census was mostly destroyed.

Jason

They were probably fudging the truth by counting up every first, middle and last name in the census, and counting the enumerator on every page as well.

Chris

Or maybe they ran out of census pages and started indexing their own indexes.

Randy Seaver

Chris,

I'll bet it was 5 billion pieces of information - surname, given name, gender, race, age, birthplace, parent's birthplace, immigration data, languages, occupation, etc. If they indexed all of that, then that comes out to something like 1.2 billion pieces for a population of 100 million.

Between 1850 and 1880, they collected only half of that for a smaller population, and between 1790 and 1840 it was even less. So 5 billion as a total number of bits of info is certainly possible.

Randy

Andy E. Wold

"... our population isn't supposed to reach 300 million until this fall."

Chris, careful on the article you linked to here. They think that 2006 - 1967 = 29 (instead of 39) :)

Chris

Randy: Ah, but the original press release said "more than five billion searchable names." Methinks the reporter who regurgitated this release mistook the total number of names in all of Ancestry's databases for the number of names in their U.S. Census collection.

Andy: Nice catch! I've switched the link to a version of the story written by someone who can subtract.

Post a Comment

« Newer Post       Older Post »
Related Posts Plugin for WordPress, Blogger...