LocalBlox, a data analytics company, describes on its website how it “automatically crawls, discovers, extracts, indexes, maps and augments data” from a variety of sources, including Facebook, LinkedIn, Twitter and Zillow to build a “360 Degree people view,” that is then sold to marketers.
While the comprehensive range of data is scraped from publicly accessible sources, LocalBlox left a 1.2 terabyte file containing the personal data of 48 million individuals in an Amazon ‘storage bucket,’ password unprotected and accessible to anyone.
Data contained in the leak included names, physical addresses, dates of birth, scraped LinkedIn job histories, public Facebook data, and Twitter handles. Somebody with access to this data could theoretically use it to commit fraud, identity theft, or to aid in a social engineering scam like phishing.
The leak was noticed by cybersecurity firm UpGuard, which notified LocalBlox. The storage bucket was secured later that day. UpGuard outlined the breach in a report published Wednesday.
Data security has been in the spotlight since analytics firm Cambridge Analytica obtained user data on 87 million Facebook users and their friends and contacts through a third-party app.
The Cambridge Analytica breach led to an avalanche of further privacy scandals at Facebook, and landed CEO Mark Zuckerberg before a congressional panel earlier this month, where the House Energy and Commerce Committee took him to task on his company’s data collection and censorship practices.
But the LocalBlox leak illustrates just how much data can be scraped from Facebook without any kind of user consent or third-party app, even as the social media giant announced earlier this month that it would restrict its search function as a damage-control measure against automated profile scraping.
According to a sample profile from LocalBlox, the company knows a person’s first and last name, online identities, address, birthday, email and phone numbers, salary, housing information, credit rating, skills and interests, among hundreds of other data points.
LocalBlox CTO Ashfaq Rahman told ZDNet that UpGuard “hacked in” to the data file, but would not explain why he later restricted access to the bucket after speaking to UpGuard. UpGuard director Chris Vickery denies breaking the law, and said that the data was lying in the open, accessible to anybody who stumbled across it.
LinkedIn, Twitter, and Zillow all forbid scraping from their sites. Facebook is currently performing an audit into data harvesting on its platform in the wake of the Cambridge Analytica scandal, and intends to ban app developers that misuse personally identifiable user data.
While scraping publicly available data is often a legal gray area, the team at UpGuard are concerned that the resulting data haul can be handled so carelessly.
“This exposure was not the result of a clever hack, or well-planned scheme, but of a simple misconfiguration of an enterprise asset— an S3 storage bucket— which left the data open to the entire internet,” read a post on the security firm’s site.
“The profitability gained by data must come with the responsibility of protecting its integrity and privacy.”