In what could be termed as a sequel to the Collection #1 data exposure that took place earlier this year, security researchers have discovered an unsecured cloud database containing personal information of up to 1.2 billion people, including names, email addresses, phone numbers, and social media profile information.
The unsecured cloud database was discovered by security researchers Bob Diachenko and Vinny Troia who, upon deep analysis, noted that the database contained up to 4 terabytes of data and 4 billion user accounts belonging to over a billion people from all across the world.
The researchers opined that the gigantic Elasticsearch database contained information obtained from two different data enrichment companies- namely People Data Labs and Oxydata whose job is to collate hundreds of pieces of data points of information to a person's profile before selling such data to buyers.
1.2 billion data records in the unsecured cloud database were obtained from data enrichment companies
The unsecured cloud database was accessible via web browser at http://184.108.40.206:9200 and data stored in it was arranged in four separate data indexes, labeled “PDL” and “OXY”, with PDL and OXY appearing as a source alongside each user record.
After comparing the data in the PDL indexes with statistics provided on People Data Labs' website, the researchers concluded that personal records of up to 1.2 billion people and 650 million unique email addresses in the PDL indexes accurately matched records collated by the data enrichment company.
"The data discovered on the open Elasticsearch server was almost a complete match to the data being returned by the People Data Labs API. The only difference being the data returned by the PDL also contained education histories. There was no education information in any of the data downloaded from the server. Everything else was exactly the same, including accounts with multiple email addresses and multiple phone numbers," they noted.
They also found that data in the OXY indexes contained information scraped completely from LinkedIn and appeared to match data collated by data enrichment company Oxydata which claims to have 4TB of user data and 380 million people profiles. Even though neither PDL nor Oxydata own the database, the person who created the database may have used information from both sources to build more accurate profiles of over a billion people.
Based on available information, the researchers concluded that it is impossible to identify the owner of the database as its IP address traced back to Google Cloud Services. It is also not possible to state whether the owner of the database was a customer of both data enrichment companies, whether data was stolen from these companies, or whether data obtained from both companies legally was misused.
"One could argue that because PDL’s data was mis-used, it is up to them to notify their customers. One could also argue that the owner of 220.127.116.11 is responsible and liable for any potential damages. But legally, we have no way of knowing who that is without a court order.
"Due to the sheer amount of personal information included, combined with the complexities identifying the data owner, this has the potential raise questions on the effectiveness of our current privacy and breach notification laws," they concluded.
Need for a more security-conscious mindset among cloud providers & operators to prevent breaches
"This incident is less of a data leak and more of a full on data tsunami. The biggest challenge when these kinds of repositories are found is that it's near impossible to accurately identify who the owner is. It could be a company that is legitimately recording data, or a third party tasked with compiling profiles, a researcher, or a criminal," says Javvad Malik, security awareness advocate at KnowBe4.
"Regardless of who set it up. the fact that its insecure and publicly accessible means that anyone could have taken the data for any purpose. While it is stated that sensitive data such as passwords weren't included, the sheer volume of aggregated data makes the whole thing sensitive as a whole.
"There is no easy fix to these kinds of issues, and we will likely continue to see such leaks. We need vendors, cloud providers, and system administrators to adopt a more security-conscious mindset so that across the digital realm a secure culture propagates. Making it difficult for any one person to harvest data, aggregate in such large quantities, and leave publicly exposed," he adds.
Sam Curry, chief security officer at Cybereason, says that this data breach is a stark reminder that consumers need to rethink their own security hygiene.
"Today, everyone should assume their private information has been stolen numerous times and will continue to be accessible to a growing number of threat actors. To keep threat actors at bay, please reset passwords regularly and don't use the password 123456 or ABCDEF.
"In this day and age, and with a more complex and diverse attack surface, this is never a good idea. Laziness is no excuse, as hackers prey on this and their biggest asset is patience and time. Please tighten your passwords; and if you are one of the millions of people using 123456 - STOP!"
In January this year, security researcher Troy Hunt discovered a massive unsecured cloud database hosted on cloud service MEGA that contained 1.16 billion unique combinations of email addresses and passwords, including 773 million unique email addresses and over 21 million unique passwords.