PII Leakage - Revealing Secrets

Summary :

PII stands for Personally Identifiable Information. It is a kind of data which helps us to identify ones identity, for instance your full name, social security number, taxpayer identification number, driver’s license number, PAN card number, mobile number, address, etc. This kind of issues can breach the privacy of anyone on the internet.

Description :

I have found this issue on one of the private program of HackerOne where it was leaking customer name and pending invoice amount. At first I used google dorks to find the information and found some PDF reports but after in-depth search I noticed that those PDF information is public.

There was one thing common in all PDF reports, it has not mentioned to whom the particular report belongs and is there any pending amount or not as the information was regarding the bill payments. After enumerating sub-domains using google dorks I found a domain which was having the functionality of “Invoice No”, so I randomly entered an invoice number but it gave me an error.

I noticed that the PDF reports that I found were having the year and some random numbers (for eg. <year>–13659) as their name so I entered the number without year and got the result and I was able to see the customer names and there pending or paid amounts.

How I found this vulnerability ?

  1. I used google dorks for checking invoice but found PDF reports which were already public
Google Dork
Google Dorks

2. After that I enumerated the subdomains and found “Invoice” functionality on one domain

Subdomain
Invoice Functionality

3. I entered random number but got an error

Invalid Invoice

4. Then I opened the publicly available PDF report’s directory

Publicly Available Reports
Report Files with IDs

5. Then I entered an invoice number from the file name “13656” and got the result

Invoice ID
Result

6. Then I entered another invoice ids to check the results

Result
Result

Why it happened ?

In my opinion the main reason for this kind of data leaks is improper security policy related to that data. When this type of data is uploaded on the server with access enabled from anywhere, they are indexed by web search crawlers. The indexing of this data was not avoided and necessary security policy was not set which led to this vulnerability.

Dorks Used :

  1. site:target.gov “invoice” - for identifying invoices of users from one domain
  2. site:*.target.gov - for enumerating subdomains
  3. site:*.target.gov “invoice” - for identifying invoices of users from all domains

Impact :

This kind of data can breach the privacy of any user on the internet. An attacker can steal all the personal information of the victim using this vulnerability.

Mitigation :

  1. Use robots.txt file

Example 1 :

User-agent: *

Disallow: /

This entry will not allow anyone to view the directories

Example 2 (My case) :

User-agent: *

Disallow: /reports/

This entry will not allow anyone to view /report/ directory

2. Use Meta Tag

Example 1 :

<META NAME=”ROBOTS” CONTENT=”NOINDEX, NOFOLLOW”>

It will prevent site scanning from web crawlers

Example 2 :

<META NAME=”GOOGLEBOT” CONTENT=”NOINDEX, NOFOLLOW”>

It will deny certain web crawl spiders to crawl the site

3. Do not use invoice ids as filename (My case)

Special Thanks To :

|Penetration Tester| |Hack The Box| |Digital Forensics| |Malware Analysis|