How 896 ID Cards Were Exposed Through a Vietnamese University's Exam Portal

How 896 ID Cards Were Exposed Through a Vietnamese University's Exam Portal


Update (June 23, 2026): Partially remediated

Company Y has added server-side authentication on the XHR API gateway and blocked account-level IDOR. However, partial candidate data remains accessible, bulk ID enumeration still works, and CDN image access has regressed. See revalidation details below.

LOW MEDIUM HIGH CRITICAL

CVSS 3.1: 9.1 (CRITICAL)

Vulnerability allows unauthenticated extraction of identity and biometric data.

Summary

University X is one of Vietnam’s well-known public universities, with roughly 40,000 students across multiple faculties. Its Testing Center administers standardized language proficiency exams for hundreds of candidates each cycle. To register, candidates must submit deeply personal information: full legal name, date of birth, phone number, email, national ID (CCCD) number, and critically, high-resolution photos of the front and back of their government-issued identity cards. This data, if exposed, represents a complete identity dossier that cannot be “reset” like a password.

The discovery began with a routine Google search. A query for exam-related keywords returned a direct link to the Testing Center’s website, and clicking through revealed a candidate’s complete registration profile including their CCCD number and photos of their physical identity card publicly displayed on an open webpage. No login was required. No special tools. Just a Google search and a click.

Through reverse engineering the platform’s JavaScript framework, I confirmed that this was not an isolated incident. The entire system, built by a third-party vendor called Company Y on their “Connections” SaaS platform, had zero access controls separating public visitors from the database. All 896 candidates’ personal data including CCCD/CMND numbers, ethnicity, place of birth, and front and back photos of their national ID cards could be systematically extracted by anyone with a web browser.

1. Introduction

1.1. Background

University X is one of Vietnam’s well-known public universities, home to roughly 40,000 students across multiple faculties. It is particularly recognized for its foreign language and international studies programs. The university’s Testing Center operates a web system at tec.universityx.vn to administer standardized language proficiency exams such as VSTEP (Vietnamese Standardized Test of English Proficiency) for hundreds of candidates each cycle. The system handles the entire exam lifecycle: registration, room assignment, candidate list publication, and result announcements.

During registration, candidates must submit deeply personal information: full legal name, date of birth, phone number, email address, national ID (CCCD) number, ethnicity, and critically, high-resolution photos of the front and back of their government-issued identity cards. This data, if exposed, represents a complete identity dossier for each individual – one that, unlike a compromised password, can never be changed or revoked.

1.2. Third-Party Platform: Company Y

Like many Vietnamese institutions, University X does not build or maintain its own software. Instead, the entire exam management system including the database, API (Application Programming Interface) layer, file storage, and CDN (Content Delivery Network) runs on a SaaS platform called “Connections,” developed and operated by Company Y (companyy.com).

This means University X handed over complete control of their candidates’ most sensitive data to an external vendor. The university likely trusts that this platform has proper security measures in place. The question this report answers: does it?

The answer is no. The Connections platform provides no access control boundary whatsoever between a random internet visitor and the personal data stored in its database.

It started with a simple Google search. Searching for exam-related keywords returned a direct link to the Testing Center’s website. Clicking through revealed a candidate’s complete registration form: full name, date of birth, CCCD number, and even photos of their physical identity card – all displayed on a public webpage.

No login was required. No special tools. Just a Google search and a click.

This raised the critical question: was this one candidate’s data accidentally exposed, or was every single candidate’s personal information wide open?

The answer, as this report demonstrates, is the latter.

Google search results showing candidate PII publicly indexed on the internet Figure 1: Google search results revealing candidate registration data including full name, date of birth, and CCCD number – publicly indexed and accessible to anyone.

1.4. Scope and Ethics

The research was conducted strictly for security assessment purposes:

  • All data access used publicly available endpoints.
  • No authentication was bypassed - because none existed to bypass.
  • No data was modified, deleted, or exfiltrated to third parties.
  • No brute-force attacks were performed against protected resources.
  • The techniques described replicate what any internet user with basic technical knowledge could perform through a web browser’s developer console.

1.5. Technical Attack Classification

The following MITRE ATT&CK techniques and OWASP categories are relevant to the methods used in this research:

Technique Framework Application in Research
IDOR (Insecure Direct Object Reference) OWASP Top 10 Database object IDs used directly in unauthenticated API calls
Broken Access Control OWASP Top 10 No authentication on any API endpoint
API Abuse OWASP API Top 10 Unlimited access to search and data retrieval operations
Reconnaissance (T1592) MITRE ATT&CK Google dorking to discover indexed PII pages
Active Scanning (T1595) MITRE ATT&CK Probing API endpoints and database schema
Data from Info Repos (T1213) MITRE ATT&CK Extracting data from exposed database APIs
JS API Hijacking Web Security Invoking internal framework functions via page.evaluate()
Foreign Key Traversal Database Security Following foreign keys to discover hidden tables
CDN URL Harvesting Web Security Extracting encoded image URLs from rendered DOM

2. Infrastructure Mapping: University X, Company Y, and Connections

Before diving into the vulnerability, it is important to understand who actually runs this system and how the pieces fit together. The Testing Center’s website is not what it appears to be on the surface.

2.1. Discovering the Vendor Relationship

The first step in the analysis was understanding who actually operates the infrastructure behind tec.universityx.vn. Inspecting the HTML source revealed that the web application loads its core JavaScript framework from external domains:

<script src="https://cdn.companyy.com/js/jquery.main.isj"></script>
<script src="https://cdn.companyy.com/js/include.core.isj"></script>

The domain companyy.com belongs to Company Y, a technology company in Vietnam. Further analysis revealed a sprawling multi-domain infrastructure:

Domain Role Relationship to University X
companyy.com Company domain Platform vendor
cdn.companyy.com JavaScript CDN Delivers JS/CSS framework
xhr.companyy.com API gateway Handles all XHR (XMLHttpRequest - background data exchange API) calls to the database
tts.companyy.vn Framework server Hosts application configuration files
connections.vn Platform brand “Connections” SaaS platform
connections.universityx.vn University X-specific API Dedicated API endpoint for University X
local.universityx.connections.vn File storage Stores PDFs and uploaded files
i0.connections.vn Image CDN (node 0) Serves ID card photos
i3.connections.vn Image CDN (node 3) Serves ID card photos
thuctap.companyy.com CSS server Delivers CSS for sub-applications

2.2. “Connections” Platform Architecture

The “Connections” platform appears to be a general-purpose SaaS framework similar to Salesforce or Airtable – providing:

  • A database layer where tables are identified by 32-character hexadecimal hashes
  • A client-side JavaScript framework with Vietnamese API functions (xửLý, CĂN.db, config)
  • A file storage CDN at local.{org}.connections.vn
  • An image CDN at i{N}.connections.vn with encoded URL parameters
  • A multi-tenant architecture where multiple organizations (University X, and potentially others) share the same infrastructure

The critical security implication: University X’s candidate data including ID card photos is stored on and served by Company Y’s shared infrastructure, not on servers controlled by University X.

2.3. Subdomain Structure and Data Flow

User's Browser
    |
    |-- (1) GET tec.universityx.vn/page
    |         |-- Loads HTML skeleton
    |
    |-- Loads JS from cdn.companyy.com
    |         |-- Loads CSS from thuctap.companyy.com
    |
    |-- (2) XHR to xhr.companyy.com/xhr/ (or connections.universityx.vn/xhr/)
    |
    |-- "doiTuong.tai.{table_hash}" = database query
    |         |-- Returns list of object IDs
    |
    |-- (3) XHR to xhr.companyy.com/xhr/
    |         |-- "CAN.db({table}.{id})" = load object
    |
    |-- Returns ALL fields including CCCD, phone number, etc.
    |
    |-- (4) GET local.universityx.connections.vn/upload/{cat}/{date}/{file}
    |
    |-- Downloads PDF files (candidate lists)
    |
    |-- (5) GET i0.connections.vn/{encoded_path}?q={encoded_token}
              |-- Downloads ID card photos (front/back)

None of these requests require authentication.

3. Reverse Engineering the Database Structure

The next step was understanding how the database is organized. This involved reading the website’s JavaScript code, which turned out to be written entirely in Vietnamese, and discovering that database table identifiers and record IDs are passed around without any security checks.

3.1. Phase 1: URL Structure Analysis

The initial URL indexed by Google provided the first clue about the database structure:

https://tec.universityx.vn/7fdc5fa41f345xxxx4bba6b0d3e449385/1518250/2368M35018
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^  ^^^^^^^  ^^^^^^^^^^
                    Table hash (Registration)         Obj ID   File number

This URL directly exposes:

  1. Registration table hash: 7fdc5fa41f345xxxx4bba6b0d3e449385
  2. Registration object ID: 1518250
  3. Candidate’s file number: 2368M35018

3.2. Phase 2: Reverse Engineering the JavaScript Framework

The framework source at cdn.companyy.com/js/include.core.isj is heavily minified and uses Vietnamese identifiers. Through runtime analysis (executing functions in the browser console and observing XHR traffic), I identified the core API functions:

Function Discovered Behavior
xửLý(action, params, opts, cb) Main dispatcher. Sends XHR to xhr.companyy.com/xhr/. The action string determines the operation.
CĂN.db(key, callback) Loads a database object by key (format: table_hash.object_id). Caches results in local memory.
config(key) Retrieves a previously loaded object from local cache. Returns a JavaScript object with field IDs (numbers) as keys.
dữLiệu Global object containing the page’s data context.

3.3. Phase 3: Discovering Database Schema via IDOR

The key discovery was that the API uses Insecure Direct Object References (IDOR): database table hashes and object IDs are passed directly from client to server with no authorization checks. This allowed me to:

  1. Obtain the Registration table hash (visible in the URL)
  2. Query the Registration table using xửLý("đốiTượng.tải.{hash}", ...) with arbitrary field filters
  3. Load a Registration object and inspect all its fields – discovering field IDs, data types, and foreign key references
  4. Discover the Candidate table hash by following the foreign key in field 1686869 (Candidate ID), which references objects in table 3576ff3533bb4xxxx8e394a0aa83a461f
  5. Load Candidate objects to access the full set of personal data fields

3.4. Phase 4: Field-by-Field Mapping

Each database object is returned as a dictionary with numeric string keys (field IDs). I mapped them by:

  1. Loading multiple candidate objects
  2. Cross-referencing field values with rendered page content
  3. Identifying field types (string, integer, date, reference, file)

Field Type System

The framework uses single Vietnamese characters to mark field types in stored values:

Symbol Storage Format Meaning
(none) Plain string/number Direct value (name, phone, CCCD)
{"ậ":["ID"]} JSON with reference ID Reference to another object (ethnicity, province)
{"ị":["ID"]} JSON with file ID Reference to an uploaded file/image

The "ậ" (reference) type stores a pointer to a lookup object for example, the ethnicity field 1626773 stores {"ậ":["146992"]}, which resolves to “Kinh” when the page renders. The "ị" (file) type stores a pointer to an uploaded file for example, field 1658487 stores {"ị":["4296"]}, which resolves to a CCCD photo on the image CDN.

Registration Table Field Map (Reverse Engineered)

Field ID Field Name Type Example
1626725 Exam Session ID Foreign Key (FK) 54914
1626730 File Number String 2368M35018
1630529 Status Code Integer 1
1630538 Active Flag Boolean 1
1642331 SBD (Exam Seat Number) String AN1001
1654914 Registration Timestamp Timestamp 1726716820
1686869 Candidate ID Foreign Key (FK) 582319
1704995 Scores JSON [{“nghe”:”8.5”}]
mãĐịnhDanh Identifier Code String V2KT2106AN1114
tổngTiền Exam Fee Integer 1800000

Candidate Table Field Map (Reverse Engineered)

Field ID Field Name Type Example / Notes
1686868 Full Name String Nguyễn Thị xxxx Trà
1626768 Last Name & Middle Name String Nguyễn Thị Linh
1626772 First Name String Trà
1626773 Ethnicity Reference {“ậ”:[“146992”]} → Kinh
1626783 Date of Birth Date 28/09/2000
1626784 Gender Enum 1=Male, 2=Female
1626788 Phone Number String 037xxxx973
1626793 Email String email@gmail.com
1626818 Place of Birth Reference {“ậ”:[“147000”]} → Bắc Kạn Province
1626820 Province/City (Current) Reference {“ậ”:[“146999”]}
1646777 CCCD/CMND Number String 0222xxxx2576
1658487 CCCD Photo (Front) File Reference {“ị”:[“4296”]}
1658488 CCCD Photo (Back) File Reference {“ị”:[“243”]}
2102859 Workplace String Free text

3.5. Phase 5: Foreign Key Traversal Diagram

The complete traversal path from a public URL to ID card photos:

+-------------------------------------------+
|           Public URL                      |
|  tec.universityx.vn/{hash}/{id}/{file}           |
+---------------------+---------------------+
                      |
          exposes table hash + object ID
                      |
                      v
+-------------------------------------------+
|        Registration Object                |
|  Table: 7fdc5fa4...                       |
|  Contains SBD, file number                |
+---------------------+---------------------+
                      |
           foreign key field 1686869
                      |
                      v
+-------------------------------------------+
|        Candidate Object                   |
|  Table: 3576ff35...                       |
|  Contains ALL PII                         |
+----------+----------------+---------------+
           |                |
           v                v
+-------------------+  +----------------------+
| Personal Data     |  | ID Card Photos       |
| CCCD, Phone,      |  | Front + Back         |
| Email, Ethnicity, |  | on i0.connections.vn  |
| Place of Birth    |  |                      |
+-------------------+  +----------------------+

4. CDN Discovery: Image and File Infrastructure

Identity card photos are stored on a separate image server, not in the database itself. Understanding how these image URLs are generated was key to proving that photos could be downloaded in bulk by anyone.

4.1. Static File CDN: local.universityx.connections.vn

All uploaded documents (PDFs, candidate lists) are stored on a static file server with a predictable URL structure:

https://local.universityx.connections.vn/upload/{category_id}/{YYYY/MM/DD}/{uuid_filename}

File metadata including all URL components is embedded in the main page’s HTML as a cached JSON blob. The metadata uses minified Vietnamese field names:

JSON Key Meaning Example
"i" File ID 534167
"ạ" Category ID 25
"ô" Server hostname local.universityx.connections.vn
"ớ" Date path 2024/09/19
"ũ" Original filename Danh sach phong thi.pdf
"ợ" Server filename (UUID) f2247559bcac…03.pdf

I discovered 32 candidate list PDFs by parsing this metadata and filtering for filenames containing “danh sach” or “phong thi.” I encountered a critical bug: JSON-escaped forward slashes in the date path (2024\/09\/19) caused HTTP 404 errors until I added path unescaping logic to the code.

4.2. Image CDN: i{N}.connections.vn

The most sensitive discovery was the image CDN infrastructure. ID card photos are served from a load-balanced CDN with nodes i0.connections.vn, i3.connections.vn, etc.

Encoded Image URLs

Unlike the static file CDN (which uses human-readable paths), the image CDN uses encoded/obfuscated paths and query parameters:

# CCCD front photo:
https://i0.connections.vn/kt3PG1hPTgTPG1MJ.u54.L8JKx-X.LDJ9Ei
  tREHqGaAN9ZWP6gM46Wbb?q=wqQ2KLBn6g3dDaf29mOnD7stDafo9aAo...

# CCCD back photo:
https://i0.connections.vn/kt3PG1hPTgTPG1MJ.u54.L8JKx-X.LDJ9Ei
  z9mONGaAN9ZWP6gM46Wbb?q=wqQ2KLBn6g3dDaf29mOnD7stDafo9aAo...

# Portrait photo:
https://i0.connections.vn/kt3PG1hPTgTPG1MJ.u54.L8JKx-X.LDJ9Ei
  tREHdGaAN9ZWP6gM46Wbb?q=wqQ2KLBn6g3dDaf29mOnD7stDafo9aAo...

Key observations about URL structure:

  • The URL path encodes the file reference - different images for the same candidate differ by a few characters in the path
  • The q= query parameter appears to be a session or authentication token, but is identical across all images in a single page load, suggesting it is a page-level token rather than per-image
  • These URLs cannot be guessed - they can only be obtained by rendering a candidate’s detail page in a browser (the JavaScript framework generates them at runtime)
  • However, once the page is rendered, these URLs can be downloaded directly via simple HTTP GET without any additional cookies or headers

How Image URLs Are Generated

The JavaScript framework generates image CDN URLs during page rendering through the following process:

  1. Reads the file reference field (e.g., {"ị":["4296"]})
  2. Encodes the file ID, organization context, and a session token into the URL path and query string
  3. Assigns the URL as a CSS background-image attribute on a <div> element

This means image URLs cannot be constructed programmatically from file IDs alone – the JavaScript framework’s encoding algorithm must be executed in a browser environment. My tool solves this by rendering each candidate’s page in a headless browser and extracting background-image URLs from the DOM.

Image Download Verification

Downloaded images were verified as actual ID card photos:

  • File sizes ranged from 44KB to 228KB (consistent with phone camera photos of ID cards)
  • Valid JPEG image files
  • Three images per candidate (typically): portrait photo, CCCD front, CCCD back
  • Images contain readable text including the candidate’s name, CCCD number, date of birth, and address printed on the physical card

File explorer showing downloaded national ID card photos Figure 3: File explorer showing 2,600+ downloaded national ID card photos (portrait, front, and back) of exam candidates.

5. Complete Attack Chain

Here is the complete chain of steps, from the initial Google discovery to downloading all 896 candidates’ identity card photos. Each step builds on the previous one, and none of them require any form of authentication.

5.1. Overview

The attack chain combines multiple techniques to escalate from a single Google result to full PII extraction including ID card photos:

  1. Google Dorking (Reconnaissance)
  2. Source Code Analysis (Framework Identification)
  3. JavaScript API Reverse Engineering (Schema Discovery)
  4. IDOR Exploitation (Database Traversal)
  5. Foreign Key Traversal (Cross-Table Access)
  6. PDF Metadata Extraction (Seed Data Collection)
  7. Headless Browser API Injection (Bulk Extraction) – A headless browser is a web browser running in the background without a graphical interface, enabling automation (e.g., Playwright, Puppeteer)
  8. DOM Scraping (Reference Resolution + Image Harvesting)
  9. CDN Image Download (ID Card Photo Extraction)

5.2. Step 1: Google Dorking - Initial Discovery

A routine Google search containing a candidate’s name and University X-related keywords returned a direct link to the Testing Center website:

TRUNG TAM KHAO THI TRUONG DAI HOC X
https://tec.universityx.vn/7fdc5fa41f345xxxx4bba6b0d3e449385/1518250/2368M35018
"Phieu dang ky thi nang luc ngoai ngu..."

The rendered page displayed the full exam registration form including the CCCD number, date of birth, ethnicity, contact information, and ID card photos.

5.3. Step 2: Source Code Analysis - Identifying Company Y

Inspecting the page source revealed:

  • JavaScript loaded from cdn.companyy.com (Company Y)
  • CSS loaded from thuctap.companyy.com
  • API calls to xhr.companyy.com and connections.universityx.vn
  • Framework configuration at tts.companyy.vn/nguyendinhhuy
  • Vietnamese function names (xửLý, CĂN.db, config)

5.4. Step 3: Reverse Engineering the JavaScript Framework API

Using the browser’s developer console, I probed the framework’s global scope:

> typeof xuLy       // "function" main API handler
> typeof CAN.db     // "function" database loader
> typeof config     // "function" config retriever
> typeof duLieu     // "object"   page data context
> CAN               // {fn, khoa, lib, js, db, _db}

By intercepting XHR traffic in the Network tab while loading a candidate page, I observed the request/response pattern:

POST https://xhr.companyy.com/xhr/
  Request:  {action: "doiTuong.tai.7fdc5fa4...", d: {thuocTinh: {...}}}
  Response: ["1518250", "1518251", ...]  // List of Object IDs

5.5. Step 4: IDOR + Foreign Key Traversal

With the API functions identified, I exploited IDOR to traverse the database:

// 1. Find Registration by SBD (exam seat number from PDF)
xuLy("doiTuong.tai.7fdc5fa41f345xxxx4bba6b0d3e449385",
  {d: {thuocTinh: {"1642331": "AN1001"}}}, {}, function(ids) {
    console.log(ids);
    // ["1518250"]
  });

// 2. Load Registration object -> discover Candidate table
CAN.db("7fdc5fa41f345xxxx4bba6b0d3e449385.1518250", function() {
  var reg = config("7fdc5fa41f345xxxx4bba6b0d3e449385.1518250");
  console.log(reg["1686869"]);  // "582319" (Candidate ID)
  // Candidate table hash discovered from foreign key relationship
});

// 3. Load Candidate object -> access ALL personal data
CAN.db("3576ff3533bb4xxxx8e394a0aa83a461f.582319", function() {
  var c = config("3576ff3533bb4xxxx8e394a0aa83a461f.582319");
  console.log(c["1646777"]);  // "0222xxxx2576" (CCCD number)
  console.log(c["1626788"]);  // "098xxxx321"   (Phone number)
  console.log(c["1626793"]);  // "email@example.com"
  console.log(c["1626773"]);  // {"ậ":["146992"]}  (Ethnicity reference)
  console.log(c["1658487"]);  // {"ị":["4296"]}     (CCCD front photo)
  console.log(c["1658488"]);  // {"ị":["243"]}      (CCCD back photo)
});

5.6. Step 5: Seed Data Extraction from PDFs

To enumerate all candidates, I extracted the list of SBD (exam seat numbers) from 32 publicly accessible PDF files. PDF metadata was embedded in the website’s HTML as a cached JSON object with abbreviated Vietnamese field names. After unescaping JSON-encoded paths, all PDFs were downloadable from the static file CDN at local.universityx.connections.vn. Result: 896 unique SBDs extracted from 32 PDFs.

5.7. Step 6: Automated Bulk Extraction

I developed a Python tool (crawl_xxxx.py) that automates the entire chain using Playwright (headless Chromium browser):

# Each worker runs 2 browser pages:
#   api_page:    stays on /dangkythi for fast JS API calls
#   render_page: navigates to each candidate's detail page

def lookup_sbd(api_page, render_page, sbd):
    # Fast API lookup (~2 seconds)
    reg_ids = _api_search(api_page, REG_TABLE, {F_SBD: sbd})
    reg_data = _api_load_object(api_page, REG_TABLE, reg_ids[0])
    cand_data = _api_load_object(api_page, CAND_TABLE, reg_data[F_CAND_ID])
    # -> CCCD, phone, email, workplace now retrieved

    # Render page for references + images (~20 seconds)
    render_page.goto(f"tec.universityx.vn/{REG_TABLE}/{reg_id}/{file_num}")
    # Smart wait: wait until "Dan toc" text appears
    # -> Extract ethnicity, place of birth from DOM text
    # -> Extract image URLs from CSS background-image attributes
    # -> Download ID card photos from connections.vn CDN

Spreadsheet showing crawled candidate data Figure 2: Spreadsheet of extracted candidate data including names, registration numbers, CCCD numbers, dates of birth, emails, phone numbers, ethnicity, and place of birth demonstrating the scale of the data exposure.

5.8. Step 7: Parallel Execution with Adaptive Throttling

The tool supports 3 parallel workers by default, each with its own Playwright browser instance (for thread safety). If the server starts rejecting requests (3 consecutive failures), the tool automatically falls back to a single worker:

Workers: 3 (Default)
    |
    +-- Worker 1: Browser + 2 pages (api + render)
    +-- Worker 2: Browser + 2 pages (api + render)
    +-- Worker 3: Browser + 2 pages (api + render)
    |
[3 consecutive failures on any worker]
    |
    v
Workers: 1 (Fallback)
    +-- Worker 1: Browser + 2 pages (api + render)

5.9. Step 8: Graceful Error Handling

The tool implements multiple robust recovery mechanisms:

  • Ctrl+C Handling: When interrupted, the tool immediately saves all collected data to CSV before exiting.
  • Network Error Recovery: On timeout, the tool resets the browser session and continues.
  • Periodic Saving: Flushes data to CSV every 10 candidates to minimize data loss.
  • Resume Support: On restart, the tool reads the existing CSV file and skips already-processed SBDs.
  • Reference Caching: Ethnicity and Place of Birth lookups are cached (there are only ~54 ethnic groups and ~63 provinces in Vietnam), so page rendering becomes unnecessary for previously encountered reference IDs.

6. Results: Extracted Data

6.1. Data Volume

Metric Value
Total candidates (from PDFs) 896
Candidates with resolved CCCD 896 (100%)
Candidates with ethnicity info 896 (100%)
Candidates with place of birth 896 (100%)
ID card photos downloaded 2,600+ (portrait + front + back)
Success rate 100%
Workers used 3 (parallel)
Processing speed ~5 candidates/minute (including page rendering)

6.2. Extracted Data Fields per Candidate

Field Source Sensitivity Level Example
SBD (Exam Seat Number) PDF Low AN1001
Full Name API Medium Nguyễn xxx xxxx Trà
Date of Birth API + PDF Medium 28/09/2000
Gender API + PDF Low Female
CCCD/CMND Number API Critical 0222xxxx2576
Phone Number API High 037xxxx973
Email API High email@gmail.com
Workplace API Medium Free text
Ethnicity Rendered DOM High Kinh
Place of Birth Rendered DOM Medium Bắc Kạn Province
ID Card Photo (Front) CDN Critical JPEG, 44–228 KB
ID Card Photo (Back) CDN Critical JPEG, 44–228 KB
Exam Scores API + PDF Medium 8.5/6.0/5.5/7.0
File Number API Low 2368M35018

6.3. Output Directory Structure

output/
  candidates_phase1.csv     # 896 candidates (SBD, name, DOB, gender, scores)
  candidates_phase2.csv     # 896 candidates (+ CCCD, phone, email, ethnicity,
                            #   place of birth, image paths)
  candidates_full.csv       # Final merged complete dataset
  images/
    AN1001_front.jpg        # ID card front photo
    AN1001_back.jpg         # ID card back photo
    AN1001_extra.jpg        # Portrait/additional candidate photo
    AN1002_front.jpg
    ...                     # ~2,600 images total
  pdfs/                     # 32 original candidate list PDFs
  file_index.json           # PDF file index metadata

7. Detailed Data Analysis

This section presents statistical analysis of the 896 candidate records extracted from candidates_phase2.csv. All statistics were computed programmatically from the raw data.

7.1. Demographics

  • Total candidates: 896 unique individuals
  • Gender: 661 Female (73.8%), 235 Male (26.2%)
  • Age range (birth years): 1969–2005 (37-year span)
  • Median birth year: ≈ 2000 (most common: 2000 with 234 candidates, followed by 1998 with 117 and 1999 with 102)
  • The 3:1 female-to-male ratio and concentration around 1998–2002 birth years are entirely consistent with the profile of foreign language proficiency exam candidates at a university specializing in foreign languages.

7.2. Data Completeness

Field Populated Empty Fill Rate
SBD (Exam Seat Number) 896 0 100.0%
Full Name 896 0 100.0%
Date of Birth 896 0 100.0%
Gender 896 0 100.0%
CCCD/CMND 896 0 100.0%
Phone Number 896 0 100.0%
Email 896 0 100.0%
Ethnicity 818 78 91.3%
ID Card Photo (Front) 820 76 91.5%
ID Card Photo (Back) 816 80 91.1%
Place of Birth/Province 576 320 64.3%
Workplace 135 761 15.1%

The 7 core PII fields (name, date of birth, gender, CCCD, phone number, email, file number) all have a 100% fill rate. The low workplace fill rate (15.1%) suggests most candidates are students who left this non-required field blank.

7.3. Email Domain Analysis

Domain Count %
gmail.com 766 85.5%
s.universityx.edu.vn 87 9.7%
universityx.edu.vn 22 2.5%
yahoo.com / yahoo.com.vn 4 0.4%
gmail.con (typo) 3 0.3%
Other (.edu.vn, .gov.vn) 14 1.6%

The 3 instances of the gmail.con typo and the 87 students using their student ID as a prefix for University X email ({student_id}@s.universityx.edu.vn) are noteworthy: the typos confirm this is real user-entered data, while the latter pattern inadvertently creates a secondary channel exposing student ID numbers.

7.4. CCCD/CMND Format Analysis

Vietnam has issued identification documents in three formats, all of which appear in this dataset:

Format Count % Description
12-digit CCCD 474 52.9% New Citizen Identity Card (post-2021)
10-digit CMND 271 30.2% Old People’s Identity Card (10-digit)
9-digit CMND 135 15.1% Old People’s Identity Card (9-digit)
Other lengths 16 1.8% Student IDs/Passport numbers/Data entry errors

6 duplicate CCCD numbers were found (each appearing in 2 registrations), indicating candidates who registered for exams multiple times. This confirms these are real registration records spanning from September 2024 to February 2026.

7.5. Geographic Distribution

The first 3 digits of a 12-digit CCCD encode the province of issuance. Among the 474 new-format CCCDs:

Code Province/City Count %
001 Hanoi 144 30.4%
036 Nam Dinh 38 8.0%
038 Thanh Hoa 36 7.6%
034 Thai Binh 31 6.5%
030 Hai Duong 25 5.3%
024 Bac Giang 23 4.9%
035 Ha Nam 16 3.4%
033 Hung Yen 16 3.4%
027 Bac Ninh 15 3.2%
037 Ninh Binh 15 3.2%

The distribution is heavily concentrated in the Red River Delta region in northern Vietnam, consistent with University X’s location in Hanoi. Hanoi-based candidates alone account for 30.4% of new-format CCCD holders.

7.6. Image Repository Statistics

Image Type Count Description
*_front.jpg 820 CCCD/CMND front photo
*_back.jpg 816 CCCD/CMND back photo
*_extra.jpg 813 Candidate portrait photo
Total 2,449 223 MB on disk

The approximately 76–80 candidates missing images most likely registered before the mandatory ID card photo upload requirement was implemented, or uploaded documents in non-standard formats that the DOM scraper could not extract.

7.7. PDF Source Analysis

Time Period PDF Count Exam Type
September 2024 4 C1 English, Chinese, Japanese, Korean
March 2025 4 C1 English, Chinese, Japanese, Korean
May 2025 1 University X Test
June 2025 6 NN2 SĐH (Second Foreign Language for Graduate Studies)
October 2025 4 English, Chinese, Japanese, Korean
January 2026 9 Morning/Afternoon sessions (multiple days)
February 2026 4 Morning/Afternoon sessions
Total 32 Period: Sep 2024 – Feb 2026

8. Root Cause Analysis

8.1. Architectural Flaws in the Connections Platform

The data exposure stems from fundamental architectural design decisions in Company Y’s Connections platform:

  1. No API Authentication Layer: The XHR API endpoints at xhr.companyy.com and connections.universityx.vn accept requests from any JavaScript execution context. There are no tokens, session cookies, or API key checks.

  2. No Field-Level Access Control: The API returns all fields for any requested object. A user accessing a public page to view the exam schedule receives the exact same data as an administrator viewing CCCD numbers and ID card photos.

  3. IDOR by Design: Database object IDs are used directly in URLs and API calls. Table hashes serve as identifiers that provide no security they are plainly visible in URLs and easily discoverable through foreign key traversal.

  4. Client-Side-Only Security: All business logic and data filtering occurs in browser-side JavaScript. The server acts as a transparent data store, enforcing no access controls whatsoever.

  5. No CDN Controls: Both the static file CDN and image CDN serve content without authentication. Once a URL is known (or extracted from a rendered page), any HTTP client can download the file.

  6. No Rate Limiting: The API accepts hundreds of sequential queries from a single client without throttling, enabling easy bulk data extraction.

8.2. Third-Party Vendor Risk

University X has entrusted sensitive candidate data including photos of government-issued CCCD/CMND cards to Company Y’s Connections platform. This creates a supply chain vulnerability:

  • University X may be entirely unaware that the platform has zero access controls.
  • The same architectural flaws likely affect all organizations using the Connections platform, not just University X.
  • University X is limited in its ability to implement security controls on infrastructure it does not operate.
  • The vendor relationship means that remediating this vulnerability requires Company Y to redesign their platform architecture.

8.3. Why Image CDN Encoding Is Not Security

The image CDN uses encoded URL paths (e.g., kt3PG1hPTgTPG1MJ.u54.L8J...), which superficially appears to provide security. However:

  • The encoding is performed by client-side JavaScript, which users fully control.
  • Encoded URLs are embedded directly as CSS background-image attributes in the DOM, making them trivial to extract.
  • Once extracted, these URLs require no cookies, tokens, or headers to download the images.
  • Any headless browser can render a candidate’s page and automatically harvest all image URLs.

9. Impact Assessment

The data exposed in this vulnerability is not just abstract “PII.” For 896 real people, it represents everything needed to steal their identity. In Vietnam, CCCD photos are widely used for KYC (Know Your Customer) verification at banks, e-wallets like MoMo and ZaloPay, and telecom providers. A leaked CCCD photo is essentially a master key to someone’s financial life. What follows is an assessment of what this exposure means for the real students and professionals whose data was left wide open.

9.1. Affected Individuals

  • 896 unique candidates confirmed from exams across 2024–2026.
  • The database very likely contains candidates from all historical exam sessions, potentially numbering in the thousands.
  • All candidates have their full PII profiles and ID card photos accessible without authentication.

9.2. Severity: ID Card Photo Exposure

The exposure of CCCD/CMND card photos is far more severe than text-based PII exposure:

  • Biometric Data: Images contain the cardholder’s face, which can be used for facial recognition attacks.
  • Physical Card Replication: High-quality images of both front and back provide all information needed to create counterfeit ID cards.
  • KYC Bypass: Many financial services in Vietnam accept CCCD photos for KYC (Know Your Customer) verification - these images could be used to open fraudulent bank accounts or e-wallets.
  • Irreversibility: Unlike passwords or phone numbers, a CCCD number and its card images cannot be “changed” once exposed - the consequences are permanent.

9.3. Risk Scenarios

  1. Large-Scale Identity Theft: CCCD numbers + Images + Full names + Dates of birth = a complete identity dossier for 896 individuals.
  2. Financial Fraud: ID card photos can bypass KYC systems at banks, e-wallets (MoMo, ZaloPay, VNPay), and cryptocurrency exchanges.
  3. SIM Swap Attacks: Phone numbers + ID card photos enable SIM swap attacks with mobile carriers, leading to account takeovers.
  4. Deepfake Creation: Facial images extracted from CCCDs, combined with names and biographical information, enable AI-generated deepfake content for social engineering.
  5. Targeted Phishing: A complete dossier (Name, Email, Phone, Workplace, Exam history) enables highly sophisticated and convincing spear-phishing campaigns.
  6. Legal Violations & Penalties: Under Vietnam’s Decree 13/2023/ND-CP on Personal Data Protection, the exposure of citizen identity information and biometric images constitutes a serious violation subject to sanctions.

9.4. Severity Rating

Factor Assessment
Attack Complexity Low (Browser console + basic Python script)
Authentication Required None
Data Sensitivity Extremely Critical (National ID + photos)
Number of Affected Users 896+ confirmed, potentially thousands
Data Reversibility Irreversible (Cannot change a CCCD)
Exploitability Easy (No specialized tools required)
Vendor Scope All clients sharing the Connections platform
Overall Assessment Critical (CVSS 9.1+)

10. Technical Challenges and Solutions

10.1. Dynamic Content Rendering

The website renders all data client-side using JavaScript. Standard HTTP requests (such as curl or Python’s requests) only retrieve the empty HTML skeleton.

Solution: Used Playwright with a headless Chromium browser to execute the JavaScript framework, enabling both API calls and DOM extraction.

10.2. Vietnamese Source Code

The framework uses accented Vietnamese identifiers: xửLý, dữLiệu, thuộcTính, đốiTượng. While not intentional obfuscation, this requires Unicode-capable tools and makes pattern-matching significantly harder than standard JavaScript analysis.

10.3. Reference Field Resolution

The Ethnicity and Place of Birth fields store opaque reference IDs (e.g., {"ậ":["146992"]}) instead of text. These references are only resolved during page rendering by the framework’s internal logic calling the APIs directly (CĂN.db, config, thuộcTính.tải) always returns null for these IDs.

Solution: Render the full page for each candidate and extract the resolved text from the DOM (e.g., “4. Dân tộc: Kinh”). I cached these values to avoid redundant page renders.

10.4. Image URL Encoding

ID card photos are served from the image CDN with encoded URL paths that cannot be constructed from file IDs alone – the framework generates them at runtime.

Solution: Render each candidate’s page, extract URLs from CSS background-image attributes on <div> elements pointing to connections.vn, then download images via HTTP GET.

10.5. Parallel Processing and Thread Safety

Playwright’s synchronous API is not thread-safe across shared browser instances.

Solution: Each worker thread launches its own Playwright browser instance, with 2 separate pages per browser (one for API calls, one for rendering). Workers are staggered by 3 seconds to avoid thundering herd effects during session establishment.

10.6. Network Resilience

University X’s server frequently experiences very slow page loads (sometimes exceeding 30 seconds).

Solution: Used smart wait algorithms (polling DOM text instead of fixed timeouts), automatic session resets on failure, periodic CSV saving every 10 candidates, and Ctrl+C handling to save all collected data before interruption.

11. Recommendations

11.1. For University X (Immediate Remediation)

  1. Vendor Security Audit: Require Company Y to conduct a comprehensive security assessment of the Connections platform.
  2. Remove ID Card Photos: Delete all stored ID card images from the platform or migrate them to a storage system with strict access controls.
  3. Add robots.txt / noindex: Prevent search engines from indexing candidate detail pages; request Google to remove currently cached pages.
  4. Restrict PDF Access: Place candidate lists behind authentication or redact sensitive columns.
  5. Evaluate Alternative Platforms: Consider migrating to a platform with proper access control capabilities.

11.2. For Company Y / Connections Platform (Urgent Remediation)

  1. Implement API Authentication: All XHR endpoints must require a valid session token with role-based access control.
  2. Add Field-Level Access Control: Sensitive fields (CCCD, Phone, File references) must be restricted to admin-level authenticated sessions only.
  3. Secure Image CDN: Image URLs must require authentication tokens that are validated server-side, not relying solely on path encoding.
  4. Server-Side Rendering for Sensitive Data: Move PII display logic to server-side processing; never send raw sensitive data to public-facing web page contexts.
  5. Rate Limiting and Anomaly Detection: Implement IP-based query rate limits and alerting for bulk download patterns.
  6. Platform-Wide Security Audit: The same vulnerabilities very likely affect all organizations using the Connections platform.

11.3. Long-Term Plan

  1. Compliance Review: Assess system compliance with Vietnam’s Decree 13/2023/ND-CP on Personal Data Protection.
  2. Penetration Testing Program: Establish a regular security testing schedule for the system.
  3. Data Minimization: Reconsider whether it is necessary to retain citizen ID card photos after the initial identity verification process is complete.
  4. Incident Response: Notify affected candidates about the data exposure in accordance with legal requirements.

12. Conclusion

What started as a routine Google search ended with a disturbing discovery: an entire university’s exam candidates had their most sensitive personal data – including photos of their government-issued identity cards – exposed to the open internet. 896 people registered for a language proficiency exam, trusting that their information would be handled responsibly. Instead, their complete identity dossiers were accessible to anyone with a web browser.

The root cause is not a single bug or misconfiguration. It is a fundamental architectural failure: the software vendor built a system where the database has no lock on the door. The Connections platform treats every visitor, whether a student checking exam results or a stranger on the internet, as having full access to every record, every field, and every uploaded file. There is no authentication layer, no access control, no distinction between public and private data.

These findings deliver 3 critical lessons:

  1. Third-party vendor risk is real: Outsourcing your software does not outsource your responsibility. Any organization handing sensitive data to a SaaS platform must audit that platform’s security architecture, not just evaluate its features.
  2. Client-side security is not security: If the only thing standing between an attacker and your database is JavaScript running in their own browser, you have no security at all. Access control must be enforced on the server.
  3. Security through obscurity always fails: Encoded URLs, minified code, and hash-based IDs may slow down an attacker by minutes, but they can never substitute for real authentication.

For 896 candidates, the damage is done. Their CCCD numbers, their identity card photos, their personal information – it is data that can never be taken back.

13. Revalidation Update – June 23, 2026

On June 23, 2026, I retested all attack vectors on tec.universityx.vn. Company Y had updated the codebase that same day (version 14484523062026). The results show meaningful progress but incomplete remediation.

What changed

The most important fix is at the API gateway level: the XHR endpoints now enforce server-side authentication. Both connections.universityx.vn/xhr/ and xhr.companyy.com/xhr/ return HTTP 403 with {"error":403,"code":"access_denied"} for unauthenticated POST requests. This is the first time server-side access control has been observed on this platform.

Additionally:

  • Account table (taiKhoan) access is blocked. The account-level IDOR documented in this report no longer works; all tested account IDs return null.
  • The b6x cipher has been removed. The monoalphabetic substitution layer that was added as a “fix” after my initial disclosure is gone entirely. Remaining data is returned in plaintext rather than wrapped in a broken cipher.

What remains vulnerable

Unlike the alumni platform documented in Part 2, the exam system still has gaps:

Candidate data partially exposed. At least one candidate record (#1662402) still returns data via the API. The b6x wrapping is gone, but four fields remain in the raw response:

Field Value Status
Date of birth 10/12/1998 STILL EXPOSED
Gender 2 STILL EXPOSED
CCCD front image reference {"i":["4296"]} STILL EXPOSED
CCCD back image reference {"i":["243"]} STILL EXPOSED
Full name REMOVED
Phone number REMOVED
Email REMOVED
CCCD number REMOVED

The most sensitive text fields (name, phone, email, CCCD number) have been scrubbed. But date of birth and image reference IDs persist.

Bulk ID enumeration still works. The mass-load endpoint returns 50,403 candidate IDs. While most individual records appear empty or deleted, the enumeration itself should not be possible for unauthenticated users.

CDN image access has regressed. The image CDN nodes at i0.connections.vn and i3.connections.vn are responding with HTTP 200 again. These were marked as fixed in the March 10 revalidation, meaning this is a regression, not a lingering issue. If image reference IDs from candidate records can still be resolved to CDN URLs, the ID card photos documented in this report may once again be downloadable.

Scorecard

Platform Tests Fixed Vulnerable Score
tec.universityx.vn (this report) 13 6 5 46% fixed
connections.universityx.vn (Part 2) 13 11 2 85% fixed
Previous (March 10, both) 10 3 7 30% fixed

Assessment

The vendor has made real progress. The XHR authentication gate is the correct architectural fix, and the removal of the b6x cipher in favor of actually scrubbing sensitive fields is a better approach than obfuscation. The pattern is moving in the right direction.

But for this platform specifically, the job is not done. The CDN regression is concerning because it reverses a previously confirmed fix. The persistent candidate data, even partial, combined with image reference IDs, means the core finding of this report (ID card photo exposure) may not be fully resolved. And 50,403 enumerable candidate IDs represent a larger dataset than the 896 candidates I documented here, suggesting the exposure may have been broader than initially assessed.

The next step should be verifying whether the exposed image reference IDs can still be resolved to downloadable photos on the CDN. If they can, the most critical finding in this report remains exploitable despite four months of remediation efforts.

Appendix

A. Tools Used

Tool Version Purpose
Python 3.12 Primary scripting runtime
requests Latest HTTP protocol for downloading images and PDFs
pdfplumber Latest Extracting data tables from PDFs
Playwright Latest Headless browser (JS execution + DOM scraping)
Chromium (Bundled) Browser engine for rendering pages

B. Disclosure Timeline

Date / Time Event
2026-02-25 13:00 Vulnerability discovered via Google search indexing
2026-02-25 15:30 Framework reverse engineering complete; Database schema mapped
2026-02-25 17:00 API exploitation confirmed; CCCD data accessed
2026-02-25 19:30 Image CDN analyzed successfully; ID card photos downloaded
2026-02-25 20:00 Automation tool development started
2026-02-25 21:00 Phase 1 complete: 896 SBDs extracted from 32 PDFs
2026-02-26 01:30 Phase 2 started: Parallel API crawl (3 workers)
2026-02-26 04:30 Phase 2 complete: 896/896 records processed successfully
2026-02-26 09:00 Data merge complete: 2,449 images, 223 MB total
2026-02-26 13:00 Technical report finalized
2026-03-24 Report published
2026-06-13 Part 2 published
2026-06-23 Revalidation: 46% fixed on tec.universityx.vn, 85% on connections.universityx.vn

C. Glossary

Term Definition
CCCD Căn cước công dân - Citizen Identity Card (new format)
CMND Chứng minh nhân dân - People’s Identity Card (old format)
University X Pseudonym for the affected university
SBD Số báo danh - Exam Seat Number
VSTEP Vietnamese Standardized Test of English Proficiency
IDOR Insecure Direct Object Reference
SaaS Software as a Service
CDN Content Delivery Network (static files, images)
PII Personally Identifiable Information
KYC Know Your Customer (bank/wallet verification process)
Company Y Pseudonym for the company that provides and operates the Connections platform

D. Sample Data Records

Below are two representative records extracted from the dataset, with actual personally identifiable information redacted:

// Sample Record 1 (Redacted)
{
  "sbd": "AN1***",
  "ho_ten": "[REDACTED]",
  "ngay_sinh": "28/09/2000",
  "gioi_tinh": "Nu",
  "cccd": "022XXXXXXXXX",
  "sdt": "037XXXXXXX",
  "email": "[redacted]@gmail.com",
  "don_vi": "",
  "dan_toc": "Kinh",
  "noi_sinh": "Tinh Bac Ninh",
  "img_front": "output/images/AN1***_front.jpg",
  "img_back": "output/images/AN1***_back.jpg"
}
// Sample Record 2 (Redacted)
{
  "sbd": "TQ1***",
  "ho_ten": "[REDACTED]",
  "ngay_sinh": "03/07/2000",
  "gioi_tinh": "Nu",
  "cccd": "180XXXXXXXXX",
  "sdt": "036XXXXXXX",
  "email": "180XXXXXXXX@s.universityx.edu.vn",
  "don_vi": "",
  "dan_toc": "Kinh",
  "noi_sinh": "",
  "img_front": "output/images/TQ1***_front.jpg",
  "img_back": "output/images/TQ1***_back.jpg"
}

Notable observations:

  • CCCD numbers are stored as plain text (with a leading apostrophe only for CSV formatting purposes).
  • Image files are standard JPEG photographs of physical identity cards.
  • The don_vi (workplace) field has a very low fill rate (15.1%), indicating most candidates are students who have not yet entered the workforce.

E. Reproduction Steps

  1. Step 1 Download publicly listed candidate PDFs from the static file CDN and parse the candidate tables to extract exam registration numbers (SBDs).
  2. Step 2 For each SBD, query the unauthenticated JavaScript API endpoints to retrieve full candidate records including CCCD numbers and personal details.
  3. Step 3 Resolve image reference fields from API responses into CDN image URLs, then download the associated ID card photos.
  4. Step 4 Merge PDF-extracted data and API-extracted data using SBD as the primary key to produce a complete dataset.

Note: Detailed reproduction code and tooling have been withheld from this public report to prevent misuse. Full technical details were shared with the affected parties during responsible disclosure.