How 896 ID Cards Were Exposed Through a Vietnamese University's Exam Portal
Update (June 23, 2026): Partially remediated
Company Y has added server-side authentication on the XHR API gateway and blocked account-level IDOR. However, partial candidate data remains accessible, bulk ID enumeration still works, and CDN image access has regressed. See revalidation details below.
CVSS 3.1: 9.1 (CRITICAL)
Vulnerability allows unauthenticated extraction of identity and biometric data.
Summary
University X is one of Vietnam’s well-known public universities, with roughly 40,000 students across multiple faculties. Its Testing Center administers standardized language proficiency exams for hundreds of candidates each cycle. To register, candidates must submit deeply personal information: full legal name, date of birth, phone number, email, national ID (CCCD) number, and critically, high-resolution photos of the front and back of their government-issued identity cards. This data, if exposed, represents a complete identity dossier that cannot be “reset” like a password.
The discovery began with a routine Google search. A query for exam-related keywords returned a direct link to the Testing Center’s website, and clicking through revealed a candidate’s complete registration profile including their CCCD number and photos of their physical identity card publicly displayed on an open webpage. No login was required. No special tools. Just a Google search and a click.
Through reverse engineering the platform’s JavaScript framework, I confirmed that this was not an isolated incident. The entire system, built by a third-party vendor called Company Y on their “Connections” SaaS platform, had zero access controls separating public visitors from the database. All 896 candidates’ personal data including CCCD/CMND numbers, ethnicity, place of birth, and front and back photos of their national ID cards could be systematically extracted by anyone with a web browser.
1. Introduction
1.1. Background
University X is one of Vietnam’s well-known public universities, home to roughly 40,000 students across multiple faculties. It is particularly recognized for its foreign language and international studies programs. The university’s Testing Center operates a web system at tec.universityx.vn to administer standardized language proficiency exams such as VSTEP (Vietnamese Standardized Test of English Proficiency) for hundreds of candidates each cycle. The system handles the entire exam lifecycle: registration, room assignment, candidate list publication, and result announcements.
During registration, candidates must submit deeply personal information: full legal name, date of birth, phone number, email address, national ID (CCCD) number, ethnicity, and critically, high-resolution photos of the front and back of their government-issued identity cards. This data, if exposed, represents a complete identity dossier for each individual – one that, unlike a compromised password, can never be changed or revoked.
1.2. Third-Party Platform: Company Y
Like many Vietnamese institutions, University X does not build or maintain its own software. Instead, the entire exam management system including the database, API (Application Programming Interface) layer, file storage, and CDN (Content Delivery Network) runs on a SaaS platform called “Connections,” developed and operated by Company Y (companyy.com).
This means University X handed over complete control of their candidates’ most sensitive data to an external vendor. The university likely trusts that this platform has proper security measures in place. The question this report answers: does it?
The answer is no. The Connections platform provides no access control boundary whatsoever between a random internet visitor and the personal data stored in its database.
1.3. Motivation: Discovery via Google Search
It started with a simple Google search. Searching for exam-related keywords returned a direct link to the Testing Center’s website. Clicking through revealed a candidate’s complete registration form: full name, date of birth, CCCD number, and even photos of their physical identity card – all displayed on a public webpage.
No login was required. No special tools. Just a Google search and a click.
This raised the critical question: was this one candidate’s data accidentally exposed, or was every single candidate’s personal information wide open?
The answer, as this report demonstrates, is the latter.
Figure 1: Google search results revealing candidate registration data including full name, date of birth, and CCCD number – publicly indexed and accessible to anyone.
1.4. Scope and Ethics
The research was conducted strictly for security assessment purposes:
- All data access used publicly available endpoints.
- No authentication was bypassed - because none existed to bypass.
- No data was modified, deleted, or exfiltrated to third parties.
- No brute-force attacks were performed against protected resources.
- The techniques described replicate what any internet user with basic technical knowledge could perform through a web browser’s developer console.
1.5. Technical Attack Classification
The following MITRE ATT&CK techniques and OWASP categories are relevant to the methods used in this research:
| Technique | Framework | Application in Research |
|---|---|---|
| IDOR (Insecure Direct Object Reference) | OWASP Top 10 | Database object IDs used directly in unauthenticated API calls |
| Broken Access Control | OWASP Top 10 | No authentication on any API endpoint |
| API Abuse | OWASP API Top 10 | Unlimited access to search and data retrieval operations |
| Reconnaissance (T1592) | MITRE ATT&CK | Google dorking to discover indexed PII pages |
| Active Scanning (T1595) | MITRE ATT&CK | Probing API endpoints and database schema |
| Data from Info Repos (T1213) | MITRE ATT&CK | Extracting data from exposed database APIs |
| JS API Hijacking | Web Security | Invoking internal framework functions via page.evaluate() |
| Foreign Key Traversal | Database Security | Following foreign keys to discover hidden tables |
| CDN URL Harvesting | Web Security | Extracting encoded image URLs from rendered DOM |
2. Infrastructure Mapping: University X, Company Y, and Connections
Before diving into the vulnerability, it is important to understand who actually runs this system and how the pieces fit together. The Testing Center’s website is not what it appears to be on the surface.
2.1. Discovering the Vendor Relationship
The first step in the analysis was understanding who actually operates the infrastructure behind tec.universityx.vn. Inspecting the HTML source revealed that the web application loads its core JavaScript framework from external domains:
<script src="https://cdn.companyy.com/js/jquery.main.isj"></script>
<script src="https://cdn.companyy.com/js/include.core.isj"></script>
The domain companyy.com belongs to Company Y, a technology company in Vietnam. Further analysis revealed a sprawling multi-domain infrastructure:
| Domain | Role | Relationship to University X |
|---|---|---|
companyy.com |
Company domain | Platform vendor |
cdn.companyy.com |
JavaScript CDN | Delivers JS/CSS framework |
xhr.companyy.com |
API gateway | Handles all XHR (XMLHttpRequest - background data exchange API) calls to the database |
tts.companyy.vn |
Framework server | Hosts application configuration files |
connections.vn |
Platform brand | “Connections” SaaS platform |
connections.universityx.vn |
University X-specific API | Dedicated API endpoint for University X |
local.universityx.connections.vn |
File storage | Stores PDFs and uploaded files |
i0.connections.vn |
Image CDN (node 0) | Serves ID card photos |
i3.connections.vn |
Image CDN (node 3) | Serves ID card photos |
thuctap.companyy.com |
CSS server | Delivers CSS for sub-applications |
2.2. “Connections” Platform Architecture
The “Connections” platform appears to be a general-purpose SaaS framework similar to Salesforce or Airtable – providing:
- A database layer where tables are identified by 32-character hexadecimal hashes
- A client-side JavaScript framework with Vietnamese API functions (
xửLý,CĂN.db,config) - A file storage CDN at
local.{org}.connections.vn - An image CDN at
i{N}.connections.vnwith encoded URL parameters - A multi-tenant architecture where multiple organizations (University X, and potentially others) share the same infrastructure
The critical security implication: University X’s candidate data including ID card photos is stored on and served by Company Y’s shared infrastructure, not on servers controlled by University X.
2.3. Subdomain Structure and Data Flow
User's Browser
|
|-- (1) GET tec.universityx.vn/page
| |-- Loads HTML skeleton
|
|-- Loads JS from cdn.companyy.com
| |-- Loads CSS from thuctap.companyy.com
|
|-- (2) XHR to xhr.companyy.com/xhr/ (or connections.universityx.vn/xhr/)
|
|-- "doiTuong.tai.{table_hash}" = database query
| |-- Returns list of object IDs
|
|-- (3) XHR to xhr.companyy.com/xhr/
| |-- "CAN.db({table}.{id})" = load object
|
|-- Returns ALL fields including CCCD, phone number, etc.
|
|-- (4) GET local.universityx.connections.vn/upload/{cat}/{date}/{file}
|
|-- Downloads PDF files (candidate lists)
|
|-- (5) GET i0.connections.vn/{encoded_path}?q={encoded_token}
|-- Downloads ID card photos (front/back)
None of these requests require authentication.
3. Reverse Engineering the Database Structure
The next step was understanding how the database is organized. This involved reading the website’s JavaScript code, which turned out to be written entirely in Vietnamese, and discovering that database table identifiers and record IDs are passed around without any security checks.
3.1. Phase 1: URL Structure Analysis
The initial URL indexed by Google provided the first clue about the database structure:
https://tec.universityx.vn/7fdc5fa41f345xxxx4bba6b0d3e449385/1518250/2368M35018
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^ ^^^^^^^^^^
Table hash (Registration) Obj ID File number
This URL directly exposes:
- Registration table hash:
7fdc5fa41f345xxxx4bba6b0d3e449385 - Registration object ID:
1518250 - Candidate’s file number:
2368M35018
3.2. Phase 2: Reverse Engineering the JavaScript Framework
The framework source at cdn.companyy.com/js/include.core.isj is heavily minified and uses Vietnamese identifiers. Through runtime analysis (executing functions in the browser console and observing XHR traffic), I identified the core API functions:
| Function | Discovered Behavior |
|---|---|
xửLý(action, params, opts, cb) |
Main dispatcher. Sends XHR to xhr.companyy.com/xhr/. The action string determines the operation. |
CĂN.db(key, callback) |
Loads a database object by key (format: table_hash.object_id). Caches results in local memory. |
config(key) |
Retrieves a previously loaded object from local cache. Returns a JavaScript object with field IDs (numbers) as keys. |
dữLiệu |
Global object containing the page’s data context. |
3.3. Phase 3: Discovering Database Schema via IDOR
The key discovery was that the API uses Insecure Direct Object References (IDOR): database table hashes and object IDs are passed directly from client to server with no authorization checks. This allowed me to:
- Obtain the Registration table hash (visible in the URL)
- Query the Registration table using
xửLý("đốiTượng.tải.{hash}", ...)with arbitrary field filters - Load a Registration object and inspect all its fields – discovering field IDs, data types, and foreign key references
- Discover the Candidate table hash by following the foreign key in field
1686869(Candidate ID), which references objects in table3576ff3533bb4xxxx8e394a0aa83a461f - Load Candidate objects to access the full set of personal data fields
3.4. Phase 4: Field-by-Field Mapping
Each database object is returned as a dictionary with numeric string keys (field IDs). I mapped them by:
- Loading multiple candidate objects
- Cross-referencing field values with rendered page content
- Identifying field types (string, integer, date, reference, file)
Field Type System
The framework uses single Vietnamese characters to mark field types in stored values:
| Symbol | Storage Format | Meaning |
|---|---|---|
| (none) | Plain string/number | Direct value (name, phone, CCCD) |
{"ậ":["ID"]} |
JSON with reference ID | Reference to another object (ethnicity, province) |
{"ị":["ID"]} |
JSON with file ID | Reference to an uploaded file/image |
The "ậ" (reference) type stores a pointer to a lookup object for example, the ethnicity field 1626773 stores {"ậ":["146992"]}, which resolves to “Kinh” when the page renders. The "ị" (file) type stores a pointer to an uploaded file for example, field 1658487 stores {"ị":["4296"]}, which resolves to a CCCD photo on the image CDN.
Registration Table Field Map (Reverse Engineered)
| Field ID | Field Name | Type | Example |
|---|---|---|---|
| 1626725 | Exam Session ID | Foreign Key (FK) | 54914 |
| 1626730 | File Number | String | 2368M35018 |
| 1630529 | Status Code | Integer | 1 |
| 1630538 | Active Flag | Boolean | 1 |
| 1642331 | SBD (Exam Seat Number) | String | AN1001 |
| 1654914 | Registration Timestamp | Timestamp | 1726716820 |
| 1686869 | Candidate ID | Foreign Key (FK) | 582319 |
| 1704995 | Scores | JSON | [{“nghe”:”8.5”}] |
| mãĐịnhDanh | Identifier Code | String | V2KT2106AN1114 |
| tổngTiền | Exam Fee | Integer | 1800000 |
Candidate Table Field Map (Reverse Engineered)
| Field ID | Field Name | Type | Example / Notes |
|---|---|---|---|
| 1686868 | Full Name | String | Nguyễn Thị xxxx Trà |
| 1626768 | Last Name & Middle Name | String | Nguyễn Thị Linh |
| 1626772 | First Name | String | Trà |
| 1626773 | Ethnicity | Reference | {“ậ”:[“146992”]} → Kinh |
| 1626783 | Date of Birth | Date | 28/09/2000 |
| 1626784 | Gender | Enum | 1=Male, 2=Female |
| 1626788 | Phone Number | String | 037xxxx973 |
| 1626793 | String | email@gmail.com | |
| 1626818 | Place of Birth | Reference | {“ậ”:[“147000”]} → Bắc Kạn Province |
| 1626820 | Province/City (Current) | Reference | {“ậ”:[“146999”]} |
| 1646777 | CCCD/CMND Number | String | 0222xxxx2576 |
| 1658487 | CCCD Photo (Front) | File Reference | {“ị”:[“4296”]} |
| 1658488 | CCCD Photo (Back) | File Reference | {“ị”:[“243”]} |
| 2102859 | Workplace | String | Free text |
3.5. Phase 5: Foreign Key Traversal Diagram
The complete traversal path from a public URL to ID card photos:
+-------------------------------------------+
| Public URL |
| tec.universityx.vn/{hash}/{id}/{file} |
+---------------------+---------------------+
|
exposes table hash + object ID
|
v
+-------------------------------------------+
| Registration Object |
| Table: 7fdc5fa4... |
| Contains SBD, file number |
+---------------------+---------------------+
|
foreign key field 1686869
|
v
+-------------------------------------------+
| Candidate Object |
| Table: 3576ff35... |
| Contains ALL PII |
+----------+----------------+---------------+
| |
v v
+-------------------+ +----------------------+
| Personal Data | | ID Card Photos |
| CCCD, Phone, | | Front + Back |
| Email, Ethnicity, | | on i0.connections.vn |
| Place of Birth | | |
+-------------------+ +----------------------+
4. CDN Discovery: Image and File Infrastructure
Identity card photos are stored on a separate image server, not in the database itself. Understanding how these image URLs are generated was key to proving that photos could be downloaded in bulk by anyone.
4.1. Static File CDN: local.universityx.connections.vn
All uploaded documents (PDFs, candidate lists) are stored on a static file server with a predictable URL structure:
https://local.universityx.connections.vn/upload/{category_id}/{YYYY/MM/DD}/{uuid_filename}
File metadata including all URL components is embedded in the main page’s HTML as a cached JSON blob. The metadata uses minified Vietnamese field names:
| JSON Key | Meaning | Example |
|---|---|---|
"i" |
File ID | 534167 |
"ạ" |
Category ID | 25 |
"ô" |
Server hostname | local.universityx.connections.vn |
"ớ" |
Date path | 2024/09/19 |
"ũ" |
Original filename | Danh sach phong thi.pdf |
"ợ" |
Server filename (UUID) | f2247559bcac…03.pdf |
I discovered 32 candidate list PDFs by parsing this metadata and filtering for filenames containing “danh sach” or “phong thi.” I encountered a critical bug: JSON-escaped forward slashes in the date path (2024\/09\/19) caused HTTP 404 errors until I added path unescaping logic to the code.
4.2. Image CDN: i{N}.connections.vn
The most sensitive discovery was the image CDN infrastructure. ID card photos are served from a load-balanced CDN with nodes i0.connections.vn, i3.connections.vn, etc.
Encoded Image URLs
Unlike the static file CDN (which uses human-readable paths), the image CDN uses encoded/obfuscated paths and query parameters:
# CCCD front photo:
https://i0.connections.vn/kt3PG1hPTgTPG1MJ.u54.L8JKx-X.LDJ9Ei
tREHqGaAN9ZWP6gM46Wbb?q=wqQ2KLBn6g3dDaf29mOnD7stDafo9aAo...
# CCCD back photo:
https://i0.connections.vn/kt3PG1hPTgTPG1MJ.u54.L8JKx-X.LDJ9Ei
z9mONGaAN9ZWP6gM46Wbb?q=wqQ2KLBn6g3dDaf29mOnD7stDafo9aAo...
# Portrait photo:
https://i0.connections.vn/kt3PG1hPTgTPG1MJ.u54.L8JKx-X.LDJ9Ei
tREHdGaAN9ZWP6gM46Wbb?q=wqQ2KLBn6g3dDaf29mOnD7stDafo9aAo...
Key observations about URL structure:
- The URL path encodes the file reference - different images for the same candidate differ by a few characters in the path
- The
q=query parameter appears to be a session or authentication token, but is identical across all images in a single page load, suggesting it is a page-level token rather than per-image - These URLs cannot be guessed - they can only be obtained by rendering a candidate’s detail page in a browser (the JavaScript framework generates them at runtime)
- However, once the page is rendered, these URLs can be downloaded directly via simple HTTP GET without any additional cookies or headers
How Image URLs Are Generated
The JavaScript framework generates image CDN URLs during page rendering through the following process:
- Reads the file reference field (e.g.,
{"ị":["4296"]}) - Encodes the file ID, organization context, and a session token into the URL path and query string
- Assigns the URL as a CSS
background-imageattribute on a<div>element
This means image URLs cannot be constructed programmatically from file IDs alone – the JavaScript framework’s encoding algorithm must be executed in a browser environment. My tool solves this by rendering each candidate’s page in a headless browser and extracting background-image URLs from the DOM.
Image Download Verification
Downloaded images were verified as actual ID card photos:
- File sizes ranged from 44KB to 228KB (consistent with phone camera photos of ID cards)
- Valid JPEG image files
- Three images per candidate (typically): portrait photo, CCCD front, CCCD back
- Images contain readable text including the candidate’s name, CCCD number, date of birth, and address printed on the physical card
Figure 3: File explorer showing 2,600+ downloaded national ID card photos (portrait, front, and back) of exam candidates.
5. Complete Attack Chain
Here is the complete chain of steps, from the initial Google discovery to downloading all 896 candidates’ identity card photos. Each step builds on the previous one, and none of them require any form of authentication.
5.1. Overview
The attack chain combines multiple techniques to escalate from a single Google result to full PII extraction including ID card photos:
- Google Dorking (Reconnaissance)
- Source Code Analysis (Framework Identification)
- JavaScript API Reverse Engineering (Schema Discovery)
- IDOR Exploitation (Database Traversal)
- Foreign Key Traversal (Cross-Table Access)
- PDF Metadata Extraction (Seed Data Collection)
- Headless Browser API Injection (Bulk Extraction) – A headless browser is a web browser running in the background without a graphical interface, enabling automation (e.g., Playwright, Puppeteer)
- DOM Scraping (Reference Resolution + Image Harvesting)
- CDN Image Download (ID Card Photo Extraction)
5.2. Step 1: Google Dorking - Initial Discovery
A routine Google search containing a candidate’s name and University X-related keywords returned a direct link to the Testing Center website:
TRUNG TAM KHAO THI TRUONG DAI HOC X
https://tec.universityx.vn/7fdc5fa41f345xxxx4bba6b0d3e449385/1518250/2368M35018
"Phieu dang ky thi nang luc ngoai ngu..."
The rendered page displayed the full exam registration form including the CCCD number, date of birth, ethnicity, contact information, and ID card photos.
5.3. Step 2: Source Code Analysis - Identifying Company Y
Inspecting the page source revealed:
- JavaScript loaded from
cdn.companyy.com(Company Y) - CSS loaded from
thuctap.companyy.com - API calls to
xhr.companyy.comandconnections.universityx.vn - Framework configuration at
tts.companyy.vn/nguyendinhhuy - Vietnamese function names (
xửLý,CĂN.db,config)
5.4. Step 3: Reverse Engineering the JavaScript Framework API
Using the browser’s developer console, I probed the framework’s global scope:
> typeof xuLy // "function" main API handler
> typeof CAN.db // "function" database loader
> typeof config // "function" config retriever
> typeof duLieu // "object" page data context
> CAN // {fn, khoa, lib, js, db, _db}
By intercepting XHR traffic in the Network tab while loading a candidate page, I observed the request/response pattern:
POST https://xhr.companyy.com/xhr/
Request: {action: "doiTuong.tai.7fdc5fa4...", d: {thuocTinh: {...}}}
Response: ["1518250", "1518251", ...] // List of Object IDs
5.5. Step 4: IDOR + Foreign Key Traversal
With the API functions identified, I exploited IDOR to traverse the database:
// 1. Find Registration by SBD (exam seat number from PDF)
xuLy("doiTuong.tai.7fdc5fa41f345xxxx4bba6b0d3e449385",
{d: {thuocTinh: {"1642331": "AN1001"}}}, {}, function(ids) {
console.log(ids);
// ["1518250"]
});
// 2. Load Registration object -> discover Candidate table
CAN.db("7fdc5fa41f345xxxx4bba6b0d3e449385.1518250", function() {
var reg = config("7fdc5fa41f345xxxx4bba6b0d3e449385.1518250");
console.log(reg["1686869"]); // "582319" (Candidate ID)
// Candidate table hash discovered from foreign key relationship
});
// 3. Load Candidate object -> access ALL personal data
CAN.db("3576ff3533bb4xxxx8e394a0aa83a461f.582319", function() {
var c = config("3576ff3533bb4xxxx8e394a0aa83a461f.582319");
console.log(c["1646777"]); // "0222xxxx2576" (CCCD number)
console.log(c["1626788"]); // "098xxxx321" (Phone number)
console.log(c["1626793"]); // "email@example.com"
console.log(c["1626773"]); // {"ậ":["146992"]} (Ethnicity reference)
console.log(c["1658487"]); // {"ị":["4296"]} (CCCD front photo)
console.log(c["1658488"]); // {"ị":["243"]} (CCCD back photo)
});
5.6. Step 5: Seed Data Extraction from PDFs
To enumerate all candidates, I extracted the list of SBD (exam seat numbers) from 32 publicly accessible PDF files. PDF metadata was embedded in the website’s HTML as a cached JSON object with abbreviated Vietnamese field names. After unescaping JSON-encoded paths, all PDFs were downloadable from the static file CDN at local.universityx.connections.vn. Result: 896 unique SBDs extracted from 32 PDFs.
5.7. Step 6: Automated Bulk Extraction
I developed a Python tool (crawl_xxxx.py) that automates the entire chain using Playwright (headless Chromium browser):
# Each worker runs 2 browser pages:
# api_page: stays on /dangkythi for fast JS API calls
# render_page: navigates to each candidate's detail page
def lookup_sbd(api_page, render_page, sbd):
# Fast API lookup (~2 seconds)
reg_ids = _api_search(api_page, REG_TABLE, {F_SBD: sbd})
reg_data = _api_load_object(api_page, REG_TABLE, reg_ids[0])
cand_data = _api_load_object(api_page, CAND_TABLE, reg_data[F_CAND_ID])
# -> CCCD, phone, email, workplace now retrieved
# Render page for references + images (~20 seconds)
render_page.goto(f"tec.universityx.vn/{REG_TABLE}/{reg_id}/{file_num}")
# Smart wait: wait until "Dan toc" text appears
# -> Extract ethnicity, place of birth from DOM text
# -> Extract image URLs from CSS background-image attributes
# -> Download ID card photos from connections.vn CDN
Figure 2: Spreadsheet of extracted candidate data including names, registration numbers, CCCD numbers, dates of birth, emails, phone numbers, ethnicity, and place of birth demonstrating the scale of the data exposure.
5.8. Step 7: Parallel Execution with Adaptive Throttling
The tool supports 3 parallel workers by default, each with its own Playwright browser instance (for thread safety). If the server starts rejecting requests (3 consecutive failures), the tool automatically falls back to a single worker:
Workers: 3 (Default)
|
+-- Worker 1: Browser + 2 pages (api + render)
+-- Worker 2: Browser + 2 pages (api + render)
+-- Worker 3: Browser + 2 pages (api + render)
|
[3 consecutive failures on any worker]
|
v
Workers: 1 (Fallback)
+-- Worker 1: Browser + 2 pages (api + render)
5.9. Step 8: Graceful Error Handling
The tool implements multiple robust recovery mechanisms:
- Ctrl+C Handling: When interrupted, the tool immediately saves all collected data to CSV before exiting.
- Network Error Recovery: On timeout, the tool resets the browser session and continues.
- Periodic Saving: Flushes data to CSV every 10 candidates to minimize data loss.
- Resume Support: On restart, the tool reads the existing CSV file and skips already-processed SBDs.
- Reference Caching: Ethnicity and Place of Birth lookups are cached (there are only ~54 ethnic groups and ~63 provinces in Vietnam), so page rendering becomes unnecessary for previously encountered reference IDs.
6. Results: Extracted Data
6.1. Data Volume
| Metric | Value |
|---|---|
| Total candidates (from PDFs) | 896 |
| Candidates with resolved CCCD | 896 (100%) |
| Candidates with ethnicity info | 896 (100%) |
| Candidates with place of birth | 896 (100%) |
| ID card photos downloaded | 2,600+ (portrait + front + back) |
| Success rate | 100% |
| Workers used | 3 (parallel) |
| Processing speed | ~5 candidates/minute (including page rendering) |
6.2. Extracted Data Fields per Candidate
| Field | Source | Sensitivity Level | Example |
|---|---|---|---|
| SBD (Exam Seat Number) | Low | AN1001 | |
| Full Name | API | Medium | Nguyễn xxx xxxx Trà |
| Date of Birth | API + PDF | Medium | 28/09/2000 |
| Gender | API + PDF | Low | Female |
| CCCD/CMND Number | API | Critical | 0222xxxx2576 |
| Phone Number | API | High | 037xxxx973 |
| API | High | email@gmail.com | |
| Workplace | API | Medium | Free text |
| Ethnicity | Rendered DOM | High | Kinh |
| Place of Birth | Rendered DOM | Medium | Bắc Kạn Province |
| ID Card Photo (Front) | CDN | Critical | JPEG, 44–228 KB |
| ID Card Photo (Back) | CDN | Critical | JPEG, 44–228 KB |
| Exam Scores | API + PDF | Medium | 8.5/6.0/5.5/7.0 |
| File Number | API | Low | 2368M35018 |
6.3. Output Directory Structure
output/
candidates_phase1.csv # 896 candidates (SBD, name, DOB, gender, scores)
candidates_phase2.csv # 896 candidates (+ CCCD, phone, email, ethnicity,
# place of birth, image paths)
candidates_full.csv # Final merged complete dataset
images/
AN1001_front.jpg # ID card front photo
AN1001_back.jpg # ID card back photo
AN1001_extra.jpg # Portrait/additional candidate photo
AN1002_front.jpg
... # ~2,600 images total
pdfs/ # 32 original candidate list PDFs
file_index.json # PDF file index metadata
7. Detailed Data Analysis
This section presents statistical analysis of the 896 candidate records extracted from candidates_phase2.csv. All statistics were computed programmatically from the raw data.
7.1. Demographics
- Total candidates: 896 unique individuals
- Gender: 661 Female (73.8%), 235 Male (26.2%)
- Age range (birth years): 1969–2005 (37-year span)
- Median birth year: ≈ 2000 (most common: 2000 with 234 candidates, followed by 1998 with 117 and 1999 with 102)
- The 3:1 female-to-male ratio and concentration around 1998–2002 birth years are entirely consistent with the profile of foreign language proficiency exam candidates at a university specializing in foreign languages.
7.2. Data Completeness
| Field | Populated | Empty | Fill Rate |
|---|---|---|---|
| SBD (Exam Seat Number) | 896 | 0 | 100.0% |
| Full Name | 896 | 0 | 100.0% |
| Date of Birth | 896 | 0 | 100.0% |
| Gender | 896 | 0 | 100.0% |
| CCCD/CMND | 896 | 0 | 100.0% |
| Phone Number | 896 | 0 | 100.0% |
| 896 | 0 | 100.0% | |
| Ethnicity | 818 | 78 | 91.3% |
| ID Card Photo (Front) | 820 | 76 | 91.5% |
| ID Card Photo (Back) | 816 | 80 | 91.1% |
| Place of Birth/Province | 576 | 320 | 64.3% |
| Workplace | 135 | 761 | 15.1% |
The 7 core PII fields (name, date of birth, gender, CCCD, phone number, email, file number) all have a 100% fill rate. The low workplace fill rate (15.1%) suggests most candidates are students who left this non-required field blank.
7.3. Email Domain Analysis
| Domain | Count | % |
|---|---|---|
gmail.com |
766 | 85.5% |
s.universityx.edu.vn |
87 | 9.7% |
universityx.edu.vn |
22 | 2.5% |
yahoo.com / yahoo.com.vn |
4 | 0.4% |
gmail.con (typo) |
3 | 0.3% |
Other (.edu.vn, .gov.vn) |
14 | 1.6% |
The 3 instances of the gmail.con typo and the 87 students using their student ID as a prefix for University X email ({student_id}@s.universityx.edu.vn) are noteworthy: the typos confirm this is real user-entered data, while the latter pattern inadvertently creates a secondary channel exposing student ID numbers.
7.4. CCCD/CMND Format Analysis
Vietnam has issued identification documents in three formats, all of which appear in this dataset:
| Format | Count | % | Description |
|---|---|---|---|
| 12-digit CCCD | 474 | 52.9% | New Citizen Identity Card (post-2021) |
| 10-digit CMND | 271 | 30.2% | Old People’s Identity Card (10-digit) |
| 9-digit CMND | 135 | 15.1% | Old People’s Identity Card (9-digit) |
| Other lengths | 16 | 1.8% | Student IDs/Passport numbers/Data entry errors |
6 duplicate CCCD numbers were found (each appearing in 2 registrations), indicating candidates who registered for exams multiple times. This confirms these are real registration records spanning from September 2024 to February 2026.
7.5. Geographic Distribution
The first 3 digits of a 12-digit CCCD encode the province of issuance. Among the 474 new-format CCCDs:
| Code | Province/City | Count | % |
|---|---|---|---|
| 001 | Hanoi | 144 | 30.4% |
| 036 | Nam Dinh | 38 | 8.0% |
| 038 | Thanh Hoa | 36 | 7.6% |
| 034 | Thai Binh | 31 | 6.5% |
| 030 | Hai Duong | 25 | 5.3% |
| 024 | Bac Giang | 23 | 4.9% |
| 035 | Ha Nam | 16 | 3.4% |
| 033 | Hung Yen | 16 | 3.4% |
| 027 | Bac Ninh | 15 | 3.2% |
| 037 | Ninh Binh | 15 | 3.2% |
The distribution is heavily concentrated in the Red River Delta region in northern Vietnam, consistent with University X’s location in Hanoi. Hanoi-based candidates alone account for 30.4% of new-format CCCD holders.
7.6. Image Repository Statistics
| Image Type | Count | Description |
|---|---|---|
*_front.jpg |
820 | CCCD/CMND front photo |
*_back.jpg |
816 | CCCD/CMND back photo |
*_extra.jpg |
813 | Candidate portrait photo |
| Total | 2,449 | 223 MB on disk |
The approximately 76–80 candidates missing images most likely registered before the mandatory ID card photo upload requirement was implemented, or uploaded documents in non-standard formats that the DOM scraper could not extract.
7.7. PDF Source Analysis
| Time Period | PDF Count | Exam Type |
|---|---|---|
| September 2024 | 4 | C1 English, Chinese, Japanese, Korean |
| March 2025 | 4 | C1 English, Chinese, Japanese, Korean |
| May 2025 | 1 | University X Test |
| June 2025 | 6 | NN2 SĐH (Second Foreign Language for Graduate Studies) |
| October 2025 | 4 | English, Chinese, Japanese, Korean |
| January 2026 | 9 | Morning/Afternoon sessions (multiple days) |
| February 2026 | 4 | Morning/Afternoon sessions |
| Total | 32 | Period: Sep 2024 – Feb 2026 |
8. Root Cause Analysis
8.1. Architectural Flaws in the Connections Platform
The data exposure stems from fundamental architectural design decisions in Company Y’s Connections platform:
-
No API Authentication Layer: The XHR API endpoints at
xhr.companyy.comandconnections.universityx.vnaccept requests from any JavaScript execution context. There are no tokens, session cookies, or API key checks. -
No Field-Level Access Control: The API returns all fields for any requested object. A user accessing a public page to view the exam schedule receives the exact same data as an administrator viewing CCCD numbers and ID card photos.
-
IDOR by Design: Database object IDs are used directly in URLs and API calls. Table hashes serve as identifiers that provide no security they are plainly visible in URLs and easily discoverable through foreign key traversal.
-
Client-Side-Only Security: All business logic and data filtering occurs in browser-side JavaScript. The server acts as a transparent data store, enforcing no access controls whatsoever.
-
No CDN Controls: Both the static file CDN and image CDN serve content without authentication. Once a URL is known (or extracted from a rendered page), any HTTP client can download the file.
-
No Rate Limiting: The API accepts hundreds of sequential queries from a single client without throttling, enabling easy bulk data extraction.
8.2. Third-Party Vendor Risk
University X has entrusted sensitive candidate data including photos of government-issued CCCD/CMND cards to Company Y’s Connections platform. This creates a supply chain vulnerability:
- University X may be entirely unaware that the platform has zero access controls.
- The same architectural flaws likely affect all organizations using the Connections platform, not just University X.
- University X is limited in its ability to implement security controls on infrastructure it does not operate.
- The vendor relationship means that remediating this vulnerability requires Company Y to redesign their platform architecture.
8.3. Why Image CDN Encoding Is Not Security
The image CDN uses encoded URL paths (e.g., kt3PG1hPTgTPG1MJ.u54.L8J...), which superficially appears to provide security. However:
- The encoding is performed by client-side JavaScript, which users fully control.
- Encoded URLs are embedded directly as CSS
background-imageattributes in the DOM, making them trivial to extract. - Once extracted, these URLs require no cookies, tokens, or headers to download the images.
- Any headless browser can render a candidate’s page and automatically harvest all image URLs.
9. Impact Assessment
The data exposed in this vulnerability is not just abstract “PII.” For 896 real people, it represents everything needed to steal their identity. In Vietnam, CCCD photos are widely used for KYC (Know Your Customer) verification at banks, e-wallets like MoMo and ZaloPay, and telecom providers. A leaked CCCD photo is essentially a master key to someone’s financial life. What follows is an assessment of what this exposure means for the real students and professionals whose data was left wide open.
9.1. Affected Individuals
- 896 unique candidates confirmed from exams across 2024–2026.
- The database very likely contains candidates from all historical exam sessions, potentially numbering in the thousands.
- All candidates have their full PII profiles and ID card photos accessible without authentication.
9.2. Severity: ID Card Photo Exposure
The exposure of CCCD/CMND card photos is far more severe than text-based PII exposure:
- Biometric Data: Images contain the cardholder’s face, which can be used for facial recognition attacks.
- Physical Card Replication: High-quality images of both front and back provide all information needed to create counterfeit ID cards.
- KYC Bypass: Many financial services in Vietnam accept CCCD photos for KYC (Know Your Customer) verification - these images could be used to open fraudulent bank accounts or e-wallets.
- Irreversibility: Unlike passwords or phone numbers, a CCCD number and its card images cannot be “changed” once exposed - the consequences are permanent.
9.3. Risk Scenarios
- Large-Scale Identity Theft: CCCD numbers + Images + Full names + Dates of birth = a complete identity dossier for 896 individuals.
- Financial Fraud: ID card photos can bypass KYC systems at banks, e-wallets (MoMo, ZaloPay, VNPay), and cryptocurrency exchanges.
- SIM Swap Attacks: Phone numbers + ID card photos enable SIM swap attacks with mobile carriers, leading to account takeovers.
- Deepfake Creation: Facial images extracted from CCCDs, combined with names and biographical information, enable AI-generated deepfake content for social engineering.
- Targeted Phishing: A complete dossier (Name, Email, Phone, Workplace, Exam history) enables highly sophisticated and convincing spear-phishing campaigns.
- Legal Violations & Penalties: Under Vietnam’s Decree 13/2023/ND-CP on Personal Data Protection, the exposure of citizen identity information and biometric images constitutes a serious violation subject to sanctions.
9.4. Severity Rating
| Factor | Assessment |
|---|---|
| Attack Complexity | Low (Browser console + basic Python script) |
| Authentication Required | None |
| Data Sensitivity | Extremely Critical (National ID + photos) |
| Number of Affected Users | 896+ confirmed, potentially thousands |
| Data Reversibility | Irreversible (Cannot change a CCCD) |
| Exploitability | Easy (No specialized tools required) |
| Vendor Scope | All clients sharing the Connections platform |
| Overall Assessment | Critical (CVSS 9.1+) |
10. Technical Challenges and Solutions
10.1. Dynamic Content Rendering
The website renders all data client-side using JavaScript. Standard HTTP requests (such as curl or Python’s requests) only retrieve the empty HTML skeleton.
Solution: Used Playwright with a headless Chromium browser to execute the JavaScript framework, enabling both API calls and DOM extraction.
10.2. Vietnamese Source Code
The framework uses accented Vietnamese identifiers: xửLý, dữLiệu, thuộcTính, đốiTượng. While not intentional obfuscation, this requires Unicode-capable tools and makes pattern-matching significantly harder than standard JavaScript analysis.
10.3. Reference Field Resolution
The Ethnicity and Place of Birth fields store opaque reference IDs (e.g., {"ậ":["146992"]}) instead of text. These references are only resolved during page rendering by the framework’s internal logic calling the APIs directly (CĂN.db, config, thuộcTính.tải) always returns null for these IDs.
Solution: Render the full page for each candidate and extract the resolved text from the DOM (e.g., “4. Dân tộc: Kinh”). I cached these values to avoid redundant page renders.
10.4. Image URL Encoding
ID card photos are served from the image CDN with encoded URL paths that cannot be constructed from file IDs alone – the framework generates them at runtime.
Solution: Render each candidate’s page, extract URLs from CSS background-image attributes on <div> elements pointing to connections.vn, then download images via HTTP GET.
10.5. Parallel Processing and Thread Safety
Playwright’s synchronous API is not thread-safe across shared browser instances.
Solution: Each worker thread launches its own Playwright browser instance, with 2 separate pages per browser (one for API calls, one for rendering). Workers are staggered by 3 seconds to avoid thundering herd effects during session establishment.
10.6. Network Resilience
University X’s server frequently experiences very slow page loads (sometimes exceeding 30 seconds).
Solution: Used smart wait algorithms (polling DOM text instead of fixed timeouts), automatic session resets on failure, periodic CSV saving every 10 candidates, and Ctrl+C handling to save all collected data before interruption.
11. Recommendations
11.1. For University X (Immediate Remediation)
- Vendor Security Audit: Require Company Y to conduct a comprehensive security assessment of the Connections platform.
- Remove ID Card Photos: Delete all stored ID card images from the platform or migrate them to a storage system with strict access controls.
- Add robots.txt / noindex: Prevent search engines from indexing candidate detail pages; request Google to remove currently cached pages.
- Restrict PDF Access: Place candidate lists behind authentication or redact sensitive columns.
- Evaluate Alternative Platforms: Consider migrating to a platform with proper access control capabilities.
11.2. For Company Y / Connections Platform (Urgent Remediation)
- Implement API Authentication: All XHR endpoints must require a valid session token with role-based access control.
- Add Field-Level Access Control: Sensitive fields (CCCD, Phone, File references) must be restricted to admin-level authenticated sessions only.
- Secure Image CDN: Image URLs must require authentication tokens that are validated server-side, not relying solely on path encoding.
- Server-Side Rendering for Sensitive Data: Move PII display logic to server-side processing; never send raw sensitive data to public-facing web page contexts.
- Rate Limiting and Anomaly Detection: Implement IP-based query rate limits and alerting for bulk download patterns.
- Platform-Wide Security Audit: The same vulnerabilities very likely affect all organizations using the Connections platform.
11.3. Long-Term Plan
- Compliance Review: Assess system compliance with Vietnam’s Decree 13/2023/ND-CP on Personal Data Protection.
- Penetration Testing Program: Establish a regular security testing schedule for the system.
- Data Minimization: Reconsider whether it is necessary to retain citizen ID card photos after the initial identity verification process is complete.
- Incident Response: Notify affected candidates about the data exposure in accordance with legal requirements.
12. Conclusion
What started as a routine Google search ended with a disturbing discovery: an entire university’s exam candidates had their most sensitive personal data – including photos of their government-issued identity cards – exposed to the open internet. 896 people registered for a language proficiency exam, trusting that their information would be handled responsibly. Instead, their complete identity dossiers were accessible to anyone with a web browser.
The root cause is not a single bug or misconfiguration. It is a fundamental architectural failure: the software vendor built a system where the database has no lock on the door. The Connections platform treats every visitor, whether a student checking exam results or a stranger on the internet, as having full access to every record, every field, and every uploaded file. There is no authentication layer, no access control, no distinction between public and private data.
These findings deliver 3 critical lessons:
- Third-party vendor risk is real: Outsourcing your software does not outsource your responsibility. Any organization handing sensitive data to a SaaS platform must audit that platform’s security architecture, not just evaluate its features.
- Client-side security is not security: If the only thing standing between an attacker and your database is JavaScript running in their own browser, you have no security at all. Access control must be enforced on the server.
- Security through obscurity always fails: Encoded URLs, minified code, and hash-based IDs may slow down an attacker by minutes, but they can never substitute for real authentication.
For 896 candidates, the damage is done. Their CCCD numbers, their identity card photos, their personal information – it is data that can never be taken back.
13. Revalidation Update – June 23, 2026
On June 23, 2026, I retested all attack vectors on tec.universityx.vn. Company Y had updated the codebase that same day (version 14484523062026). The results show meaningful progress but incomplete remediation.
What changed
The most important fix is at the API gateway level: the XHR endpoints now enforce server-side authentication. Both connections.universityx.vn/xhr/ and xhr.companyy.com/xhr/ return HTTP 403 with {"error":403,"code":"access_denied"} for unauthenticated POST requests. This is the first time server-side access control has been observed on this platform.
Additionally:
- Account table (taiKhoan) access is blocked. The account-level IDOR documented in this report no longer works; all tested account IDs return null.
- The b6x cipher has been removed. The monoalphabetic substitution layer that was added as a “fix” after my initial disclosure is gone entirely. Remaining data is returned in plaintext rather than wrapped in a broken cipher.
What remains vulnerable
Unlike the alumni platform documented in Part 2, the exam system still has gaps:
Candidate data partially exposed. At least one candidate record (#1662402) still returns data via the API. The b6x wrapping is gone, but four fields remain in the raw response:
| Field | Value | Status |
|---|---|---|
| Date of birth | 10/12/1998 |
STILL EXPOSED |
| Gender | 2 |
STILL EXPOSED |
| CCCD front image reference | {"i":["4296"]} |
STILL EXPOSED |
| CCCD back image reference | {"i":["243"]} |
STILL EXPOSED |
| Full name | – | REMOVED |
| Phone number | – | REMOVED |
| – | REMOVED | |
| CCCD number | – | REMOVED |
The most sensitive text fields (name, phone, email, CCCD number) have been scrubbed. But date of birth and image reference IDs persist.
Bulk ID enumeration still works. The mass-load endpoint returns 50,403 candidate IDs. While most individual records appear empty or deleted, the enumeration itself should not be possible for unauthenticated users.
CDN image access has regressed. The image CDN nodes at i0.connections.vn and i3.connections.vn are responding with HTTP 200 again. These were marked as fixed in the March 10 revalidation, meaning this is a regression, not a lingering issue. If image reference IDs from candidate records can still be resolved to CDN URLs, the ID card photos documented in this report may once again be downloadable.
Scorecard
| Platform | Tests | Fixed | Vulnerable | Score |
|---|---|---|---|---|
| tec.universityx.vn (this report) | 13 | 6 | 5 | 46% fixed |
| connections.universityx.vn (Part 2) | 13 | 11 | 2 | 85% fixed |
| Previous (March 10, both) | 10 | 3 | 7 | 30% fixed |
Assessment
The vendor has made real progress. The XHR authentication gate is the correct architectural fix, and the removal of the b6x cipher in favor of actually scrubbing sensitive fields is a better approach than obfuscation. The pattern is moving in the right direction.
But for this platform specifically, the job is not done. The CDN regression is concerning because it reverses a previously confirmed fix. The persistent candidate data, even partial, combined with image reference IDs, means the core finding of this report (ID card photo exposure) may not be fully resolved. And 50,403 enumerable candidate IDs represent a larger dataset than the 896 candidates I documented here, suggesting the exposure may have been broader than initially assessed.
The next step should be verifying whether the exposed image reference IDs can still be resolved to downloadable photos on the CDN. If they can, the most critical finding in this report remains exploitable despite four months of remediation efforts.
Appendix
A. Tools Used
| Tool | Version | Purpose |
|---|---|---|
| Python | 3.12 | Primary scripting runtime |
| requests | Latest | HTTP protocol for downloading images and PDFs |
| pdfplumber | Latest | Extracting data tables from PDFs |
| Playwright | Latest | Headless browser (JS execution + DOM scraping) |
| Chromium | (Bundled) | Browser engine for rendering pages |
B. Disclosure Timeline
| Date / Time | Event |
|---|---|
| 2026-02-25 13:00 | Vulnerability discovered via Google search indexing |
| 2026-02-25 15:30 | Framework reverse engineering complete; Database schema mapped |
| 2026-02-25 17:00 | API exploitation confirmed; CCCD data accessed |
| 2026-02-25 19:30 | Image CDN analyzed successfully; ID card photos downloaded |
| 2026-02-25 20:00 | Automation tool development started |
| 2026-02-25 21:00 | Phase 1 complete: 896 SBDs extracted from 32 PDFs |
| 2026-02-26 01:30 | Phase 2 started: Parallel API crawl (3 workers) |
| 2026-02-26 04:30 | Phase 2 complete: 896/896 records processed successfully |
| 2026-02-26 09:00 | Data merge complete: 2,449 images, 223 MB total |
| 2026-02-26 13:00 | Technical report finalized |
| 2026-03-24 | Report published |
| 2026-06-13 | Part 2 published |
| 2026-06-23 | Revalidation: 46% fixed on tec.universityx.vn, 85% on connections.universityx.vn |
C. Glossary
| Term | Definition |
|---|---|
| CCCD | Căn cước công dân - Citizen Identity Card (new format) |
| CMND | Chứng minh nhân dân - People’s Identity Card (old format) |
| University X | Pseudonym for the affected university |
| SBD | Số báo danh - Exam Seat Number |
| VSTEP | Vietnamese Standardized Test of English Proficiency |
| IDOR | Insecure Direct Object Reference |
| SaaS | Software as a Service |
| CDN | Content Delivery Network (static files, images) |
| PII | Personally Identifiable Information |
| KYC | Know Your Customer (bank/wallet verification process) |
| Company Y | Pseudonym for the company that provides and operates the Connections platform |
D. Sample Data Records
Below are two representative records extracted from the dataset, with actual personally identifiable information redacted:
// Sample Record 1 (Redacted)
{
"sbd": "AN1***",
"ho_ten": "[REDACTED]",
"ngay_sinh": "28/09/2000",
"gioi_tinh": "Nu",
"cccd": "022XXXXXXXXX",
"sdt": "037XXXXXXX",
"email": "[redacted]@gmail.com",
"don_vi": "",
"dan_toc": "Kinh",
"noi_sinh": "Tinh Bac Ninh",
"img_front": "output/images/AN1***_front.jpg",
"img_back": "output/images/AN1***_back.jpg"
}
// Sample Record 2 (Redacted)
{
"sbd": "TQ1***",
"ho_ten": "[REDACTED]",
"ngay_sinh": "03/07/2000",
"gioi_tinh": "Nu",
"cccd": "180XXXXXXXXX",
"sdt": "036XXXXXXX",
"email": "180XXXXXXXX@s.universityx.edu.vn",
"don_vi": "",
"dan_toc": "Kinh",
"noi_sinh": "",
"img_front": "output/images/TQ1***_front.jpg",
"img_back": "output/images/TQ1***_back.jpg"
}
Notable observations:
- CCCD numbers are stored as plain text (with a leading apostrophe only for CSV formatting purposes).
- Image files are standard JPEG photographs of physical identity cards.
- The
don_vi(workplace) field has a very low fill rate (15.1%), indicating most candidates are students who have not yet entered the workforce.
E. Reproduction Steps
- Step 1 Download publicly listed candidate PDFs from the static file CDN and parse the candidate tables to extract exam registration numbers (SBDs).
- Step 2 For each SBD, query the unauthenticated JavaScript API endpoints to retrieve full candidate records including CCCD numbers and personal details.
- Step 3 Resolve image reference fields from API responses into CDN image URLs, then download the associated ID card photos.
- Step 4 Merge PDF-extracted data and API-extracted data using SBD as the primary key to produce a complete dataset.
Note: Detailed reproduction code and tooling have been withheld from this public report to prevent misuse. Full technical details were shared with the affected parties during responsible disclosure.