Internet Archive Under Fire: 31 Million-Account Breach and Zendesk Token Exposure Amid Sustained DDoS Disruption

1. Executive Summary

In October 2024, the Internet Archive (IA) faced a compound cyber incident combining service-disrupting DDoS activity with a major compromise of user authentication data (31 million records) and a subsequent breach of its third-party support platform (Zendesk). Public reporting indicates the initial compromise exposed usernames, email addresses, and bcrypt password hashes, while the later Zendesk incident stemmed from stolen access tokens and insufficient credential rotation practices. IA acknowledged “cyberattacks” affecting availability and confirmed that patron emails and encrypted passwords were exposed, while also noting that attackers abused a third-party helpdesk system to send emails to patrons. (blog.archive.org)

2. Contextual Background

2.1 Nature of the threat

This activity is best characterised as a multi-vector campaign against IA combining:

Availability attacks: intermittent DDoS events that repeatedly degraded or halted access to archive.org and related services. (WIRED)
Data breach and web-layer manipulation: a compromise that enabled a JavaScript-based site message/defacement and theft of a user authentication database containing ~31 million unique email addresses alongside usernames and bcrypt password hashes. (BleepingComputer)
Token/third-party platform compromise: reuse of stolen secrets and access tokens (reported as originating from exposed GitLab configuration/secrets) to access Zendesk support records and send emails from a legitimate Zendesk infrastructure path. (BleepingComputer)

No CVE has been authoritatively tied to these intrusions in reputable public reporting as of 2 November 2024; the observed risk centres on credential exposure, token theft, and operational security gaps (secret management and rotation) rather than a single disclosed software vulnerability. (BleepingComputer)

2.2 Threat-actor attribution

DDoS activity: WIRED reported that a hacktivist group calling itself BlackMeta publicly claimed responsibility for that week’s DDoS attacks. (WIRED)
BleepingComputer separately reported the DDoS as claimed by an alleged pro-Palestinian group named SN_BlackMeta, and also highlighted widespread misreporting that incorrectly attributed the data breach to the DDoS actors. (BleepingComputer)

Data breach / token compromise: Public reporting indicates the database theft was conducted by a separate, unidentified actor who sought “credit” for the intrusion and described an initial foothold via an exposed GitLab configuration file on an IA development host. This actor also claimed (without providing proof) to have stolen far more data (e.g., “7TB”), which should be treated as unconfirmed. (BleepingComputer)

Confidence assessment (Admiralty/NATO style):

DDoS claimed by BlackMeta/SN_BlackMeta: Possible (C/D) — self-claimed responsibility is reported, but independent validation of actor identity/capability is limited in public sources. (WIRED)
Database breach actor distinct from DDoS actors: Likely (B/C) — multiple sources distinguish the events and describe separate threat activity timelines and motives. (BleepingComputer)

2.3 Sector and geographic targeting

IA is a globally relied-upon non-profit digital library and web preservation platform; the impact profile is therefore broad and cross-sector, affecting researchers, journalists, legal professionals, and the general public. IA’s own service updates frame the incident in the context of broader attacks on library institutions. (blog.archive.org)

3. Technical Analysis

3.1 Detailed description of vulnerabilities and/or TTPs (MITRE ATT&CK mapped)

A. DDoS-driven service disruption

Network/Service Flooding: DDoS intermittently took archive.org offline and forced IA into phased service restoration (read-only modes).
- ATT&CK: T1498 (Network Denial of Service)

Sources describe recurring DDoS waves and IA’s need to “scrub”/filter traffic while restoring services. (WIRED)

B. Web-layer manipulation and data theft (31M user database)
Public reporting describes visitors seeing a malicious JavaScript alert/defacement message. Troy Hunt (HIBP) and BleepingComputer reported receiving and validating a database file (“ia_users.sql”) containing email addresses, screen names, bcrypt password hashes, and related metadata; the most recent record timestamps were reported as late September 2024. (BleepingComputer)

ATT&CK: T1565.001 (Data Manipulation: Stored Data Manipulation) — consistent with web defacement/JS injection effects on user experience (note: reporting focuses on the pop-up/alert mechanism rather than a full integrity-impact assessment). (BleepingComputer)
ATT&CK: T1005 (Data from Local System) / T1039 (Data from Network Shared Drive) — used here as analytic mapping for database theft; exact collection path is not fully public. (BleepingComputer)
ATT&CK: T1041 (Exfiltration Over C2 Channel) — analytic mapping; the mechanism of exfiltration is not publicly detailed. (BleepingComputer)

C. Exposed secrets → token reuse → third-party platform compromise (Zendesk)
BleepingComputer reported that the breach chain began with an exposed GitLab configuration file on an IA development server (services-hls.dev.archive.org), enabling access to source code and embedded credentials/tokens, including tokens associated with Zendesk. (BleepingComputer)
The same reporting states IA did not rotate “many of the API keys” promptly after exposure, and that a Zendesk token permitted access to 800K+ support tickets dating back to 2018. (BleepingComputer)
IA’s own service update corroborated that attackers “sent emails to patrons by exploiting a 3rd party helpdesk system.” (blog.archive.org)

ATT&CK mapping:

T1552 (Unsecured Credentials) — exposed tokens/secrets in configuration and/or repository materials (as described). (BleepingComputer)
T1528 (Steal Application Access Token) — consistent with reported token theft and subsequent misuse. (BleepingComputer)
T1078 (Valid Accounts) — operationally consistent with using valid tokens to access Zendesk and send authenticated emails. (BleepingComputer)

3.2 Exploitation status

As of 2 November 2024, these incidents were confirmed publicly via a combination of IA service updates, validation by HIBP/Troy Hunt, and multiple security news outlets. (Have I Been Pwned)
No reputable source in the reviewed set ties the incident to a widely exploited CVE or to inclusion in government “known exploited vulnerabilities” catalogues; the observed exploitation instead aligns to exposed secrets, credential harvesting, and token misuse. (BleepingComputer)

4. Impact Assessment

4.1 Severity and scope

User database exposure (31M accounts): The compromise of emails, usernames, and bcrypt password hashes increases risk of:

credential stuffing and account takeover (especially where passwords were reused),
targeted phishing against IA users,
privacy and doxxing concerns for users with sensitive archival/removal requests.

HIBP lists the breach as affecting 31.1 million accounts, with breach occurrence in September 2024 and addition to HIBP on 9 October 2024. (Have I Been Pwned)

Zendesk/support record exposure (800K+ tickets): Support interactions often contain personal data, potentially identity documents (as noted by BleepingComputer recipients), and sensitive removal requests—raising the likelihood of doxxing, harassment, extortion attempts, and reputational damage. (BleepingComputer)

Availability and operational disruption: IA restored core services in stages (read-only modes, phased return of Wayback Machine and Archive-It), confirming material operational impact. (blog.archive.org)

4.2 Victim profile

Individuals: IA account holders; individuals submitting support tickets/removal requests (potentially including ID documents). (BleepingComputer)
Organisations: libraries, academic institutions, journalists, and enterprises that depend on Wayback Machine evidence for research and litigation support. (blog.archive.org)

5. Indicators of Compromise (IOCs)

5.1 IOC Table

Note: As of 2 November 2024, reputable public reporting provides very limited actionable IOCs (e.g., no malicious IPs/domains/hashes published for defenders). The artefacts below are included as investigative pivots referenced in reporting, not as confirmed malicious infrastructure.

Type	Value	Context/Notes	Source
Database artefact (filename)	`ia_users.sql`	Reported name of the stolen authentication database file shared with HIBP/Troy Hunt.	BleepingComputer reporting (BleepingComputer)
Internal host (reported exposure point)	`services-hls.dev.archive.org`	Reported location of an exposed GitLab configuration file used to obtain a GitLab auth token (investigative pivot).	BleepingComputer reporting (BleepingComputer)
Technique artefact	JavaScript pop-up/alert on archive.org	Used to announce the breach and present a defacement-style message to visitors.	BleepingComputer / WIRED (BleepingComputer)
Support platform scope indicator	“800K+ support tickets … since 2018”	Reported access level of a Zendesk token; treat as impact indicator rather than IOC.	BleepingComputer reporting (BleepingComputer)

5.2 Detection guidance (practical and platform-oriented)

Because IOCs are sparse, prioritise behavioural and control-plane detections:

GitLab / source control

Alert on secrets committed to repositories and “long-lived” tokens referenced in CI/CD variables or config files.
Detect unusual cloning/export of repositories, mass downloads, or access from atypical geographies/ASNs.
Implement pre-receive hooks and secret scanning (e.g., GitLab secret detection) and treat any hit as an incident until rotated and verified.

Zendesk

Review Zendesk audit logs for:
- creation/use of API tokens,
- bulk ticket exports/downloads,
- unusual API client identifiers,
- replies to historic tickets that match the reported attacker behaviour (reviving older removal requests). (BleepingComputer)

Web/edge

Alert on:
- unauthorised changes to JavaScript libraries loaded globally,
- integrity drift for high-risk scripts (SRI mismatches),
- unexpected content injection (CSP violation reports).

6. Incident Response Guidance

6.1 Containment, eradication, and recovery

Immediate credential rotation at scale: rotate all potentially exposed GitLab tokens, CI variables, database credentials, and third-party API keys; invalidate old tokens and ensure rotation propagates across environments. (This directly addresses the failure mode described in reporting around incomplete key rotation.) (BleepingComputer)
Zendesk containment: revoke tokens, reset admin accounts, enforce SSO/MFA, and restrict API token scope and lifetime.
Rebuild trust boundaries: treat dev hosts and build pipelines as potentially compromised; re-issue signing keys, rebuild artefacts from clean sources, and validate deployment integrity before restoring write-enabled features (consistent with IA’s phased read-only restoration posture). (blog.archive.org)
DDoS resilience: ensure upstream DDoS mitigation, rate limiting, and scrubbing capacity; validate runbooks for sustained intermittent attacks. (WIRED)

6.2 Forensic artefacts to collect and preserve

Web server logs, WAF/CDN logs, and CSP violation reports around the time of the JavaScript alert/defacement. (BleepingComputer)
GitLab audit logs, runner logs, CI job histories, and access tokens inventory (creation, last-used timestamps). (BleepingComputer)
Database access logs and export histories for the user authentication database. (BleepingComputer)
Zendesk audit logs and export logs; attachment access logs where available (given reports of identity documents in removal requests). (BleepingComputer)

6.3 Lessons learned and preventive recommendations

Eliminate long-lived tokens, enforce automated rotation, and implement continuous secret scanning across repos and configuration management.
Segment development infrastructure and enforce least privilege between source control, database management, and production systems.

7. Threat Intelligence Contextualisation

7.1 Similar incidents and patterns

This incident reflects a recurring pattern seen across non-profit and public-interest organisations: attackers combine availability pressure (DDoS) with public shaming/defacement and data theft to amplify attention and operational strain. IA itself highlighted contemporaneous cyberattacks affecting library institutions more broadly. (blog.archive.org)

7.2 MITRE ATT&CK mapping table (observed lifecycle)

Tactic	Technique ID	Technique Name	Observed Behaviour
Impact	T1498	Network Denial of Service	Intermittent DDoS disrupted availability; services restored in phases/read-only modes. (WIRED)
Credential Access	T1552	Unsecured Credentials	Exposed GitLab config/secrets reportedly enabled access token compromise. (BleepingComputer)
Credential Access	T1528	Steal Application Access Token	Stolen tokens reportedly used to access Zendesk and potentially other systems. (BleepingComputer)
Initial Access / Persistence (analytic)	T1078	Valid Accounts	Use of legitimate tokens/accounts to send authenticated Zendesk emails and access records. (BleepingComputer)
Collection (analytic)	T1005	Data from Local System	Theft of user authentication database described in reporting (collection path not fully public). (BleepingComputer)
Exfiltration (analytic)	T1041	Exfiltration Over C2 Channel	Database was transferred to third parties (HIBP received the dataset); exact exfil method not disclosed. (BleepingComputer)
Impact (web integrity)	T1565.001	Stored Data Manipulation	JavaScript alert/defacement used to message visitors and publicise breach. (BleepingComputer)

8. Mitigation Recommendations

8.1 Hardening and best practices

Secret hygiene: mandatory secret scanning; block commits containing tokens/keys; rotate secrets on detection.
Token policy: short-lived tokens, scoped permissions, and enforced rotation SLAs; automatic invalidation on suspected exposure. (BleepingComputer)
Third-party governance: treat SaaS platforms (Zendesk) as extensions of your trust boundary; monitor API usage, enforce MFA/SSO, and restrict exports and attachment access. (BleepingComputer)
Web supply chain controls: integrity controls for shared JavaScript libraries (SRI), strict CSP, and monitored deployment pipelines to reduce defacement/injection risk. (BleepingComputer)

8.2 Patch management advice

Not applicable in a traditional CVE-led sense based on publicly available, reputable reporting as of 2 November 2024; prioritisation should instead follow exposed secret remediation and identity/control-plane hardening. (BleepingComputer)

9. Historical Context & Related Vulnerabilities

While IA has historically faced DDoS pressure, the October 2024 events stand out due to the combination of DDoS, public-facing defacement/alerts, and confirmed credential database exposure. WIRED noted IA had faced aggressive DDoS attacks previously (including in late May), underscoring that the organisation has been under recurring availability pressure. (WIRED)

10. Future Outlook

10.1 Emerging trends and likely evolution

Reputational targeting will persist: high-visibility public-interest platforms attract attention-seeking intrusions where “impact” is amplified by downtime and public messaging. (BleepingComputer)
Token-theft will remain a primary risk driver: long-lived tokens and incomplete rotation create compounding breach paths from source code → infrastructure credentials → third-party platforms.

10.2 Predicted shifts

Expect follow-on activity focused on:

credential stuffing against IA users (especially where password reuse is common),
targeted harassment/phishing against users who filed sensitive takedown/removal requests,
renewed DDoS pressure during high-visibility moments (service restoration milestones, legal news cycles).

11. Further Reading

Vendor / primary updates
- Internet Archive “Services Update: 2024-10-21” (blog.archive.org)
- Internet Archive “Services Update: 2024-10-17” (blog.archive.org)
Incident reporting and validation