Gartner's API Security Report

Download report

Unpacking the Parler Data Breach

Michael Isbitski
Michael Isbitski
Jan 21, 2021

The storming of Capitol Hill on January 6, 2021 was an unprecedented incident for Americans and non-Americans alike, and the world is still processing what happened. Many suspects have been charged, and the FBI is still investigating. As many of you likely know by now, a number of individuals used the social media platform Parler to discuss and coordinate their activities. Just before midnight PT on January 11, 2021, AWS pulled the plug on hosting the platform, citing Parler’s inability to moderate content effectively. The Parler mobile applications were also revoked from the public app stores, Apple App Store and Google Play, which are the lifeblood and main distribution mechanism of any modern, mainstream service. Parler’s defense is that it doesn’t moderate content as heavily as other social media platforms, favoring freedom of speech.

Digital activists and data archivists led by the individual who identifies by their Twitter handle, crash override (@donk_enby), began to siphon as much data as possible off of the Parler platform prior to the AWS shutdown. They allege that the Parler data was public and also critical for the Capitol Hill riot investigations. Its users will largely argue otherwise, since privacy implications will emerge as the dust settles. Media coverage from other investigative journalists as well as reviews of the code used by archivists make clear that the Parler data was inadequately secured in a number of ways. The code used for coordinated scraping by the ArchiveTeam project can be found at https://github.com/ArchiveTeam/parler-grab. The community has also cataloged other code used for scraping and data parsing at:

https://gist.github.com/brossi/3abb692edf25aaabaef9648dbbd693fd

https://github.com/rljacobson/CapitolResources/

How do we know this was an API attack?

In the case of Parler, there are a few giveaways in some of the scraping code. In particular, we can see API references in the function wget.callbacks.get_urls():

This naming convention follows a common pattern for API endpoints. It appears as though the code invokes this particular endpoint resolve user identifiers of Parler user profiles and match to content URLs on separate endpoints, possibly also APIs. Some media reports also indicated that hosts suffixed as *.pw were API endpoints.

Unfortunately with REST, there is a wide gamut of what an API endpoint can look like, since REST is an architectural style, not a standard. What appears to be a standard URL may actually be a web API. Full HTTP requests and responses from a traffic capture would be useful to properly identify the extent of how many APIs were used. Most modern web and mobile application designs make extensive use of server-side APIs to enable functionality and serve data. The volume of APIs often increases exponentially where microservices architecture and cloud-native design principles are practiced.

Why does it qualify as a data breach?

The reasons that the Parler event in January 2021 fits the definition of a data breach are two-fold:

  1. Within the Data Breach Investigations Report, a commonly cited security resource, Verizon differentiates breaches as “an incident that results in the confirmed disclosure—not just potential exposure—of data to an unauthorized party.”
  2. In a similar vein , the European Commission within the General Data Protection Regulation, or GDPR, defines a personal data breach as “a breach of security leading to the accidental or unlawful destruction, loss, alteration, unauthorised. disclosure of, or access to, personal data transmitted, stored or otherwise processed” What should be clear is that a breach is a type of security incident.”

See the top findings from the industry's first State of API Security report

Deep dive into Parler security issues

While the shutdown of Parler remains politically charged, the event offers some valuable technical lessons worth reviewing, many of which tie directly into API security and how best to protect sensitive data. For readability, I’ve labeled and mapped the lessons to the five major security issues discovered thus far, many of which map to common API security oversights:

  1. No authentication on Parler’s public API
  2. Parler failed to purge data and used a user-controllable variable to control visibility
  3. Parler used predictable, sequential identifiers for content
  4. Parler failed to scrub content metadata
  5. Multi-factor authentication misconfiguration resulted in an authentication bypass (disputed)

I’m not in the business of quantifying or qualifying risk, so I imply no severity or priority ranking to these oversights. I’d posit that they are all severe and chained together, disastrous. Inarguably, the lack of authentication is the most critical issue here since it enabled anyone to get to any data. I also map to the OWASP API Security Top 10 where appropriate for each security issue.

1 – No authentication on Parler’s public API

This vulnerability is one of the less clear-cut security issues in the Parler breach given some of the mixed information being reported. The general consensus as reported by many media outlets and hacktivists was that Parler’s authentication was entirely absent. Broken, weak, or misconfigured authentication maps specifically to OWASP API2:2019 Broken User Authentication. It also warrants a closer examination, since the issue might be more nuanced. Based on what the hacktivist shared publicly, at least one endpoint was available without authentication which provided access to images without requiring authentication.

It’s a mystery to me why Parler endpoints might have had no authentication in place. Organization’s sometimes create APIs that are open by design and that allow fully anonymous access. A potential use case might be to see availability of an item without requiring a login flow, prompting for credentials. Distinctions must also be made when considering terms like external API, open API and public API. These labels don’t necessarily infer that authentication is absent, or that they allow fully anonymous access. There may even be transparent application-level or device-level authentication in place, rather than user-level authentication. In Parler’s case, these APIs likely were not intended to be anonymous, public APIs. The APIs allowed direct access to Parler user profile information and user content, including message posts, images, and videos. It is unlikely that Parler would have intended or configured these APIs and pages to be accessible without authentication.

Ideally, we’d have a session capture with recorded sequences of HTTP requests and responses to analyze what truly occurred. Examining one of the Python scripts used to scrape video, there is no functionality to create a session nor is a variable used to provide authentication headers, session cookies or other common authentication information. cURL is used here to fetch video files in MP4 format, using a file list of known URLs which could be found hosted elsewhere. Pastebin was one such source for the list, which appears to have been taken offline now. This isn’t necessarily evidence that Parler lacked authentication though. It may just be that the expectation was the user would already have an authenticated session in which they’d run the scraping code.

OWASP API1:2019 Broken Object Level Authorization is one of the most common API attack patterns we see, better known as BOLA or previously insecure direct object reference (IDOR). BOLA issues if exploited are often referred to as privilege escalation attacks, which may be either horizontal or vertical. In this case you were able to see other user’s content, or horizontal privilege escalation. Insufficient authorization paired with lack of rate limiting allows for enumeration of records, particularly those with sequential identifiers. Typically, one would expect to see something like OpenID Connect (OIDC) for authentication and OAuth 2 for authorization. Based on various media reports, no authenticated session was needed, and you could simply request data uninhibited enabling the archivists to grab the bulk of Parler user data.

Examining other code snippets in detail, such as the Parler API interface for Python, it does appear that at least some API endpoints and pages required authentication. Other video scraping tools also required authenticated state, such as can be seen with ParlerScraper. This leads me to believe this might have been at least partially a BOLA attack, despite information reported that all data was public.

Parler’s configuration appeared to lack sufficient authorization, at least not enforced properly at all the APIs and pages the data archivists used. We’ll need to see if more details come to light from Parler itself or the archivists clarifying how they pulled data. There is a world of difference between having no authentication and having authentication with weak authorization.

2 – Parler failed to purge data and used a user-controllable variable to control visibility

It appears Parler did not delete content when a given user requested to delete it as reported by Ars Technica.

“When users deleted their posts, the site failed to remove the content and instead only added a delete flag to it.”

And also Wired UK:

“It also appears as though deleted posts were retained in Parler’s database and even flagged as deleted, potentially giving law enforcement and researchers access to posts that Parler users wanted to erase from history.”

I couldn’t locate evidence hinting at this deleted flag within the scraping code, such as functionality to account for content state (deleted or not) when fetching message posts, images or video. Presumably, deleted data was accessible via the APIs hinting at a larger authorization problem as described earlier. Parler may have also relied on front-end code, or its web application and mobile apps, to filter what deleted data was visible to users. Based on the limited technical details, this particular issue would best align with OWASP API3:2019 Excessive Data Exposure.

3 – Parler used predictable, sequential identifiers for content

Media reports have stated this numerously, and we see evidence of it in the scraping scripts and code. Archivists merely iterated alphanumerically to harvest all content off of Parler. Prectable, sequential identifiers enable attackers to discern patterns. They can use this knowledge to bruteforce APIs and URLs via scripting, headless browsers and other forms of automation. In the Parler case, archivists were able to capitalize on this vulnerability to perform the large scale content scraping.This maps to OWASP API4:2019 Lack of Resources & Rate Limiting. There are also intersections with automated threat taxonomies, particularly the OWASP Automated Threats Handbook and OWASP:OAT-011 Scraping.

4 – Parler failed to scrub content metadata

Parler user data contained geolocation information. Most modern operating systems, particularly mobile OSs like Android and iOS will automatically append this location information to images and videos captured on mobile devices equipped with GPS capability. It’s commonly referred to as geotagging and employed by most modern smartphones and tablets. GPS data is accurate within approximately 3 meters or 10 feet, but it doesn’t provide elevation. Regardless, this geolocation metadata provided reasonably accurate physical proximity for those posting to Parler during the Capitol Hill riots. The hacktivist shared one example of the harvested video metadata on their Twitter page.

There are many privacy implications with such metadata being persisted and then harvested. It’s unclear if Parler application teams relied on the client applications to mask this information, or if they intended for the APIs to restrict access to such data. It’s not uncommon for social media platforms to maintain this metadata for cases such as user photo album organization, marketing campaigns or general user analytics. The issue also maps closest to OWASP API3:2019 Excessive Data Exposure.

5 – Misconfiguration of multifactor authentication resulted in an authentication bypass (disputed)

Some reports indicated there was a security misconfiguration as a result of Twilio integration that was later decommissioned. Allegedly, some of the archivists used this to bypass multifactor (MFA) authentication during account creation and extract data. The issue was later disputed by the hacktivist, and Twilio representatives have also stated it was false. There is evidence to the contrary though in some code hosted in github that could have been used to generate fake accounts on Parler. If this code were used in the Parler scraping, it would map to other forms of abuse, specifically OWASP:OAT-019 Account Creation and OWASP:OAT-009 CAPTCHA Defeat. One of the comments on this fake account creation code allude to the broken Twilio integration that allowed bypass of CAPTCHA and MFA, also sometimes referred to as 2FA:

Unfortunately, there aren’t enough forensic artifacts to fully validate claims on either side. OWASP API2:2019 Broken User Authentication and OWASP API7:2019 Security Misconfiguration would be relevant classifications of the issue if MFA authentication misconfiguration had occurred.

This MFA misconfiguration would further fuel the debate whether the Parler data was truly public. Parler’s authentication and authorization mechanisms were either designed inadequately, or possibly some code or configuration change resulted in the broken authentication and authorization. It would be interesting to see commit history and change history for the application and associated systems, though this level of detail would require information from Parler’s application teams. Typically that information is not publicly available unless code is maintained in a public repo within git-based services such as GitHub or GitLab.

Top recommendations to avoid Parler mistakes

1 – Secure your mobile application code

Protect your mobile application code. Mobile apps are often essential to enabling business or customer functionality. Most organizations consider them to be a critical part of their intellectual property. They are also entry points to your APIs and used as an initial step in attacker reconnaissance. Protecting your mobile applications requires a mixture of client-side code protections, secure coding practices, and secure design choices such as ensuring sensitive functionality and data is kept server-side. In the case of something like deleted content, it should remain inaccessible to unauthorized users and not something you should rely on client code to filter. Always presume your client-side code, as well as any user or app-generated data that originates from it can and will be compromised.

The hacktivist who first posted details on how to access Parler’s APIs and data was spending time reverse engineering the Parler mobile app prior to the platform shutdown. The hacktivist created and posted Parler-tricks publicly, which was Python code that could be used to abuse the Parler iOS mobile app and associated backend APIs. Based on their own retelling, the hacktivist’s intent for this reverse engineering was to support claims surrounding a separate Parler data breach in late 2020. That particular breach was reported by the hacktivist and Anonymous founder, Aubrey Kottle, aka Kirtner. Regardless, any attacker’s first steps in breaching an organization’s APIs often starts with reverse engineering of front end code, typically mobile applications and JavaScript-powered web applications.

How Salt Security can help:

The Salt Security platform detects early attacker reconnaissance activity and can also be configured to alert and block accordingly using your organization’s existing proxies and gateways. This includes datacenter and cloud environments. The platform differentiates between expected user traffic flows and API consumption by baselining all requests to and responses from API endpoints. It can detect the early warning signs when an attacker reverse engineers your mobile code and starts probing server-side APIs for weaknesses and vulnerabilities. Probing attempts by a skilled attacker may include many small changes in API requests such as URL parameters, HTTP methods, authentication headers and message body values. Any single attempt is not necessarily a complete attack. The Salt Security platform tracks each attacker’s probing attempts over multiple requests. This increases accuracy of detection, alerting and blocking based on an entire attacker session.

An unfortunate side effect of modern design and development is that there may be more APIs in use in your organization’s applications than you are aware of. The Salt Security platform also discovers APIs you may be blind to, or shadow APIs, as well as undocumented parameters within known APIs, or shadow parameters.

2 – Implement both authentication and authorization

There’s an old saying in security circles of “don’t roll your own crypto.” In other words, use well vetted encryption algorithms and implementations instead of trying to create your own. The same must also be said for authentication and authorization. Any API that isn’t intended to be used openly by the public and anonymously must employ both authentication and authorization. Even anonymous APIs may use machine identifiers for tracking and access control. Internal and private APIs should also be on your radar for strong authentication and authorization, as it is all too common for internal APIs to become external-facing through mobilization and application modernization.

These two security controls are the highest up on the OWASP API Security Top 10 for a reason. Having an intended design for robust authentication and authorization is also only one step. Ensure it is properly implemented initially and also continuously. Misconfigurations bite organizations regularly, and even common protocols like OIDC and OAuth 2 have inherent complexity.

How Salt Security can help:

The Salt Security platform analyzes all API traffic in your environment and details a variety of API authentication and authorization issues including:

  1. Missing, broken or misconfigured authentication such as JWT misconfiguration, token leakage and exposed tokens
  2. Missing, broken or misconfigured authorization such as BOLA, broken function level authorization (BFLA) and unchecked enumeration of objects and functions

You can feed this information to development teams and API teams if a code fix is necessary. Salt provides native integration with Jira and supports custom integrations via webhooks to support your organization’s remediation workflows. Salt also integrates with your existing proxies, web application firewalls and API gateways to mitigate issues in runtime.

3 – Avoid predictable identifiers and sequential identifiers

Don’t use sequential identifiers, ever. Ensure you use algorithms with sufficient entropy, or randomness, for sensitive data such as record identifiers, authentication tokens and session values. Use well-vetted frameworks, libraries and SDKs with functions that provide sufficient entropy, commonly referred to as pseudo random number generators (PRNG). And if you’re using these identifiers for any form of encryption, you’ll need cryptographically secure PRNGs. Failing to use algorithms which provide sufficient randomness leaves data exposed to enumeration and brute forcing attacks. In encryption use cases, insufficient entropy leaves data vulnerable to cryptographic attacks.

How Salt Security can help:

The Salt Security platform detects brute force attacks that include identifier enumerations in API parameters. By monitoring each user individually and continuously over time, the Salt Security platform identifies low and slow attempts of identifier guessing or sequential scanning. These attempts are detected and can be identified and blocked before any record has been breached or stolen. You can feed this information to development teams and API teams if a code fix is necessary. Salt provides native integration with Jira and supports custom integrations via webhooks to support your organization’s remediation workflows.

4 – Use rate limiting as a starting point to mitigate API abusers

Employ rate limits at a minimum, but also seek solutions that can employ more advanced traffic collection and behavioral analysis. The scope of the problem is much larger than what can be enforced within a gateway or perimeter proxy. Rate limits didn’t seem to be at all in place with Parler given the large scale scraping that occurred. Unfortunately, many organizations hit a wall with rate limits since they are difficult to scale without inhibiting functionality. This is particularly true with static rate limiting and public APIs. When you’re dealing with hundreds of thousands or millions of users, it becomes an incredibly complex technical problem to maintain API context of what users and machines are requesting, let alone tracking abusive requesters. Relying on IP address lists of malicious requesters and IP address allow/block lists are not scalable options for most organizations. This can lead to content scraping attacks and data breaches as seen with Parler.

How Salt Security can help:

The Salt Security platform detects API abusers that are excessively consuming any API that may be lacking resource limits or rate limits. The platform also enables detection of scrapers, malicious scripts, and low and slow fraud attempts that often evade traditional rate limiting mechanisms. Salt can alert on these types of events and notify your SOC analysts via SIEM integration. It can also automatically block at your organization’s existing proxies, web application firewalls and API gateways.

5 – Protect data served by APIs and protect APIs that serve user data

User privacy is paramount and throughout its lifecycle. You must use the impacting regulation as your guide here, since approaches vary based on what type of data you collect and from whom. For instance, GDPR impacts any organization globally that collects data on EU citizens living anywhere. I’ve often seen organizations get this wrong, thinking it only applies to EU companies or only to EU citizens living in the EU.

Regulations are often less prescriptive, much to the dismay of security teams and developers. The discussion should start with questioning the data you collect and seeking a solid understanding of what security controls help protect data in transit, in use and at rest. APIs ultimately are used to serve such data, so ensure that security controls you put in place are adequate. TLS 1.2 should be standard, and that’s only a transport security control. Sensitive data should also be encrypted at rest using well-vetted algorithms such as AES-CBC with appropriate key sizes. 128-bits is the minimum, but many organizations opt for 256-bits. The Parler breach also presented some uniqueness, since the user data contained geolocation information. This metadata is commonly appended to images and videos that are captured on modern mobile devices. If you are in the business of accepting images or videos from users, you may need to ensure that metadata is scrubbed accordingly in order to protect user privacy.

How Salt Security can help:

The Salt Security platform discovers and catalogs all your APIs and the traffic they serve, including sensitive data. In some cases, APIs may need to serve sensitive data by design, but it may warrant further review by your organization’s compliance or privacy teams. Salt identifies multiple types of sensitive data including PII, PHI and cardholder data that organizations must protect as mandated by industry security standards and regulations. These standards and regulations include CCPA, GDPR, HIPAA, and PCI-DSS among others.

Depending on the sensitive data in question and where it is exposed, remediating the data exposure might require a code fix or mitigation at your organization’s gateways to mask, tokenize or encrypt data as necessary. You can feed this information to development teams and API teams if a code fix is necessary. Salt provides native integration with Jira and supports custom integrations via webhooks to support your organization’s remediation workflows. Salt also integrates with your existing proxies, web application firewalls and API gateways to mitigate issues in runtime.

6 – Maintain visibility and security monitoring of APIs

Have sufficient API monitoring in place to respond in the event of an attack. The fact that Parler did not stop the mass data extraction prior to AWS shutting down hosting would seem to highlight they didn’t know what was going on or just didn’t care. The latter seems unlikely to me though given that Parler was a business operating a social media platform, not a fly by night operation. Many organizations lack adequate security monitoring for all their applications and systems, and this is particularly true for API context. The lack of adequate API inventory and monitoring of APIs are where OWASP API9:2019 Improper Assets Management or OWASP API10:2019 Insufficient Logging & Monitoring enter the picture. Knowing what APIs you have and ensuring you have visibility into how they are consumed is critical for any organization that is acquiring, integrating or building APIs.

How Salt Security can help:

The Salt Security platform is designed with discovery and visibility in mind. Salt can collect API traffic from numerous locations in your datacenter and cloud environments to cover all the ways your organization’s APIs communicate. The platform analyzes all API traffic in your environment, catalogs APIs and the sensitive data they serve, and monitors them continuously in runtime to identify security issues or exposures. Salt alerts on anomalous events and can be configured to notify your SOC analysts via SIEM integration to help enable SecOps workflows. The platform also distinguishes between an occasional operational hiccup and a truly malicious event. This capability helps minimize SecOps fatigue from reviewing unnecessary alerts and wasted developer cycles from triaging false positives.

Organizational implications and conclusion

The Parler event will continue to unfold and offer us more to unpack as new information comes to light. I am neither a political reporter nor an investigative journalist. This writeup is a fusion of information based on what others have uncovered thus far, readily available code snippets and my own technical analysis. I can revisit this analysis of the Parler breach as necessary as new data comes to light.

The hacktivist did not appear to have practiced responsible disclosure here, which is a bit troubling for me. That will undoubtedly be spotlighted as Parler and its users pursue their own legal avenues. The timing and sequence of events here may have made responsible disclosure impossible. Emotions and political beliefs may drive choices in this area. Public opinion can also differ from judicial system interpretation. This event may be viewed as data archival to serve the greater public good or data exfiltration with malicious intent. In the security field, when you identify a weakness or vulnerability, it is common practice to first notify the owning party. You also don’t typically fully exploit an issue, let alone extract all data possible to prove it. This action can spell the difference between security researcher and attacker. It also typically has legal implications later.

Most media outlets have deemed the Parler security issues to be the result of poor coding. Personally, I’d like to hear more from Parler’s leadership and its application and engineering teams. Maybe Parler grew too big too fast and overlooked some basic security requirements. In reality, we’ve seen these types of poor design choices and coding patterns amongst major household name brands. Case in point, Facebook had a BOLA vulnerability in the November/December 2020 timeframe, though exploitability and impact were far lower. Amazon Ring also recently had a pair of vulnerabilities in its Neighbors app, specifically using sequential identifiers and relying on client-side code to filter data. Dismissing these types of issues as “poor coding” perpetuates the friction amongst developers, engineers, and security practitioners.

I’ve kept this writeup in the sweet spot of API security since this event can spiral into a host of other subject areas including responsible disclosure, digital activism, privacy impacts, criminal law, and broader issues in social media platforms. Our focus in this recap is to provide valuable API security education.

The harsh reality of modern application development and systems design is that it is incredibly complex. It requires many individuals in both security and non-security roles to work in sync to get a large-scale application working without issue. The problem is worsened when you consider the modern digital supply chain and how organizations outsource elements of IT wholesale or per project. In my experience, no application or system is flawless. Every assessment I performed as a security practitioner would inevitably result in some found vulnerability. Bugs and vulnerabilities are inevitable. However, we can continue to learn from these mistakes and improve the state of application security and API security.