☑️XXE
Introduction
XML External Entity (XXE) Injection vulnerabilities occur when XML data is taken from a user-controlled input without properly sanitizing or safely parsing it, which may allow us to use XML features to perform malicious actions.
XML
Extensible Markup Language (XML) is a common markup language (similar to HTML and SGML) designed for flexible transfer and storage of data and documents in various types of applications. XML is not focused on displaying data but mostly on storing documents' data and representing data structures. XML documents are formed of element trees, where each element is essentially denoted by a tag, and the first element is called the root element, while other elements are child elements.
<?xml version="1.0" encoding="UTF-8"?>
<email>
<date>01-01-2022</date>
<time>10:00 am UTC</time>
<sender>[email protected]</sender>
<recipients>
<to>[email protected]</to>
<cc>
<to>[email protected]</to>
<to>[email protected]</to>
</cc>
</recipients>
<body>
Hello,
Kindly share with me the invoice for the payment made on January 1, 2022.
Regards,
John
</body>
</email>Tag
The keys of an XML document, usually wrapped with (</>) characters.
<date>
Entity
XML variables, usually wrapped with (&/;) characters.
<
Element
The root element or any of its child elements, and its value is stored in between a start-tag and an end-tag.
<date>01-01-2022</date>
Attribute
Optional specifications for any element that are stored in the tags, which may be used by the XML parser.
version="1.0"/encoding="UTF-8"
Declaration
Usually the first line of an XML document, and defines the XML version and encoding to use when parsing it.
<?xml version="1.0" encoding="UTF-8"?>
Furthermore, some characters are used as part of an XML document structure, like <, >, &, or ". So, if we need to use them in an XML document, we should replace them with their corresponding entity references (e.g. <, >, &, "). Finally, we can write comments in XML documents between <!-- and -->, similar to HTML documents.
XML DTD
XML Document Type Definition (DTD) allows the validation of an XML document against a pre-defined document structure. The pre-defined document structure can be defined in the document itself or in an external file. The following is an example DTD for the XML document we saw earlier:
The above DTD can be placed within the XML document itself, right after the XML Declaration in the first line. Otherwise, it can be stored in an external file (e.g. email.dtd), and then referenced within the XML document with the SYSTEM keyword, as follows:
It is also possible to reference through URL:
This is relatively similar to how HTML documents define and reference JavaScript and CSS scripts.
XML Entity
We may also define custom entities (i.e. XML variables) in XML DTDs, to allow refactoring of variables and reduce repetitive data. This can be done with the use of the ENTITY keyword, which is followed by the entity name and its value, as follows:
Once we define an entity, it can be referenced in an XML document between an ampersand & and a semi-colon ; (e.g. &company;). Whenever an entity is referenced, it will be replaced with its value by the XML parser. Most interestingly, however, we can reference External XML Entities with the SYSTEM keyword, which is followed by the external entity's path, as follows:
This works similarly to internal XML entities defined within documents. When we reference an external entity (e.g. &signature;), the parser will replace the entity with its value stored in the external file (e.g. signature.txt). When the XML file is parsed on the server-side, in cases like SOAP (XML) APIs or web forms, then an entity can reference a file stored on the back-end server, which may eventually be disclosed to us when we reference the entity.
Local File Disclosure
When a web application trusts unfiltered XML data from user input, we may be able to reference an external XML DTD document and define new custom XML entities. Suppose we can define new entities and have them displayed on the web page. In that case, we should also be able to define external entities and make them reference a local file, which, when displayed, should show us the content of that file on the back-end server.
The first step in identifying potential XXE vulnerabilities is finding web pages that accept an XML user input. For example, lets say we submit a contact form and on burp we see this:

Note: Some web applications may default to a JSON format in HTTP request, but may still accept other formats, including XML. So, even if a web app sends requests in a JSON format, we can try changing the
Content-Typeheader toapplication/xml, and then convert the JSON data to XML with an online tool. If the web application does accept the request with XML data, then we may also test it against XXE vulnerabilities, which may reveal an unanticipated XXE vulnerability.
As we can see, the form appears to be sending our data in an XML format to the web server, making this a potential XXE testing target. Suppose the web application uses outdated XML libraries, and it does not apply any filters or sanitization on our XML input. In that case, we may be able to exploit this XML form to read local files.
By default, without any modification of the data we get the following message:

We see that the value of the email element is being displayed back to us on the page. To print the content of an external file to the page, we should note which elements are being displayed, such that we know which elements to inject into. In some cases, no elements may be displayed and we'll discuss it later.
For now, we know that whatever value we place in the <email></email> element gets displayed in the HTTP response. So, let us try to define a new entity and then use it as a variable in the email element to see whether it gets replaced with the value we defined.

In contrast, a non-vulnerable web application would display (&company;) as a raw value. This confirms that we are dealing with a web application vulnerable to XXE.
Reading Sensitive Files
Now that we can define new internal XML entities let's see if we can define external XML entities. Doing so is fairly similar to what we did earlier, but we'll just add the SYSTEM keyword and define the external reference path after it:

This enables us to read the content of sensitive files, like configuration files that may contain passwords or other sensitive files like an id_rsa SSH key of a specific user, which may grant us access to the back-end server.
Reading Source Code
Another benefit of local file disclosure is the ability to obtain the source code of the web application. This would allow us to perform a Whitebox Penetration Test to unveil more vulnerabilities in the web application, or at the very least reveal secret configurations like database passwords or API keys.

As we can see, this did not work, as we did not get any content. This happened because the file we are referencing is not in a proper XML format, so it fails to be referenced as an external XML entity. If a file contains some of XML's special characters (e.g. </>/&), it would break the external entity reference and not be used for the reference. Furthermore, we cannot read any binary data, as it would also not conform to the XML format.
Luckily, PHP provides wrapper filters that allow us to base64 encode certain resources 'including files', in which case the final base64 output should not break the XML format. To do so, instead of using file:// as our reference, we will use PHP's php://filter/ wrapper. With this filter, we can specify the convert.base64-encode encoder as our filter, and then add an input resource (e.g. resource=index.php), as follows:

This trick only works with PHP web applications. We will discuss other methods that work with any web app next.
RCE with XXE
In addition to reading local files, we may be able to gain code execution over the remote server. The easiest method would be to look for ssh keys, or attempt to utilize a hash stealing trick in Windows-based web applications, by making a call to our server. If these do not work, we may still be able to execute commands on PHP-based web applications through the PHP://expect filter, though this requires the PHP expect module to be installed and enabled.
If the XXE directly prints its output 'as we saw so far', then we can execute basic commands as expect://id, and the page should print the command output. However, if we did not have access to the output, or needed to execute a more complicated command 'e.g. reverse shell', then the XML syntax may break and the command may not execute.
The most efficient method to turn XXE into RCE is by fetching a web shell from our server and writing it to the web app, and then we can interact with it to execute commands. To do so, we can start by writing a basic PHP web shell and starting a python web server, as follows:
Now, we can use the following XML code to execute a curl command that downloads our web shell into the remote server:
Note: We replaced all spaces in the above XML code with $IFS, to avoid breaking the XML syntax. Furthermore, many other characters like |, >, and { may break the code, so we should avoid using them.
Once we send the request, we should receive a request on our machine for the shell.php file, after which we can interact with the web shell on the remote server for code execution.
Note: Again, the expect module is not enabled/installed by default on modern PHP servers, so this attack may not always work. This is why XXE is usually used to disclose sensitive local files and source code, which may reveal additional vulnerabilities or ways to gain code execution.
Other XXE Attacks
Another common attack often carried out through XXE vulnerabilities is SSRF exploitation, which is used to enumerate locally open ports and access their pages, among other restricted web pages, through the XXE vulnerability.
Finally, one common use of XXE attacks is causing a Denial of Service (DOS) to the hosting web server, with the use the following payload:
This payload defines the a0 entity as DOS, references it in a1 multiple times, references a1 in a2, and so on until the back-end server's memory runs out due to the self-reference loops. However, this attack no longer works with modern web servers (e.g., Apache), as they protect against entity self-reference.
Advacned File Disclosure
Not all XXE vulnerabilities may be straightforward to exploit. Some file formats may not be readable through basic XXE, while in other cases, the web application may not output any input values in some instances, so we may try to force it through errors.
Advanced Exfiltration with CDATA
We saw how we could use PHP filters to encode PHP source files, such that they would not break the XML format when referenced, which (as we saw) prevented us from reading these files. But what about other types of Web Applications? We can utilize another method to extract any kind of data (including binary data) for any web application backend.
To output data that does not conform to the XML format, we can wrap the content of the external file reference with a CDATA tag (e.g. <![CDATA[ FILE_CONTENT ]]>). This way, the XML parser would consider this part raw data, which may contain any type of data, including any special characters.
One easy way to tackle this issue would be to define a begin internal entity with <![CDATA[, an end internal entity with ]]>, and then place our external entity file in between, and it should be considered as a CDATA element, as follows:
After that, if we reference the &joined; entity, it should contain our escaped data. However, this will not work, since XML prevents joining internal and external entities, so we will have to find a better way to do so.
To bypass this limitation, we can utilize XML Parameter Entities, a special type of entity that starts with a % character and can only be used within the DTD. What's unique about parameter entities is that if we reference them from an external source (e.g., our own server), then all of them would be considered as external and can be joined, as follows:
So, let's try to read the submitDetails.php file by first storing the above line in a DTD file (e.g. xxe.dtd), host it on our machine, and then reference it as an external entity on the target web application, as follows:
Now, we can reference our external entity (xxe.dtd) and then print the &joined; entity we defined above, which should contain the content of the submitDetails.php file, as follows:
So the joined comes from an external .dtd file.
Once we write our xxe.dtd file, host it on our machine, and then add the above lines to our HTTP request to the vulnerable web application, we can finally get the content of the submitDetails.php file:

As we can see, we were able to obtain the file's source code without needing to encode it to base64, which saves a lot of time when going through various files to look for secrets and passwords.
This trick can become very handy when the basic XXE method does not work or when dealing with other web development frameworks
Error-Based XXE
Another situation we may find ourselves in is one where the web application might not write any output, so we cannot control any of the XML input entities to write its content. In such cases, we would be blind to the XML output and so would not be able to retrieve the file content using our usual methods.
If the web application displays runtime errors (e.g., PHP errors) and does not have proper exception handling for the XML input, then we can use this flaw to read the output of the XXE exploit. If the web application neither writes XML output nor displays any errors, we would face a completely blind situation which we'll discuss later.
Lets force an error by sending a malformed XML

We see that we did indeed cause the web application to display an error, and it also revealed the web server directory, which we can use to read the source code of other files. Now, we can exploit this flaw to exfiltrate file content. To do so, we will use a similar technique to what we used earlier. First, we will host a DTD file that contains the following payload:
The above payload defines the file parameter entity and then joins it with an entity that does not exist. In our previous exercise, we were joining three strings. In this case, %nonExistingEntity; does not exist, so the web application would throw an error saying that this entity does not exist, along with our joined %file; as part of the error. There are many other variables that can cause an error, like a bad URI or having bad characters in the referenced file.
Now, we can call our external DTD script, and then reference the error entity, as follows:
Once we host our DTD script as we did earlier and send the above payload as our XML data (no need to include any other XML data), we will get the content of the /etc/hosts file as follows:

This method may also be used to read the source code of files. All we have to do is change the file name in our DTD script to point to the file we want to read (e.g. "file:///var/www/html/submitDetails.php"). However, this method is not as reliable as the previous method for reading source files, as it may have length limitations, and certain special characters may still break it.
Blind Data Exfiltration
We will see how we can get the content of files in a completely blind situation, where we neither get the output of any of the XML entities nor do we get any PHP errors displayed.
Out-of-band Data Exfiltration
Previously, we utilized an out-of-band attack since we hosted the DTD file in our machine and made the web application connect to us (hence out-of-band).
Instead of having the web application output our file entity to a specific XML entity, we will make the web application send a web request to our web server with the content of the file we are reading.
To do so, we can first use a parameter entity for the content of the file we are reading while utilizing PHP filter to base64 encode it. Then, we will create another external parameter entity and reference it to our IP, and place the file parameter value as part of the URL being requested over HTTP, as follows:
We can even write a simple PHP script that automatically detects the encoded file content, decodes it, and outputs it to the terminal:
So, we will first write the above PHP code to index.php, and then start a PHP server on port 8000, as follows:
Now, to initiate our attack, we can use a similar payload to the one we used in the error-based attack, and simply add <root>&content;</root>, which is needed to reference our entity and have it send the request to our machine with the file content:

Automated OOB Exfiltration
Although in some instances we may have to use the manual method we learned above, in many other cases, we can automate the process of blind XXE data exfiltration with tools. One such tool is XXEinjector. This tool supports most of the tricks we learned , including basic XXE, CDATA source exfiltration, error-based XXE, and blind OOB XXE.
Once we have the tool, we can copy the HTTP request from Burp and write it to a file for the tool to use. We should not include the full XML data, only the first line, and write XXEINJECT after it as a position locator for the tool:
Now, we can run the tool with the --host/--httpport flags being our IP and port, the --file flag being the file we wrote above, and the --path flag being the file we want to read. We will also select the --oob=http and --phpfilter flags to repeat the OOB attack we did above, as follows:
We see that the tool did not directly print the data. This is because we are base64 encoding the data, so it does not get printed. In any case, all exfiltrated files get stored in the Logs folder under the tool, and we can find our file there:
Prevention
Avoiding Outdated Components
While other input validation web vulnerabilities are usually prevented through secure coding practices (e.g., XSS, IDOR, SQLi, OS Injection), this is not entirely necessary to prevent XXE vulnerabilities. This is because XML input is usually not handled manually by the web developers but by the built-in XML libraries instead. So, if a web application is vulnerable to XXE, this is very likely due to an outdated XML library that parses the XML data.
For example, PHP's libxml_disable_entity_loader function is deprecated since it allows a developer to enable external entities in an unsafe manner, which leads to XXE vulnerabilities. If we visit PHP's documentation for this function, we see the following warning:
Warning
This function has been DEPRECATED as of PHP 8.0.0. Relying on this function is highly discouraged.
Furthermore, even common code editors (e.g., VSCode) will highlight that this specific function is deprecated and will warn us against using it:

Note: You can find a detailed report of all vulnerable XML libraries, with recommendations on updating them and using safe functions, in OWASP's XXE Prevention Cheat Sheet.
In addition to updating the XML libraries, we should also update any components that parse XML input, such as API libraries like SOAP. Furthermore, any document or file processors that may perform XML parsing, like SVG image processors or PDF document processors, may also be vulnerable to XXE vulnerabilities, and we should update them as well.
These issues are not exclusive to XML libraries only, as the same applies to all other web components (e.g., outdated Node Modules). In addition to common package managers (e.g. npm), common code editors will notify web developers of the use of outdated components and suggest other alternatives. In the end, using the latest XML libraries and web development components can greatly help reduce various web vulnerabilities, including XXE.
Using Safe XML Configurations
Other than using the latest XML libraries, certain XML configurations for web applications can help reduce the possibility of XXE exploitation. These include:
Disable referencing custom
Document Type Definitions (DTDs)Disable referencing
External XML EntitiesDisable
Parameter EntityprocessingDisable support for
XIncludePrevent
Entity Reference Loops
Another thing we saw was Error-based XXE exploitation. So, we should always have proper exception handling in our web applications and should always disable displaying runtime errors in web servers.
Such configurations should be another layer of protection if we miss updating some XML libraries and should also prevent XXE exploitation. However, we may still be using vulnerable libraries in such cases and only applying workarounds against exploitation, which is not ideal.
With the various issues and vulnerabilities introduced by XML data, many also recommend using other formats, such as JSON or YAML. This also includes avoiding API standards that rely on XML (e.g., SOAP) and using JSON-based APIs instead (e.g., REST).
Finally, using Web Application Firewalls (WAFs) is another layer of protection against XXE exploitation. However, we should never entirely rely on WAFs and leave the back-end vulnerable, as WAFs can always be bypassed.
Last updated