Introduction
Data transformation is a core task in application integration. MuleSoft's powerful DataWeave language simplifies this process, allowing you to manipulate data in various formats with concise scripts. This post will guide you through a common scenario: converting an alphanumeric string into a well-structured JSON object by extracting specific sequences of numbers and letters.
Understanding the Core DataWeave Script
The goal is to process a string like "a12b34c56d" and produce a JSON output such as {"12": "abc", "34": null, "56": null}. The initial script achieves this through a series of steps.
First, it imports essential string functions from the DataWeave core library: https://docs.mulesoft.com/dataweave/2.4/dw-core-functions-string.
Declaring the Number and Character Variables
The script uses two main variables, num and char, to hold the extracted values.
- The
numVariable: This variable extracts pairs of digits from the input string. - The
filterfunction iterates over each character, keeping only those that match the digit pattern[0-9]. - The
scanfunction then groups these filtered digits into an array of two-digit numbers using the regex[0-9]{2}. - Finally,
flattenensures the result is a simple array. - The
charVariable: This variable extracts groups of three letters using the same logic but with different patterns. filterkeeps only alphabetical characters ([a-zA-Z]).scangroups them into an array of three-letter sequences ([a-zA-Z]{3}).
- Building the Final JSON Output
The script constructs the JSON object by mapping over the num array. For each number in the array, it creates a key-value pair where the number itself is the key, and the corresponding value is the element at the same index in the char array. If there is no corresponding element, the value is null.
Handling Edge Cases with an Improved Script
A minor issue with the initial script is that it ignores single digits or letter groups smaller than the specified size. For example, in a string like "01020304056INDAUSENGUSACHNUK", the "6" and "UK" would be lost.
The improved script addresses this by making the scan patterns more flexible. Replacing [0-9]{2} with [0-9]{1,2} allows it to capture both one and two-digit numbers. Similarly, changing [a-zA-Z]{3} to [a-zA-Z]{1,3} allows it to capture one, two, or three-letter groups. This ensures no data is left behind. According to MuleSoft documentation on pattern matching, this quantifier approach is efficient for such use cases: https://docs.mulesoft.com/dataweave/2.4/dataweave-pattern-matching.
Real-World Example
Imagine processing international sport event codes where numbers represent event IDs and three-letter codes represent countries. The improved script perfectly handles this.
Input String: "01020304056INDAUSENGUSACHNUK"
DataWeave Output:
{ "01": "IND", "02": "AUS", "03": "ENG", "04": "USA", "05": "CHN", "6": "UK" }
Alternative Implementation
While the example uses the matches function for clarity, you can achieve the same extraction directly with the scan function on the original payload without first filtering individual characters. This can sometimes lead to more concise code.
Sources
- https://docs.mulesoft.com/dataweave/2.4/dw-core-functions-string
- https://docs.mulesoft.com/dataweave/2.4/dataweave-pattern-matching
- https://help.mulesoft.com/s/article/How-to-use-DataWeave-scan-function-for-string-matching
Conclusion
With just a few lines of DataWeave code, you can efficiently parse complex alphanumeric strings and convert them into structured JSON. This flexibility is key for handling diverse data formats in your MuleSoft integration projects. By understanding functions like scan, filter, and map, you unlock powerful data transformation capabilities.