Extract Numbers and Words from Strings with DataWeave | MuleSoft Guide

 Introduction

Data transformation is a core task in application integration. MuleSoft's powerful DataWeave language simplifies this process, allowing you to manipulate data in various formats with concise scripts. This post will guide you through a common scenario: converting an alphanumeric string into a well-structured JSON object by extracting specific sequences of numbers and letters.


Understanding the Core DataWeave Script

The goal is to process a string like "a12b34c56d" and produce a JSON output such as {"12": "abc", "34": null, "56": null}. The initial script achieves this through a series of steps.

First, it imports essential string functions from the DataWeave core library: https://docs.mulesoft.com/dataweave/2.4/dw-core-functions-string.

Declaring the Number and Character Variables

The script uses two main variables, num and char, to hold the extracted values.

  • The num Variable: This variable extracts pairs of digits from the input string.
    • The filter function iterates over each character, keeping only those that match the digit pattern [0-9].
    • The scan function then groups these filtered digits into an array of two-digit numbers using the regex [0-9]{2}.
    • Finally, flatten ensures the result is a simple array.
  • The char Variable: This variable extracts groups of three letters using the same logic but with different patterns.
    • filter keeps only alphabetical characters ([a-zA-Z]).
    • scan groups them into an array of three-letter sequences ([a-zA-Z]{3}).
  • Building the Final JSON Output

The script constructs the JSON object by mapping over the num array. For each number in the array, it creates a key-value pair where the number itself is the key, and the corresponding value is the element at the same index in the char array. If there is no corresponding element, the value is null.

Handling Edge Cases with an Improved Script

A minor issue with the initial script is that it ignores single digits or letter groups smaller than the specified size. For example, in a string like "01020304056INDAUSENGUSACHNUK", the "6" and "UK" would be lost.

The improved script addresses this by making the scan patterns more flexible. Replacing [0-9]{2} with [0-9]{1,2} allows it to capture both one and two-digit numbers. Similarly, changing [a-zA-Z]{3} to [a-zA-Z]{1,3} allows it to capture one, two, or three-letter groups. This ensures no data is left behind. According to MuleSoft documentation on pattern matching, this quantifier approach is efficient for such use cases: https://docs.mulesoft.com/dataweave/2.4/dataweave-pattern-matching.

Real-World Example

Imagine processing international sport event codes where numbers represent event IDs and three-letter codes represent countries. The improved script perfectly handles this.

Input String: "01020304056INDAUSENGUSACHNUK"

DataWeave Output:

{
  "01": "IND",
  "02": "AUS",
  "03": "ENG",
  "04": "USA",
  "05": "CHN",
  "6": "UK"
}

Alternative Implementation

While the example uses the matches function for clarity, you can achieve the same extraction directly with the scan function on the original payload without first filtering individual characters. This can sometimes lead to more concise code.

Sources

  • https://docs.mulesoft.com/dataweave/2.4/dw-core-functions-string
  • https://docs.mulesoft.com/dataweave/2.4/dataweave-pattern-matching
  • https://help.mulesoft.com/s/article/How-to-use-DataWeave-scan-function-for-string-matching

Conclusion

With just a few lines of DataWeave code, you can efficiently parse complex alphanumeric strings and convert them into structured JSON. This flexibility is key for handling diverse data formats in your MuleSoft integration projects. By understanding functions like scanfilter, and map, you unlock powerful data transformation capabilities.

Post a Comment (0)
Previous Post Next Post