Code Red: Database Surrogate Needed

Code Red response

 I’ve seen the problem crop up over and over as I walk the earth and train IT professionals. A high visibility project begins to slip behind schedule, all because our developer can’t get access to the necessary network, server, or database.

I’ve thought about it a lot, and talked with many folks who have been in, or impacted by this condition. So on the day when it happened to me, I had a half-baked scheme in mind that Just. Might. Work!

With only three hours to go in a five day class, it turned out that the database we needed for the next experiment was down hard. It wasn’t coming back right away, and it was at the center of the experiment that was intended to wrap up the class.

[This article is about development using Anypoint Studio and DataWeave. It’s not the only thing we talk about here, but below the fold, it’s a lot of data and code. So you know…]

The students were ready for a break, so we called a 15 minute pause, and I got right to work simulating the database. I would have them write the access functionality along with me when they returned, so I needed only a reasonable data sample. The clock began to tick.

The classroom experiment requires a simple database connection, and an SQL query that returns account records with a specific and unique postal code.

They would then write a program to select only new entries with that postal code. With the program running then, we would submit a new record with the qualifying postal code, and our program should find it and print it to the log.

It’s a simple watermarking demonstration designed to illustrate features of the development platform along with some DataWeave concepts that turn out to be useful in many programs.

To retrieve the records, the program will use an SQL statement like this:

SELECT * FROM accounts WHERE postal = 99999

We would need a substitute for the database table. It would be a simple matter to write DataWeave that creates a number of mock database account records. These could be preserved in a local file.

We had a small JSON sample of the model customer record. So we could use that sample to synthesize a small set of records, with a mild variation in each one; that would be our necessary dataset. A sample record from the table looks like this in JSON:

{
    accountID: 5864,
    country: "United States",
    street: "7963 Congress Street",
    state: "Iowa",
    name: "Mimosa Mule",
    city: "Austin",
    postal: 15431
  }

So we need a record set that contains records, but where none contain the postal code used in the query (ie. 99999). One way to go about this would be to synthesize the original records such that the postal code is between 10000 and 79999. This is a good job for the randomInt() function.

The DataWeave expression would look something like this:

var postal = 10000 + randomInt(70000)

The randomInt() function generates a result between 0 and the value passed in the function call. So this expression will give us something in the range between 10000 and 79999. That will do nicely. We can take a similar approach to generating mock Account IDs.

To synthesize the address, we could use a function that selects randomly from an array of Customer Names, Street, City, and State names.

Here’s an example of what we might try in order to generate a random city name:

fun getCity() = ["Denver","Austin","Portland"][randomInt(3)]

We can use nearly identical functions to generate the other values. In each case, we supply some static potential values, and then use the randomInt() function to choose from among them.

Here is some code that would do the trick:

fun makeAccount() = do {
        // static boundary set in indexers
        // update if you alter the input sequence
	fun getCity() = ["Denver","Austin","Portland"][randomInt(3)]
	fun getStreet() = ["Oak","Main","Congress","Willow"][randomInt(4)]
	fun getState() = ["Maine","Iowa","Utah","Ohio"][randomInt(4)]
	fun getName() = ["Minnie","Mike","Akshay","Mo","Ted"][randomInt(5)]
	 ---
	{
            "accountID": 5000 + randomInt(999),
            "country": "United States",
            "street": "$(10 + randomInt(7989)) $(getStreet()) Street",
            "state": getState(),
            "name": "$(getName()) Mule",
            "city": getCity(),
            "postal": 9999 + randomInt(60001)
	}
}

The function uses the do{} construct to localize a set of setter functions, each of which will choose a random item from its embedded array. (A note: There is no way the code I wrote in 10 minutes looked that pretty and this version still leaves something to be desired. The inelegance here is that each setter function chooses a value from a static array. There is a better way to handle this and we’ll talk about it near the end.)

The accountID field, and the postal field use the approach we just discussed. The street field is populated by using a similar expression for the street number (get a number between 10 and 7999) which then has a street name appended using the getStreet() function.

The City, State, and Name fields are all populated by functions similar to one another.

Now, to get multiple records, we can call this function some number of times. A very simple way to do this is with the to operator and the map() function. It looks like this:

1 to 15 map makeAccount()

With that ready, I ran the procedure and had my 15 records for my mock database. I wrote the result out as JSON data into a local file. It was just in time, because the students were back and eager to get on with the experiment. We were ready to dive into the job!

Put that all together, and here’s what a portion of the output looks like:

[
  {
    accountID: 5993,
    country: "United States",
    street: "1575 Congress Street",
    state: "Iowa",
    name: "Marvin Mule",
    city: "Denver",
    postal: 11024
  }, 
  {
    accountID: 5149,
    country: "United States",
    street: "4273 Willow Street",
    state: "Iowa",
    name: "Michael Mule",
    city: "Portland",
    postal: 65170
  }, ... 

The next article in this series will consider the approach we had to take from this point. But there’s something I want to explore before we go to that.

The symmetry in the internal functions defined for this exhibit far too much repetition for my taste.

It is possible to collapse all of those functions into one. That offers the opportunity to factor out what is quite simply a hack in those functions. (Not the good kind of hack, but instead the kind someone uses when they prune a rosebush with a machete.)

Rather than propose a solution here, I thought we might open the floor for suggestions. Can you see one that cleans up this code considerably? Use the comments to suggest it here and I will write about some of the best ideas in a future article.

If you want to learn more about DataWeave and Mule application development, check out the courses at Trailhead Academy. There are self-study and instructor led courses. And the DataWeave Playground will lead you to an immediate and terse tutorial.

By agentv

Chief Commentator

Leave a Reply

Your email address will not be published. Required fields are marked *