I was at the 2025 GSUK, and someone asked me this (amongst other questions). I had a think about it and came up with….
The unhelpful answer
Well it is obvious; what did you expect?
A better answer.
There are many reasons…. and some are not obvious.
General performance concepts
- A piece of work is either using CPU or waiting.
CPU
- Work can use CPU
- A transaction may be delayed from starting. For example WLM says other work is more important that yours.
- Once your transaction has started, and has issued a requests, such as an I/O request. When the request has finished – your task may not be re-dispatched immediately because other work has a higher priority – other work is dispatched to keep to the WLM system goals.
I remember going to a presentation ab…
I was at the 2025 GSUK, and someone asked me this (amongst other questions). I had a think about it and came up with….
The unhelpful answer
Well it is obvious; what did you expect?
A better answer.
There are many reasons…. and some are not obvious.
General performance concepts
- A piece of work is either using CPU or waiting.
CPU
- Work can use CPU
- A transaction may be delayed from starting. For example WLM says other work is more important that yours.
- Once your transaction has started, and has issued a requests, such as an I/O request. When the request has finished – your task may not be re-dispatched immediately because other work has a higher priority – other work is dispatched to keep to the WLM system goals.
I remember going to a presentation about WLM when it was first available. Customers were “complaining” because batch work was going through faster when WLM was enabled. CICS transactions used to complete in half a second, but the requirement was 1 second. They now take 1 second (no one noticed) – and batch is doing more work.
Your transaction may be doing more work.
For example in PP, you only read one record from the (small) database. The production database may be much larger, and the data is not in memory. This means it takes longer to get a record. In production, you may have to process more records – which adds to the amount of work done.
Your database may not be optimally configured. In one customer incident, a table was (mis) configured so it did a sequential scan of up to 100 records to find the required record. In production there were 1000’s of records to scan to find the required record; increasing the processing time by a factor of 10. They defined an index and cured the problem.
Waiting
There are many reason for an application to wait. For example
Latches
A latch is a serialisation mechanism for very short duration activities (microseconds). For example if a thread wants to GETMAIN a block of storage, the system gets the address space latch (lock) updates the few storage pointers, and releases the latch. The more thread running in the address space, and the more storage requests they issue the more chance of to threads trying to get the latch at the same time and so tasks may have to wait. At the deep hardware level the storage may have to be access from different CPUs… and so data moves 1 meter or so, and so this is a slow access.
Application IO
An application can read/write a record from a data set, a file, or the spool. There may be serialisation to the resource to ensure only one thread can use the resource.
Also if there are a limited number of connections from the processor to the disk controller, higher priority work may be scheduled before your work.
Database logging delays
If your application is using DB2, IMS, or MQ, these process requests from many address spaces and threads.
As part of transactional work, data is written to the log buffers.
At commit time, the data for the thread is written out to disk. Once the data is written successfully the commit can return.
There are several situations.
- If the data has already been written – do not wait; just return “OK”
- There is no log I/O active in the subsystem. Start the I/O and wait for completion. The duration is one I/O.
- There is currently an I/O in progress. Keep writing data to a buffer. When the I/O completes, start the next I/O with data from the buffer. On average your task waits half an I/O time while the previous I/O completes, then the I/O time. The duration is 1.5 I/O time
- As the system gets busier more data is written in each I/O. This means each I/O takes longer – and the previous wait takes longer.
- There is so much data in the log buffer, that several log writes are needed before the last of your data is successfully written. The duration is multiple long I/O rquests.
This means that other jobs running on the system, using the same DB2, IMS or MQ will impact the time to write the data to the subsystem log, and so impact your job.
Database record delays
If you have two threads wanting to update the same database record, then there will be a data lock from the time gets the record for update, to the end of the commit time. Another task wanting that record will have to wait for the first task to finish. Of course on a busy system, the commits will take longer, as described above.
What can make it worse is when a transactions gets a record for update (so locking it) and then issues a remote request, for example over TCPIP, to another server. The record lock is held for the duration of this request, and the commit etc.
Network traffic
If your transaction is using remote servers, this can take a significant time
- Establishing a connection to the remote server.
- Establishing the TLS session. This can take 3 flows to establish a secure session.
- Transferring the data. This may involve several blocks of data sent, and several blocks received.
- Big blocks of data can take longer to process.
- The network traffic depends on all users of the network, so you may have production data going to the remote site. On the PP you have a local, closer, server.
Waiting for the human input.
For example prompting for account number and name.
Yes, but the pre-production is not busy!
This is where you have to step back and look at the bigger picture.
Typically the physical CPU is partitioned into many LPARS. You may have 2 production LPARS, and one pre-production (PP) LPARs. The box has been configured so that production gets priority over PP.
CPU
Although your work is top of the queue for execution on your LPAR , the LPAR is not given any CPU because the production LPARs have priority. When they do not need CPU, the PP gets to use the CPU and your work runs (or not depending on other work)
IO
There may be no other task on your LPAR using the device, so there is no delays in the LPAR issuing the I/O request to the disk controller. However other work may be running on other LPARs, and so there is contention from the storage controller down to the storage.
Overall
So not such an obvious answer after all!
Published November 7, 2025