Discussion:
Parse vs Substr (was REPLACE function in REXX)
(too old to reply)
Robert Zenuk
2016-10-12 23:16:23 UTC
Permalink
Raw Message
Well, I'm back from a long vacation (I left around the time this thread was starting) and catching up now (I go back to work next week).

While reading what I missed, I saw several things to follow-up on (mostly what I thought would be personal research). The first item (probably to start more discussion) to cover is the age old discussion of the benefits of PARSE over SUBSTR and other string functions...

While actually reviewing my CHGSTR function and the PARSE solution also suggested, I wrote a similar CHGSTR using PARSE. They seemed to perform about the same so I decided to benchmark them. I wrote a batch test harness to run the 2 execs using the same input multiple times and recorded the elapsed time (and step joblog stats). The 2 versions ran about the same. My original version avoids the recursion issue possible in the PARSE solution, so I'm staying with my original. However, after seeing the post with the 18 year old link, it got me thinking about a basic SUBSTR vs PARSE benchmark...

The test harness (TESTREXX) is very simple... It runs an EXEC 'n' times traps the output and simply reports elapsed time and CPU time...

parse arg count execname
x = outtrap(x.)
y = time(r)
b = sysvar('SYSCPU')
do count
execname
end
x = outtrap(off)
e = sysvar('SYSCPU')
cpu = e - b
say 'Elapsed:' time('e') 'CPU:' cpu right(count,10) execname

The 2 EXEC's I benchmarked were VERY simple.

PTEST1 (uses substr)

string = 'word1 word2 word3 word4'
w1 = substr(string,1,5)
w2 = substr(string,7,5)
w3 = substr(string,13,5)
w4 = substr(string,19,5)
say w1||w2||w3||w4

PTEST2 (uses parse)

string = 'word1 word2 word3 word4'
parse var string w1 w2 w3 w4 .
say w1||w2||w3||w4

Here is the JCL I ran (1, 10, 100, 1,000, 10,000 and 100,000 executions of each)

//jobcard...
//TESTREXX PROC COUNT=,EXECNAME=
//TESTREXX EXEC PGM=IKJEFT01,PARM='TESTREXX &COUNT &EXECNAME'
//SYSEXEC DD DSN=OPSROZ.EXEC,DISP=SHR
//SYSTSPRT DD SYSOUT=*
//SYSTSIN DD DUMMY
// PEND
//P11 EXEC TESTREXX,COUNT=1,EXECNAME=PTEST1
//P110 EXEC TESTREXX,COUNT=10,EXECNAME=PTEST1
//P1100 EXEC TESTREXX,COUNT=100,EXECNAME=PTEST1
//P11000 EXEC TESTREXX,COUNT=1000,EXECNAME=PTEST1
//P110000 EXEC TESTREXX,COUNT=10000,EXECNAME=PTEST1
//P1100000 EXEC TESTREXX,COUNT=100000,EXECNAME=PTEST1
//P21 EXEC TESTREXX,COUNT=1,EXECNAME=PTEST2
//P210 EXEC TESTREXX,COUNT=10,EXECNAME=PTEST2
//P2100 EXEC TESTREXX,COUNT=100,EXECNAME=PTEST2
//P21000 EXEC TESTREXX,COUNT=1000,EXECNAME=PTEST2
//P210000 EXEC TESTREXX,COUNT=10000,EXECNAME=PTEST2
//P2100000 EXEC TESTREXX,COUNT=100000,EXECNAME=PTEST2

Here are the results... On an idle z13 running z/OS 2.2

STEPNAME PROCSTEP PROGRAM RC EXCP CPU SRB CLOCK SERV
P11 TESTREXX IKJEFT01 00 16 .01 .00 .08 487
P110 TESTREXX IKJEFT01 00 44 .01 .00 .05 613
P1100 TESTREXX IKJEFT01 00 314 .03 .00 .27 2146
P11000 TESTREXX IKJEFT01 00 3014 .17 .01 2.60 14404
P110000 TESTREXX IKJEFT01 00 30014 1.77 .15 27.02 147K
P1100000 TESTREXX IKJEFT01 00 300K 17.70 1.47 270.80 1466K
P21 TESTREXX IKJEFT01 00 17 .01 .00 .04 378
P210 TESTREXX IKJEFT01 00 44 .01 .00 .06 515
P2100 TESTREXX IKJEFT01 00 314 .02 .00 .33 1743
P21000 TESTREXX IKJEFT01 00 3014 .17 .01 2.86 14455
P210000 TESTREXX IKJEFT01 00 30014 1.63 .14 27.75 136K
P2100000 TESTREXX IKJEFT01 00 300K 16.42 1.39 254.28 1373K

REXX test harness output

Elapsed: 0.003071 CPU: 0 1 PTEST1
READY
END
Elapsed: 0.028208 CPU: 0 10 PTEST1
READY
END
Elapsed: 0.246308 CPU: 0.02 100 PTEST1
READY
END
Elapsed: 2.575580 CPU: 0.17 1000 PTEST1
READY
END
Elapsed: 26.993276 CPU: 1.87 10000 PTEST1
READY
END
Elapsed: 270.745555 CPU: 18.69 100000 PTEST1
READY
END
Elapsed: 0.003414 CPU: 0 1 PTEST2
READY
END
Elapsed: 0.029568 CPU: 0 10 PTEST2
READY
END
Elapsed: 0.299840 CPU: 0.01 100 PTEST2
READY
END
Elapsed: 2.823942 CPU: 0.17 1000 PTEST2
READY
END
Elapsed: 27.718376 CPU: 1.71 10000 PTEST2
READY
END
Elapsed: 254.228957 CPU: 17.36 100000 PTEST2
READY
END

IMHO for things traditionally done in REXX, it doesn't seem to make a whole lot of difference. While CPU starts to diverge around 1,000 loops (with the nod to PARSE), elapsed time stays consistent up through 10,000 loops (with the nod to SUBSTR) and somewhere between 10,000 and 100,000 PARSE finally edges out SUBSTR on both. This is kind of like the comment style debate a few years ago... Programming style, readability and future maintenance concerns should probably win out in this case as well. If you are running on an old P390, it might matter. On anything from this decade (or century), it probably doesn't.

I probably won't change my programming style with this knowledge. I have been using PARSE so long that I instinctively think of using PARSE when I need to parse something. Having said that, I suspect when I wrote CHGSTR (20? years ago), I must have stumbled on the recursion issue and decided it was easier to avoid using pos, insert, delstr than with parse.

Here is my original CHGSTR with some tightening suggested by Walter.

chgstr: procedure
if arg() <> 3 then
return 'chgstr: missing args, must have string, new and old'
parse arg string, new, old
lnew = length(new)
lold = length(old)
x = 1
do forever
if pos(old,string,x) = 0 then return string
x = pos(old,string,x)
string = insert(new,delstr(string,x,lold),x-1,lnew)
x = x + lnew
end

Here is the PARSE version susceptible to recursion

chgstr: procedure
if arg() <> 3 then
return 'CHGSTR: missing args, must have STRING, NEW and OLD'
parse arg string, new, old
if pos(old,new) > 0 then
return 'CHGSTR: recursion, OLD can''t be a substring of NEW'
do while pos(old,string) > 0
parse value string with before (old) after
string = before||new||after
end
return string


My two cents,

Rob

----------------------------------------------------------------------
For TSO-REXX subscribe / signoff / archive access instructions,
send email to ***@VM.MARIST.EDU with the message: INFO TSO-REXX
Gerard Schildberger
2016-10-13 18:46:15 UTC
Permalink
Raw Message
Post by Robert Zenuk
Well, I'm back from a long vacation (I left around the time this thread was starting) and catching up now (I go back to work next week).
While reading what I missed, I saw several things to follow-up on (mostly what I thought would be personal research). The first item (probably to start more discussion) to cover is the age old discussion of the benefits of PARSE over SUBSTR and other string functions...
While actually reviewing my CHGSTR function and the PARSE solution also suggested, I wrote a similar CHGSTR using PARSE. They seemed to perform about the same so I decided to benchmark them. I wrote a batch test harness to run the 2 execs using the same input multiple times and recorded the elapsed time (and step joblog stats). The 2 versions ran about the same. My original version avoids the recursion issue possible in the PARSE solution, so I'm staying with my original. However, after seeing the post with the 18 year old link, it got me thinking about a basic SUBSTR vs PARSE benchmark...
The test harness (TESTREXX) is very simple... It runs an EXEC 'n' times traps the output and simply reports elapsed time and CPU time...
parse arg count execname
x = outtrap(x.)
y = time(r)
b = sysvar('SYSCPU')
do count
execname
end
x = outtrap(off)
e = sysvar('SYSCPU')
cpu = e - b
say 'Elapsed:' time('e') 'CPU:' cpu right(count,10) execname
The 2 EXEC's I benchmarked were VERY simple.
PTEST1 (uses substr)
string = 'word1 word2 word3 word4'
w1 = substr(string,1,5)
w2 = substr(string,7,5)
w3 = substr(string,13,5)
w4 = substr(string,19,5)
say w1||w2||w3||w4
PTEST2 (uses parse)
string = 'word1 word2 word3 word4'
parse var string w1 w2 w3 w4 .
say w1||w2||w3||w4
Here is the JCL I ran (1, 10, 100, 1,000, 10,000 and 100,000 executions of each)
//jobcard...
//TESTREXX PROC COUNT=,EXECNAME=
//TESTREXX EXEC PGM=IKJEFT01,PARM='TESTREXX &COUNT &EXECNAME'
//SYSEXEC DD DSN=OPSROZ.EXEC,DISP=SHR
//SYSTSPRT DD SYSOUT=*
//SYSTSIN DD DUMMY
// PEND
//P11 EXEC TESTREXX,COUNT=1,EXECNAME=PTEST1
//P110 EXEC TESTREXX,COUNT=10,EXECNAME=PTEST1
//P1100 EXEC TESTREXX,COUNT=100,EXECNAME=PTEST1
//P11000 EXEC TESTREXX,COUNT=1000,EXECNAME=PTEST1
//P110000 EXEC TESTREXX,COUNT=10000,EXECNAME=PTEST1
//P1100000 EXEC TESTREXX,COUNT=100000,EXECNAME=PTEST1
//P21 EXEC TESTREXX,COUNT=1,EXECNAME=PTEST2
//P210 EXEC TESTREXX,COUNT=10,EXECNAME=PTEST2
//P2100 EXEC TESTREXX,COUNT=100,EXECNAME=PTEST2
//P21000 EXEC TESTREXX,COUNT=1000,EXECNAME=PTEST2
//P210000 EXEC TESTREXX,COUNT=10000,EXECNAME=PTEST2
//P2100000 EXEC TESTREXX,COUNT=100000,EXECNAME=PTEST2
Here are the results... On an idle z13 running z/OS 2.2
STEPNAME PROCSTEP PROGRAM RC EXCP CPU SRB CLOCK SERV
P11 TESTREXX IKJEFT01 00 16 .01 .00 .08 487
P110 TESTREXX IKJEFT01 00 44 .01 .00 .05 613
P1100 TESTREXX IKJEFT01 00 314 .03 .00 .27 2146
P11000 TESTREXX IKJEFT01 00 3014 .17 .01 2.60 14404
P110000 TESTREXX IKJEFT01 00 30014 1.77 .15 27.02 147K
P1100000 TESTREXX IKJEFT01 00 300K 17.70 1.47 270.80 1466K
P21 TESTREXX IKJEFT01 00 17 .01 .00 .04 378
P210 TESTREXX IKJEFT01 00 44 .01 .00 .06 515
P2100 TESTREXX IKJEFT01 00 314 .02 .00 .33 1743
P21000 TESTREXX IKJEFT01 00 3014 .17 .01 2.86 14455
P210000 TESTREXX IKJEFT01 00 30014 1.63 .14 27.75 136K
P2100000 TESTREXX IKJEFT01 00 300K 16.42 1.39 254.28 1373K
REXX test harness output
Elapsed: 0.003071 CPU: 0 1 PTEST1
READY
END
Elapsed: 0.028208 CPU: 0 10 PTEST1
READY
END
Elapsed: 0.246308 CPU: 0.02 100 PTEST1
READY
END
Elapsed: 2.575580 CPU: 0.17 1000 PTEST1
READY
END
Elapsed: 26.993276 CPU: 1.87 10000 PTEST1
READY
END
Elapsed: 270.745555 CPU: 18.69 100000 PTEST1
READY
END
Elapsed: 0.003414 CPU: 0 1 PTEST2
READY
END
Elapsed: 0.029568 CPU: 0 10 PTEST2
READY
END
Elapsed: 0.299840 CPU: 0.01 100 PTEST2
READY
END
Elapsed: 2.823942 CPU: 0.17 1000 PTEST2
READY
END
Elapsed: 27.718376 CPU: 1.71 10000 PTEST2
READY
END
Elapsed: 254.228957 CPU: 17.36 100000 PTEST2
READY
END
IMHO for things traditionally done in REXX, it doesn't seem to make a whole lot of difference. While CPU starts to diverge around 1,000 loops (with the nod to PARSE), elapsed time stays consistent up through 10,000 loops (with the nod to SUBSTR) and somewhere between 10,000 and 100,000 PARSE finally edges out SUBSTR on both. This is kind of like the comment style debate a few years ago... Programming style, readability and future maintenance concerns should probably win out in this case as well. If you are running on an old P390, it might matter. On anything from this decade (or century), it probably doesn't.
I probably won't change my programming style with this knowledge. I have been using PARSE so long that I instinctively think of using PARSE when I need to parse something. Having said that, I suspect when I wrote CHGSTR (20? years ago), I must have stumbled on the recursion issue and decided it was easier to avoid using pos, insert, delstr than with parse.
----- snipped -----
Post by Robert Zenuk
My two cents,
Rob
What you are largely measuring is the construction of the
"string" REXX variable, the concatenation of four REXX
variables, and the displaying of them (via the SAY instruction).

(Alas, I no longer have easy access to a TSO system.)
I simplified the whole she-bang and came up with:

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
/*REXX program to test speed of SUBSTR versus PARSE. */
do 3
call subs 1000
call subs 10000
call subs 100000
call subs 1000000
call subs 10000000
say
call pars 1000
call pars 10000
call pars 100000
call pars 1000000
call pars 10000000
say ................................................
end
exit
/*-----------------------------------------------------------------*/
subs: parse arg cnt; string= 'word1 word2 word3 word4'
call time 'Reset'
do cnt
w1=substr(string, 1, 5)
w2=substr(string, 7, 5)
w3=substr(string, 13, 5)
w4=substr(string, 19, 5)
end
say right(cnt,9) 'SUBSTRs took' format(time('E'),,2) "seconds."
return
/*-----------------------------------------------------------------*/
pars: parse arg cnt; string= 'word1 word2 word3 word4'
call time 'Reset'
do cnt
parse var string w1 w2 w3 w4
end
say right(cnt,9) 'PARSEs took' format(time('E'),,2) "seconds."
return
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@


... and the output for the (above) REXX program is:

$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
1000 SUBSTRs took 0.00 seconds.
10000 SUBSTRs took 0.02 seconds.
100000 SUBSTRs took 0.09 seconds.
1000000 SUBSTRs took 0.83 seconds.
10000000 SUBSTRs took 9.24 seconds.

1000 PARSEs took 0.00 seconds.
10000 PARSEs took 0.00 seconds.
100000 PARSEs took 0.02 seconds.
1000000 PARSEs took 0.22 seconds.
10000000 PARSEs took 2.15 seconds.
................................................
1000 SUBSTRs took 0.02 seconds.
10000 SUBSTRs took 0.00 seconds.
100000 SUBSTRs took 0.09 seconds.
1000000 SUBSTRs took 0.83 seconds.
10000000 SUBSTRs took 9.17 seconds.

1000 PARSEs took 0.00 seconds.
10000 PARSEs took 0.00 seconds.
100000 PARSEs took 0.03 seconds.
1000000 PARSEs took 0.22 seconds.
10000000 PARSEs took 2.14 seconds.
................................................
1000 SUBSTRs took 0.00 seconds.
10000 SUBSTRs took 0.00 seconds.
100000 SUBSTRs took 0.09 seconds.
1000000 SUBSTRs took 0.83 seconds.
10000000 SUBSTRs took 9.19 seconds.

1000 PARSEs took 0.00 seconds.
10000 PARSEs took 0.00 seconds.
100000 PARSEs took 0.02 seconds.
1000000 PARSEs took 0.22 seconds.
10000000 PARSEs took 2.15 seconds.
................................................

$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$


Note that this version uses the elapsed time, not the CPU
time, albeit on a quiet PC (using Regina 3.9.1 running
under Windows 7). Note that the SUBSTR is about 4.3
times slower (of course, this is only for a particular
REXX interpreter, your mileage may vary depending on
road conditions and heaviness of foot).

It should also be be noted that the "extra" period on the
(original) PARSE statement also costs CPU time (but not
that much, but it is measurable).

_________________________________________ Gerard Schildberger

Loading...