International Prostate Symptom Score (IPSS) Overestimates the Treatment Efficacy
---Evaluation of IPSS by 24-hour Uroflowmetry---



AKIRA KIMURA1), SHIGEHARU KURIMOTO1), YOSHIO HOSAKA1), TADAICHI KITAMURA1), and SHOHEI NAKAMURA2)


Department of Urology, Branch Hospital, Faculty of Medicine, the
University of Tokyo1)
Department of Urology, Jichi Medical School2)


running head : Evaluation of IPSS by 24-hour Uroflowmetry


Key words Benign prostatic hyperplasia, IPSS, uroflowmetry, 24-hour uroflowmetry






Abstract


The reliability of changes in IPSS following TURP was evaluated utilizing 24-hour uroflowmetry. In 11 patients with BPH who underwent TURP, all flow curves and urination data during hospitalization were recorded. Objective scores for frequency and nocturia were obtained from the time recorded. An objective score for intermittency was calculated from the uroflow curves. An objective change in the scores for weak stream was calculated from the distribution of peak flow rates before and after TURP for each patient. An objective change in the scores for hesitancy was calculated from the distribution of hesitation times. The changes of symptom scores for frequency, nocturia, intermittency, weak stream, and hesitancy were -0.82, -0.6, -1.5, -2.64, and -1.45 respectively, while changes in the corresponding objective scores were +0.27, +0.64, -1, -2.4, and -0.73, respectively. There was a tendency to overestimate efficacy if IPSS was used to evaluate the treatment.






Introduction

To compare new and old technologies for the treatment of BPH, reliable means of assessing BPH symptoms are necessary. For this purpose, a symptom index for BPH was developed and validated by a multidisciplinary measurement committee of the American Urological Association (AUA)1). The final AUA symptom index includes 7 questions covering frequency, nocturia, intermittency, weak urinary stream, hesitancy, incomplete emptying, and urgency. Each question can be answered on a scale of 0 (symptom never present) to 5 (symptom always present) and the total score, therefore, ranges from 0 to 35 points. The AUA symtom index has been adopted by the World Health Organization International Consultation on BPH as an important part of the evaluation of men with prostatism2). The consultation labeled it the international prostate symptom score (IPSS). It has been translated into several languages including Japanese by the consultation.
IPSS is now often used to provide data for comparisons of treatment effectiveness3), because patients have no serious problems with understanding the questions. However, it has not been proved by objective means whether patients answer the questions correctly.
A system for recording 24-hour uroflowmetry was developed by Nakamura et al4). With the system, flow curves and urination data are recorded on an integrated circuit card, which each patient carries in his pocket. The card can record up to 14 days(IU(B worth of urination data. A personal computer then calculates and prints all the flow curves. Using data recorded by 24-hour uroflowmetry, the accuracy of IPSS was evaluated in patients with BPH who underwent TURP.


Patients and Methods

Between November 1994 and June 1995, 21 patients underwent TURP in the University of Tokyo Branch Hospital. Histological examination of the resected prostatic tissues revealed an adenocarcinoma in one patient. Nine patients had complete urinary retention preoperatively. These 10 patients were excluded from the evaluation. The remaining 11 patients with BPH underwent 24-hour uroflowmetry during hospitalization. The mean age of the 11 patients was 65 years (range 57-78).
For 24-hour uroflowmetry, flow curves and urination data were recorded on an integrated circuit card, which each patient carried in his pocket. Before urinating, the patient inserted the card into the system. Urination data were recorded automatically. A personal computer then calculated and printed all the flow curves including peak flow rate, voided volume, hesitation time, and date and time of urination (Fig.1).
The patients were admitted 1 to 10 days (mean 5 days) before operation. For each patient, 8 to 77 (mean 33) urinations were recorded preoperatively. The urethral catheters were placed 4 to 7 days (mean 6 days) after operation. Immediately after the catheter was removed, 24-hour uroflowmetry was resumed. The patients answered IPSS 2 days after the removal of catheters. For each patient, 11 to 76 (mean 40) urinations were recorded postoperatively. Preoperative IPSS had been obtained at their last visit to our clinic before hospitalization.
From the date and time of urination recorded on the card, the objective score for nocturia was calculated. The number of urinations between 21:00 and 6:00 was considered as the objective score. When the number was more than 5, the nocturia score was 5. For example, the case whose 24-hour uroflowmetry is shown in Fig.1, urinated 3 times between 21:00 on the11th and 6:00 on the 12th. Therefore, his objective score for nocturia was 3. From the date and time of urination recorded, the objective score for frequency was calculated. The number of urinations at intervals of less than 2 hours was counted. The ratio of the number to total urinations was calculated. If the ratio was 0, then the objective score for frequency was 0. If the ratio was less than 20%, then the score was 1. If the ratio was less than half, the score was 2. If the ratio was about half, then the score was 3. If the ratio was more than half, then the score was 4. If the ratio was almost 100%, then the score was 5. In the case of Fig.1, 10 urinations among 19 urinations (the first urination was excluded) were those at intervals of less than 2 hours. Therefore, his objective score for frequency was 3. From the flow curves, the number of urinations in which the curves were interrupted was counted. The ratio of the number of intermitted urinations to the total number was translated into the objective score as described above. In the case of Fig.1, 10 urinations among 20 were intermitted. Therefore, his objective score for intermittency was 3.
To calculate an objective score for weak stream, it is nessesary to calculate the ratio of the number of urinations with weak stream to the total number of urinations. Thus, it becomes necessary to define weak stream. However, there is no standardized normal limit of peak flow rates, because a peak flow rate varies with voided volume and age5). Though several nomograms are available to adjust the peak flow rate for varying age and voided volume6,7), they are different from each other and there is no accepted standard. Drach et al5) found a linear relationship between peak flow rate and voided volume for volumes greater than 150 ml, while Marshall et al8) found a linear relationship for less than 150 ml.
Because no standardized normal limit of peak flow rate exists, the objective score for weak stream cannot be calculated directly from the distribution of peak flow rates. However, a threshold of peak flow rate under which each patient felt urinary stream as weak can be defined from the distribution of peak flow rates and his symptom score. Figure2a shows the preoperative distribution of peak flow rates of a patient who answered that he almost always had weak stream. Because he was not satisfied with the peak flow rate of 16 ml/sec, it is reasonable to assume that his threshold was above 16 ml/sec. Likewise, the threshold of the patient feeling the weak stream about half the time must be at the median of his distribution of peak flow rates. That of a patient feeling weak stream less than half the time must be in the lower quarter of his distribution of peak flow rates. That of a patient having weak stream less than 1 time in 5 must be in the lower tenth of his distribution of peak flow rates. In this way, the threshold of peak flow rate for weak stream was determined for each patient.
By adopting the threshold thus calculated from the preoperative distribution of peak flow rates to the postoperative one, the postoperative objective score for weak stream was estimated. Figure2b shows the postoperative distribution of peak flow rates of the same patient of Fig.2a. By adopting his threshold of 16 ml/sec, 3 of 10 peak flow rates fall below the threshold. Then, his postoperative objective score for weak stream should be 2 (less than half the time). However, he answered postoperatively that he had weak stream less than 1 time in 5.
Similarly, the threshold of hesitation time for hesitancy was obtained for each patient from the distribution of hesitation times and his preoperative symptom score. Figur3a shows the preoperative distribution of hesitation times of a patient who answered he had hesitation about half the time. Because it is reasonable to assume that his threshold was at the median of his distribution of hesitation times, his threshold was 20 seconds.
By adopting the threshold thus calculated from the preoperative distribution of hesitation times to the postoperative one, the postoperative objective score for hesitancy was estimated. Figure3b shows the postoperative distribution of hesitation times of the same patient of Fig. 3a. By adopting his threshold of 20 seconds, 5 hesitation times among 20 are above the threshold. Then, his postoperative objective score for hesitancy should be 2 (less than half the time). However, he answered postoperatively that he had no hesitation at all.
Concerning the questions about frequency, nocturia, and intermittency, the coincidence between symptom score and objective score was evaluated both for preoperative and for postoperative scores. Whether the changes in scores were significant was also evaluated both for symptom score and objective score.
Concerning the questions about weak stream and hesitancy, the coincidence between postoperative symptom score and postoperative objective score was evaluated. Whether the change in the symptom score was significant and whether the difference between the preoperative symptom score and the postoperative objective score was significant were also evaluated. Because the thresholds for weak stream and hesitancy were defined from the preoperative symptom scores, the preoperative objective scores for weak stream and hesitancy based on these thresholds should be the same as the preoperative symptom scores.
The sum of the 5 scores was calculated for both symptom and objective scores and for both preoperative and postoperative scores. The change of the sum of symptom scores was compared with that of objective scores in each patient.
Statistical analyses were done with Student(IU(Bs paired t-test.



Results

Table 1 lists the changes in the symptom scores and the objective scores for frequency, intermittency, and nocturia. Table 2 lists the mean and standard deviation of the scores and the results of Student(IU(Bs paired t-test. The preoperative symptom score for frequency was slightly higher than the objective score (not significant), while the postoperative symptom score was lower than the corresponding objective score (p<0.01). The symptom score was decreased after TURP (p<0.05), while the objective score was slightly increased postoperatively (not significant).
The preoperative symptom score for intermittency was slightly lower than the objective score (not significant), while the postoperative symptom score was significantly lower than the corresponding objective score (p<0.05). The symptom score was decreased from 2.36 to 0.91 after TURP (p<0.05), while the objective score was decreased from 2.82 to 1.82 (p<0.05).
The preoperative symptom score for nocturia was slightly higher than the objective score (not significant), while the postoperative symptom score was lower than the corresponding objective score (not significant). The symptom score was significantly decreased after TURP (p<0.001), while the objective score was increased postoperatively (not significant).
Table 3 lists the threshold of the peak flow rate for weak stream and that of hesitation time for hesitancy. It was remarkable that the thresholds differed from patient to patient.
Table 4 lists the changes of symptom scores for weak stream and hesitancy after TURP and postoperative objective scores were listed. Table 5 gives the mean and standard deviation of the scores and the results of Student(IU(Bs paired t-test. The postoperative symptom score for weak stream was lower than the postoperative objective score (p<0.05); the symptom score decreased from 3.45 to 0.82 after operation (p<0.001) while the objective score decreased from 3.45 to 1.09 (p<0.001).
The postoperative symptom score for hesitancy was slightly lower than the postoperative objective score (not significant); the symptom score decreased from 2.27 to 0.82 after operation (p<0.05) while the objective score decreased from 2.27 to 1.55 (p<0.05).
The relationships between the sum of 5 symptom scores and that of 5 objective scores are shown in Fig. 4. Both preoperative and postoperative scores were plotted on the graph. Linked points are of the same case and arrows indicate the change following operation. The graph demonstrates that the objective scores were not decreased as expected from the change in symptom scores. The sum of preoperative symptom scores was slightly higher than that of objective scores (not significant), while the sum of postoperative symptom scores was significantly lower than that of objective scores (p<0.01). The sum of symptom scores was decreased from 13.27 to 6.27 after TURP (p<0.001), while that of objective scores was decreased from 12.73 to 9.55 (p<0.01).



Discussion

IPSS includes 7 questions covering frequency, nocturia, intermittency, weak urinary stream, hesitancy, incomplete emptying, and urgency. To avoid redundancy, only those questions that had little correlation with one another but had close correlation with the quality of life were selected from among the 15 initial questions1). To make it easy to answer, questions which were confusing or difficult to answer were changed so that the response frame became clear enough. Six questions, except for hesitancy, were selected under this policy. The question about hesitancy was added afterwards because (IR(Bhesitancy is generally regarded as a fairly classical symptom of BPH(IS(B. The question on hesitancy, however, added little independent information to the set of the other 6 questions.
Because patients have no serious problems in understanding the questions, IPSS is widely used not only in the evaluation of treatment efficacy but also for the diagnosis of BPH. However, as its use becomes more widespread, its limitations become apparent. Barry et al9) reported that within-patient variability in 159 BPH patients ranged from +14 to -10 points. Yalla et al10) reported that the symptom score was not significantly correlated with the severity of obstruction.
It has not been validated by objective means whether patients answer the questions correctly. By using data recorded by 24-hour uroflowmetry4), the accuracy of IPSS was evaluated in patients who underwent TURP.
The preoperative symptom score for frequency and nocturia were higher than the objective scores, while the postoperative symptom scores were lower than the objective scores. As a result, the symptom scores were decreased after operation, though the objective scores were rather higher postoperatively. The changes of these scores did not reflect the actual changes of urinary condition.
Both preoperative and postoperative symptom scores for intermittency were lower than the objective scores. To calculate the objective score, we defined any urination in which the flow curve was discontinued as interrupted, even when the voided volume after interruption was very small. Some patients did not feel these interrupted urinations as being stopped and restarted. Because the change of the symptom score was in the same degree as that of the objective score, this symptom score seemed reliable in evaluating treatment efficacy.
Postoperative symptom scores for weak stream were lower than postoperative objective scores in 3 cases. As a result, the degree of the improvement of symptom scores was greater than that estimated objectively in these 3 cases.
Postoperative symptom scores for hesitacy were lower than postoperative objective scores in 6 cases and higher only in 1 case. The symptom scores for hesitancy had also a tendency to overestimate the degree of improvement.
It was demonstrated that 5 questions from IPSS, which could be evaluated by 24-hour uroflowmetry, overestimated the treatment efficacy. The actual improvement of urination by operation was less than expected from IPSS.
Because 24-hour uroflowmetry requires that the patient be hospitalized4), postoperative data were obtained in a few days just after the urethral catheter was removed. At this timing, frequency and nocturia are generally exaggerated because of the bladder irritability, though the urinary stream becomes strong just after operation. Originally, IPSS was designed to ask about symptoms averaged over the past month or so1). Therefore, IPSS may be more appropriately used for evaluating treatment efficacy 3 or 6 months after the operation. Consequently, the results obtained in this study are limited to the usefullness of IPSS in the evaluation of treatment efficacy.
However, it is noteworthy that patients answered that both frequency and nocturia were improved even though both were actually worsened. Only one patient (SK) whose peak flow rate was not impoved by operation answered that frequency was not improved either. Patients seemed to answer every question favorably, when they were satisfied with a strong urinary stream.
Golomb et al11) reported a device for home uroflowmetry. By using such a device, 24-hour uroflowmetry 3 to 6 months after TURP will become available. Frequency will probably show real improvement 3 to 6 months after operation. The difference between symptom scores and objective scores for frequency and nocturia might be much smaller than that obtained in this study. Twenty-four-hour uroflowmetry 3 to 6 months after the operation will be useful to determine whether IPSS reflects the actual clinical results. The analysis of 24-hour uroflowmetry in late stage may provide useful data in developing reliable symptom indexes for evaluation of postoperative outcomes, which patients can answer correctly. With the improvement of frequency, voided volume will also be increased 3 to 6 months after operation. Because the peak flow rate depends on voided volume5,6,8), responses to questions about frequency and weak stream may be correlated in the late stage. Whether all 7 questions are nessesary in assessing BPH symptoms has not been thoroughly studied. To avoid redundancy, questions adding little independent information should be omitted. The analysis of 24-hour uroflowmetry in the late stage may provide useful data in developing the ideal symptom indexes, in which questions are independent of one another.

Acknowledgements The authors thank Dr. Kazuki Kawabe, Professor of Urology, University of Tokyo, for his comments on our paper.


References

1) Barry, M.J., Fowler, F.J., O'Leary, M.P., et al.: The American Urological Association symptom index for benign prostatic hyperplasia. J. Urol.,148: 1549-1557, 1992.
2) Cockett, A.T., Aso, Y., Denis, L., et al.: Recommendations of the International Consensus Committee. in Proceedings of the international consultation on benign prostatic hyperplasia, p.553-564, 1993.
3) Bdesha, A.S., Bunce, C.J., Snell, M.E., et al.: A sham controlled trial of transurethral microwave therapy with subsequent treatment of the control group. J. Urol., 152: 453-458, 1994.
4) Nakamura, S., Ishiyama, S., Kobayashi, Y., et al.: Automatic integrated circuit card system for recording 24-hour uroflowmetry. J. Urol., 150: 926-929,1993.
5) Drach, G.W,, Layton, T.N., Binard, W.J.: Male peak urinary flow rate: relationships to volume voided and age. J. Urol., 122: 210-214, 1979.
6) Haylen, B.T., Ashby, D., Sutherst, J.R., et al.: Maximum and average urine flow rates in normal male and female population - the Liverpool nomograms. Br. J. Urol., 64: 30-38, 1989.
7) Drach, G.W., Layton, T.N., Bottaccini, M.R.: A method of adjustment of male peak urinary flow rate for varing age and volume voided. J. Urol., 128: 960-962, 1982.
8) Marshall, V.R., Ryall, R.L., Austin, M.L., et al.: The use of urinary flow rates obtained from voided volumes less than 150 ml in the assessment of voiding ability. Br. J. Urol., 55: 28-33, 1983.
9) Barry, M.J,, Gilman, C.J., O'Leary, M.P., et al. : Using repeated measurement of symptom score, uroflowmetry and prostate specific antigen in the clinical management of prostate disease. J. Urol., 153: 99-103, 1995.
10)Yalla, S.V., Sullivan, M.P., Lecamwasam, H.S., et al.:Correlation of American urological association symptom index with obstructive and nonobstructive prostatism. J. Urol., 153: 674-680, 1995.
11) Golomb, J., Lindner, A., Siegel, Y., et al.: Variability and circadian changes in home uroflowmetry in patients with benign prostatic hyperplasia compared to normal controls. J. Urol., 147: 1044-1047, 1992.




Legends to Illustrations


Fig.1. Twenty-four-hour uroflowmetry of a patient. Flow curves of 20 sequential urinations are printed. The full time scale is 90 seconds and full flow rate scale is 15 ml per second. The date and time of urination are listed above each flow curve. It is shown that he urinated 3 times (6, 7, and, 8th records) between 21:00 on the 11th and 6:00 on the 12th. Therefore, his objective score for nocturia was 3. Ten urinations (3, 4, 8, 9, 11, 12, 13, 14, 15, and 20th records) among 19 urinations (the first urination was excluded) were at intervals of less than 2 hours. Therefore, his objective score for frequency was 3. Ten urinations (2, 3, 4, 5, 7, 8, 12, 17, 18 and 19th records) among 20 urinations were intermitted. Therefore, his objective score for intermittency was 3.
Fig.2. (a) The preoperative distribution of peak flow rates of a patient who answered he had almost always weak stream. Because he was not satisfied with the peak flow rate of 16 ml/sec, it is reasonable to assume that his threshold was above 16ml/sec. (b) The postoperative distribution of peak flow rates of the same patient. By adopting his threshold of 16 ml/sec, 3 peak flow rates among 10 fall below the threshold. His postoperative objective score for weak stream became 2 (less than half the time).
Fig.3. (a) The preoperative distribution of hesitation times of a patient who answered he had hesitation about half the time. Because it is reasonable to assume that his threshold was at the median of his distribution of hesitation times, his threshold was 20 seconds. (b) The postoperative distribution of hesitation times of the same patient. By adopting his threshold of 20 seconds, 5 hesitation times among 20 are above the threshold. His postoperative objective score for hesitancy became 2 (less than half the time).
Fig.4. The relationship between the sum of 5 symptom scores (transverse axis) and that of 5 objective scores (longitudinal axis) in 11 patients. Arrows indicate the change of the scores in each patient. In most patients, the changes of the objective scores were smaller than those of symptom scores.


#