This forum is about wrong numbers in science, politics and the media. It respects good science and good English.
There is nothing unusual about the behaviour of the placebo group in the APPROVe study. It's about the second non-obvious thing you stumble across when you start working on time to event analyses in clinical data. The incidence per unit time always decreases with increasing time, and precisely because events have occurred. They have occurred relatively more frequently in those at higher risk for the event thus leaving the remaining, event-free population, at lower overall risk. So fewer events occur per unit time at later timepoints. In APPROVe we see exactly this in the placebo set but not in the treated set.
For the same reason (good) contraceptive efficacy trials are run over 12-18 months in fewer subjects rather than over 1-3 months in many subjects. The events of becoming with child due to unreliability, deceit, or being everyday-hard-of-thinking dramatically outweigh the treatment failures in the early stage of the trial, so you see little difference between treatments. Those who make it to ~6 months not pregnant are already a selected population, and as a result more likely to make it to 18 months not pregnant than all randomised subjects.
so you only get a good idea of relative treatment failure rates once (a substantial proportion of) those who were going to get pregnant anyway, whether by ineptitude or intent, have already done so, and turned into notches on your Kaplan-Meier output. The same principle works with almost any time-to-event data. Vide sinusoidal progression curves in cancer trials - the first few weeks of no change is an artefact of recruiting patients not quite on the brink of death and time between starting treatment and first scan.
Humans are humans and produce results accordingly, not according to a neat radioactive-decay-like half-life. Ironically even there the number of events per unit time is inversely proportional to time elapsed, albeit due to the reduction in population size rather than change in "risk profile" for decay event.
The general equation for time to failure curves includes the variation of rate of events with time. This is seared into my memory, because as a research student I thought I had developed it independently, only to discover that von Laue had included it in his 1925 paper. The exponential distribution (Poisson’s traffic law) is the special case, when the event rate is independent of time lapse. Are you suggesting that a selection from the general population of various ages start to experience CV events just because you have started a trial? Are you satisfied that 18 months is a reasonable time constant for this process? In which case, why not select your group then start the trial two years later? Do you really think that all but 2% of the sample are virtually immune to such events; or indeed that removal of that 2%, corrected or not, is dramatically going to flatten your curve? All this compared with a treatment group that behaves exactly as you would expect and with a plausible time constant.
It is the slope of the curve, or to be precise the slope of -ln[1-F(t)] versus t, that is the estimator of event rate.
I regret that I am unable to comment on your anecdotal examples, for lack of familiarity.
Starting the trial with patients an average of two years older would have no effect unless you selected the trial population for not having any previous CV events. The population in this trial was >40 years old, exclusion for CV events only for the year prior to trial start.
The reason for the near-universal observation of this shape in survival curves is the removal of patients from the trial upon the event occurring. In those at higher risk for an event the event will tend to occur early on, subsequent events (if any) are not captured because the subject is off-trial. Therefore once you are some way into a trial, you are inevitably left with a population with a lower risk profile than when you started. This is independent of when the trial starts.
It's also not a population of various ages, but specifically a population at higher (and rather constant) risk of CV events compared to the general population (>40 years). I agree the older the patients are later in the trial contributes to an increase in risk, but over 3 years this is going to be marginal - and most importantly equal (stochastic effects considered) in the two groups.
The time constant seems reasonable to me, remember it will also be influenced by the length of time subjects spend on-trial. If you take it to an extreme you will have a sigmoid curve, with a rapid uptick towards the end. 3 years isn't long enough to get there for CV events. Again, the theoretical considerations are very much secondary to the data observed. It's also complicated by early termination as you rightly pointed out, meaning there is lower participation at later time points. Had the trial been completed you would have seen more events in both groups at later time points.
The treated group does not behave exactly as expected - the event curve is almost linear. This is the point - the behaviour of the placebo group is normal, that of the treated group is not.
Also remember this was an experiment, a randomised controlled clinical trial. Your entirely accurate observations on RR<2 and so on in epidemiological data-dredges do not apply here - CV risk was a predefined endpoint for this trial. It was specifically set up to look for this, so P<0.05 is, for all its laxity, appropriate given the experimental setting.
Mortality is a red herring, heart attacks and strokes invariably produce permanent, often disabling consequences that we would rather avoid. Hence the premature termination. The first and foremost consideration in a clinical trial is the wellbeing of the subjects. Science takes a back seat to that whether we like it or not.
It occurs to me that it is also worth debating whether the strength of evidence is sufficient to pull the drug from the market. I'd agree probably not, and as a scientist I would consider doing similar trials in a population selected for no prior CV events. There are probably other constellations you could look at - short-term treatment for example might give a more acceptable risk/benefit profile than "chemopreventive" treatment over years.
The fact is that once these data became public the drug became commercially unviable and going back to the drawing board with further trials was also a non-starter, partly for commercial reasons (the extra few years of ticking patent clock meant that even if the drug could be licensed again there was no prospect of recouping the additional development costs), and partly for ethical reasons.
Drug licensing involves both science and the real world. Once these data were out the drug was always going to fail at the public acceptance stage. It's not a unique way to go - after the Tegenero scandal, a potentially useful product never got developed. Several people were injured by dosing errors and failing to follow the protocol (and failure to understand at the time that biologicals will work differently in humans and other species). It might do what was intended at much lower doses, but we will never know because you will not ever find anyone prepared to take it at any dose. This is regrettable, but we have to live with reality.
The early part of an exponential (for time much less than the time constant) is linear. Bit of a coincidence, isn’t it, that the allegedly discrepant group continues exactly at the same event rate as both groups started at? I find your arguments in justification for the claim rather circular and in need of a trim from Occam’s razor.
I'm not convinced its a coincidence, I've seen similar behaviour in similar data too many times. Curves that look like that (albeit with radically different numbers and duration of treatment) are used to claim efficacy for cancer treatments, so you can't just ignore them when they turn out to be bad news for a drug.
Again for emphasis, we are talking about data from human subjects here, not physics experiments. Whatever type of curve you might want to fit to the data, it is the data that matter not the theory behind your distribution of choice. The crucial thing that is going on here is the removal of subjects that have events from the population. Every time you do this you reduce the inherent risk of events in the remaining population (because those at inherently higher risk will on average have their event earlier in the trial). That is why you expect to see the event incidence per unit time tail off.
If we do indeed see that in one group and not in another, in a randomised, double-blind, placebo-controlled clinical trial powered to look for the effect and with the effect as a predefined endpoint, we are entitled to conclude (with all the usual caveats) that there is a treatment effect.
I'd be delighted for you to take Occam's razor to my arguments or point out where they are circular, I'm here to learn.
Since it's come up again, Vioxx was not seen off by junkists, it was voluntarily withdrawn from the market by the manufacturer. I believe it still has a valid marketing license in the USA, which means the FDA is all fine about it.
There was no statistical blunder. The behaviour of the placebo group is completely normal for a clinical trial setting. It comes down to whether you think the benefit of the product outweighs the risk. Even given this data, that trade-off might be positive for the customers, but especially in the USA with its class-action suits and huge personal injury claims egged on by a drug-company-suing industry, the manufacturer did not think it worthwhile continuing to flog these pills.
Our bending author is spot on when it comes to most abuses of statistics, but this one really doesn't cut it. The criticisms rightly applied to post-hoc analysis of multiplicit data-dredges on retrospectively-reported exposures and unreliable outcome data do not apply to single predefined endpoints with predefined analyses on prospective empirical data. Even when it's the control group you think is odd.
You lost me in the first paragraph at the top of this page, particularly with its last sentence.
I have so often seen these great breakthroughs in the work of research students. I have always told them “You need to demonstrate that this is repeatable”, which was, unfortunately, always the end of the matter. I conceived the Law of Experiments long before I started Number Watch. Only the second corollary appeared much later, but well before the vioxx fiasco.
The attrition of the cohorts appears considerably greater than the number of incidents, which leads me to conjecture that the premature termination of the trial was motivated by the shortage of participants rather than for ethical reasons. I also note that the results were presented as percentages of the remaining cohorts rather than their actual numbers, which does not lend support to your explanation.
This discussion is a variant of the smoking/lung cancer discussion to me. The analysis is of the RR (HR, OR, Whatever ratio you wish to use). The RR is inevitably a ratio of bad to basis for each group. In my over simplified model, it is the probability of rolling a six on one throw of the die and comparing that to the chance of rolling a six on a different die.
Continuously lost in the discussion is the survivability ratio, aka, what is the chance in x rolls of the dice that I won't roll a six. Smokers continue to flaunt the huge risks of smoking, leading active productive lives surviving way longer than they are suppose to. The risks associated with Vioxx are microscopic compared to smoking.
The smoking death toll requires great minds to extract from the data. Other great minds can come back and twist the result without too much difficulty because it fiddles with a variable that isn't all that well defined. The unit of life is the period between birth and death. Rosebuds come to mind. There is quite possibly more science behind "carpe diem" than all the sophisticated analysis of epidemiology. Vioxx made it infinitely more likely for a person seize more moments and enjoy them. Apologies in advance, but I will sneer heavily in the direction of anyone suggesting they can accurately measure enjoyment. A Nun on her knees praying can effectively be enjoying life every bit as much as Paris Hilton.
Imagine a collection of atoms of radioactive elements. Any and all - we have a range of half-lives ranging from a few seconds to millenia.
We stick them in a box and watch them decay. At the beginning of the experiment the decay rate, the number of decays per unit time per number of remaining atoms, is higher than it is later in the experiment.
Your average group of patients being watched for some event is like this mixture of different nuclei, they are not a group of the same type of nuclei at constant risk of decay irrespective of time. The analogy is not perfect of course - in humans risk of bad stuff tends to rise with age, radioisotopes don't care how old they are. But in this human experiment we are looking at two groups with one systematic treatment difference, and all the other differences controlled for, as far as possible, through randomisation.
Cohort attrition at later stages is because of the trial being stopped.
It's not ethical to stop a trial in a marketed product for lack of participation, not that there was any such lack anyway. It is ethical to stop it because the drug is harming people. The highest ethic in clinical trials is the good of the participants in that trial - the greater good of humanity takes a back seat (cf. Declaration of Helsinki). You might not like that scientifically, but that's the decision that has been reached and in many places legislated for.
Once you have this kind of result it's **** hard to find volunteers for a repeat of the experiment. Like I said, you are absolutely right with most of your criticisms of statistical abuse in the life sciences. However, as it's "life" and in this case human life, we have to live with a lower degree of certainty than the physicists.
Rutherford said if you need statistics you need better experiments. Unfortunately in drug research it is not ethically possible (or financially possible) to put any question of efficacy or safety beyond all doubt. Beyond reasonable doubt (even at p=0.05) can be hard enough. and the risk/benefit evaluation is by its nature not a simple formula.