This forum is about wrong numbers in science, politics and the media. It respects good science and good English.

Now you have lost me.

That is certainly not your fault.

I was trying to explain a concept and it was, I'm afraid, poorly done.

This free to view paper taken from the Lancet should give an idea how the confidence interval for the "meta analysis" relates to the confidence intervals for the individual contributing studies in the meta analysis. The meta analysis still has 95% confidence levels.

The paper seems to be using a technique for combining results from the contributing studies called the Mantel-Haenszel method (it quotes some other methods as well), for which details are given on this webpage:

Meta-analysis can work fine if you are combining several similarly-executed trials which then gives you the statistical power to see effects that wouldn't be powered for in the smaller trials.

The problem with most of the epidemiological stuff is not the use of meta-analysis per se, it is that the statistical testing is applied to large numbers of post-hoc hypotheses and the tests are really designed to look for effects of interventions. Since you can't ethically perform an interventional study with tobacco smoke (we know it's bad for you and the study has no prospect of benefitting the participants) you have to assign "treatment groups" on the basis of asking people about past, incidental exposure. This is not only notoriously inaccurate it introduces a range of biases you can't control for. One odd result off the top of my head: Dutch tea drinkers are more likely to smoke than Dutch non-tea drinkers. If the correlation is strong enough (or the study large enough) you could demonstrate that drinking tea causes heart disease in the Netherlands. Bigger studies also take disproportionate effort to do important things like age and sex matching of controls - importantly failing to do this is likely to dilute real effects but potentiate stochastic effects.

In the clinical world, statistically significant results (uncorrected for multiplicity) on things other than your powered, primary efficacy variable, are considered interesting things that might or might not be worthy of further investigation. At best in efficacy they are supportive of a claim. You couldn't usually base a strong enough claim to get a marketing license on them, for example. In the public health world, one P<0.05 among twenty risks (never benefits) tested for is considered adequate justification for draconian legislation.