improved importance sampling via iterated moment matching

Topi Paananen, Juho Piironen, Paul-Christian Bürkner and Aki Vehtari have recently arXived a work on constructing an adapted importance (sampling) distribution. The beginning is more a review than a new contribution, covering the earlier work by Vehtari, Gelman  and Gabri (2017): estimating the Pareto rate for the importance weight distribution helps in assessing whether or not this distribution allows for a (necessary) second moment. In case it does not (seem to), the authors propose an affine transform of the importance distribution, using the earlier sample to match the first two moments of the distribution. Or of the targeted function. Adaptation that is controlled by the same Pareto rate technique, as in the above picture (from the paper). Predicting a natural objection as to the poor performances of the earlier samples, the paper suggests to use robust estimators of these moments, for instance via Pareto smoothing. It also suggests using multiple importance sampling as a way to regularise and robustify the estimates. While I buy the argument of fitting the target moments to achieve a better fit of the importance sampling, I remain unclear as to why an affine transform would change the (poor) tail behaviour of the importance sampler. Hence why it would apply in full generality. An alternative could consist in finding appropriate Box-Cox transforms, although the difficulty would certainly increase with the dimension.