EN

CaseStudy

Synthetic Consumers in Hedonic Tests

 

How to enhance insights without compromising robustness

 

Synthetic data is emerging as a valuable tool in consumer research, offering new ways to enhance insights while maintaining data integrity. Generated through machine learning (ML) models rather than natural language processing (NLP) to avoid biased or stereotypical outputs from the LLM’s training set. Synthetic data can be used to boost participation, assign missing values, and create synthetic twins.

 

Approach

The data synthesis process involves three key stages: data preparation, synthesis, and quality reporting. Synthetic data can be used in hedonic testing to uncover core patterns in consumer responses or to enrich datasets with additional information on respondents or products.

 

Outcome

Findings show that synthetic data can offer robust results from a holistic perspective, with greater discrimination between products. However, inconsistencies in respondent behaviour within synthetic datasets were observed, including anomalies in clustering and penalty subgroup comparisons. These issues can lead to inaccurate conclusions if not carefully managed. Low discrimination in training data was identified as a key risk, potentially resulting in critical information loss. 

Our research reinforces that synthetic data is only as reliable as the real data used to train it.

Conclusion - Synthetic data has promising applications in market research, particularly when used to complement real data. However, its use in sensory consumer research – where detailed, per-respondent analysis is essential – requires careful consideration. We continue to advise clients on its appropriateness on a case-by-case basis, ensuring insights remain robust and relevant.

 

If you are interested to see the presentation we delivered at Pangborn 2025, please get in touch.

 

Back GET IN TOUCH