A Global Pharma Company Achieved 82% Accuracy in Sales Forecasting Despite Data Discrepancies


Business Case

A global pharmaceutical company was looking for a robust forecasting model capable of analyzing vast historical sales data and accurately predicting the demand for various Active Pharmaceutical Ingredients (APIs). These predictions were crucial in improving production scheduling, optimizing resource allocation, and reducing inventory costs.

The precision in predictions majorly depends on the accuracy of the data; however, the client’s historical sales data lacked accuracy, in terms of geographical definition which would have helped in pinpointing the reason for the spike or dip in the sales for specific days in a specific geographical location.

The client was looking for an expert data science team to sort the inaccuracies within data, build a robust forecasting model and train it to predict future sales trends accurately.

Our Solution

Before building a forecasting model the team had to tackle data discrepancies to ensure accurate predictions. The data scientists at i2e first took on the task of geographic definition, for this the team cross referenced the historical sales data with the annual calendars of various regions to understand the reasons behind the variations in sales globally. The team used AWS SageMaker to conduct geo location trans reference to understand the variations in sales.

Once the data discrepancies were cleaned, the team further used the Augmented Dickey-Fuller (ADF) and Kwiatkowski-Phillips-Schmidt-Shin test (KPSS) tests to determine the stationarity of the data which allowed them to double check the accuracy of the data post data cleaning. The team built several models starting from simple Moving Average to ARIMA, SARIMA, SARIMAX. We even used PydArima(Pyramid Arima) and FbProphet along with LSTM to train and predict the future sales trends for various APIs compounds accurately. The team also used deep learning models to determine the upper and lower limits to attain a prediction accuracy of 82%.

Challenges Overcome

  • Addressing the inaccuracies within the data, studying the data and cross-referencing with the calendars globally was a challenging task.
  • Selecting and training the ML model on the data to determine the predictions with 82% accuracy required multiple iterations.


  • Data discrepancies were corrected which made the data suitable for accurate predictions.
  • Trained ML model to predict sales trends for the upcoming years.
  • Accurate sales forecasts helped for the effective production planning.