#Data Analytics

Subscriptions in a new country: how to predict results without historical data?

How to forecast LTV in new markets without historical data? Discover practical approaches – from benchmarks and early behavioral signals to Machine Learning models in BigQuery ML ...

Grzegorz Kałucki, Data Analyst

18/08/2025

Introduction

Expanding into new markets is always an exciting moment for companies operating on a subscription model. New users, new growth potential, new opportunities to scale the business.

But there is one big “but”: the lack of historical data.
We don't know how customers from a new country will behave. We don't know their retention rate, we don't know if they will renew their subscriptions or cancel after a month. And yet we want to know if a campaign in Brazil, Cambodia, or Romania will pay off—before we burn through our budget.

In this article, we show how to approach the problem of predicting LTV (Customer Lifetime Value) when local data is lacking. The example is based on a real-world implementation.

It is worth noting that the lack of historical data is a challenge not only for marketing departments, but also for analytical and management teams who have to make decisions about budget allocation and development strategies. In such situations, it is crucial to use available proxy information and modern data analysis methods that allow for reliable forecasts despite limitations.

Furthermore, effective LTV prediction in new markets requires continuous monitoring and updating of models as new data becomes available. This allows for gradual improvement in forecast accuracy and optimization of marketing and retention activities.

In the following sections, we will present specific methods and tools to help subscription companies address this challenge, as well as discuss the benefits and limitations of each approach.

Card payments in subscription apps – user behavior analysis and early signals for LTV prediction in new markets

Problem: new market, zero data

Let's assume that our subscription app has been operating on the market for a long time, generating the most traffic from there, and is also present in several other smaller markets, such as Poland, Germany, and Spain. We have a well-researched user profile from these countries – we know subscription renewal rates, LTV, retention, and other important long-term metrics, and we use proven analytical models.

Now we are planning to enter a new market, for example Brazil. We are conducting several test campaigns there to explore the potential of this market for our business. The campaign has been running for several days, so we are starting to analyze users from Brazil. But what do we see?

No historical data available.
No information about subscription renewals – because the first renewal period has not yet passed, e.g., after one month.
Uncertainty as to whether assumptions from other markets that we have already researched can be applied.

As a result, we do not know whether users from Brazil will return to us in the long term, which means we do not even have an approximate estimate of LTV. How can we deal with this challenge?

Solution 1: benchmarking based on similar markets

The first approach is to use benchmarks from similar markets. For example: If Brazil is a new market, but we have data from Mexico, Chile, and Colombia, we can assume that their behavior will be similar, and therefore the metrics for these countries will also be similar. This approach allows us to quickly obtain initial estimates of LTV and retention, which is particularly important when there is a lack of local historical data.

We create a comparison group: “South America” and average the indicators, which allows us to draw conclusions based on a larger sample of data, thereby reducing the impact of individual anomalies.

Advantages:

A quick solution that allows for a preliminary assessment of market potential.
Sufficient for preliminary estimation of revenues and risks, which facilitates decision-making on budget allocation.

Disadvantages:

High risk of error – The behavior of the average user in Brazil may be completely different from that of a user in Colombia, even though they come from a similar geographical area.
This can lead to wrong investment decisions if the cultural or economic differences between markets are significant, which is worth bearing in mind when interpreting the results.

Solution 2: Gradually incorporating historical metrics from the new market

Knowing what percentage of users converted or what the renewal rate is after the first month, we can use this knowledge to improve our long-term estimates. For example, if we know that 5% of visits from the Brazil campaign resulted in a subscription purchase, we can search our historical data for a similar case and check what the final LTV looked like for that group after 12 months, and then translate that to the new cohort from Brazil, which allows for more informed planning of further marketing and budgetary activities.

Advantages:

we have a more accurate prediction,
estimates are based on what has already happened,
this increases the reliability of our forecasts,
minimizes the risk of wrong decisions.

Disadvantages:

it requires manual searching for “similar cases,”
it is susceptible to randomness or seasonality, for example: the compared campaign may have taken place during the holiday season, when renewability is unnaturally high,
this may distort the accuracy of the estimates and requires careful interpretation of the results.

Solution 3: Early predictions based on user activity

When analyzing users at a general level (i.e., at the cohort level), we lose a lot of information about the details of what individual users did, and this can be valuable information. For example, 7 days after installing the app, the user:

he opened the app 5 times,
spent 30 minutes in it,
proceeded to the payment screen,
performed a specific action.

That's something. Such activity may indicate that a given user will be willing to renew their subscription. Now, based on detailed knowledge of user behavior, e.g., from the first 7 days, we can estimate the probability of subscription renewal for the following months, and consequently, we can estimate the LTV of such a user. Why? Because we have historical information about many users from markets we know. In other words, we will try to match each “new” user from Brazil with a similar user, e.g., from Poland (for whom we know a longer history of activity).

How can this be done automatically? This is where machine learning comes in handy, i.e., creating a model to predict the likelihood of subscription renewal based on the user's behavior during the first few days of their account, where we use historical data from markets we are familiar with for training. This allows the model to take into account various behavior patterns and their impact on future user decisions, which significantly increases the accuracy of predictions and allows for more precise marketing budget management.

Advantages:

enables LTV forecasting based on very early signals (e.g., the first 7 days of activity),
allows you to act practically from the start of the campaign, without waiting for the first subscription renewals,
thanks to ML, you can automatically identify patterns and segment users, which increases the accuracy of marketing decisions,
flexible – the model learns and adapts as new data comes in.

Disadvantages:

requires a sufficient quantity and quality of behavioral data (e.g., application logs),
building and maintaining an ML model adds technical complexity.

Make better decisions with advanced data analytics, learn more

Tools and techniques that work

In this implementation, we used BigQuery ML, operating in a closed environment, which allowed for easy maintenance, updating, and scaling of the predictive model. We built accurate predictions at the user level, which we then averaged to the cohort level to obtain a clear and reliable picture at the customer group level. This allowed the client to determine at the start of the test campaigns whether it was worth increasing investments and allocating additional budget to develop a given market.

The model took into account, among other things, data on user behavior in the application (such as frequency of use, activity paths, interactions with key functions), campaign type, and relevant acquisition information (country, traffic source). This comprehensive approach enabled fast and accurate LTV forecasts, which were automatically updated as new data became available. This made it possible to continuously analyze and optimize the costs of acquiring new customers and better manage customer value over time.

The use of advanced machine learning models based on historical data and current user information has enabled accurate prediction of retention, churn rate, and expected revenue from each customer account. Such models not only allow for the estimation of new customer LTV, but also for the dynamic adjustment of marketing and retention strategies, which translates into increased business efficiency and better use of resources.

In summary, the use of BigQuery ML tools and advanced data analysis techniques is an effective method for creating reliable LTV predictive models that support business decision-making and cost optimization in conditions of limited historical data.

Summary: lack of data is not a verdict

Entering a new market? Have a new campaign, a new customer segment, and zero data?

✔️ Use data from similar markets as a benchmark.
Based on information from markets with a similar profile, you can create preliminary LTV and retention forecasts. This approach allows for quick analysis and assessment of the potential of a new market, even if local historical data is lacking.

✔️ Create separate segments for markets with unknown behavior.
Segmenting customers by market and behavior enables better analysis and prediction of LTV. This allows you to plan acquisition activities more precisely and manage the costs of acquiring new customers.

✔️ Predict LTV based on early behavioral signals.
Analyzing user activity in the first few days, such as the number of sessions or actions taken, allows you to forecast retention and churn rates. This data is key to accurately predicting LTV and planning further steps.

✔️ Train models that will become more accurate over time.
Use machine learning models to analyze and forecast user data. These models learn and increase their accuracy based on available information, allowing for better customer value management and optimization of retention and acquisition costs.

If your team is planning to expand abroad but lacks the tools to model LTV, we can help. We build predictive LTV models based on data from applications, campaigns, and data warehouses (e.g., BigQuery).