Big Data Solution Providing Insights into Customer Behavior across 30+ Dimensions
Customer
The Customer is a Texas-based telecom company participating in the federal Lifeline Support Program and providing pre-paid cell phones and service packages to low-income individuals.
Challenge
As a part of the project, ScienceSoft’s analytics team was to design and implement data management and analytics platform to let the Customer collect the data from multiple sources and get insights into customer behavior. The Customer wanted the platform to analyze historical data and enable forecasting. Access rights were another issue to solve, as the Customer planned to provide their tenants with the access to the tenant-related analytics.
Solution
The data analytics platform was gathering raw data (such as user’s impressions and click-throughs, tariff plans, device models, apps installed and more) from 10+ sources. To collect this telemetry data and move it into Apache Kafka, ScienceSoft’s big data team suggested the MQTT protocol.
The team also suggested using Amazon Spot Instances to reduce the costs of AWS computing resources. To ensure the analytical system’s scalability, they used AWS Application Load Balancers.
Apache Kafka acted as a data streaming platform. There, the raw data was organized for further offload into the landing zone that was running on Amazon Simple Storage Service. For data storage and warehousing, Amazon Redshift was chosen, where the telemetry data from mobile phones running on Android, as well as the information from the Enterprise Resource Planning and the Home Location Register (HLR) was supplied to.
To enable regular and ad-hoc reporting, ScienceSoft developed ROLAP cubes with 30+ dimensions and 10+ facts. For instance, the analytical system measured advertising impressions and click-throughs of a particular user to calculate the reward points earned. Another example: based on the increased number of calls to the support, the Customer could expect that the user was likely to be dissatisfied with the service. With no measures taken, that could lead to customer churn.
Not only the Customer but also their tenants (also telecom companies with their own customers and HLRs) were granted access to the platform for valuable insights. For instance, a tenant can access the part of analytics related to their company. To make this possible, ScienceSoft’s team introduced two approaches: shared access (organized at the data warehouse level) and dedicated access (involving a separate AWS account).
Results
With ScienceSoft’s big data services, the Customer was able to:
- Measure the engagement and identify the preferences of a particular user.
- Spot trends in the users’ behavior.
- Make predictions about how users would behave.
- Invoice advertisers based on their calculated share.
- Benefit from insightful data analytics (for example, daily earnings, number of new users, customer service data and more).
The use of Amazon Spot Instances allowed the Customer to reduce the costs of AWS computing resources by 80%.
Technologies and Tools
Amazon Web Services (Amazon cloud), Apache Kafka (data streaming), the Message Queuing Telemetry Transport Protocol, Amazon Simple Storage Service (persistent storage used for data landing zone), Amazon Redshift (data warehouse), Airbnb Airflow and Python (ETL).