As more and more platforms offer real-time event streams, the infrastructure that stores and interprets these events need to evolve as well. Amazon Kinesis can be integrated into any existing or new architecture to fix this issue. You can run analytics on the fly, shard the data streams for scalability, or simply stream the data into an S3 bucket for later processing.
Stream v Batch
Batch processing – this method involves moving data in chunks and analysing it as each chunk is transmitted.
Stream processing – data is streamed in real-time from the storage option. This represents a significant change from batch processing, which has been the traditional method for transferring data from one place to another. As a result, we can act on the data faster and make better decisions.
Amazon Kinesis Data Streams
To illustrate the concepts, let’s use Kinesis Data Streams as an example:
Input/producer – an event is generated by an application that acts as the input/producer of the data. It can be log files, media, clicks on a website, or transactional data.
Data stream – a shard, or a group of shards, that ingests records at the rate of 1000 per second per shard. The data is then available for 24 hours.
Consumer/processor – this is the AWS service, which can be another Kinesis service, that retrieves events from the shards. This usually occurs in real-time. This can also be pushed into databases like DynamoDB or Aurora.
Use cases for Kinesis Data Streams
- Streaming data like website clicks and transactional data
- Migrating data from databases
- Applications with specialised data pipelines
Amazon Kinesis Firehose
Kinesis Firehose differs from Kinesis Data Streams as it takes the data, batches, encrypts and compresses it. Then persists it somewhere such as Amazon S3, Redshift, or Amazon Elasticsearch Service.
Use cases for Kinesis Firehose
- IoT events
- Splunk can be configured as a destination for security monitoring.
- Auto archiving
Amazon Data Kinesis Analytics
Kinesis Data Analytics allows us to both process events and analyse them using SQL queries on the fly. The service recognises formats like JSON and CSV, then sends the output on to the analytics tool for visualisation or action.
Use cases for Kinesis Analytics
- Processing of events data from applications
- Exploratory analysis
- Analysing clickstream anomalies
How do I pay for all this?
- Data stream shards are billed at an hourly rate.
- Firehose and Analytics services are billed based on the volumes of data ingested.
- The free tier does not include Kinesis but many of the other core services are. If you have a use-case for streaming data, give it a try.
Photo by cottonbro from Pexels