Log data will be uploaded on this firehose.
Bucket -> Create New -> Create S3 Bucket
EC2 - management instances service by AWS.
Launch Instance -> Linux -> Create -> Instance Type
Create New SSH Key -> Download Launch Instance
puttygen arq.pem -O private -o key.ppk
# if public key
puttygen keyfile.pem -L
Get public DNS of instance AWS - IPv4 address
Connect-SSH Client -> Copy public DNS -> Paste on putty hostname
SSH-Auth -> Browse -> Import key.ppk
pattern login: ec2-user
On AWS-instance:
sudo yum install -y aws-kinesis-agent
sudo yum install -y git
# Get repository with dataset and scripts
git clone https://github.com/cassianobrexbit/dio-live-aws-bigdata-2.git
unzip dataset
# Script python to process dataset line by line and generate logs
# Transform script in executable
chmod a+x loggeneratorscript.py
# Create directory that will store the logs
sudo mkdir /var/log/logdir
# Access kinesis to start working
cd /etc/aws-kinesis
# Access agent.json file to set configurations
sudo nano agent.json
File agent.json
## Attention to the region - it can be found at infos - kinesis firehose details - region
“flows” - directory of logs
“filePattern”:“/var/log/logdir/*.log”
“deliveryStream”: copiar delivery stream ARN
Instances -> Select -> Actions -> Security -> Modify IAM role
Create New IAM role -> Create role -> Select role to allow access to services
sudo service aws-kinesis-agent start
sudo chkconfig aws-kinesis-agent on
sudo ./loggeneratorscript.py 500000 #number of logs
tail -f /var/log/aws-kinesis-agent/aws-kinesis-agent.log
Check on bucket
Create DataStream & Access Delivered Data
Edit json to share streaming to be consumed to other applications
Create a streaming for data output
In AWS site - Kinesis -> Dashboards -> Data Stream -> Name_Stream
number of shards
Create Data Stream
Create connection between script and stream through agent.json
Add lines: *kinesis.endpoint:“kinesis.sa-east-1.amazonaws.com”
In flows - “kinesisStream”:“Data_Stream_name” deliveryStream - firehose name
Restart kinesis-agent
sudo service aws-kinesis-agent restart
It can be visualized with Glue Data Brew.
File agent.json model - content provided by the course
{
"cloudwatch.emitMetrics": true,
"kinesis.endpoint": "kinesis.<region>.amazonaws.com",
"firehose.endpoint": "firehose.<region>.amazonaws.com",
"flows" : [
{
"filePattern": "/var/log/logdir/*.log",
"kinesisStream": "DataStreamName",
"partitionKeyOption": "RANDOM",
"dataProcessingOptions": [
{
"optionName": "CSVTOJSON",
"customFieldNames": ["country", "iso_code", "total_vaccinations", "people_fully_vaccinated", "total_vaccinations_per_hundred", "vaccines", "source_name", "source_website"]
}
]
},
{
"filePattern": "/var/log/logdir/*.log",
"deliveryStream": "FirehoseName"
}
]
}