serverless

Terraform-AWS serverless blog

Iā€™m learning Terraform at the moment and thought this could be a good hand-on side project for me. The provided terraform code will spin up a github repo, a codebuild project and a s3 bucket to host a static blog (blue box in the flow chart above). I figure people might not want to use cloudfront or route 53 as they are not free tier service, so I left them out.

To spin this up, we will need the below prerequisites:

Once all the prerequisites are setup, follow the steps below.

  1. Open cmd/powershell and run the following commands to clone terraform and build spec file:
1
git clone https://github.com/tduong10101/serverless-blog-terra.git
  1. Update serverless-blog-terra/variable.tfvars with your github token and site name that you would like set up
  2. Run the following commands
1
2
3
cd serverless-blog-terra
terraform init
terraform apply -var-file variable.tfvars
  1. Review the resouces and put in ā€œyesā€ to approve terraform to spin them up.
  2. Grab the outputs and save them somewhere, weā€™ll use them for later steps.
  3. Navigate to the parent folder of serverless-blog-terra
1
cd ..
  1. Create a new folder, give it the same name as git repo (doesnā€™t matter if the is not the same, itā€™s just easier to manage), cd to new folder and run hexo init command

    1
    2
    3
    mkdir <new folder>
    cd .\<new folder>
    hexo init
  2. Copy buildspec.yml file from serverless-blog-terra folder to this new folder

  3. Update the buildspec.yml with s3:// link from step 5

  4. Init Git and setup git remote with the below commands. Insert your git repo url from step 5.

1
2
3
4
5
git init
git add *
git commit -m "init"
git remote add origin "<your-git-url-from-step-5>"
git push -u origin master
  1. Wait for codebuild to complete update S3 bucket. Logon to AWS console to confirm.
  1. Open the website_endpoint url on step 5 and enjoy your serverless blog.

Visit Hexo for instructions on how to create posts, change theme, add plugins etc

Remove the blog:

  1. If you donā€™t like the new blog and want to clean up aws/git resources. Run the below command:
1
terraform destroy -var-file variable.tfvars
  1. Once terraform finish cleaning up the resources. The rest of the folders can be removed from local computer.

Hosting a simple Code Editor on S3

I got this old code editor project sitting in github without much description - repo link. So I thought why not try to host it on S3 so I could showcase it in the repo.

Also itā€™s a good pratice to brush up my knowledge on some of the AWS services (S3, CloudFront, Route53). After almost an hour, I got the site up so itā€™s not too bad. Below are the steps that I took.

  1. Create a S3 bucket and upload my code to this new bucket - ceditor.tdinvoke.net.

  2. Enable ā€œStatic website hostingā€ on the bucket

  3. Create a web CloudFront without following settings (the rest are set with default)

    1. Origin Domain Name: endpoint url in S3 ceditor.tdinvoke.net ā€˜Static Website Hostingā€™
    2. Alternate Domain Names (CNAMEs): codeplayer.tdinvoke.net
    3. Viewer Protocol Policy: Redirect HTTP to HTTPS
    4. SSL Certificate: Custom SSL Certificate - reference my existing SSL certificate
  4. Create new A record in Route 53 and point it to the new CloudFront Distributions

Aaand here is the site: https://codeplayer.tdinvoke.net/

Next I need to go back to the repo and write up a readme.md for it.

Get AWS IAM credentials report script

Quick powershell script to generate and save AWS IAM credentials report to csv format on a local location.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
Import-Module AWSPowerShell
$reportLocation = "C:\report"
if (!(test-path($reportLocation))){
New-Item -ItemType Directory -Path $reportLocation
}
$date = get-date -Format dd-MM-yy-hh-mm-ss
$reportName = "aws-credentials-report-$date.csv"
$reportPath = Join-Path -Path $reportLocation -ChildPath $reportName
# request iam credential report to be generated
do {
$result = Request-IAMCredentialReport
Start-Sleep -Seconds 10
} while ($result.State.Value -notmatch "COMPLETE")
# get iam report
$report = Get-IAMCredentialReport -AsTextArray
# convert to powershell object
$report = $report|ConvertFrom-Csv
# export to set location
$report | Export-Csv -Path $reportPath -NoTypeInformation

My random podcast app

Iā€™ve been trying to catch up with a few podcasts and canā€™t decide what to listen to first. So I thought, let create an app that could pick out a random episode for me. Less thinking about picking and more time listening!

So here is what I came up with.

I thought it would be straight forward but it took me the whole weekend to get it up T__T

There are 4 lambda functions in this app.

1- update-station: trigger whenever a new item is added to stationsDB. This will crawl the site main page to get episode playlist and insert that back to stationsDB as list_url

2- update-episode: trigger by update-station function or a monthly cloudwatch event. This function will loop through the stationsdb and run the itemā€™s spider fucntion on its list_url. The output would be a list of 50 most recent episodes for each stations. This list would then get compare with all episodes added to episodesDB. The differences would then get added to episodesDB

3- gen-random-episode: trigger by api gateway when an episode is finished playing at https://blog.tdinvoke.net/random-podcast/. This funciton would first change the current episode status ā€˜completedā€™. Then it would pull out all episodes url from episodeDB that havenā€™t play (with blank status). Random pick out 1 episode and change its status to current.

4- get-current-episode: trigger by api gateway when the page https://blog.tdinvoke.net/random-podcast/ is loaded. This one is simple, pull episode with ā€˜currentā€™ status.

You can find the codes here

To see the app in action, please visit here

Issues encountered/thoughts:

  • Add a UI page to modify the station DB. Iā€™ll have to workout how to put authorisation in API call to add new station.
  • Split crawler functions into separate lambda functions which make the functions clean and easy to manage.
  • Add more crawler.At the moment, this app only crawl playerfm stations.
  • Learnt how to add js scripts to Hexo. There arnā€™t much information on how to it out there. I had to hack around for awhile. Basically, I need to create a new script folder at thems/ā€˜my-themeā€™/source/ā€˜td-podcastā€™. Chuck all my js scripts in there, then modify ā€˜_partials/scripts.ejsā€™ to reference the source folder. Learnt a bit of ejs as well.
  • Chalice doesnā€™t have Dynamodb stream trigger, gave up halfway and gone back to create the lambda functions manually.
  • Looking into SAM and CloudFormation to do CI/CD on this.
  • Could turn this into youtube/twitch random video. Looking into Youtube Google api and Twitch api.

AWS WAF automations

A friend of mine suggested that I should write something about AWS WAF security automations. This is mentioned in the Use AWS WAF to Mitigate OWASPā€™s Top 10 Web Application Vulnerabilities whitepaper and there are plenty of materials about this solution on the net. So I thought, instead of writing about what it is / how to set it up, let have some funs ddos my own site and actually see how it works.

Iā€™m going to try to break my site with 3 different methods.

1. http flood attack

My weapon of choice is PyFlooder.

After about 5000 requests, the lambda function started to kick in and blocked my access to the side. I can also see my ip has been blocked on WAF http flood rule.

I then removed the ip from the blocked list and onto the next attack.

2. XSS

Next up is XSS, input a simple <script> tag on to the uri and I got 403 error straight away.

3. Badbot

For this method I used scrapy. Wrote a short spider script to crawl my site, targeting the honeypot url.

1
2
3
4
5
6
7
8
9
import scrapy
class mySpider(scrapy.Spider):
name='td-1'
start_urls = ['https://blog.tdinvoke.net']
def parse(self, response):
for url in response.css('a::attr("href")'):
yield {'url': url.extract()}
for next_page in response.css('a[rel="nofollow"]::attr("href")'):
yield response.follow(next_page.extract(), self.parse)

Release the spider!!!!

and got the 403 error as expected.

Issues encountered/thoughts:

  • Setting up the bot wasnā€™t easy as I expected, but I learnt a lot about scrapy.

  • I accidentally/unknowingly deleted the badbot ip list from the badbot rule. Only found out about the silly mistake by going through the whole pipeline (api gateway -> lambda -> waf ip list -> waf rule) to troubleshoot the issue.

  • PyFlooder is not compatible with windows os. Had to spin up a ubuntu vm to run it.

  • Learnt how to add file to source for Hexo. Not complicated at all, just chuck the file into /source folder. Do not use the hexo-generator-robotstxt plugin, I almost broken my site because of it.

  • Overall this was an interesting exercise - breaking is always more fun than building!

TI9 - Follow my fav team with opendota api, lambda and sns

Itā€™s Dota season of the year, The International 9, the biggest esport event on the planet. So I thought I should make a project relate to this event - a notification function that notify me on my favorite team matches.

This function uses opendota api, aws lambda, cloudwatch and sns. Below is a high level design of the function I put together:

Lambda funcion is set to run every 1 hour trigger by CloudWatch. If the function found my favorite team just finish their match, it will sms me the result. Below is the lambda python code and a screen shot of a sms message.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
import os
import requests
import json
from datetime import datetime, timedelta
import boto3

def lambda_handler(event, context):
uri = 'https://api.opendota.com/api/proMatches'
response = requests.get(uri)
json_data = json.loads(response.content)
team_name = os.environ['MY_TEAM']
league_name = os.environ['LEAGUE_NAME']

filtered_json_data = list()
for j in json_data:
if j['radiant_name'] == None or j['dire_name'] == None:
continue
check_rad_name = team_name in j['radiant_name']
check_dire_name = team_name in j['dire_name']
check_tour = league_name in j['league_name']
current_time = datetime.now()
duration = j['duration']
start_time = datetime.fromtimestamp(j['start_time'])
end_time = start_time + timedelta(seconds=duration)
elapsed_time = current_time - end_time
elapsed_time_cal = divmod(elapsed_time.total_seconds(), 60)
check_time = elapsed_time_cal[0] < 60
if (check_rad_name or check_dire_name) and check_tour and check_time:
filtered_json_data.append(j)

#print(*filtered_json_data, sep="\n")

for f in filtered_json_data:
if f['radiant_win']:
winner = f['radiant_name']
winner_score = f['radiant_score']
loser = f['dire_name']
loser_score = f['dire_score']
winner_side = "R"
loser_side = "D"
else:
winner = f['dire_name']
winner_score = f['dire_score']
loser = f['radiant_name']
loser_score = f['radiant_score']
winner_side = "D"
loser_side = "R"

if winner in team_name:
message = "{0}({5}) won against {1}({6})\n{0}: {2} - {1}: {3}\nGame duration {4:.0f} minutes".format(winner, loser, winner_score, loser_score, f['duration']/60,winner_side,loser_side)
else:
message = "{1}({6}) lost against {0}({5})\n{0}: {2} - {1}: {3}\nGame duration {4:.0f} minutes".format(winner, loser, winner_score, loser_score, f['duration']/60,winner_side,loser_side)

client = boto3.client('sns')
client.publish(
TopicArn = os.environ['SNS_ARN_TOPIC'],
Message = message
)

return 0

Things could be improved:

  • set up full CI/CD
  • CloudWatch schedule time to only run when the matches are happenning not 24/7
  • UI to select favorite team or hook up with Steam account favorite team

Iā€™ll comeback on another day to work on this. Got to go watch the game nowā€¦

Letā€™s go Liquid!

Serverless Blog

So this blog is serverless using combination of hexo, s3, github, codebuild, route53 and cloudfront. My original plan was to build the blog from the ground up with lambda chalice, dynamodb and some hacking with java script. But I thought there got to be someone with the same idea somewhere. One search on google and found two wonderful guides from hackernoon and greengocloud. Thanks to the guides I was able to spin this up within 4-5 hours. Iā€™m still getting use to Hexo and markdown but feeling pretty good that I got it working.

I was struggling a bit with git, the theme didnā€™t get committed properly. Removed Git Submodule sorted the issue out.

Also CodeBuild didnā€™t play nice with default role, got to give the role fullS3access to the bucket. Itā€™s working like charm now.

PS: This blog use Chan theme by denjones