https://github.com/awslabs/realtime-fraud-detection-with-gnn-on-dgl

An end-to-end blueprint architecture for real-time fraud detection(leveraging graph database Amazon Neptune) using Amazon SageMaker and Deep Graph Library (DGL) to construct a heterogeneous graph from tabular data and train a Graph Neural Network(GNN) model to detect fraudulent transactions in the IEEE-CIS dataset.

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (7.4%) to scientific vocabulary

Keywords

amplify-js appsync aws aws-cdk dgl documentdb fraud-detection gnn graph-database neptune sagemaker

Last synced: 5 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: awslabs
License: apache-2.0
Language: TypeScript
Default Branch: main
Homepage: https://awslabs.github.io/realtime-fraud-detection-with-gnn-on-dgl/
Size: 12.6 MB

Statistics

Stars: 216
Watchers: 19
Forks: 39
Open Issues: 21
Releases: 11

Archived

Topics

amplify-js appsync aws aws-cdk dgl documentdb fraud-detection gnn graph-database neptune sagemaker

Created almost 5 years ago · Last pushed almost 2 years ago

Metadata Files

Readme Contributing License Code of conduct

README.md

Real-time Fraud Detection with Graph Neural Network on DGL

It's an end-to-end blueprint architecture for real-time fraud detection using graph database Amazon Neptune, Amazon SageMaker and Deep Graph Library (DGL) to construct a heterogeneous graph from tabular data and train a Graph Neural Network(GNN) model to detect fraudulent transactions in the IEEE-CIS Fraud detection dataset. See the more detail in blog post.

Architecutre of solution

This solution consists of below stacks,

Fraud Detection solution stack
nested model training and deployment stack
nested real-time fraud detection stack
nested transaction dashboard stack

Model training and deployment stack

The model training & deployment pipeline is orchestrated by AWS Step Functions like below graph,

Dashboard stack

It creates a React based web portal that observes the recent fraud transactions detected by this solution. This web application also is orchestrated by Amazon CloudFront, AWS Amplify, AWS AppSync, Amazon API Gateway, AWS Step Functions and Amazon DocumentDB. business system

How to train model and deploy inference endpoint

After deploying this solution, go to AWS Step Functions in AWS console, then start the state machine starting with ModelTrainingPipeline.

You can input below parameters to overrride the default parameters of model training,

json { "trainingJob": { "hyperparameters": { "n-hidden": "64", "n-epochs": "100", "lr":"1e-3" }, "instanceType": "ml.c5.9xlarge", "timeoutInSeconds": 10800 } }

How to deploy the solution

Regions

The solution is using graph database Amazon Neptune for real-time fraud detection and Amazon DocumentDB for dashboard. Due to the availability of those services, the solution supports to be deployed to below regions,

US East (N. Virginia): us-east-1
US East (Ohio): us-east-2
US West (Oregon): us-west-2
Canada (Central): ca-central-1
South America (São Paulo): sa-east-1
Europe (Ireland): eu-west-1
Europe (London): eu-west-2
Europe (Paris): eu-west-3
Europe (Frankfurt): eu-central-1
Asia Pacific (Tokyo): ap-northeast-1
Asia Pacific (Seoul): ap-northeast-2
Asia Pacific (Singapore): ap-southeast-1
Asia Pacific (Sydney): ap-southeast-2
Asia Pacific (Mumbai): ap-south-1
China (Beijing): cn-north-1
China (Ningxia): cn-northwest-1

Quick deployment

Region name | Region code | Launch --- | --- | --- Global regions(switch to above region you want to deploy) | us-east-1(default) | Launch AWS China(Beijing) Region | cn-north-1 | Launch AWS China(Ningxia) Region | cn-northwest-1 | Launch

See deployment guide for detail steps.

Deploy from source

Prerequisites

An AWS account
Configure credential of aws cli
Install node.js LTS version 16.18.0 at least
Install Docker Engine
Install the dependencies of solution via executing command yarn install && npx projen
Initialize the CDK toolkit stack into AWS environment(only for deploying via AWS CDK first time), run yarn cdk-init
[Optional] Public hosted zone in Amazon Route 53
Authenticate with below ECR repository in your AWS partition shell aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 763104351884.dkr.ecr.us-east-1.amazonaws.com Run below command if you are deployed to China regions shell aws ecr get-login-password --region cn-northwest-1 | docker login --username AWS --password-stdin 727897471807.dkr.ecr.cn-northwest-1.amazonaws.com.cn

Deploy it in a new VPC

The deployment will create a new VPC acrossing two AZs at least and NAT gateways. Then the solution will be deployed into the newly created VPC. shell yarn deploy

Deploy it into existing VPC

If you want to deploy the solution to default VPC, use below command. shell yarn deploy-to-default-vpc Or deploy an existing VPC by specifying the VPC Id, shell npx cdk deploy -c vpcId=<your vpc id>

NOTE: please make sure your existing VPC having both public subnets and private subnets with NAT gateway.

Deploy it with custom Neptune instance class and replica count

The solution will deploy Neptune cluster with instance class db.r5.xlarge and 1 read replica by default. You can override the instance class and replica count like below,

shell npx cdk deploy --parameters NeptuneInstaneType=db.r5.4xlarge -c NeptuneReplicaCount=2

Deploy it with using SageMaker Serverless Inference(experimental)

shell npx cdk deploy -c ServerlessInference=true -c ServerlessInferenceConcurrency=50 -c ServerlessInferenceMemorySizeInMB=2048

Deploy it with custom domain of dashboard

If you want use custom domain to access the dashbaord of solution, you can use below options when deploying the solution. NOTE: you need already create a public hosted zone in Route 53, see Solution prerequisites for detail. shell npx cdk deploy -c EnableDashboardCustomDomain=true --parameters DashboardDomain=<the custom domain> --parameters Route53HostedZoneId=<hosted zone id of your domain>

Deploy it to China regions

Add below additional context parameters, shell npx cdk deploy -c TargetPartition=aws-cn NOTE: deploying to China region also require below domain parameters, because the CloudFront distribution must be accessed via custom domain. shell --parameters DashboardDomain=<the custom domain> --parameters Route53HostedZoneId=<hosted zone id of your domain>

How to test

shell yarn test

Data engineering/scientist

There are Jupyter notebooks for data engineering/scientist playing with the data featuring, model training and deploying inference endpoint without deploying this solution.

FAQ

What’s the benefits of using the graph neural network in the scenario Fraud Detection?

In the scenario of fraud detection, fraudsters can work collaboratively as groups to hide their abnormal features but leave some traces of relations. Traditional machine leaning models utilize various features of samples. However, the relations among different samples are normally ignored, either because of no direct feature can represent these relations, or the unique values of a feature is too big to be encoded for models. For example, IP addresses and physical addresses can be a link of two accounts. But normally the unique values of these addresses are too big to be one-hot encoded. Many feature-based models, hence, fail to leverage these potential relations.

Graph Neural Network models, on the other hand, directly benefit from links built among different samples, once reconstruct some categorical features of a sample into different nodes in a graph structure. Via using message pass and aggregation mechanism, GNN-based models can not only utilize features of samples, but also capture the relations among samples. With the advantages of capture relations, Graph Neural Network is more capable of detecting collaborated fraud event compared to traditional models.

Why using graph database in this solution?

We use graph database to store the relationships between entities. The graph database provides the microseconds query performance to query the sub-graph of entities for real-time fraud detection inference.

How differentiate this solution and Amazon Fraud Detector?

This solution is based on Graph Neural Network and graph-structured data while the Amazon Fraud Detector use time-serial models and take advantage of Amazon’s own data on fraudsters.

In addition, this solution also serves as a reference architecture of graph analytics and real-time graph machine learning scenarios. Users can take this solution as a base and fit into their own environments.

How differentiate this solution and Amazon Neptune ML?

This solution has a few additional components and features than the current Amazon Neptune ML, including but not limited:

An end-to-end data process pipeline to show how a real-world data pipeline could be. This will help industrial customer to quickly hand on a total solution of graph neural network model-based system.
Real-time online inference sub-system, while the Neptune ML supports offline batch inference mode.
A demo GUI to show how the solution can solve real-world business problems, while the Neptune ML primarily uses graph database queries to show results.

While this solution gives an overall architecture of an end-to-end real-time inference solution, the Amazon Neptune ML has been optimized for scalability and system-level performance, e.g. query latency. Therefore, later on when the Amazon Neptune ML supports real-time inference, it could be integrated into this solution as the main training and inference sub-system for customers who requires better scalability and low latency.

Deployment failure on creating `CloudWatch LogGroup`

The deployment might fail due to creating CloudWatch log group with error message like below,

Cannot enable logging. Policy document length breaking Cloudwatch Logs Constraints, either < 1 or > 5120 (Service: AmazonApiGatewayV2; Status Code: 400; Error Code: BadRequestException; Request ID: xxx-yyy-zzz; Proxy: null)

It's caused by CloudWatch Logs resource policies are limited to 5120 characters. The remediation is merging or removing useless policies then update the resource policy of CloudWatch logs to reduce the characters of policies.

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

Owner

Name: Amazon Web Services - Labs
Login: awslabs
Kind: organization
Location: Seattle, WA

Website: http://amazon.com/aws/
Repositories: 914
Profile: https://github.com/awslabs

AWS Labs

GitHub Events

Total

Watch event: 10
Fork event: 3

Last Year

Watch event: 10
Fork event: 3

Issues and Pull Requests

Last synced: almost 2 years ago

All Time

Total issues: 22
Total pull requests: 78
Average time to close issues: 26 days
Average time to close pull requests: 5 days
Total issue authors: 3
Total pull request authors: 6
Average comments per issue: 0.23
Average comments per pull request: 0.44
Merged pull requests: 44
Bot issues: 0
Bot pull requests: 34

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

zxkane (20)
jl-hk (1)
welkinwalker (1)
pixix (1)

Pull Request Authors

dependabot[bot] (62)
zxkane (33)
chenhaiyun (8)
pixix (1)
zhjwy9343 (1)
gutovsky (1)
yanbasic (1)

Top Labels

Issue Labels

feature-request (15) needs-triage (15) bug (7) documentation (1) guidance (1)

Pull Request Labels

dependencies (62)

Dependencies

frontend/package.json npm

@types/jest ^26.0.23 development
@types/node ^14 development
@types/react ^17.0.11 development
@types/react-dom ^17.0.8 development
@types/react-loader-spinner ^3.1.3 development
@types/react-router-dom ^5.1.7 development
@typescript-eslint/eslint-plugin ^5 development
@typescript-eslint/parser ^5 development
eslint ^8 development
eslint-import-resolver-node ^0.3.6 development
eslint-import-resolver-typescript ^2.5.0 development
eslint-plugin-import ^2.25.4 development
eslint-plugin-react-hooks next development
eventsource ^2.0.2 development
jest ^27 development
jest-junit ^13 development
json-schema ^0.4.0 development
npm-check-updates ^12 development
projen ^0.17.30 development
ts-jest ^27 development
typescript ^4.0.3 development
@material-ui/core ^4.11.4
@material-ui/icons ^4.11.2
@material-ui/lab ^5.0.0-alpha.25
@testing-library/jest-dom ^5.14.1
@testing-library/react ^11.2.7
@testing-library/user-event ^13.1.9
apexcharts ^3.27.1
aws-amplify ^4.3.12
aws-appsync ^4.0.3
aws-sdk ^2.1058.0
axios ^0.24.0
best-queue ^2.0.1
graphql-tag ^2.12.4
i18next ^20.3.1
i18next-browser-languagedetector ^6.1.1
i18next-http-backend ^1.2.6
moment ^2.29.4
react ^17.0.2
react-apexcharts ^1.3.9
react-dom ^17.0.2
react-i18next ^11.11.0
react-loader-spinner ^4.0.0
react-router-dom ^5.2.0
react-scripts ^4.0.0
sweetalert2 ^10.16.9
web-vitals ^1.1.2

frontend/yarn.lock npm

2173 dependencies

package.json npm

@aws-cdk/cloud-assembly-schema ^2 development
@aws-cdk/cx-api ^2 development
@types/bson ^4.2.0 development
@types/jest ^26.0.24 development
@types/mongodb ^3.6.20 development
@types/node ^14 development
@typescript-eslint/eslint-plugin ^5 development
@typescript-eslint/parser ^5 development
aws-cdk ^2.0.0 development
constructs ^10.0.5 development
esbuild ^0.14.49 development
eslint ^8 development
eslint-import-resolver-node ^0.3.6 development
eslint-import-resolver-typescript ^2.7.1 development
eslint-plugin-import ^2.26.0 development
jest ^27 development
jest-junit ^13 development
json-schema ^0.4.0 development
npm-check-updates ^12 development
projen ^0.58.32 development
ts-jest ^27 development
ts-node ^9 development
typescript ~4.6.0 development
@aws-cdk/aws-apigatewayv2-alpha 2.0.0-alpha.11
@aws-cdk/aws-apigatewayv2-integrations-alpha 2.0.0-alpha.11
@aws-cdk/aws-appsync-alpha 2.0.0-alpha.11
@aws-cdk/aws-glue-alpha 2.0.0-alpha.11
@aws-cdk/aws-lambda-python-alpha 2.0.0-alpha.11
@aws-cdk/aws-neptune-alpha 2.0.0-alpha.11
@aws-sdk/client-cloudformation ^3.30.0
@aws-sdk/client-glue ^3.30.0
@aws-sdk/client-lambda ^3.30.0
@aws-sdk/client-secrets-manager ^3.30.0
@aws-sdk/client-serverlessapplicationrepository ^3.30.0
@aws-sdk/client-sts ^3.30.0
@types/aws-lambda ^8.10.83
aws-cdk-lib ^2.0.0
cdk-bootstrapless-synthesizer ^2.1.1
cfn-custom-resource ^5.0.14
constructs ^10.0.5
mongodb ^3.7.0
mongodb-client-encryption ^1.2.6
object-hash ^2.2.0
sync-fetch ^0.3.0

yarn.lock npm

856 dependencies

src/lambda.d/inference/layer/requirements.txt pypi

numpy ==1.22.0
pandas ==1.0.0
sagemaker ==2.24.1

src/lambda.d/layer.d/awswrangler/requirements.txt pypi

awswrangler ==2.7.0

src/sagemaker/FD_SL_DGL/code/requirements.txt pypi

dgl ==0.6.

.github/workflows/bandit-check.yml actions

actions/checkout v2 composite
tj-actions/bandit v4.1 composite

.github/workflows/build.yml actions

actions/checkout v3 composite
actions/download-artifact v3 composite
actions/setup-node v3 composite
actions/upload-artifact v3 composite

.github/workflows/cfn-nag.yml actions

actions/checkout v2 composite
actions/setup-node v2 composite
stelligent/cfn_nag master composite

.github/workflows/codeql-analysis.yml actions

actions/checkout v2 composite
github/codeql-action/analyze v1 composite
github/codeql-action/autobuild v1 composite
github/codeql-action/init v1 composite

.github/workflows/ghpage.yml actions

actions/checkout v2 composite
actions/setup-python v2 composite
peaceiris/actions-gh-pages v3 composite

.github/workflows/lint-pr.yml actions

amannn/action-semantic-pull-request v3.2.6 composite

.github/workflows/pull-request-lint.yml actions

amannn/action-semantic-pull-request v5.0.2 composite

.github/workflows/upgrade.yml actions

actions/checkout v3 composite
actions/download-artifact v3 composite
actions/setup-node v3 composite
actions/upload-artifact v3 composite
peter-evans/create-pull-request v4 composite

src/container.d/load-graph-data/Dockerfile docker

alpine latest build
public.ecr.aws/lambda/python ${VERSION} build

src/lambda.d/repackage-model/Dockerfile docker

public.ecr.aws/lambda/python 3.8 build

src/sagemaker/FD_SL_DGL/gnn_fraud_detection_dgl/Dockerfile docker

$IMAGE_REPO/pytorch-training 1.6.0-cpu-py36-ubuntu16.04 build

https://github.com/awslabs/realtime-fraud-detection-with-gnn-on-dgl

Science Score: 13.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Real-time Fraud Detection with Graph Neural Network on DGL

Architecutre of solution

Model training and deployment stack

Dashboard stack

How to train model and deploy inference endpoint

How to deploy the solution

Regions

Quick deployment

Deploy from source

Prerequisites

Deploy it in a new VPC

Deploy it into existing VPC

Deploy it with custom Neptune instance class and replica count

Deploy it with using SageMaker Serverless Inference(experimental)

Deploy it with custom domain of dashboard

Deploy it to China regions

How to test

Data engineering/scientist

FAQ

What’s the benefits of using the graph neural network in the scenario Fraud Detection?

Why using graph database in this solution?

How differentiate this solution and Amazon Fraud Detector?

How differentiate this solution and Amazon Neptune ML?

Deployment failure on creating CloudWatch LogGroup

Security

License

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies

Deployment failure on creating `CloudWatch LogGroup`