https://github.com/carv-ics-forth/ghnavs

https://github.com/carv-ics-forth/ghnavs

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.2%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: CARV-ICS-FORTH
  • License: bsd-3-clause
  • Default Branch: main
  • Size: 229 KB
Statistics
  • Stars: 1
  • Watchers: 6
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 1 year ago · Last pushed over 1 year ago
Metadata Files
Readme License

README.md

GHNAVS gem5 simulator

GHNAVS stands for Gem5 - HBM - NoC - ARM - V1 - Simulator

This is a patch for gem5, to enhance support for ARM64 CPUs and ARM-CMN Interconnects modeling.

I. Features

  1. Implementation for multi-layered NoC, where each traffic class (VNET) uses a separate NoC. This is a more accurate modeling of ARM CMN Interconnects[7], in relation to the existing gem5 NoC models (garnet2.0 [1]).

    • The number of NoC layers be adjusted with a single parameter (--noc_layers) from the command-line
  2. Changes in the Ruby CPU Sequencer [2, 3], in order to avoid Sequencer port blocking caused by aliased requests. In the default implementation of Ruby CPU Sequencer subsequent requests (read or write), to the same Cache line are called aliased requests, and cause the CPU master port to block, even for requests targeting a different cacheline. This unrealistic modeling results in very poor performance (due to low number of outstanding requests), for workloads with sequential access to memory (e.g: STREAM [4]).

    • This new feature (called sequencer_port_block_bypass) can be turned on/off using a single configuration parameter.
  3. Processor models:

    • ARM Cortex-A76 CPU and
    • R-CPU, which models an ARM-Neoverse-V1 CPU core with SVE support
  4. NoC Topology: 16-cores, 16 SLC slices, 8 HBM2 controllers 4x4 Mesh Topology

  5. HBM2 memory model

  6. Very detailed NoC statistics

  7. MOESICMPdirectory CC protocol

II. Instructions

A. How to build

Software Requirements

  • g++ version 7
  • scons version 3.01
  • python version 2.7

Additionally you are advised to install pydot and graphviz related libraries, in order to visualize the generated Systems, and cross-check that they are as expected. (When those libraries are installed gem5 will produce config.dot.pdf and config.system.ruby.dot.pdf files, which contain a high level, visual representation of the components of the simulated system).

One can build an LXC debian buster container, which can have all these requirements.

The provided patch is based on public-gem5 source code [6], commit: 904784fb1e15f0c090fb1f1e5c5057e74b0b4ea8. In order to apply the patch use the following:

git clone https://gem5.googlesource.com/public/gem5 cd gem5 git checkout 904784fb1e15f0c090fb1f1e5c5057e74b0b4ea8 git apply *.patch

Example build command:

CXX=g++-7 scons NUMBER_BITS_PER_SET=128 PROTOCOL=MOESI_CMP_directory build/ARM/gem5.opt -j7 --force-lto

B. Important configuration parameters

The following parameters, can be adjusted to more accurate values, depending on the system you want to setup.

| Param | File | Value | Description | | --- | --- | --- | --- | | maxoutstandingrequests | src/mem/ruby/system/Sequencer.py | 128 | The number of outstanding requests for Ruby CPU Sequencer | | sequencerportblockbypass | src/mem/ruby/system/Sequencer.py | True | Use True to enable FORTH Sequencer (False for the default Ruby Sequencer) | | --sys-clock | command-line | 2GHz | System Clock | | --ruby-clock | command-line | 2GHz | Ruby Subsystem Clock | | --cpu-clock | command-line | 2GHz | CPU clock | | --topology | command-line | MeshEPIquadrantp1 | MeshXY with multiple NoC layers support | | --mesh-rows | command-line | 4 | 4x4 Mesh, (set to 4 for 4x4 Mesh) | | --link-width-bits | command-line | 576 | This covers 8 bytes for control and 64 bytes for data (64 bit + 512 bits) | | --noclayers | command-line | 3 | The number of layers is dependent on the number of VNETS of the CC protocol that is used. Use 3 for MOESICMPdirectory protocol. Use 6 if you want to enable the multi-VNETs NoC feature | | svevlse | src/arch/arm/ArmISA.py | 1/2/4/8/16 | SVE Length in Quadwords (quadword = 128 bits). This only works for SE mode | | sve_vl | arch/arm/ArmSystem.py | 1/2/4/8/16 | SVE Length in Quadwords (quadword = 128 bits). This only works for FS mode | | --num-dirs | command-line | 4,8,... | On Ruby Systems, the number of memory channels is directly related to the number of Cache directories. Note that mem-channels parameter is ignored when modeling Ruby Systems | | --mem-channels | command-line | 1,2,4,8,... | On Classic Memory Systems the number of memory channels is adjusted by mem-channels | | --ports | command-line | 4 | For Ruby Systems: CC transitions per cycle |

C. How to run

1. Syscall emulation (SE) case (se.py)

An example command for SE mode is the following:

./build/ARM/gem5.opt \ --outdir=m5out/outdir_1 \ configs/example/se.py \ --cpu-type=R_CPU \ --arm-iset=aarch64 \ --num-cpus=4 \ --num-dirs=4 \ --caches \ --l2cache \ --num-l2caches=4 \ --l1i_size=64kB \ --l1d_size=64kB \ --l2_size=1MB \ --l1i_assoc=4 \ --l1d_assoc=4 \ --l2_assoc=8 \ --mem-type=DDR4_2400_8x8 \ --mem-size=2GB \ --sys-clock=2GHz \ --cpu-clock=2GHz \ --ruby-clock=2GHz \ --ruby \ --topology=Mesh_EPI \ --mesh-rows=2 \ --network=garnet2.0 \ --link-width-bits=576 \ --noc_layers=3 \ --vcs-per-vnet=4 \ -c sve_stream_copy_se.exe Note: One can use -c $executable1;$executable2;$executable3;$executable4 to launch the same or different applications to each core, since mutli-thread support is limited in SE mode.

2. FullSystem (FS) mode (fs.py)

For Full System mode we use fs.py simulation script, as it supports the gem5 Ruby System [2], and therefore NoC modeling.

  • Getting a checkpoint: We always get a checkpoint using a classic system setup. The number of CPU cores, SVE lengths, and memory size are important as changing those requires a new checkpoint. Additionally mounting or adding files to the image file requires a new checkpoint to be taken.

./build/ARM/gem5.opt \ --outdir=m5out/checkpoint_outdir \ configs/example/fs.py \ --cpu-type=AtomicSimpleCPU \ --cpu-clock=2GHz \ --num-cpus=16 \ --kernel=vmlinux_bsc \ --disk=aarch64-ubuntu-armcl-sve.img \ --machine-type=VExpress_GEM5_V1 \ --mem-type=DDR4_2400_8x8 \ --script=configs/boot/hack_back_ckpt.rcS \ --mem-size=2GB

  • Restoring from a checkpoint: In order to restore from a checkpoint, the checkpoint directory (named cpt.*) should be copied (or sym-linked), inside the outdir of the new simulation (outdir_2 in the following examples).
  • If you don't want to restore from a checkpoint, remove -r 1 from the command line.

2-a) Simple system with a 2x2 Mesh Noc

8 core - Ruby NoC System - MOESI_CMP_directory

2-b) 4x4 Mesh: 16 cores - 16 SLCs - 8 HBM2 controllers (Topology: MeshEPIquadrant_p1)

``` CXX=g++-7 scons NUMBERBITSPERSET=128 PROTOCOL=MOESICMP_directory build/ARM/gem5.opt -j7 --force-lto

./build/ARM/gem5.opt \ --listener-mode=off \ --outdir=m5out/outdir2 \ configs/example/fs.py \ -r 1 \ --cpu-type=RCPU \ --restore-with-cpu=RCPU \ --arm-iset=aarch64 \ --kernel=vmlinuxbsc \ --disk=aarch64-ubuntu-armcl-sve.img \ --machine-type=VExpressGEM5V1 \ --script=configs/boot/bootscript1.rcS \ --num-cpus=16 \ --num-dirs=8 \ --num-l2caches=16 \ --caches \ --l2cache \ --l1isize=64kB \ --l1dsize=1MB \ --l2size=2MB \ --l1iassoc=4 \ --l1dassoc=4 \ --l2assoc=8 \ --mem-type=HBM220004H1x128 \ --mem-size=2GB \ --sys-clock=2GHz \ --cpu-clock=2GHz \ --ruby-clock=2GHz \ --ruby \ --topology=MeshEPIquadrantp1 \ --mesh-rows=4 \ --network=garnet2.0 \ --link-width-bits=576 \ --noc_layers=3 \ --vcs-per-vnet=4 ```

EPI Quadrant Topology

III. Advanced features

A. Setup with SLC approximation

MOESICMPdirectory only supports L1 and L2. However we can approximate a 3-level cache hierarchy using the following: The idea is to use the L2 cache controllers to approximate the SLC cache slices. L1D size is increased to match the size of L1+L2 of the modeled platform, and L2 size is increased to match the size of the platform SLC. Also the cache latencies are adjusted accordingly.

The provided patch already adjusts L1/L2 cache latencies of the model. Then, you can adjust the L1D size, L2 size (which acts as SLC), as well as the number of L2 Controllers (--num-l2caches), approximating the number of SLC slices.

B. Mutli-VNETs feature (Double NoC bandwidth)

This release can support having multiple VNETs per Request/Response (This means that each VNET can have 2 physical links). More specifically one can initiate a gem5 system with a total of 6 NoC layers, instead of 3, while using MOESI_CMP_directory. Use --noc_layers=6, and make sure that use_offset_vnets = True in file: configs/network/Network.py.

C. Detailed NoC Latencies (and CC message types)

One can print detailed per VNET and per Source to Destination NoC Queue and Network latencies. To enable this feature, use --debug-flags=RubyNetConnections. When using this debug flag, the output shown, will contain two types of messages.

The first one regards the NoC latencies. For example:

7207675622500: system.ruby.network: vnet:2, [L1Cache_Controller 3]->[L2Cache_Controller 5], NI[3->21], Rtr[46->37], hops[3], queue_AvgMinMaxMed[8.06/8/53/8.00], net_AvgMinMaxMed[9.86/9/91/9.00], flits:29393 The message is split into the following fields:

  1. gem5 simulation current tick
  2. VNET number
  3. Source Controller -> Destination Controller (Reminder: L1 represents L1+L2 and L2 represents SLC)
  4. Source NI -> Destination NI
  5. Source Router -> Destination Router
  6. Number of hops
  7. Queueing latencies (Avg, Min, Max, Median)
  8. Network latencies (Avg, Min, Max, Median)
  9. Number of packets (flits) sent from source to destination during the last ROI

The second type of messages, describes what type of CC messages, each Source Controller sends to each Destination Controller per VNET and ROI. An example printout is the following:

7207675622500: system.ruby.network: vnet:1, [L2Cache_Controller 2]->[Directory_Controller 0], REQ: [GETX : 19691][GETS : 39488][PUTX : 19723][WRITEBACK_DIRTY_DATA : 19723] 7207675622500: system.ruby.network: vnet:2, [L1Cache_Controller 3]->[L2Cache_Controller 0], RESP: [UNBLOCK : 19713][UNBLOCK_EXCLUSIVE : 9781] Those types of messages contain the following fields:

  1. gem5 simulation current tick
  2. VNET number
  3. Source Controller -> Destination Controller (Reminder: L1 represents L1+L2 and L2 represents SLC)
  4. REQ / RESP (Request, Response Type)
  5. A sequence of: [CC Message type : Number of messages during last ROI]

IV. References

[1] http://old.gem5.org/Garnet2.0.html

[2] http://old.gem5.org/Ruby.html

[3] http://old.gem5.org/Coherence-Protocol-IndependentMemoryComponents.html

[4] https://www.cs.virginia.edu/stream/

[5] http://old.gem5.org/Running_gem5.html

[6] https://gem5.googlesource.com/public/gem5

[7] Arm Neoverse CMN‑650 Coherent Mesh Network TRM

V. Acknowledgements

We thankfully acknowledge support for this research from the European High Performance Computing Joint Undertaking (EuroHPC JU) under Framework Partnership Agreement No 800928 (European Processor Initiative) and Specific Grant Agreement No 101036168 (EPI-SGA2). The EuroHPC JU receives support from the European Union’s Horizon 2020 research and innovation programme and from Croatia, France, Germany, Greece, Italy, Netherlands, Portugal, Spain, Sweden, and Switzerland. National contributions from the involved state members (including the Greek General Secretariat for Research and Innovation) match the EuroHPC funding.

Owner

  • Name: Computer Architecture and VLSI Systems (CARV) Laboratory
  • Login: CARV-ICS-FORTH
  • Kind: organization
  • Location: Heraklion, Greece

GitHub Events

Total
  • Watch event: 1
  • Public event: 1
Last Year
  • Watch event: 1
  • Public event: 1

Issues and Pull Requests

Last synced: over 1 year ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels