rl_game/get_up/config/ppo_cfg.yaml

params:
  seed: 42
  algo:
    name: a2c_continuous

  model:
    name: continuous_a2c_logstd

  network:
    name: actor_critic
    separate: False
    space:
      continuous:
        mu_activation: None
        sigma_activation: None
        mu_init:
          name: default
        sigma_init:
          name: const_initializer
          val: 1.2
        fixed_sigma: False
    mlp:
      units: [512, 256, 128]
      activation: relu
      d2rl: False
      initializer:
        name: default

  config:
    name: T1_Walking
    env_name: rlgym # Isaac Lab 包装器
    multi_gpu: False
    ppo: True
    mixed_precision: True
    normalize_input: True
    normalize_value: True
    value_bootstrap: True
    num_actors: 8192 # 同时训练的机器人数量
    reward_shaper:
      scale_value: 1.0
    normalize_advantage: True
    gamma: 0.98
    tau: 0.95
    learning_rate: 5e-4
    lr_schedule: adaptive
    kl_threshold: 0.008
    score_to_win: 20000
    max_epochs: 500
    save_best_after: 50
    save_frequency: 100
    grad_norm: 0.5
    entropy_coef: 0.008
    truncate_grads: True
    bounds_loss_coef: 0.001
    e_clip: 0.2
    horizon_length: 128
    minibatch_size: 8192
    mini_epochs: 4
    critic_coef: 1
    clip_value: True
The demo of get up 2026-03-16 05:00:20 -04:00			`params:`
			`seed: 42`
			`algo:`
			`name: a2c_continuous`

			`model:`
			`name: continuous_a2c_logstd`

			`network:`
			`name: actor_critic`
			`separate: False`
			`space:`
			`continuous:`
			`mu_activation: None`
			`sigma_activation: None`
			`mu_init:`
			`name: default`
			`sigma_init:`
			`name: const_initializer`
change parameter 2026-03-21 10:16:01 -04:00			`val: 1.2`
change parameter 2026-03-20 09:53:34 -04:00			`fixed_sigma: False`
The demo of get up 2026-03-16 05:00:20 -04:00			`mlp:`
			`units: [512, 256, 128]`
			`activation: relu`
			`d2rl: False`
			`initializer:`
			`name: default`

			`config:`
			`name: T1_Walking`
			`env_name: rlgym # Isaac Lab 包装器`
			`multi_gpu: False`
			`ppo: True`
			`mixed_precision: True`
			`normalize_input: True`
			`normalize_value: True`
			`value_bootstrap: True`
Amend bugs 2026-03-22 00:01:21 -04:00			`num_actors: 8192 # 同时训练的机器人数量`
The demo of get up 2026-03-16 05:00:20 -04:00			`reward_shaper:`
			`scale_value: 1.0`
			`normalize_advantage: True`
change the reward remove arm disturbance 2026-03-22 02:26:16 -04:00			`gamma: 0.98`
The demo of get up 2026-03-16 05:00:20 -04:00			`tau: 0.95`
change the reward remove arm disturbance 2026-03-22 02:26:16 -04:00			`learning_rate: 5e-4`
The demo of get up 2026-03-16 05:00:20 -04:00			`lr_schedule: adaptive`
change parameter 2026-03-20 09:53:34 -04:00			`kl_threshold: 0.008`
The demo of get up 2026-03-16 05:00:20 -04:00			`score_to_win: 20000`
change the reward remove arm disturbance 2026-03-22 02:26:16 -04:00			`max_epochs: 500`
The demo of get up 2026-03-16 05:00:20 -04:00			`save_best_after: 50`
			`save_frequency: 100`
change parameter 2026-03-20 10:51:07 -04:00			`grad_norm: 0.5`
change parameter 2026-03-21 10:16:01 -04:00			`entropy_coef: 0.008`
The demo of get up 2026-03-16 05:00:20 -04:00			`truncate_grads: True`
			`bounds_loss_coef: 0.001`
			`e_clip: 0.2`
change parameter 2026-03-21 10:16:01 -04:00			`horizon_length: 128`
			`minibatch_size: 8192`
			`mini_epochs: 4`
change parameter 2026-03-20 09:53:34 -04:00			`critic_coef: 1`
The demo of get up 2026-03-16 05:00:20 -04:00			`clip_value: True`