Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
H
hands-on-rl
Manage
Activity
Members
Labels
Plan
Issues
Issue boards
Milestones
Wiki
Code
Merge requests
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Deploy
Releases
Package registry
Model registry
Operate
Terraform modules
Monitor
Incidents
Analyze
Value stream analytics
Contributor analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
GitLab community forum
Contribute to GitLab
Provide feedback
Terms and privacy
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
Brussart Paul-emile
hands-on-rl
Commits
25c3cb07
Commit
25c3cb07
authored
2 years ago
by
Brussart Paul-emile
Browse files
Options
Downloads
Patches
Plain Diff
Adding a2c_sb3_cartpole.py using wandb to test the model
parent
8b00c644
No related branches found
No related tags found
No related merge requests found
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
a2c_sb3_cartpole.py
+38
-13
38 additions, 13 deletions
a2c_sb3_cartpole.py
with
38 additions
and
13 deletions
a2c_sb3_cartpole.py
+
38
−
13
View file @
25c3cb07
import
gym
from
stable_baselines3
import
A2C
from
stable_baselines3.common.monitor
import
Monitor
from
stable_baselines3.common.vec_env
import
DummyVecEnv
import
wandb
from
wandb.integration.sb3
import
WandbCallback
config
=
{
"
policy_type
"
:
"
MlpPolicy
"
,
"
total_timesteps
"
:
20000
,
"
env_name
"
:
"
CartPole-v1
"
,
}
run
=
wandb
.
init
(
project
=
"
cartpole
"
,
config
=
config
,
sync_tensorboard
=
True
,
monitor_gym
=
True
,
save_code
=
True
,
)
# Create the CartPole environment
env
=
gym
.
make
(
'
CartPole-v1
'
)
def
make_env
():
env
=
gym
.
make
(
config
[
"
env_name
"
])
env
=
Monitor
(
env
)
# record stats such as returns
return
env
env
=
DummyVecEnv
([
make_env
])
# Wrap the environment in a DummyVecEnv to handle multiple environments
env
=
DummyVecEnv
([
lambda
:
env
])
# Initialize the A2C model
model
=
A2C
(
'
MlpPolicy
'
,
env
,
verbose
=
1
)
# Train the model for 1000 steps
model
.
learn
(
total_timesteps
=
1000
)
# Train the model for 20 000 steps
model
.
learn
(
total_timesteps
=
config
[
"
total_timesteps
"
],
callback
=
WandbCallback
(
gradient_save_freq
=
100
,
model_save_path
=
f
"
models/
{
run
.
id
}
"
,
verbose
=
2
,
)
)
#Saving the model
model
.
save
(
"
a2c_sb3_cartpole
"
)
# Test the trained model
obs
=
env
.
reset
()
for
i
in
range
(
1000
):
action
,
_states
=
model
.
predict
(
obs
)
obs
,
rewards
,
dones
,
info
=
env
.
step
(
action
)
env
.
render
()
\ No newline at end of file
model
.
save
(
"
a2c_sb3_cartpole_model
"
)
run
.
finish
()
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment