add doc content

TheEimer · TheEimer · commit 25efb5d5f2f6 · 2025-04-14T16:02:14.000+02:00
diff --git a/docs/basic_usage.md b/docs/basic_usage.md
@@ -0,0 +1,27 @@
+There are a few different ways you can use Mighty:
+
+### Running Meta-Methods
+This is the easiest part. We have several algorithms and meta-methods implemented in Mighty and you should be able to run them directly on any environment of your choosing. The most difficult part will likely be the configuration of each method since they might require specific keywords or are only compatible with a given base algorithm. So you will likely want to read up on whatever method you choose. Then you also need to know if your method is of the runner or meta module type. Each have their own configuration keyword. An example for using a specific runner is:
+
+```bash
+python mighty/run_mighty runner=es popsize=5 iterations=100 es=evosax.CMA_ES search_targets=["learning_rate", "_batch_size"] rl_train_agent=true
+```
+This will use the evosax CMA-ES implementation with population size 5 to optimize the learning rate and batch size in 100 iterations. Meta modules, on the other hand, use a different keyword:
+```bash
+python mighty/run_mighty.py +algorithm_kwargs.meta_methods=[mighty.mighty_meta.PrioritizedLevelReplay]
+```
+This meta methods list collects all meta modules in the order they should be used. So while you can't use multiple runners, you can use layers of meta modules. 
+
+### Implementing New Components
+Of course Mighty currently only supports a limited amount of methods. This is where you come in! It should be fairly easy for you to add your own. We recommend following these steps:
+1. What are you adding? A runner, meta module, exploration policy, buffer, update variation or model? Make sure you choose the best level to implement your idea in.
+2. Implement your method using the abstract class and existing methods as templates.
+3. Plug your class into your Mighty config file. This works by replacing the default value with the import path of your custom class.
+4. Run the algorithm.
+
+Since you are passing the place from which to import your new class, you do not need to work within the Mighty codebase directly, but keep your changes separate. This way you can add several new methods to Mighty without copying the code. 
+
+### Combining Different Ideas
+You can combine different approaches with Mighty by varying the runner, exploration, buffer, update class and network architecture and combining them with an arbitrary number of meta modules.
+At this point, configuration might become very difficult. We recommend that you take a close look at how to use different hydra configuration files to separately configure each of your methods so that you can keep track of everything.
+Depending on what exactly you want to do, it can make sense to keep separate configuration files for each variation you make. This can be confusing, especially if you haven't worked with hydra before, so we recommed you take the time to focus on configurations when attempting combinations of several methods.
diff --git a/docs/methods/algorithms.md b/docs/methods/algorithms.md
@@ -0,0 +1,4 @@
+- Algorithms generally
+- exploration policies
+- buffers
+- update classes
diff --git a/docs/methods/architectures.md b/docs/methods/architectures.md
@@ -0,0 +1,4 @@
+- how models are implemented (feature extractor -> rest)
+- existing models
+- changing structure
+- when to add new models
diff --git a/docs/methods/inner_loops.md b/docs/methods/inner_loops.md
@@ -0,0 +1,4 @@
+- general meta module idea
+- what info they get
+- when they can act
+- combining modules
diff --git a/docs/methods/outer_loops.md b/docs/methods/outer_loops.md
@@ -0,0 +1,3 @@
+- when to use runners
+- what they have access to
+- examples?
diff --git a/docs/package_structure.md b/docs/package_structure.md
@@ -0,0 +1,90 @@
+Mighty is desined to be highly modular, enabling access to the RL loop on different levels. This means it's not designed to be the absolute fastest way to run RL, but the most convenient one to apply different sorts of RL, MetaRL and AutoRL methods. As such, there are a few things you should know about the structure of Mighty.
+
+### For Multiple Inner Runs: Mighty Runners
+Mighty uses runner classes to control the outer training loop. In the simplest case, a runner will just directly call the agent's train and evaluation functions without any changes:
+
+```python
+def run(self) -> Tuple[Dict, Dict]:
+        train_results = self.train(self.num_steps)
+        eval_results = self.evaluate()
+        return train_results, eval_results
+```
+This will result in a standard RL agent training run. Of course, we can at this point also run agents multiple times, make changes to their setup (hyperparameters, weights, environments) and integrate learning on this meta-level.
+A still fairly simple example is our ESRunner for outer loops with Evolutionary Strategies:
+
+```python
+def run(self) -> Tuple[Dict, Dict]:
+        es_state = self.es.initialize(self.rng)
+        for _ in range(self.iterations):
+            rng_ask, _ = jax.random.split(self.rng, 2)
+            x, es_state = self.es.ask(rng_ask, es_state)
+            eval_rewards = []
+
+            for individual in x:
+                if self.search_params:
+                    self.apply_parameters(individual[: self.total_n_params])
+                    individual = individual[self.total_n_params :]
+
+                for i, target in enumerate(self.search_targets):
+                    if target == "parameters":
+                        continue
+                    new_value = np.asarray(individual[i]).item()
+                    if target in ["_batch_size", "n_units"]:
+                        new_value = max(0, int(new_value))
+                    setattr(self.agent, target, new_value)
+
+                if self.train_agent:
+                    self.train(self.num_steps_per_iteration)
+
+                eval_results = self.evaluate()
+                eval_rewards.append(eval_results["mean_eval_reward"])
+
+            fitness = self.fit_shaper.apply(x, jnp.array(eval_rewards))
+            es_state = self.es.tell(x, fitness, es_state)
+
+        eval_results = self.evaluate()
+        return {"step": self.iterations}, eval_results
+```
+Here we can change all sorts of things about the agent, train in between or only evaluate and use the ES to get fresh inputs. Runner classes are defined with these multiple evaluations of RL tasks in mind, i.e. these classes will usually train multiple agents, reset their policies completely or otherwise start over at some point. 
+
+### For In-The-Loop Methods: Mighty Meta Modules
+
+Not all Meta- or AutoRL methods operate in an outer loop, however. For the ones that configure training while it is still ongoing, we use the Mighty Meta Modules. 
+These are classes that maintain lists of function calls to make at different points in training:
+
+```python
+    def __init__(self) -> None:
+        """Meta module init.
+
+        :return:
+        """
+        self.pre_step_methods = []
+        self.post_step_methods = []
+        self.pre_update_methods = []
+        self.post_update_methods = []
+        self.pre_episode_methods = []
+        self.post_episode_methods = []
+```
+This gives meta modules a lot of flexibility of when to act upon training. Additionally, each of these function calls is given a "metrics" dictionary. This dictionary contains most, if not all, relevant information about training progress, e.g.:
+- the last transitions
+- the last losses, errors and predictions
+- policy, Q- and value-networks
+- hyperparameters
+
+This means meta modules can use everything from the current timestep to agent predictions. 
+
+
+### Algorithm Components: Mighty Exploration, Buffers and Updates
+
+The Mighty algorithms themselves also have modules which can be easily switched. These are exploration policies, buffers and update classes. 
+Exploration policies and buffers furthermore have access to the same metrics dictionary as meta modules, meaning you can get creative as to what they do with this information.
+The way they are used in the RL loop is fixed, however, such that these are a bit more streamlined than the completely free meta-modules.
+
+
+### Inside the Agent: Mighty Models
+
+Agent loops outside of exploration, buffers and updates are harder to alter in Mighty, since Mighty is primarily focused on meta-methods.
+You can control the network architecture of your agent fairly easily, however. 
+There are two principal avenues for this: 
+1. You can use one of the pre-defined Mighty Models and configure it to use a different network architecture in the config. We use torch internally, that means you can allocate torch.nn layers and activations in different parts of these networks to form a custom architecture.
+2. If you also want to customize what exactly the network predicts or add things like frozen weights, you probably want to implement your own Mighty Model. These always contain a 'feature_extractor' as a base and can vary beyond that.

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+- when to use runners`
	`2`	`+- what they have access to`
	`3`	`+- examples?`