Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: lgen operator #204

Merged
merged 5 commits into from
Jul 30, 2024
Merged

feat: lgen operator #204

merged 5 commits into from
Jul 30, 2024

Conversation

samika98
Copy link
Contributor

@samika98 samika98 commented Jul 17, 2024

Features in this pr :

  1. Generate crds with vars specific to load generation
  2. Add reconciliation controller for load generation specific resources
  3. Change operator yaml to give custom api access for generation load generation specific crds

Tests :

Test 1 : Create a load generation and apply it.

  1. Create a network (similar to basic.yaml) in the docs - network.yaml
  2. Create load generation yaml for configuring load generation
  # load-generator.yaml
apiVersion: "keramik.3box.io/v1alpha1"
kind: LoadGenerator
metadata:
  name: load-gen
  namespace: keramik-small
spec:
  scenario: "model-load-sync"
  runTime: 1
  image: "samika98/runner:final-1"
  imagePullPolicy: "IfNotPresent"
  throttleRequests: 100
  tasks: 10

Result : Success - Was able to generate load successfully against a ceramic node. Model got indexed on the ceramic peer.

@samika98 samika98 marked this pull request as ready for review July 17, 2024 22:42
@samika98 samika98 changed the title Feat/lgen operator feat: lgen operator Jul 17, 2024
Copy link
Contributor

@dav1do dav1do left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly questions, am happy to approve if needed but was leaving the review to the folks more familiar with the k8s requirements.

},
EnvVar {
name: "RUST_LOG".to_owned(),
value: Some("info,keramik_runner=trace".to_owned()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really want to default to trace here? I don't think we actually have tracing yet, but that will probably cause a lot of logs/memory consumption 💸

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just for future iterations. Cause we need tracing one less thing to do I guess when we enable it. We just need to figure out what's causing high memory consumption in tracing.
For now this wont cause a memory consumption increase

operator/src/lgen/spec.rs Outdated Show resolved Hide resolved
runner/src/main.rs Outdated Show resolved Hide resolved
}

/// Start a controller for the LoadGenerator CRD.
pub async fn run() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you see as the future of the simulation crd? will that likely go away? seems like there is a lot of duplication here, which is fine for now but we may want to think about how to share some things if we're going to keep both around (and/or add other variations)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see load generation completely separate from simulations.
Simulations will be short lived, more flexible, produce burst traffic and offer way more flexibility and coverage in terms of use cases it covers.
Load generation is for steady non bursty load on the network.
I believe both can be used together as well. Have a steady load over the network, on top of that load we can run simulations to test out specific use-cases.

operator/src/lgen/job.rs Show resolved Hide resolved
metadata: Some(ObjectMeta {
labels: Some(BTreeMap::from_iter(vec![(
"name".to_owned(),
"load-gen-job".to_owned(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use the LOAD_GENERATOR_JOB_NAME const for name in this file as well?

..Default::default()
}),
spec: Some(PodSpec {
hostname: Some("job".to_owned()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super familiar with the hostname/subdomain settings here.. do we want something more specific?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming is hard. We can probably be more specific here, load-gen-job sound better?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kubernetes uses the hostname and subdomain fields to create a fully qualified domain name (FQDN) for the pod. The FQDN will be of the form hostname.subdomain.namespace.svc.cluster.local
I feel given the subdomain, this should be fine. Any thoughts?

@samika98 samika98 requested review from smrz2001 and dav1do July 25, 2024 19:38
@@ -87,7 +87,7 @@ apiVersion: apps/v1
kind: Deployment
metadata:
name: keramik-operator
namespace: default
namespace: keramik
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this was intentionally default. IIUC, the operator doesn't itself run in the keramik namespace, though @nathanielc might have a better idea if it matters much.

@samika98 samika98 requested a review from smrz2001 July 26, 2024 20:26
Copy link
Collaborator

@smrz2001 smrz2001 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only minor comments from my side. LGTM!

job::{job_spec, JobConfig, JobImageConfig},
spec::{LoadGenerator, LoadGeneratorState},
},
simulation::controller::monitoring_ready,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be moved somewhere common? Maybe it's ok but feels odd to pull in simulation-related code in lgen.

pub mod job;
/// Spec module for creating load generator specs.
#[cfg(feature = "controller")]
pub mod spec;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Want to also add these couple of lines at the end? Very minor, just for consistency with the other mods:

#[cfg(feature = "controller")]
pub use controller::run;

@@ -38,7 +38,8 @@ async fn main() -> Result<()> {
Command::Daemon => {
tokio::join!(
keramik_operator::network::run(),
keramik_operator::simulation::run()
keramik_operator::simulation::run(),
keramik_operator::lgen::controller::run(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if you also export run, you'll be able to use the same syntax as the other modes.

@samika98 samika98 merged commit bff0702 into feat/lgen-runner Jul 30, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants