Chaos Engineering release notes
The release notes describe recent changes to Harness Chaos Engineering.
- Progressive deployment: Harness deploys changes to Harness SaaS clusters on a progressive basis. This means the features described in these release notes may not be immediately available in your cluster. To identify the cluster that hosts your account, go to your Account Overview page in Harness. In the new UI, go to Account Settings, Account Details, General, Account Details, and then Platform Service Versions.
- Security advisories: Harness publishes security advisories for every release. Go to the Harness Trust Center to request access to the security advisories.
- More release notes: Go to Harness Release Notes to explore all Harness release notes, including module, delegate, Self-Managed Enterprise Edition, and FirstGen release notes.
January 2024
Version 1.30.0
New features and enhancements
-
Appropriate environment variables are added at relevant places to ensure that the self-managed platform (SMP) can be used with feature flags (FF). (CHAOS-3865)
-
The SSH chaos experiment now supports an extended termination grace period, allowing for longer execution of abort scripts. (CHAOS-3748)
-
This release adds wildcard support for all entities in the chaosguard conditons. (CHAOS-3254)
Fixed issues
- Chaos hub icons were not visible when the hub name consisted of the '/' character. This has been fixed so that a user can't create a hub with the '/' character. (CHAOS-3753)
Version 1.29.0
New features and enhancements
- Improves the error messages and logs returned to the client in the API to save chaos experiments. (CHAOS-3607)
Fixed issues
-
Linux chaos infrastructure (LCI) installer wasn't executing the script with sudo privileges, which resulted in Failed to install linux-chaos-infrastructure error. This issue is now resolved. (CHAOS-3724)
-
Deselecting the Show active infra displayed the inactive infrastructures only, whereas it should display all the infrastructures. This issue is now resolved. (CHAOS-3717)
-
LCI process would get killed due to a lack of memory (OOM) when a high amount of memory was specified during a memory stress fault. This issue is now resolved so that the likeliness of OOM kills during limited memory availability is reduced. (CHAOS-3469)
Version 1.28.1
New features and enhancements
-
Adds optimisation to utilise memory efficiently, reduce latency, and enhance server performance. (CHAOS-3581)
-
Linux infrastructure is automatically versioned with the help of the API. Previously, the versions were hardcoded for every release. (CHAOS-3580)
-
Adds a condition to the experiment such that a resilience probe can't be added more than once in a single fault within an experiment. The same resilience probe can be used in another fault within the same experiment, though. (CHAOS-3520)
-
Adds a generic audit function that is used to generate all audit trails, thereby reducing redundancy. This generic function is customized based on the type of audit (Chaos experiment, Gameday, Chaos infrastructure, and so on). (CHAOS-3484)
-
With this release, the Linux chaos infrastructure binary uses static linking instead of dynamic linking. This removes any dependency on the OS built-in programs including
glibc
. (CHAOS-3334) -
Enhanced the performance of the API (GetExperiment) that was used to fetch details of Kubernetes and Linux experiments. An optional field is added that fetches the average resilience score. (CHAOS-3218)
-
Adds support for bulk-disable (disable enabled CRON schedules selected by user) and bulk-enable (enable disabled CRON schedules selected by user) CRON-scheduled experiments, with a limit of 20 experiments for every operation. (CHAOS-3174)
Fixed issues
-
After selecting an experiment, when a user tried to select an active infrastructure for the experiment, the page would throw an error. This has been fixed. (CHAOS-3585)
-
Editing a Linux experiment to change the infrastructure would not update the infrastructure. This has been fixed. (CHAOS-3536)
-
When multiple faults are executed in parallel, faults that transitioned into an "errored" state would not reflect in the logs, whereas faults in success state reflected in the logs with an "errored" status. This has been fixed. (CHAOS-3363)
December 2023
Version 1.27.1
New features and enhancements
-
Adds a filter to the listWorkflow API so that data can be filtered based on whether it is CRON-enabled or not. (CHAOS-3424)
-
While selecting a chaos infrastructure to create an experiment, users can list the active infrastructures by clicking the checkbox Show active only. (CHAOS-3350)
-
Metrics for the Dynatrace probe (Metrics Selector and Entity Selector) have been made compulsory. This ensures that the required properties are always passed while creating a Dynatrace probe. (CHAOS-3330)
-
An experiment can be created against inactive chaos infrastructure(s). This was done to complement the preparatory actions in environments that require agents to be scaled down (K8s) or stopped (Linux) except during the chaos execution window. (CHAOS-3241)
-
This release deprecates the
ACCESS_KEY
invalidation after a chaos infrastructure is successfully connected. Users can use the same manifest to connect to the infrastructures. (CHAOS-3164) -
Adds UI support to search conditions for selection while creating a ChaosGuard rule. (CHAOS-2982)
-
Adds support to incorporate
secretRef
andconfigMapRef
with the tunables for VMWare faults. (CHAOS-2750) -
Adds support for encoding metrics queries in Dynatrace probes. These metrics are constructed and executed using the metrics (or data) explorer before the API call [POST]. (CHAOS-2852)
Fixed issues
-
After an experiment timed out, the execution nodes would remain in the running state. This has been fixed. (CHAOS-3094)
-
Adding a probe without the
description
key broke theaddProbe
API. The API is now fixed to accept a blank string if no value is provided in thedescription
or the key is missing in the API request. (CHAOS-3224) -
For probe failures, the probe success iteration ratio would show up twice in the experiment logs. This has been fixed. (CHAOS-3421)
November 2023
Version 1.26.0
New features and enhancements
-
Renamed three keys in the Dynatrace probe:
- dynatrace_endpoint is now endpoint
- dynatrace_metrics_selector is now metrics_selector and is present inside metrics
- dynatrace_entity_selector is now entity_selector and is present inside metrics. (CHAOS-3177)
-
When an SSH experiment is executed inside a VM using the SSH credentials, the experiment uses parameters to allow the chaos logic scripts to receive dynamic inputs. (CHAOS-3049)
-
Field token name lengths have been reduced by modifying the Dynatrace probe schema for Kubernetes. (CHAOS-3043)
-
Linux infrastructure version is displayed on the landing page that lists all the Linux infrastructure. (CHAOS-2845)
Fixed issues
-
While editing probes, the name validation check resulted in the error "probe name not available". This has been fixed. (CHAOS-3216)
-
When a user creates an experiment by selecting the predefined experiments, the dropdown menu shows experiment type instead of Chaoshubs. This has been fixed. (CHAOS-3193)
-
HTTP Linux OnChaos probe usage halted the fault execution because the probe finished executing before the fault thread could begin the evaluation of probes, which resulted in a deadlock. This issue has been fixed. (CHAOS-3180)
-
Erroneous timestamps were displayed in the UI, which led to wrong values and headings being shown in the UI. This has been fixed. (CHAOS-3178)
-
Previously configured SLO probe property fields appeared empty when the user tried to edit them. This has been fixed. (CHAOS-3176)
-
The node selector attribute in ChaosEngine added two fields, namely key and value, instead of key:value. This has been fixed. (CHAOS-3173)
-
With changes in the image registry, the LIB_IMAGE environment variable was being overwritten by chaos-go-runner. This has been fixed. (CHAOS-3172)
-
Probes whose execution time exceeded 180 seconds would error out with N/A status, regardless of probeTimeout settings. This has been fixed. (CHAOS-3169)
-
When a gameday was deleted, the name of a deleted gameday would not show up in the audit event. It has been fixed. (CHAOS-3158)
-
Probe details, such as verdict, status and mode were not retrieved for the correct runID and notifyID. This has been fixed. (CHAOS-3144)
-
An experiment would keep running in the pipeline even if it transitioned to an error status. This has been fixed. (CHAOS-1985)
Version 1.25.5
New features and enhancements
-
Added a "Run now" button to the three-dot menu on the experiment dashboard. You can run cron experiments manually now. (CHAOS-3110)
-
Until an experiment is saved, the "run experiment" or "enable cron" buttons are hidden. (CHAOS-3099)
-
A cron enable/disable button is added to the dashboard table menu so that you can enable or disable the cron experiments from the dashboard itself. (CHAOS-3027)
-
A new field, "last_executed_at", is added to the chaos experiments. This new field is updated whenever an event is received during the course of an experiment run. (CHAOS-3018)
-
While creating an experiment, if a YAML file is uploaded that can't be parsed, a warning is displayed on the user interface. (CHAOS-3016)
-
You can now sort experiments based on the "recently executed" and "last modified" filters in ascending and descending order. (CHAOS-2895)
-
Dynatrace probes are now available on the Linux chaos infrastructure. (CHAOS-2879)
-
Custom arguments/flags are added to the command for VMware stress and network faults. (CHAOS-2846)
-
The pod memory hog chaos experiment provides distinction between experiments that failed (as an expected result) versus experiments that actually failed. (CHAOS-2515)
-
Cron and non-cron experiment types can be identified manually or using the tooltip by hovering over individual run boxes in resilience probes. (CHAOS-3010)
-
Added a new Cloud Foundry fault, "CF app route unbind". (CHAOS-2912)
-
If a previous CRON experiment is not running or is in a queued state, such a CRON experiment can be executed on-demand. This is done by clicking Run Experiment button on the vertical three-dot menu on the experiment page. (CHAOS-2896)
-
The pipeline manifest will be stored in the Harness repository. (CHAOS-2040)
Fixed issues
-
The sandbox API was being called when the corresponding flag was off. This has been fixed. (CHAOS-3126)
-
SLO probe properties in the fault selection and probe details in the runs view UI were not being displayed. This has been fixed. (CHAOS-3119)
-
Added support for SKIP_SSL_VERIFY in readiness probes for the execution plane components. (CHAOS-3115)
-
Mongo queries resulted in fetching results for deleted gamedays. This has been fixed by adding a field "is_removed" to the Mongo queries. (CHAOS-3091)
-
Linux chaos infrastructure did not provide JSON log output. This issue has been fixed. (CHAOS-2989)
-
The probe mode would be pre-selected as SOT by default. Now, it will be empty, and no value will be present by default. (CHAOS-2455)
-
CRIO runtime would give an unknown service runtime.v1alpha2.RuntimeService error. This issue has been fixed. (CHAOS-3019)
-
When a user who does not have view access in one of the scopes (Project/Organization/Account) tried to run an experiment, they encountered a permission error. This issue has been fixed. (CHAOS-2810)
-
When no tunables were selected for a fault, the Learn more link did not redirect to a destination. This issue has been fixed. (CHAOS-2973)
Version 1.24.5
New features and enhancements
-
This release adds default limits for the number of chaos probes that can be created when a chaos infrastructure is created by adding a chaos probe resource limit per account. (CHAOS-2880)
-
This release adds a new log viewer, which includes:
- New tab for helper pod logs.
- Support for grouping and minimising logs.
- Colors for various log levels.
- Logs can be downloaded, copied, and scrolled over.
- Position retention when logs are manually scrolled while streaming.
- Parsing arguments. (CHAOS-2809)
-
This release adds a validation check to the template name and entry point in the YAML to match at least one template name with the entry point name. The check ensures that the visual builder shows the faults correctly. (CHAOS-2933)
-
This release adds support for chaos dashboards in SMP. (CHAOS-3100)
-
This release adds support for source and destination ports, isolating the ports as well as excluding them for VMware network faults. (CHAOS-2892)
-
This release adds support for source and destination ports, isolating the ports as well as excluding them for Linux network faults. (CHAOS-2873)
-
This release allows you to run multiple SOT or EOT probes in parallel in Kubernetes. (CHAOS-2863)
-
This release supports min, max and mean values as parameters in the Dynatrace probe. (CHAOS-2853)
-
This release adds the usage of sandbox network namespace for the CRI-O runtime, thereby enhancing the network faults. (CHAOS-2825)
-
The format of logs has changed from JSON to level:"" timestamp:"" out: "" args:"". This improves the readability of logs. (CHAOS-2807)
-
This release adds the probe iteration success count to the probe description. (CHAOS-2797)
-
This release introduces a new fault- pod API block. This fault blocks the API based on path filtering. (CHAOS-2722)
-
This release supports adding labels from the Advanced Tune section in the UI. (CHAOS-2612)
-
This release adds an enhanced generic script injector framework that offers greater flexibility and control over your chaos experiments. It helps add chaos to target hosts using custom scripts that are passed using a configmap. These scripts are executed using SSH credentials securely referenced within the configmap. (CHAOS-2625)
-
This release introduces a new fault- cloud foundry app stop. This fault stops a Cloud Foundry app for a fixed time period and later starts it. (CHAOS-2619)
-
This release introduces a new fault- pod network rate limit. This fault determines the resilience of a Kubernetes pod under limited network bandwidth. (CHAOS-2478)
-
This release reflects changes made in the chaos infrastructure images and the experiment images in their respective manifests when an image registry setting is changed. (CHAOS-2881)
-
This release adds Linux stress and network fault custom arguments/flags that can be used with the stress-ng (stressNGFlags input) and tc (netemArgs input) commands, respectively. (CHAOS-2832)
Early access features
-
This release introduces a new fault- Linux network rate limit. This fault slows down network connectivity on a Linux machine by limiting the number of network packets processed during a time period. (CHAOS-2495)
-
This release optimises the Kube API calls by allowing the Linux IFS to use Redis for caching. (CHAOS-2119)
-
The tag filter in the query that fetches Linux experiments was removed so that Linux experiments can be edited. Previously, the Linux experiments could not be edited. (CHAOS-2827)
-
Once an experiment was pushed to the chaos hub, every fault was displayed twice in the CSV file. This has been fixed. (CHAOS-2971)
Fixed issues
-
Attempting to delete a GameDay resulted in an internal server error. This has been fixed. (CHAOS-2975)
-
The cron button on the right sidebar could not be updated in real time. It has been fixed so that the button can be toggled while updating the cron schedule. (CHAOS-2904)
-
Memory consumption fluctuated when the Linux memory stress fault was in action. This has been fixed. (CHAOS-2806)
-
If an experiment was stuck in the queued state for more than 2 hours, it would remain so indefinitely. It was fixed so that the experiment run times out if it is in the queued state for more than 2 hours. (CHAOS-2843)
-
Executing parallel faults resulted in write conflicts. This has been fixed by adding helper pod names as annotations and patching these names to the chaos result, thereby preventing the write conflict. (CHAOS-2834)
-
The reports of chaos experiment runs were missing details such as experiment run ID, experiment end time, and chaos injection duration. The issue was fixed to reflect these details. (CHAOS-2830)
-
Clicking the copy button on the infrastructure page led to rendering the details of the infrastructure. This has been fixed. (CHAOS-2791)
-
The probe name in the URL field broke the probe configuration tab. This has been fixed by adding the URL search parameters to the URL. (CHAOS-2821)
-
Clicking the Chaos Studio tab navigation would reset the states of the header and sidebar and hide some buttons. It was fixed so that the states are not reset and all buttons are visible. (CHAOS-2837)
October 2023
Version 1.23.5
New features and enhancements
-
Added support for the execution of pod-delete fault against workloads which are not managed by the standard native-controllers such as deployment, statefulset and daemonset. With this change, this fault can be executed on pods managed by custom controllers. (CHAOS-2798)
-
Added support for enabling and disabling schedules for cron experiments. This can be found in the right-side nav bar. (CHAOS-2731)
-
Enhanced Network Chaos faults (loss/latency/corruption/duplication) to support specific source and destination ports from the network fault i.e., traffic to the defined ports will not be impacted by the chaos injection. (CHAOS-2712)
-
Enhanced service kill experiments on Google Kubernetes Engine (now uses the gcloud ssh function to carry out the kill operations instead of deploying a helper pod on the targeted node). Also added support for containerd runtime. (CHAOS-2649)
-
Added support for specifying securityContext for chaos experiment related resources via user interface under advanced configuration. As part of supporting OCP4.11+ we have also stopped appending default security context attributes runAsUser & runAsGroup into the experiment/infrastructure manifest, and instead given the users the ability to add them optionally via the UI. (CHAOS-2614)
-
Added support for <,>,<=,>= operators as part of the comparator in HTTP Probe via User Interface. (CHAOS-2611)
-
Added a download button in the Logs Tab allowing users to download the logs for the node in ".log" format for further debugging/reporting purposes. (CHAOS-2462)
-
Added support for conditional logging of probe evaluation results for each iteration in the Continuous and onChaos modes via a debug field added to the probe RunProperties. (CHAOS-1515)
Early access features
- Resilience Probes: This feature is currently behind a feature flag named CHAOS_PROBES_ENABLED.
- Adding support for TLS and Authorization for HTTP and PROM probes. (CHAOS-2743)
- Fixed an issue where SLO Probes were showing Source & Command on the probe details screen. (CHAOS-2715)
- Fixed an issue where EvaluationTimeout was showing up for all types of Resilience probes, Now it is only available for SLO Probe. (CHAOS-2710)
- Fixed an issue where edit/delete buttons were enabled for disabled resilience probes. (CHAOS-2701)
Fixed issues
-
Fixed an issue where after editing an experiment via YAML Editor, users were unable to save the experiment. (CHAOS-2780)
-
Fixed an issue where revert-chaos was not working properly for VMware stress-based faults. (CHAOS-2777)
-
Fixed RBAC issue with create GameDay button on the landing page of GameDay. (CHAOS-2692)
-
Added a fix to display the appropriate user information upon performing chaos experiment operations when the user has been accorded permissions at the account level instead of at the project level. (CHAOS-1585)
-
Fixed an issue in VMware experiments where aborting an experiment was not updating the chaos result properly. This has been fixed by adding a wait for the result update before terminating the experiment for the abort. (CHAOS-2655)
-
Fixed an issue where ImagePullSecrets were not getting propagated to helper pods. (CHAOS-2608)
Version 1.22.1
New features and enhancements
-
Experiment Run & Experiment Report has been enhanced to show more details for better auditing - (CHAOS-2606)
- Added probe details along with description of failures, number of probes passed/failed/not-executed.
- Added tunables for corresponding chaos faults in an experiment.
- Project, Organization & Account Identifiers are now available in the report header itself.
-
Updated
UPDATED_BY
field to showSYSTEM
when a Chaos Resource is deleted automatically with respect to a Project/Organization/Account deletion. (CHAOS-2597) -
Enhanced the Chaos infrastructure upgrade process to automatically change to
UPGRADE_FAILED
status if the upgrade has been in progress for more than 2 hours. This will allow users to attempt an upgrade again once the upgrade has failed/timedout. (CHAOS-2575) -
Enhanced the experiment execution process to timeout a particular experiment if it has been running for more than the threshold timeout i.e. 2 hours. (CHAOS-2573)
-
Enhanced the
stopOnFailure
option to change the status of an experiment toCOMPLETED_WITH_ERROR
in case of probe failure. (CHAOS-2564) -
Added a new tunable
ServiceExitType
forvmware-service-stop
chaos fault which will allow users to choose if they want the target service to be killed gracefully or not. (CHAOS-2491) -
Added functionality to kill processes with process name in
vmware-process-kill
chaos fault. (CHAOS-2100) -
Added support for Git, GitLab, and BitBucket as native Connectors using Harness Secret Manager. (CHAOS-35)
Early access features
- Resilience Probes: This feature is currently behind a feature flag named
CHAOS_PROBES_ENABLED
.- Added support to re-fetch Probe statuses automatically under the Probes Tab in Chaos Studio. (CHAOS-2561)
- Evaluation Timeout is now only available for SLO probe. (CHAOS-2554)
- Added support for doing CRUD operations in Resilience probes from Chaos Studio itself. (CHAOS-2552)
- Fixed an issue where Resource Name was not usable in K8s Resilience Probe. Adding the specific field at the API level resolved this issue. (CHAOS-2653)
Fixed issues
-
Refreshing the chaos studio after saving was leading to unsaved changes earlier. This issue has now been resolved. (CHAOS-2654)
-
Previously when the cron schedule was edited in YAML, there was no validation for the same in UI, which would sometimes lead to UI crash when shifting to the Schedule Tab in Visual Builder. This issue has now been fixed and validation has been added for both Visual and YAML editor modes. (CHAOS-2631)
Version 1.21.2
New features and enhancements
-
Upgraded
govc
binary with the latest release which fixed 14 vulnerabilities in thechaos-go-runner
docker image. (CHAOS-2577) -
Added support for empty labels with
appkind
specified while filtering target applications for a Chaos Experiment. (CHAOS-2256)
Early access features
- Resilience Probes: This feature is currently behind a feature flag named
CHAOS_PROBES_ENABLED
.- Enhanced Chaos Studio to support older experiments with no annotation fields having Resilience probes reference. (CHAOS-2532)
- Added support for headers in HTTP probe configured via Resilience Probes mode. (CHAOS-2505)
- Deprecated "Retry" input in Probe configurations. Now only 1 (attempt) is supported. (CHAOS-2553)
Fixed issues
-
Fixed Chaoshub connection API to check for already existing ChaosHub with the same name before connecting new ChaosHub. (CHAOS-2523)
-
Fixed an issue where the
Save
button at the header of the/gamedays
route is not disabled even though the user has not selected an experiment, today it is enabled by default and throws an error on click, even if the details asked of the user on the landing page are all filled. (CHAOS-2417)
September 2023
Version 1.20.1
New features and enhancements
-
Added support for targeting specific ports when using API Chaos Faults via a new tunable, for example,
DESTINATION_PORTS
. (CHAOS-2475) -
Added support for HTTPs protocol in API Chaos Faults. (CHAOS-2145)
Early access features
-
Chaos Guard: This feature is currently behind a feature flag named
CHAOS_SECURITY_GOVERNANCE
.- Added support for evaluation of mulitple app labels when running experiments with multiple target app labels. (CHAOS-2315)
-
Linux Chaos Faults: This feature is currently behind a feature flag named
CHAOS_LINUX_ENABLED
.- In Linux experiments, the Resilience Score was sometimes showing as 0, although only one probe amongst multiple had failed. This was happening because of incorrect propagation of the probe error, which led to its misinterpretation as an experiment error rather than a probe failure. This issue has been fixed now. (CHAOS-2472)
-
Resilience Probes: This feature is currently behind a feature flag named
CHAOS_PROBES_ENABLED
.- Enhanced mode selection drawer to show the UI according to selected mode by the users. Previously it was showing the image indicating SOT for all modes irrespective of the selected mode. (CHAOS-1997)
Fixed issues
-
There was an issue where users were getting an error when an experiment triggered via a pipeline failed to start and there is no notifyID created. This has been fixed now. (CHAOS-2490)
-
Fixed an issue where the topology settings (taint-tolerations, nodeselectors) made in the advanced configuration section during experiment construction were getting applied only to the Argo workflow pods. Now, the topology settings are propagated to Chaos Fault Pods as well. (CHAOS-2186)
Version 1.19.2
New features and enhancements
-
Added support for Authentication and HTTPs in HTTP Probes for Kubernetes chaos faults. (CHAOS-2381)
-
Added support for the destination ports for the provided destination IPs and hosts in network chaos faults. (CHAOS-2336)
-
Added support for authentication and TLS in Prometheus probes in Kubernetes chaos faults. (CHAOS-2295)
-
Chaos Studio no longer shows ChaosHubs with no experiments/faults during experiment creation. (CHAOS-2283)
-
A new option has been added to preserve or delete the chaos experiment resources with a single toggle. Experiment resources can be preserved for debugging purposes. (CHAOS-2255)
-
The Docker Service Kill chaos fault was enhanced to support containerd service as well. Users can select the type of service via a new tunable (SERVICE_NAME) they want to kill. (CHAOS-2220)
-
Added support for downloading an experiment run specific manifest. Now, users can download experiment run specific manifest from the right sidebar on the Execution graph page. (CHAOS-1832)
Early access features
-
Linux Chaos Faults (This feature is currently behind a feature flag named
CHAOS_LINUX_ENABLED
)- Added support for targeting multiple network interfaces in network faults. (CHAOS-2349)
- The script generated to add the Linux infrastructure had incorrect flags due to changes in terminologies. This has now been corrected to reflect updated installation flags. (CHAOS-2313)
-
Resilience Probes (This feature is currently behind a feature flag named
CHAOS_PROBES_ENABLED
)- Users had to select the Setup Probe button 2 times. It should now work only with a single click. It was dependent on formik validations, which in turn was halting the functionality of handleSubmit due to incorrect Yup validations. (CHAOS-2364)
- When using the same probes in two faults under same chaos experiment, Probe API was returning the probe two times in the second fault. This was due to probeNames being a global variable and using the same probe name multiple times was causing the name to be appended without re-initializing the variable. Scoping it down to local scope fixed this issue. (CHAOS-2452)
Fixed issues
-
The logs for the install chaos experiment step were getting lost immediately post execution. This issue was occurring in the subscriber component, after the custom pods cleanup, the component was still trying to stream Kubernetes pod logs. As a fix, we have added a check to fetch the pod details and gracefully return the error if pods are not found with a proper error message. (CHAOS-2321)
-
As Account Viewer, users were not able to view Chaos Dashboards. This was happening because the
getDashboards
API was missing routingID, which was failing the API calls. This has been fixed now. (CHAOS-1797) -
The frontend was making unnecessary queries to the backend for listWorkflow API whenever changing experiment details via the UI. Now ChaosStep has been optimized to only query when changing selected experiment using memoization. (CHAOS-883)
Version 1.18.7
New features and enhancements
-
Added Audit Event (Update) for Chaos Infrastructures upgrades which are triggered by SYSTEM/Cron Job Upgrader Automatically. (CHAOS-2350)
-
Added filter on Chaos Experiments Table for filtering experiments based on tags. (CHAOS-2133)
-
Now, Users will be provided with an error if there is already one experiment existing with the same name in ChaosHub while pushing an experiment to a ChaosHub. (CHAOS-872)
-
Vulnerability Enhancements - (CHAOS-2162)
- PromQL binary has been rebuilt with latest go1.20.7 & upgraded in chaos-go-runner docker image.
- Kubectl binary has been upgraded to v1.28.0 to reduce 2 vulnerabilities in K8s as well as chaos-go-runner docker image.
- Argo components like workflow-controller and argo-exec have been upgraded to v3.4.10 which resolves all vulnerabilities in respective components.
Early access features
- Linux Chaos Faults (This feature is currently behind a feature flag named
CHAOS_LINUX_ENABLED
)- Enhanced fault execution logs to also include logs from commands like stress-ng, tc & dd as well. (CHAOS-2309)
- All APIs for services with respect to Linux Chaos have been migrated from the GraphQL and GRPC apis to REST. Users upgrading to 1.18.x need to upgrade all Linux Chaos Infrastructures.
Fixed issues
-
Fixed the faults logs getting truncated when the log size is high. It was happening because logs were having a buffer size of 2000 bytes, if the log size was higher, logs were getting truncated. As part of the fix, we made the buffer resizable and optimized the flow. (CHAOS-2257)
-
The UI wasn't fully updated post the probe schema changes to support explicit units definition (s, ms). Added units for probe run properties in UI. (CHAOS-2235)
-
Users were able to create different experiments with the same name, since the experiment names carry a lot of significance and they should be unique. A name validation is added whenever a new experiment is saved & users will be provided with an error if an experiment with the same name already exists. (CHAOS-2233)
August 2023
Version 1.17.3
New features and enhancements
-
Added support for OpenShift configuration for deploying chaos infrastructure. This will provide you with a predefined security context constraint (SCC) that you can modify according to your needs. (CHAOS-1889)
-
Enhanced the Chaos experiment execution diagram to not switch to running nodes automatically. This change ensures that you stay on a node when you click it, thus giving you the opportunity to observe its details. (CHAOS-2258)
-
Enhanced the Docker service kill fault to support the containerd runtime. (CHAOS-2220)
-
Added support for targeting applications by using only
appkind
, onlyapplabel
, and set-based labels. (CHAOS-2170, CHAOS-2128) -
Parallel chaos injection and revert operations at scale have been improved for multiple target pods on the same node. (CHAOS-1563)
-
Previously, if you did not set the
TARGET_CONTAINER
environment variable, the fault targeted a randomly selected container. Now, if you do not set the environment variable, the fault targets all containers in the target pods. (CHAOS-1216) -
Now, Users can specify drain timeout explicitly in the node drain fault. The node-drain fault has been using the
CHAOS_DURATION
value as a timeout, leading to potential confusion and risk of failure, especially when a shorter duration is used with many pods. The expectation is thatCHAOS_DURATION
should define the unschedulable period after draining. Providing a specific drain timeout would help users better estimate the eviction time for all pods on a node, reducing errors and false negatives. (CHAOS-2185) -
Enhanced the JobCleanUpPolicy configuration to also retain helper pods when it is set to retain in ChaosEngine. (CHAOS-2273
Fixed issues
- Fixed how chaos is reverted if an attempt to inject the node drain fault fails or needs to be canceled. (CHAOS-2184)
Version 1.16.6
Fixed issues
- There was an issue where users were not getting audit events for the rules created under the Security Governance tab. This issue has been fixed. (CHAOS-2259)