Tracking the Performance of Engineering Teams: Key Metrics and Insights

Development teams are less monolithic and manageable now than ever before. According to Hired's 2023 State of Software Engineers report, nearly 70% of open engineering positions in development allow for a remote format. The programming team of a typical company is a motley conglomeration of remote employees, specialists with hybrid and fee schedules, and freelancers. The question arises: how can IT management create a unified performance evaluation system in these conditions?

In the realm of modern development teams, our recent inquiry has highlighted a crucial shift towards decentralization and flexibility. How does one approach these topics effectively? How can we identify and address challenges in this area using comprehensive data? These and other pertinent questions were explored in our exclusive interview with Yevgen Balter, a London-based expert in Project Management and Product Implementation. With extensive experience in leading methodologies and cutting-edge technologies, Yevgen brings valuable insights to the table.

Performance Management System for Development Teams

Primarily, Mr. Balter noticed that the general answer to creating a system for tracking and improving performance is to implement specific standards in this area, reinforced by software products that automate the management of the relevant standards. There are plenty of best practices and related software that 'embed' relevant requirements into processes, eliminating the human factor in this area. Examples include Scrum and DORA (DevOps Research and Assessment).

Yevgen detailed that the work of most of these systems is based on cycles of regularly repeated actions: partly automated and partly requiring management attention. Cycles, regardless of the performance management concept used, always include the following stages:

Planning: the initial formation of the approach to the processes;
Tracking: gathering information on the current situation with the development performance;
Modification: actions to change current procedures in order to achieve better results at the next round of cyclic actions.

Information about the performance of programming teams as a whole and individual engineers can be obtained from numerous development environments that are used to support projects and orchestrate teamwork.

Examples: GitLab, GitHub, Dropbox, LXC, Docker.

"All sorts of statistics about code preparation and sending it to the project are collected automatically and become available to technical directors, project managers, and other management representatives of software development teams. The problem is that no universal specification in the sphere of performance management and no software complex built on them will tell you what indicators should be taken for analysis on the basis of which the establishment will develop corrective actions to improve performance," Mr. Balter warned.

"The situation of any technology and tech company is unique. Methodologies like DORA, of course, have generalized recommendations: for example, to measure the amount of code uploaded to the server (deploy frequency), but this and other criteria cannot be applied automatically without adapting them to the specifics of your team. For example, practice shows that if you track the productivity of individual developers only by the amount of code, they will adapt their work to the changed situation. Although, not in the way you expect it. Simple tasks will be prioritized because they can be done quickly. Meanwhile, writing complex applications will be slowed down, although perhaps these very modules in the project could lead the business as a whole to new horizons."

Assessing and Improving Performance: How to select key metrics to analyze?

The expert shared some tips on it, "In order to adapt universal approaches to the situation of a particular business and a particular team of developers, it is necessary to supplement general criteria, such as the same amount of code, with a number of more subtle metrics. Comparing their indicators with each other will allow for making correct conclusions and developing effective measures to improve performance. Along with such an indicator as the amount of code a developer 'pours' on the server, you can use, if it is appropriate in your situation, the MLTC (Mean Lead Time for Changes) metric, which shows the average time it takes for a developer to make commits, i.e. changes in the code base. Also, the so-called Cycle Time will complete the picture."

He also added that this indicator measures the time between any two stages of software production. "In many cases, it is useful to measure the 'time to open.' This is nothing but the time between a pull request (a Git term), i.e. a request from management to make a change, and the first commit (i.e. the 'pouring' of code from the developer in response to the request). In some cases, the statistics of 'merges' (Time to Merge)—combining locally prepared code with a common project on a battle server, as well as Time to First Review will be illustrative. This is the time from code pouring to code review, that is, management's evaluation of what the developer has written." Here are some examples from Yevgen of how comparing the performance of the chosen key metrics helps analyze and develop improvements that will improve performance:

Annual performance reports vs quarterly KPIs

These are useful metrics that allow you to revisit the issue of performance over long enough periods of time to look at the dynamics in a general, forward-looking, and strategic way. Comparing quarterly performance KPIs to annualized performance KPIs will show a big picture of the engineering team's performance. On the other hand, many managers make the mistake of assuming that these strategic metrics are for performance management.

Developers know that the annual report is used to generate performance bonuses, so its use is more controlling than managerial. The essence of performance management is to influence processes, not to control and fix problems in someone's work. Therefore, in addition to big indicators, we need metrics that measure what happens every day, with every change in the project, to give a cross-section of the situation from an unusual angle that completes the picture.

Percentage of merges in pull requests vs percentage of pull requests that have already been code-reviewed

Comparing these two ratios for individual developers and entire teams gives the relationship between two crucial aspects of development. First, it is writing and pouring code, which is what programmers engaged in writing the project do. Secondly, it is changes checking and additions made in the code by the management. The comparison of the two indicators can show, for example, whether the verification process does not slow down the development process. Many other important conclusions can be drawn in this way.

Review Coverage (code review coverage) vs Review Influence (code review impact)

"Everything is clear with the Review Coverage (RC) because this metric simply shows how much code a team leader or a developer's CTO has time to give feedback on," Mr. Balter commented. Review Influence (RI) is a complex metric that is calculated in an expert way. There are several alternative ways to obtain it. Anyway, the proportion of RC and RI gives a comprehensive idea of how much code review in a project translates into refactoring, i.e. code improvement by the programmer.

Frequency of code review (Review Cycles) vs Cycle Time

"Once again, by comparing the time it took to get through the next stage of an application or software product and the time between code reviews, we can get an idea of the effectiveness of development process management." The question posed by Yevgen is, "Is management to blame for the fact that programmers' productivity is low?"

Impact vs Project Rework Ratio

Lastly, Mr. Balter shared, "Impact is also an expert indicator, in the calculation of which several variables are involved for an individual developer and a whole team. A special procedure—impact mapping—is used to obtain this analytical coefficient. Impact shows the contribution of a particular work to the fate of the project as a whole. Once derived, this index can be compared to the rework and change data for a particular project." The expert concluded, "You will ascertain whether the changes are motivating and advancing the project, or if they signify underlying issues."

As was also revealed in the interview, Change Failure Rate, Consistency of Story Point Delivery, DevEx, and a number of other criteria and indicators are also among important metrics that can be useful to a company, depending on the specifics of its work.

Analyzing Metrics: What to make of the conclusions?

"The comparison of metrics provides an opportunity for analysis," Yevgen stated. Both in Scrum and in other methodologies, it is performed at the planning stage but after collecting statistics on the progress of development processes. The next step is to develop measures to improve performance. Here, it is crucial to communicate with the team properly. Whether your office is a virtual space where all developers work remotely or you have in-office and freelance specialists, the system for delivering performance feedback should be set up equally well for everyone.

Mr. Balter also highlighted that for effective communication on this issue, management should sustain sufficient dialogic forms of interaction with the team through planners, surveys, and conversations. In addition, communication should be a standard process so that its outcome is stable and results in engineers understanding the requirements addressed to them. Of course, these forms of communication are only possible in an environment where mistakes are not a crime and where management helps work, not seeks blame.

One Last Remark

By the end of our discussion, Mr. Balter finalized, "In conclusion, it is worth recalling the existence of companies with the expertise in the field of software implementation and development performance improvement standards. It is worth determining at once whether management has the ability to do all of the work above with sufficient diligence. If not, it might be a good idea to outsource these functions to external advisors on a contractual basis. The main thing is that the performance improvement activities themselves should be effective."