Connect with us


The ideal metric for business impact analysis and IT operations



This article was contributed by Nikolay Ganyushkin, CEO and founder of Monq Lab

When tech companies scale up, it’s vital for them to use metrics to monitor growth. But as the volume of data and complexity of operations increase, the metrics themselves may become a source of confusion. You get dozens of indicators that you start prioritizing and dividing into metrics for business and tech units, and may lose track of the customer needs. In this article, I’ve proposed a way to calculate an ideal metric that can reliably reflect the availability of an IT service or object from a business standpoint.

Why do we need yet another metric?

The idea of an “ideal metric” came to me at the early stages of my career, when I noticed that the metrics for the IT unit of a company where I worked didn’t accurately correlate with the situation on our business side. Later I saw more examples of this problem among the IT units of our corporate customers. It made me realize that many KPI and SLA calculations were obscure for IT units, and that they often didn’t let business and tech teams find a common language. I decided to create a single synthetic metric that would be clear and understandable to all parties. In short, this metric should be:

  • Business-oriented. It should show how our IT environment functions, but not from the standpoint of server performance, but from the standpoint of how important it is to our customers.
  • Comprehensible. It should be easy to interpret unambiguously for both IT crowd and managers.
  • Decomposable. The metric structure should let us decompose it into components, or factors, and ideally enable us to do root cause analysis as the output.

There are two ways to derive this metric:

1) To make a separate metric for the availability of each service or object (Service Availability)

2) To build a general health map for the system as a whole, which is more complicated but can be our ultimate goal.

What is the Service Availability metric?

I defined Service Availability as a state of the IT environment where customers of a business want to and are able to use a service, and are satisfied with its quality. This should be a Yes/No or 1/0 metric, since anything in between blurs the picture.

An example: imagine that a customer wants to apply online for insurance, but the system is malfunctioning. The server takes a long time to respond, so the application form is constantly being reset or shows errors, and an application can be submitted in 30 minutes instead of 10. As a result, the customer goes elsewhere, and the conversion rate in the sales funnel drops. From the engineering point of view, the service is degraded but still available, so things are fine. From the business point of view, the state of the IT environment where you’re losing hot leads is unacceptable. In that case, Service Availability is equal to 0.

An opposite example: imagine that due to unforeseen circumstances, a data center is completely shut down, but the company’s IT systems switch to backup servers. To customers, everything looks fine, even if a bit slower than usual, so they can apply for insurance relatively quickly, and the conversion rate in the sales funnel does not decline. From the business point of view, things are fine, and Service Availability is equal to 1. From the engineering point of view, though, half of the company’s servers and communication channels are unavailable, so things are not fine.

As you see, the service is available if it’s accessible to customers, so Service Availability is defined exclusively by the business side of the company. Here, however, it’s important to not extend business metrics to IT and to not set conversion as KPI for the IT unit. The IT unit’s job is to ensure the functioning of the IT infrastructure but not attract or retain customers.

Specifics of implementation for the Service Availability metric

So, how do we calculate Service Availability? A service often consists of a complex of information systems and can represent a chain or smaller services. (For example, a data center is responsible for providing virtual machines, a cloud — for the service of system components, information systems — for application services, and so on.)

Here I should clarify that, in umbrella monitoring systems, we frequently deal not with ready metrics, but with specific events or alerts that have emerged in various monitoring systems. In the Service Topology, they are considered to be coming from different configuration items (services, servers, virtual machines, clusters, etc.) or CIs. Taking into account the paragraph above, the overall Service Availability is essentially the cumulative availability of a CI group. The availability of a select CI is an assessment of its performance from the standpoint of the availability of the ultimate service to which this CI contributes. This interconnection allows to carry out factor analysis and determine the contribution of each CI to the overall result, thus defining a bottleneck.

When building a report on Service Availability, first of all, we need to define a list of emergency situations or their aggregate that indicate the dysfunction of the service. We should also think of additional parameters, such as:

  • Service working hours. For example, is it important that the service is available only during daytime hours or only on holidays?
  • Do we have an RTO (recovery time objective) — the maximum allowable time during which our object can be in an emergency state?
  • Whether or not we take into account the agreed service windows.

Besides, monitoring systems, too, make mistakes sometimes, so we should consider whether emergencies should be verified by engineers (if we have such a mechanism).

The method itself

Firstly, let’s calculate Service Availability for a single CI. By this stage, we have already configured all the problem filters and decided on the parameters of our calculations.

To calculate the service availability (SA) for a particular period, it is necessary to construct a function of the CI problem status versus time, Problem(t), which can take one of the four values at any moment of time:

  1. The value (0) means that presently the CI has no problems that correspond to the filter;
  2. The value (1) means that the CI has a problem which passes the filter conditions;
  3. The value (N) says that the CI is in an unattended state;
  4. The value (S) says that the CI is in an agreed maintenance window.

As a result, we get the following indicators:

  • timeNonWorking – the aggregate CI non-working time span in the considered period. The function value was “N”.
  • timeWorkingProblem – the time spent by the CI in a state that does not meet the SLA requirements in the investigated period of time. The function value was “1”.
  • timeWorkingService – agreed idle time when the CI was in a service mode during working hours. The function value was “S”.
  • timeWorkingOK – the time span during which the CI satisfied the SLA requirements. The fProblem(t) function had state “0”.

The calculation of Service Availability (SA) for a single CI for a given period is carried out according to the formula:

SA = timeWorkingOK / (timeWorkingOK+timeWorkingProblem) * 100%
Service Availability metric - An example of possible distributions of time intervals when calculating SA (Service Availability) for a single CI

Above: Service Availability metricFig. 1 An example of possible distributions of time intervals when calculating SA (Service Availability) for a single CI

Image Credit: Nikolay Ganyushkin

Service Availability metric - An example of the influence of RTO on the calculation of the function fProblem(t)

Above: Fig. 2 An example of the influence of RTO on the calculation of the function fProblem(t)

Image Credit: Nikolay Ganyushkin

For calculations of a CI group availability, which is Service Availability Group (SAG), it is necessary to build the function fProblem(t) for each CI included in the group. Next, we should superimpose the resulting functions fProblem(t) for each CI on top of each other, using certain rules (see Table 1).

Table 1

Above: Table 1

Image Credit: Nikolay Ganyushkin

In the end, we get the function fGroupProblem(t). We sum up the duration of the segments of this function as follows:

  • timeGroupService – time when fGroupProblem(t) = S,
  • timeGroupOK – time when fGroupProblem(t) = 0,
  • timeGroupProblem – time when fGroupProblem(t) = 1.

Thus, the metric we’ve been discussing is defined as:

SAG = timeGroupOK / (timeGroupOK+timeGroupProblem) * 100%
Service Availability metric - An example of possible distributions of time intervals for calculating availability of a CI group

Above: Fig. 3 An example of possible distributions of time intervals for calculating the availability of a CI group

Image Credit: Nikolay Ganyushkin

Business impact analysis

It is important not only to get the Service Availability metric, but also to be able to decompose it into components. This will enable us to understand which problems became critical, and which made the smallest contribution to the current situation. This set of activities is called Business Impact Analysis (BIA), and it lets us identify how each particular IT component supports each particular business service of our company. Knowing these dependencies will make our business steadier and more resilient, and help us understand which areas of the IT environment need more attention or investments.

This approach, however, has some limitations:

  1. In the method of determining Service Availability, it is impossible to define the weight of a select problem if several problems occurred simultaneously. In this case, the only parameter will be the duration of the problem.
  2. If two or more problems occur simultaneously, then for such a period we will consider the duration of each with the weight of 1/N, where N is the number of problems that occurred simultaneously.

Calculation method:

  1. We should take the function fProblem(t) that was built when calculating SA.
  2. For each segment where the final function fProblem(t) = 1, we make a list of the problems of this CI, depending on which this segment was assigned the value of 1. When compiling the list, it is necessary to take into account the problems that emerged or ended outside the time span of the function.
  3. Assign to each problem a metric of influence. It is equal to the duration of the problem in the segment multiplied by the corresponding weight. If there was only one problem in the segment, the problem is assigned a weight of 1. In the case of multiple problems, the weight is equal to 1 / N, where N is the number of simultaneously occurring problems.
  4. When calculating, the following points should be taken into account: In the general case, on the same segment at different intervals, the weight of the problem could change due to the appearance of new problems. The same problem can be present at different segments of fProblem(t) = 1. For example, a problem emerged on Friday, ended on Tuesday, and on weekends the CI is not serviced according to the SLA.
  5. Eventually, you should form a list of problems that were taken into account in the calculation of the function fProblem(t). At the same time, a metric of influence on Service Availability should be calculated for each problem.
  6. It is imperative to verify the calculation. The sum of the impact metrics for all problems must be equal to timeWorkingProblem.
  7. The user usually needs to display the relative value of the influence in percentages. To do this, the impact metric should be divided by timeWorkingProblem and multiplied by 100%.
  8. If we need to group problems and show the influence of the group, it is enough to sum up the metrics of all the problems included in the group. This statement is true only if the following condition is met: each problem is included in only one group at a time.


We have derived and calculated a Service Availability metric that is simple, business-oriented, and decomposable. It enables us to assess the state of the IT environment of a company not from the purely technical standpoint, but from the standpoint of what service said environment actually provides to the company’s customers. However, we should keep in mind that this metric is purely retrospective and cannot be used for predictions in isolation from component health metrics and plans for infrastructural changes.

Nikolay Ganyushkin is the CEO and founder of Monq Lab

Source link


8 Ways You Can Save Yourself and Others From Being Scammed



Opinions expressed by Entrepreneur contributors are their own.

Statistics on the number of scam websites that litter the internet are disturbing. During 2020, Google registered more than 2 million phishing websites alone. That means more than 5,000 new phishing sites popped up every day — not to mention the ones that avoided Google’s detection. In 2021, the U.S. Federal Bureau of Investigation (FBI) reported nearly $7 billion in losses from cybercrime that is perpetrated through these sites.

What exactly are scam websites? Scam websites refer to any illegitimate website that is used to deceive users into fraud or malicious attacks. Many scammers operate these fake websites and will download viruses onto your computer or steal passwords or other personal information.

Reporting these sites as they are encountered is an important part of fighting back. In other words, if you see something, say something. Keeping quiet, even if you avoid falling prey, allows the scammers to aim at another target.

Perhaps you’ve received a suspicious link in an email? Or maybe a strange text message that you haven’t clicked on. Fortunately, there are many organizations out there that have launched efforts aimed at reducing the threat that they pose. In general, these organizations put scam websites on the radar by collecting and sharing information about them. In some cases, they prompt an investigation into the scammers behind the sites.

Related: Learn How to Protect Your Business From Cybercrime

It’s free to report a suspicious website you’ve encountered, and it takes just a minute. Here are eight ways you can report a suspected scam website to stop cyber criminals and protect yourself and others online.

1. The Internet Crime Complaint Center

The IC3, as it is known, is an office of the FBI that receives complaints from those who have been the victims of internet-related crime. The IC3 defines the internet crimes that it addresses to include illegal activity involving websites. Complaints filed with the IC3 are reviewed and researched by trained FBI analysts.

2. Cybersecurity and Infrastructure Security Agency

CISA, which is an agency of the U.S. Department of Homeland Security, targets a wide range of malicious cyber activity. It specifically requests reports on phishing activity utilizing fraudulent websites. Information provided to CISA is shared with the Anti-Phishing Working Group, a non-profit focused on reducing the impact of phishing-related fraud around the world.


The site, run by the International Consumer Protection and Enforcement Network, is for reporting international scams. It is supported by consumer protection agencies and related offices in more than 65 countries. A secure version of their site is used by law enforcement agencies to share info on scams.

4. Google Safe Browsing

While Google does not have a mechanism for reporting all varieties of website scams, there is a form for reporting sites that are suspected of being used to carry out phishing. Reports made via the form are managed by Google’s Safe Browsing team. Google’s Transparency Report provides information on the sites that it has determined to be “currently dangerous to visit.”

Related: Is That Instagram Email a Phishing Attack? Now You Can Find Out.

5. PhishTank

This service was founded by Cisco Talos Intelligence Group to “pour sunshine on some of the dark alleys of the Internet.” Phishtank includes an ever-growing list of URLs reported as being involved in phishing scams. To date, it has received more than 7.5 million reports of potential phishing sites. It says that more than 100,000 of the sites are still online.

Related: 6 Ways Better Business Bureau Accreditation Can Boost Your Business

6. Antivirus Apps

Antivirus providers such as Norton, Kaspersky, and McAfee have forms that can be used to identify pages that users feel should be blocked. Scam sites would definitely fall under that category. With some antivirus platforms, reporting forms can only be accessed by registered users. Norton’s is open to anyone.

7. Web host

There is a chance that the DNS service hosting the scam site will take action to shut it down. There are a variety of online resources that can help you to find the DNS of a particular site. Once you identify it, send a message to their customer service reporting the site in question and the experience that you had.

8. Share your experience on social media

This is actually more like sounding an alarm than filing a report, but it might protect one of your connections who stumbles upon the same site or is targeted by the same type of scam. At the very least, it could draw attention to the fact that scam sites affect real people. A post on Facebook about a close call you had with a scam might better equip your network to avoid any dangerous entanglements. If it does, they’ll thank you.

Continue Reading


LastPass hacked, OpenAI opens access to ChatGPT, and Kanye gets suspended from Twitter (again) • TechCrunch



Aaaaand we’re back! With our Thanksgiving mini-hiatus behind us, it’s time for another edition of Week in Review — the newsletter where we quickly wrap up the most read TechCrunch stories from the past seven(ish) days. No matter how busy you are, it should give you a pretty good idea of what people were talking about in tech this week.

Want it in your inbox every Saturday morning? Sign up here.

most read

Instafest goes instaviral: You’ve probably been to a great music festival before. But have you been to one made just for you? Probably not. Instafest, a web app that went super viral this week, helps you daydream about what that festival might look like. Sign in with your Spotify credentials and it’ll generate a promo poster for a pretend festival based on your listening habits.

LastPass breached (again): “Password manager LastPass said it’s investigating a security incident after its systems were compromised for the second time this year,” writes Zack Whittaker. Investigations are still underway, which unfortunately means it’s not super clear what (and whose) data might’ve been accessed.

ChatGPT opens up: This week, OpenAI widely opened up access to ChatGPT, which lets you interact with their new language-generation AI through a simple chat-style interface. In other words, it lets you generate (sometimes scarily well-written) passages of text by chatting with a robot. Darrell used it to instantly write the Pokémon cheat sheet he’s always wanted.

AWS re:Invents: This week, Amazon Web Services hosted its annual re:Invent conference, where the company shows off what’s next for the cloud computing platform that powers a massive chunk of the internet. This year’s highlights? A low-code tool for serverless apps, a pledge to give AWS customers control over where in the world their data is stored (to help navigate increasingly complicated government policies), and a tool to run “city-sized simulations” in the cloud.

Twitter suspends Kanye (again): “Elon Musk has suspended Kanye West’s (aka Ye) Twitter account after the latter posted antisemitic tweets and violated the platform’s rules,” writes Ivan Mehta.

Spotify Wraps it up: Each year in December, Spotify ships “Wrapped” — an interactive feature that takes your Spotify listening data for the year and presents it in a super visual way. This year it’s got the straightforward stuff like how many minutes you streamed, but it’s also branching out with ideas like “listening personalities” — a Myers-Briggs-inspired system that puts each user into one of 16 camps, like “the Adventurer” or “the Replayer.”

DoorDash layoffs: I was hoping to go a week without a layoffs story cracking the list. Alas, DoorDash confirmed this week that it’s laying off 1,250 people, with CEO Tony Xu explaining that they hired too quickly during the pandemic.

Salesforce co-CEO steps down: “In one week last December, [Bret Taylor] was named board chair at Twitter and co-CEO at Salesforce,” writes Ron Miller. “One year later, he doesn’t have either job.” Taylor says he has “decided to return to [his] entrepreneurial roots.”

audio roundup

I expected things to be a little quiet in TC Podcast land last week because of the holiday, but we somehow still had great shows! Ron Miller and Rita Liao joined Darrell Etherington on The TechCrunch Podcast to talk about the departure of Salesforce’s co-CEO and China’s “great wall of porn”; Team Chain Reaction shared an interview with Nikil Viswanathan, CEO of web3 development platform Alchemy; and the ever-lovely Equity crew talked about everything from Sam Bankman-Fried’s wild interview at DealBook to why all three of the co-founders at financing startup Pipe stepped down simultaneously.


What lies behind the TC+ members-only paywall? Here’s what TC+ members were reading most this week:

Lessons for raising $10M without giving up a board seat: has raised $10 million over the last two years, all “without giving up a single board seat.” How? co-founder Henry Shapiro shares his insights.

Consultants are the new nontraditional VC: “Why are so many consultant-led venture capital funds launching now?” asks Rebecca Szkutak.

Fundraising in times of greater VC scrutiny: “Founders may be discouraged in this environment, but they need to remember that they have ‘currency,’ too,” writes DocSend co-founder and former CEO Russ Heddleston.

Source link

Continue Reading


Building global, scalable metaverse applications



Previously we talked about the trillion-dollar infrastructure opportunity that comes with building the metaverse — and it is indeed very large. But what about the applications that will run on top of this new infrastructure?

Metaverse applications will be very different from the traditional web or mobile apps that we are used to today. For one, they will be much more immersive and interactive, blurring the lines between the virtual and physical worlds. And because of the distributed nature of the metaverse, they will also need to be able to scale globally — something that has never been done before at this level.

In this article, we will take a developer’s perspective and explore what it will take to build global, scalable metaverse applications.

As you are aware, the metaverse will work very differently from the web or mobile apps we have today. For one, it is distributed, meaning there is no central server that controls everything. This has a number of implications for developers:


Intelligent Security Summit

Learn the critical role of AI & ML in cybersecurity and industry specific case studies on December 8. Register for your free pass today.

Register Now

  • They will need to be able to deal with data that is spread out across many different servers (or “nodes”) in a decentralized manner.
  • They will need to be able to deal with users that are also spread out across many different servers.
  • They will need to be able to deal with the fact that each user may have a different experience of the metaverse, based on their location and the devices they are using due to the fact not everyone has the same tech setup, and this plays a pivotal role in how the metaverse is experienced by each user.

These challenges are not insurmountable, but they do require a different way of thinking about application development. Let’s take a closer look at each one.

Data control and manipulation

In a traditional web or mobile app, all the data is stored on a central server. This makes it easy for developers to query and manipulate that data because everything is in one place.

In a distributed metaverse, however, data is spread out across many different servers. This means that developers will need to find new ways to query and manipulate data that is not centrally located.

One way to do this is through the blockchain itself. This distributed ledger, as you know, is spread out across many different servers and allows developers to query and manipulate data in a decentralized manner.

Another way to deal with the challenge of data is through what is known as “content delivery networks” (CDNs). These are networks of servers that are designed to deliver content to users in a fast and efficient manner.

CDNs are often used to deliver web content, but they can also be used to deliver metaverse content. This is because CDNs are designed to deal with large amounts of data that need to be delivered quickly and efficiently — something that is essential for metaverse applications.

Users and devices

Another challenge that developers will need to face is the fact that users and devices are also spread out across many different servers. This means that developers will need to find ways to deliver content to users in a way that is efficient and effective.

One way to do this is through the use of “mirrors.” Mirrors are copies of the content that are stored on different servers. When a user requests content, they are redirected to the nearest mirror, which helps to improve performance and reduce latency.

When a user’s device is not able to connect to the server that is hosting the content, another way to deliver content is through “proxies.” Proxies are servers that act on behalf of the user’s device and fetch the content from the server that is hosting it.

This can be done in a number of ways, but one common way is through the use of a “reverse proxy.” In this case, the proxy server is located between the user’s device and the server that is hosting the content. The proxy fetches the content from the server and then delivers it to the user’s device.

Location and devices

As we mentioned before, each user’s experience of the metaverse will be different based on their location and the devices they are using. This is because not everyone has the same tech setup, and this plays a pivotal role in how the metaverse is experienced by each user.

For example, someone who is using a virtual reality headset will have a completely different experience than someone who is just using a desktop computer. And someone who is located in Europe will have a different experience than someone who is located in Asia.

Though it may not be obvious why geographical location would play a part in something that is meant to be boundless, think of it this way. The internet is a physical infrastructure that is spread out across the world. And although the metaverse is not bound by the same physical limitations, it still relies on this infrastructure to function.

This means that developers will need to take into account the different geographical locations of their users and devices and design their applications accordingly. They will need to be able to deliver content quickly and efficiently to users all over the world, regardless of their location.

Different geographical locations also have different laws and regulations. This is something that developers will need to be aware of when designing applications for the metaverse. They will need to make sure that their applications are compliant with all applicable laws and regulations.

Application development

Now that we’ve looked at some of the challenges that developers will need to face, let’s take a look at how they can develop metaverse applications. Since the metaverse is virtual, the type of development that is required is different from traditional application development.

The first thing that developers will need to do is to create a “space”. A space is a virtual environment that is used to host applications.

Spaces are created using a variety of different tools, but the most popular tool currently is Unity, a game engine used to create 3D environments.

Once a space has been created, developers will need to populate it with content. This content can be anything from 3D models to audio files.

The next step is to publish the space. This means that the space will be made available to other users, who will be able to access the space through a variety of different devices, including desktop computers, laptops, tablets, and smartphones.

Finally, developers will need to promote their space. This means that they will need to market their space to users.

Getting applications to scale

Since web 3.0 is decentralized, scalability is usually the biggest challenge because traditional servers are almost impossible to use. IPFS is one solution that can help with this problem.

IPFS is a distributed file system used to store and share files. IPFS is similar to BitTorrent, but it is designed to be used for file storage rather than file sharing.

IPFS is a peer-to-peer system, which means that there is no central server. This makes IPFS very scalable because there is no single point of failure.

To use IPFS, developers will need to install it on their computer and add their space to the network. Then, other users will be able to access it.

The bottom line on building global, scalable metaverse applications

To finish off, the technology to build scalable metaverse applications already exists; but a lot of creativity is still required to make it all work together in a user-friendly way. The key is to keep the following concepts in mind:

  • The metaverse is global and decentralized
  • Users will access the metaverse through a variety of devices
  • Location and device management are important
  • Application development is different from traditional development
  • Scalability is a challenge, but IPFS can help

Clearly, we can’t have an article series about building the metaverse without discussing NFTs. In fact, these might be the key to making a global, decentralized, metaverse work. In our next article, we will explore how NFTs can be used in the metaverse.

By keeping these concepts in mind, developers will be able to create metaverse applications that are both user-friendly and scalable.

Daniel Saito is CEO and cofounder of StrongNode

Source link

Continue Reading