Blog AI news What is data operations? – dan rose he
AI news

What is data operations? – dan rose he

What is data operations? – dan rose he

https://images.squarespace-cdn.com/content/v1/5ee8617eedf4d13dcedda79e/1598190196918-1CENU92MY7J3C2C4CZO0/khara-woods–n4Lw7zArIk-unsplash.jpg?format=1500w

When writing about him, I often refer to data operations and how important a foundation is for most solutions. Without appropriate data operations you can easily reach a point where handling the necessary data will be very difficult and costly for the business case of it to make sense. So to explain a little, I wanted to give you an overview of what it really means.

Data operations are the process of obtaining, cleaning, storing and distributing data in a secure and cost effective way. It is a mix of business strategy, devops and data science and is the basic supply chain for many large data and solutions.

Data operations were originally created in the Big Data Regi but has become a more used term in later years.

Data operations are the most important competitive advantage

As I have mentioned in many previous posts, I see data operations as a higher advantage than the development of the algorithm when it comes to trying to defeat the competition. In most cases, the algorithms used are standard algorithms of it from the standard frames that are fed data, trained and tuned shortly before being placed. So, since the underlying algorithms are mostly the same, the real change is in data. The work that enters it to get good results from high quality data is almost nothing compared to the amount of work you need when using mediocre data. Getting data at a lower cost than competition is also a really important factor. Especially in cases of those who require a continuous flow of new data. In these cases getting new data all the time can become an economic burden that will weigh the business.

Paperflow example of data operations

To make it more concrete, I wanted to use the company that I co-founded Paper flow As an example. Paperflow is a company that receives invoices and other financial documents and captures data such as invoice date, amounts and invoice lines. Since the bills can look very different and the bills submit it varies over time, getting a lot and getting more data all the time is needed. So, to make Paperflow a good business, we needed good data operations.

To be honest, we were not so aware of the importance when we made these initial decisions, but fortunately we got it right. Our first major decision in data operations was that we wanted to collect all the data inside and make our data collection system. This is a costly investment with a high investment in the initial development of the system, but also a high cost repeated for our employees with the work to get into data from the system. Competition had chosen another strategy. Instead, they made customers enter the invoice data into their system when they failed to make the right forecast in the data captured. This is a much cheaper strategy that can provide you with a lot of data. The only problem is that customers have only one thing in mind and that is to solve their problems without considering whether it is correct or not in terms of what you need for training data.

So in Paperflow we found a way to get better data. But how do you drop costs then?

Part of the solution was by investing heavily in the system that was used to enter the data and trying to make it as soon as possible. It was really a test and wrong and it took a lot of work. Without current numbers I think we have invested more in current data operation systems than it.

Another part of the solution was to make sure we only collect the data we really needed. This is a common challenge in data operations as it is very difficult to know which data you will need in the future. Our solution was to first go to collect a lot of data (and many) and then slowly narrowing the amount of data collected. Going to the other side can be difficult. If we had begun to collect more data for each bill, we would basically need to start and reject all previously certified bills.

We also started working hard to understand a very important metric. When they were our assumptions as accurate as we trust and avoid prove a portion of the data. That was achieved with a variety of different tricks and technologies one of them being programming. Possible programming has the advantage of providing a distribution of uncertainty instead of a percentage that most machine learning algorithms will do. Knowing how safe you are that you are so visible reduces the risks of making mistakes.

The data collection strategy only you need more by choosing cases when you are the most unsafe is also known as active learning. If you are working on your data operations for him, you should definitely look at it.

Devops data functioning challenges

In the most severe part of data storage technology in an effective way, you will also see challenges. I’m not a Devops expert but I’ve seen the problem suddenly have a lot of data that grows faster than expected in real life. This can be essential as the scaling ability is quickly under pressure. If I could give advice here it would be to include an early devops in the work of architecture. Building a scaled foundation is much more fun than trying to find short -term solutions all the time.

Exit mobile version