An accident with a duplicate order number almost caused me to be fired...

An accident with a duplicate order number almost caused me to be fired...

At the end of last year, we had an accident online…

The accident is like this: There are two identical order numbers in the system, but the content of the order is not the same, and the system keeps throwing errors when querying according to the order number, and it cannot be called back normally, and the incident happened more than once, so it must be solved in this system upgrade.

An accident with a duplicate order number almost caused me to be fired…

The colleague who handled it had also changed it several times before, but the effect was still not good: there would always be the problem of duplicate order numbers, so I took advantage of this problem and processed the code written by my colleague.

Here is a brief display of the code at the time:

codesamp

As you can see, this code is actually not very good. The code part is not discussed for the time being. The main factors in the code that make the order number not repeated are random numbers and milliseconds, but the random numbers here are only two

In a high-concurrency environment, repetitive problems are prone to occur. At the same time, the choice of milliseconds is not very good. Under multi-core CPU and multi-threading, this millisecond can be said to be fixed within a certain period of time (very tested). , So here I first generate this order number with 100 concurrent tests, follow the WeChat subscription number maker’s notes, and reply to the architecture to obtain a series of architectural knowledge. The test code is as follows:

codesamp2

When I saw the results, I was shocked. There were 13 duplicates out of 100 concurrent sessions! ! ! , I quickly asked my colleagues not to publish the version, I took this job!

I spent about 6+ minutes discussing the business scenario with my colleagues and decided to make the following changes:

  • Remove the incoming merchant ID (according to colleagues, the incoming merchant ID is also to prevent duplicate orders, it turns out that it is not used)

  • Only three bits are reserved for milliseconds (reducing the length while ensuring that there is no possibility of duplication in application switching)

  • Use a thread-safe counter to increment numbers (three-digit number is guaranteed to be at least 800 concurrent without duplication, I gave 4 digits in the code)

  • Change the date to the java8 date class for formatting (for thread safety and code simplicity considerations)

After the above thinking, my final code is:

codesamp3

Of course, the code can’t be finished so casually after the code is written. Now we have to take a test main function to see:

codesamp4

It’s great, it succeeded once and you can go online directly. . .

However, I went back and looked at the above code. Although the problem of repetition of concurrent order numbers is solved to the greatest extent, there is still a potential hidden danger to our system architecture: if there are multiple instances (clusters) in the current application, there is no duplication Is it possible? Follow the WeChat subscription number maker’s notes, reply to the structure to obtain a series of architectural knowledge.

In view of this problem, an effective solution is inevitably needed, so at this time I am thinking: How to distinguish between multiple instance application order numbers? The following is the general direction of my thinking:

  • Use UUID (initialize one when the order number is first generated)
  • Use redis to record a growth ID
  • Use a database table to maintain a growth ID
  • The network IP where the application is located
  • The port number where the application is located
  • Use third-party algorithms (snow algorithm, etc.)
  • Use process ID (to some extent is a feasible solution)
  • I thought about it here. Our application runs in docker, and the application port in each docker container is the same, but the network IP will not have the problem of duplication. As for the process, there may be duplication. For UUID The method has suffered before. In short, redis or DB is also a better method, but the independence is poor. . .

At the same time, another factor is also very important, that is, all applications related to order number generation are on the same host (Linux physical server), so I chose the IP method for the current system architecture.

Here is my code:

codefin

Summary

Code description and some suggestions

There is no need to lock in the generateOrderNo() method, because the CAS spin lock is used in AtomicInteger (to ensure visibility while also ensuring atomicity, please understand for details) In the getLocalIpSuffix() method, there is no need to add a synchronization lock to the logic that is not null (two-way check lock, the whole is a safe singleton mode)

The way I implemented it is not the only way to solve the problem. The specific solution to the problem depends on the current system architecture.

Any test is necessary. My colleagues did not self-test after the first few attempts to solve this problem. Not testing is detrimental to development professionalism!