How to read the source code of open source project?

How to read the source code of open source project?

Recent years, I have read a lot of source code like RocketMQ, Kafka, Zookeeper and Tomcat. Some friends have asked how to read the source code before, so here I am to share some ideas for your guys.

When you are new to a company and want to be familiar with the code, don’t you just look at the source code? There is no essential difference between this and that of the open source framework. To be honest, the difference is nothing more than the quality of the code and the overall design.

What did you do then?

Anyway, the final result should be yours. That’s it. Like the open source framework, you will eventually get started.

So there’s nothing to be afraid of, don’t be dissuaded. Let me first share what I did when I joined a new company and took over the project.

  • toc

To join a new company to take over the project is to read the source code

When took over the project in a new company, I first got the product manager and held a meeting with the main developer of the original project. The purpose of this meeting is to let the product manager introduce the background of the project, what problems to solve, and what functions it has.

The development is on the side to add and answer my doubts. After all, the product manager does not know the details of the data interaction. After this meeting, you will be able to know what the project does and what functions it can provide.

Business understanding is very important for you to read the source code later! !

Then I will ask for documents, architecture diagrams, flowcharts, sequence diagrams, etc. (as many as there are, there is no way if not). After reading it, I have a general understanding of the whole project. Then let the project run, and after running, start to use this software, a little bit of various functions, after all, here is a difference between what I actually use and what the product manager told to me.

After basically going through the main process of the project, I started to look at the source code. At this time, looking at the source code and just looking at the name of the file, you can actually know which module the file corresponds to, and I feel confident. Then the specific in-depth details depend on the assigned tasks, and the details of several requirements will gradually become clear.

Therefore, to take over the project, you need to understand the background, get an overview and then refine it. The same goes for reading the source code of an open source project, just read it top-down.

How to read the source code of an open source project

I personally divide the source code into two situations: to improve myself and to find problems.

Read the source code to improve yourself:

I assume that you know what the open source project you are looking at is doing. For example, RocketMQ is a message queue. You should know what the message queue is doing. I also acquiesce that you have used this open source project. If you haven’t used it in business, you should use it privately first. Learn how to use simple functions and let it run first.

First look at the RocketMQ from the official website and wiki: rocketmq

Understanding the specific concepts, names, features, and architecture involved is the first step. This step allows you to have a role distribution diagram and data flow diagram in your mind, allowing you to understand the main roles of the overall project and the interaction between them.


Then look at the source code directory. You must first know what functions each directory does. This is actually the same as when you look at the business source code.


Then is to looking for a breakthrough, usually, this kind of open source project always has the demo.


This is the breakthrough. Then began the source code journey, yes you still have to bite yourself, hard bite, this is the only way to read the source code!

But at this time you are not gnawing like a headless fly. You are reading the source code under the knowledge of the roles and data flow involved in general. This is very usefully!

You will have a sense of “recognition” for some method calls, because you know the general flow, so you think how it should be.

Reading the source code sometimes feels that there is a lot of code and a lot of branches. It’s okay, first make a copy, and then delete some exception handling and uncommon branches.

First clear the overall core process! And after clarifying a process, started drawing, and the flow chart and brain map were all on.

After being clear, look at the code that has not been deleted, understand some of the exception handling, and add a complete flowchart, brain map, etc.

Take a look at the picture I drew when analyzing Kafka, and figure out a process like this:


Then the module is come! Well done! Then the various branches are diverted, the general process is clear, and the source code is almost read.

When reading the source code, you will also encounter some incomprehensible things, skip it first, and understand the main process first. For example, I saw that Kafka’s index design involves binary search, but the source code is a revised binary search. The index items are divided into hot and cold areas, and the purpose of further investigation is to avoid page fault interruption.


Another example is the method involved when watching preheat files in RocketMQ:


Then we should find out what is mlock、madvise…

These are the details, and the details are often what we need to learn, so don’t miss the details after sorting out the overall process. This reading of the source code is to read the source code in order to get promoted and learn how to design an excellent open source framework.

Read the source code to find the problem

This purpose is very strong, sometimes it is a project error, generally there is a log, so we need to search through the log.

If you are familiar with this framework, it is of course best. If you are not familiar with log search and context, you can actually find some clues from context.

However, sometimes it is necessary to analyze the entire link to troubleshoot problems. This depends on skill. Sometimes it’s because the statements of some articles conflict, one said A and the other said B.

If you can’t find authoritative information, you can just look at the source code by yourself and search by keywords.

At last

What I want to tell you is that not to be dissuaded by the source code, you are actually looking at the source code. Then look at the source code from top to bottom, don’t dive into the details at first, first have a comprehensive understanding of the open source framework from the official website and other channels, and then look at the source code to clarify the main process.

Cooperate with flowcharts, timing diagrams, brain maps and other records and classifications. Then look at the details and learn the “sao operation” of an excellent open source framework.

Imitating it, learning it, can extend a lot of additional low-level knowledge, such as the above mentioned page fault interruption, warm page lock, branch prediction and so on.

Of course, you can also go online to search for other people’s source code analysis articles, such as some of my previous ones, and then get started by yourself, which will be more comfortable and smooth.

Finally, there is no doubt that the source code is a hard bone, what I share with you today is just the preparation for reading the source code, there is long wy to go, remember that patience is the most important one!