Thursday, 30 June 2016

SCM Troubleshoot Issues

  1. svn resuming aborted checkout
  2. How to complete a git clone for a big project on an unstable connection?
  3. Working copy XXX locked and cleanup failed in SVN
  4. What to do when 'svn cleanup' fails?
  5. SVN to Git migration that works
  6. How to migrate SVN repository with history to a new Git repository?
  7. SVN Remove File from Repository without deleting local copy
  8. How to remove a file from version control without deleting it?
  9. How do I remove a file from svn versioning without deleting it from every working copy?
  10. What is the correct way to restore a deleted file from SVN?
  11. Reverting single file in SVN to a particular revision
  12. How to remove untracked files in SVN
  13. What do “branch”, “tag” and “trunk” mean in Subversion repositories?
  14. Day-to-day with Subversion
  15. SVN Error - Not a working copy
  16. Pushing an existing git repository to SVN
  17. How can I add an empty directory to a Git repository?
  18. Ignore files that have already been committed to a Git repository
  19. Push a new local branch to a remote Git repository and track it too
  20. Move existing, uncommited work to a new branch in Git
  21. How do I remove local (untracked) files from my current Git branch?
  22. Clone all remote branches with Git?
  23. Default behavior of “git push” without a branch specified
  24. How do you create a remote Git branch?
  25. How to delete a Git branch both locally and remotely?
  26. Push a tag to a remote repository using Git?
  27. How do I rename the local branch?
  28. Git fetch remote branch
  29. Checkout remote Git branch
  30. Change the URI (URL) for a remote Git repository
  31. What are the differences between 'git pull' and 'git fetch'?
  32. How do you undo the last commit?
  33. Difference between “git add -A” and “git add .”
  34. What is the Difference Between Mercurial and Git?
  35. Force Git to overwrite local files on pull
  36. pause git clone and resume later? [duplicate]
  37. How to convert a normal Git repository to a bare one?
  38. How do I convert a bare git repository into a normal one (in-place)?
  39. How to fully delete a git repository created with init?
  40. Find and restore a deleted file in a Git repository
  41. Remove a file from a Git repository without deleting it from the local filesystem
  42. How do I git rm a file without deleting it from disk? [duplicate]
  43. How to remove a directory in my GitHub repository?
  44. How can I remove a commit on GitHub?
  45. How to ignore certain files in git?
  46. Stop tracking and ignore changes to a file in Git
  47. Need to restore a deleted branch subversion


Tuesday, 21 June 2016

Building Artifacts Using Jenkins

Building Artifacts Using Jenkins

Often we deal with Maven based projects with multiple modules and Jar files, that are managed using different Maven Projects having their own Pom.xml.
This blog post guides how to configure such project on Jenkins.
I would strongly recommend you to go through my earlier post on Jenkins as that covers Jenkins Installation, WAR file building and  deployment to tomcat. Automation deployment using Jenkins
Below is the problem statement that we need to address :
Our Project is a dynamic web project
We have configured it for the deployment to the tomcat using Jenkins, but it fails to find child.jar that is managed by its own Pom.xml having its own SVN repository.
How we will resolve it :
We will create a separate job that will create this jar and copy it to the local maven repository (Maven repository of machine on which jenkins is running).
Upon successful copy of the artifacts this job will trigger another job of jenkins that is for the build and deployment of the war file in context.
Below are the steps for creating the Job that will create Jar and Trigger the Job for the deployment of WAR.
Steps to create the job
1  Click on create new job.
2  In the “Source Code Management” section enter the SVN url of the Jar Project.
3 In “Build Section” enter mvn params in Goal and Install option eg : clean install.
4 Click on button “Post Build Action”, select “Build another project option”, Enter the name of the project that you want to build upon successful packaging of the Jar.”
T2 Config [Jenkins] 2013-07-16 14-28-12

Automation deployment using Jenkins

Automation deployment using Jenkins

Application Deployment has always been on of the most tedious tasks in project development life cycle. This involves steps  like checking out the code from the Code Repository, building it using Maven/Ant and then deploying it to the remote servers. The process is more time/effort taking in case we have multiple servers like Staging,UAT,production.
Jenkins is an open source tool that helps us to automate and centralize the process of deployment.
In this post i am going to explain how to deploy WAR to Apache Tomcat using SVN as code repository and Maven as build WAR tool.
Step 1 : Configure Jenkins
Jenkins comes bundled as WAR file so you need to deploy it to an Application Server. I used Apache Tomcat version 7. Below are the steps.
1.1. Start the tomcat
1.2. Deploy the jenkins.war Download location for jenkins.war
Note : Some server restrict the size of war file to be uploaded and you get the common error while deploying it from the Tomcat UI Manager. Below is how i resolved it.
a. Open the file $CATELINA_HOME\webapps\manager\WEB-INF web.xml
b. Look out for the below entry, Size 61316110 worked just fine for me
<multipart-config>
<!– 50MB max –>
<max-file-size>61316110</max-file-size>
<max-request-size>61316110</max-request-size>
<file-size-threshold>0</file-size-threshold>
</multipart-config>
c. Restart Tomcat
1.3.  Go to the URL http://locahost:8080/jenkins ( Replace local host with the IP of your server )
1.4.  Add Deploy to container Plugin this can be done using the Plugin Manager feature of Jenkins. This plugin will be used for deployment of the war that we will build using Jenkins.
1.5.  Provide Maven Location to Jenkins – Jenkins interacts with Maven to build code that requires maven. Follow the below steps.
a. Download Maven Link for download
b. Place it on the server and copy its path
c  Go to URL http://localhost:8080/jenkins/configure ( Replace localhost with the IP of your server )
d. Provide name ( Any random name ) and location of your maven.
Configure System [Jenkins] 2013-07-16 12-50-02
Step 2 : Configure Tomcat for build deployment
We need to configure the server where we will be deploying our build ( This server can be same or different then the one where Jenkins is running ). This involves creating a user on tomcat and assigning him with manager-script role.
Just add the below lines under the node <tomcat-users></tomcat-users> to tomcat-users.xml file  and restart the server. This file is located in the conf folder of tomcat.
<role rolename=”manager”/>
  <role rolename=”admin”/>
  <user username=”harpreet” password=”harpreet” roles=”standard,manager,admin,manager-gui,manager-script” />
Step 3  –  Create a Job on Jenkins
3.1  Click On “New Job”
3.2 Enter the name for the Job
3.3 Select Build a maven2/3 project
New Job [Jenkins] 2013-07-16 12-59-18
3.4  Look out for section Source Code Management, select Subversion and provide the SVN url of project you will be prompted for your SVN credentials.
Paxcel Config [Jenkins] 2013-07-16 13-03-373.5   Look out for Build section Provide Maven build parameters e g . clean package.
Paxcel Config [Jenkins] 2013-07-16 13-06-26
3.6 : Add a post build action that will deployed the war file to remote tomcat server.  Follow the below steps
a. Click on button “Add post build action”
b. Select the option ” Deploy war/ear to a container”
c. Enter war file name and path
Important :  For every job Jenkins create a folder named as workspace in the jobs folder, this path needs to relative to workspace. Upon successful build war is copied in this folder, and deploy plugin picks it from here. If the war is not in this folder, or you failed to provide the correct path.War will not be deployed to the server.
d.  Enter the server detail and credential of the user that we created on the tomcat in above steps
Paxcel Config [Jenkins] 2013-07-16 13-14-20
3.7 : Save the job and hit “Build Now”,  Check the logs by clicking the ball icons on the left side of the screen.
TalentHuntV2 [Jenkins] 2013-07-16 13-16-46
There will be log traces like you have for Maven, when you build it using eclipse or command line args.
And few jenkins log signifying the deployment like below :
Deploying /home/gagan/.jenkins/jobs/TalentHuntV2/workspace/target/TalentHunt-1.0.war to container Tomcat 7.x Remote
3.8 :  Verifying you did it correct.In case you did everything right, you can test the deployment of your project on the servers.
Step 4 ( Optional ) – Enable Auto Build
You can also configure Jenkins to build your code automatically when anyone commits the code.
4.1 Look out for section “Build Trigger” section.
4.2 Select the event that you want for build trigger. I chose that Jenkins should poll SVN for an updates. If update in code is found it will trigger the build.
For this we just need to select option”Poll SCM” and enter appropriate time, i used pattern */5 * * * * .
T2 Config [Jenkins] 2013-07-17 10-17-42
Important : For configuration of project that use artifacts from other project please do read my next blog. Building Artifacts Using Jenkins
Jenkins can help you automate a number of other processes/tasks. WAR/EAR development is the one you can start, once you are into Jenkins am sure you will be tempted to explore it further.
You can post any issues that you get in Jenkins configuration on this blog, We at lokeshkumarbandi will be happy to help you.

Thursday, 16 June 2016

Resolve svn conflicts

SVN contains three different type of conflicts: Text Conflicts, Tree Conflicts and Property Conflicts.

Text Conflicts

Text conflicts happen when local changes have been made to a file and remote revisions also include changes to the file in such a way that it can not be automatically determined how they should be merged.
Sublime SVN includes a quick panel interface that allows for reviewing the different versions of the file, and then resolving the conflict by selecting one of the existing version, or editing the file. The options presented to the user include:
  • Review Old Repository Version
  • Review My Version
  • Review New Repository Version
  • Review/Edit Merged Version
  • Resolve with My Version
  • Resolve with New Repository Version
  • Resolve with Merged Version

Tree Conflicts

Tree conflicts happen when the working copy and remote revisions both perform actions on a file or folder such that it can not be automatically determined which action should be preferred.
This includes situations including:
  • Local delete, incoming edit upon update
  • Local edit, incoming delete upon update
  • Local delete, incoming delete upon update
  • Local missing, incoming edit upon merge
  • Local edit, incoming delete upon merge
  • Local delete, incoming delete upon merge
  • Folder added in both trunk and branch
The only way to resolve a tree conflict via SVN is to accept the current state of the working copy. Thus, if the local version of a file or folder is not the preferred one, the Revert command should be run on it.
Resolution of a tree conflict involving a folder added to both the trunk and a branch separately is slightly different. If the folder added to the branch is preferred, the working copy should be reverted to undo the merge, the folder should be deleted from trunk and then the merge should be re-run.

Property Conflicts

Property conflicts happen when local changes have been made to one or more propeties and remote revisions also include changes to those properties.
The only way to resolve a property conflict is to edit the property to the desired final value and mark it as resolved. Sublime SVN provides a quick panel interface for these operations that allow you to see the conflict, edit the properties and then mark them as resolved.
Example :
Tom decides to add a README file for their project. So he creates the READMEfile and adds TODO list into that. After adding this, the file repository is at revision 6.
[tom@CentOS trunk]$ cat README 
/* TODO: Add contents in README file */

[tom@CentOS trunk]$ svn status
?       README

[tom@CentOS trunk]$ svn add README 
A         README

[tom@CentOS trunk]$ svn commit -m "Added README file. Will update it's content in future."
Adding         trunk/README
Transmitting file data .
Committed revision 6. 
Jerry checks out the latest code which is at revision 6. And immediately he starts working. After a few hours, Tom updates README file and commits his changes. The modified README will look like this.
[tom@CentOS trunk]$ cat README 
* Supported operations:

1) Accept input
2) Display array elements

[tom@CentOS trunk]$ svn status
M       README

[tom@CentOS trunk]$ svn commit -m "Added supported operation in README"
Sending        trunk/README
Transmitting file data .
Committed revision 7.
Now, the repository is at revision 7 and Jerry's working copy is out of date.Jerry also updates the README file and tries to commit his changes.
Jerry's README file looks like this.
[jerry@CentOS trunk]$ cat README 
* File list

1) array.c Implementation of array operation.
2) README Instructions for user.

[jerry@CentOS trunk]$ svn status
M       README

[jerry@CentOS trunk]$ svn commit -m "Updated README"
Sending        trunk/README
svn: Commit failed (details follow):
svn: File or directory 'README' is out of date; try updating
svn: resource out of date; try updating

Step 1: View Conflicts

Subversion has detected that the README file has changed since last updated. So, Jerry has to update his working copy.
[jerry@CentOS trunk]$ svn up
Conflict discovered in 'README'.
Select: (p) postpone, (df) diff-full, (e) edit,
        (mc) mine-conflict, (tc) theirs-conflict,
        (s) show all options:
Subversion is complaining that there is a conflict with the README file, and Subversion does not know how to solve this. So Jerry chooses the df option to review the conflict.
[jerry@CentOS trunk]$ svn up
Conflict discovered in 'README'.
Select: (p) postpone, (df) diff-full, (e) edit,
        (mc) mine-conflict, (tc) theirs-conflict,
        (s) show all options: df
--- .svn/text-base/README.svn-base Sat Aug 24 18:07:13 2013
+++ .svn/tmp/README.tmp Sat Aug 24 18:13:03 2013
@@ -1 +1,11 @@
-/* TODO: Add contents in README file */
+<<<<<<< .mine
+* File list
+
+1) array.c Implementation of array operation.
+2) README Instructions for user.
+=======
+* Supported operations:
+
+1) Accept input
+2) Display array elements
+>>>>>>> .r7
Select: (p) postpone, (df) diff-full, (e) edit, (r) resolved,
        (mc) mine-conflict, (tc) theirs-conflict,
        (s) show all options:

Step 2: Postpone Conflicts

Next Jerry chooses the postpone(p) options, so that he can resolve the conflict.
Select: (p) postpone, (df) diff-full, (e) edit, (r) resolved,
        (mc) mine-conflict, (tc) theirs-conflict,
        (s) show all options: p
C    README
Updated to revision 7.
Summary of conflicts:
  Text conflicts: 1
After opening the README in text editor he realizes that Subversion has included both Tom's code and his code with conflict markers.
[jerry@CentOS trunk]$ cat README
<<<<<<< .min
* File list

1) array.c Implementation of array operation.
2) README Instructions for user.
=======
* Supported operations:

1) Accept input
2) Display array elements
>>>>>>> .r7
Jerry wants Tom's changes as well as his, so he just removes the lines containing the conflict markers.
So, the modified README will look like this.
[jerry@CentOS trunk]$ cat README
* File list

1) array.c Implementation of array operation.
2) README Instructions for user.

* Supported operations:

1) Accept input
2) Display array elements
Jerry resolved the conflict and he retries commit.
[jerry@CentOS trunk]$ svn commit -m "Updated README"
svn: Commit failed (details follow):
svn: Aborting commit: '/home/jerry/project_repo/trunk/README' remains in conflict
 
[jerry@CentOS trunk]$ svn status
?       README.r6
?       README.r7
?       README.mine
C       README

Step 3: Resolve Conflicts

In the above commit, the letter C indicates that there is a conflict in the README file. Jerry resolved the conflict but didn't tell Subversion that he had resolved the conflict. He uses the resolve command to inform Subversion about the conflict resolution.
[jerry@CentOS trunk]$ svn resolve --accept=working README
Resolved conflicted state of 'README'

[jerry@CentOS trunk]$ svn status
M       README

[jerry@CentOS trunk]$ svn commit -m "Updated README"
Sending        trunk/README
Transmitting file data .
Committed revision 8.

Wednesday, 15 June 2016

Critical Issues with Build Automation

Critical Issues with Build Automation

1. The Problem: Long Builds

The more components you add to your software, the more lines of code you maintain, and the more tests and routines you run as part of your build process- the longer the build will take to run. Also, some development environments and technologies, like C/C++, tend to be correlated with longer builds.
In Agile work environments, builds are expected to be run frequently and the organization depends on build output to guide development work. Long builds can be a big problem, and one that can “creep up” on a development team.

Why it matters?

Some builds are so long they can only be ran nightly. But even if your “normal” build takes just 20 or 30 minutes more than it should – there are significant costs to your organization:
  • Developer wait time – as we shown in a survey we conducted last year, most developers spend between 2-10 hours per week waiting for builds to complete – more often than not, this is lost time.
  • Context switch – while the build runs, developers switch away from the current task to other ones. If the build is long, developers lose context related to the problem they were working on, which reduces productivity. In other words, if builds are supremely fast, developers never have to stop thinking about the problem they are working on right now.
  • Build bottleneck – during intensive development phases such as “integration storms”, developers sync changes into the main code line and frequent builds are needed to get everything working. The less frequent the build, the less problems dev can solve every workday.
  • Product quality – when a developer commits changes that break something, it will only be discovered after the build runs. When the bug is fixed, again there is a lag waiting for the next build before QA can verify the fix. The less frequent the build, the less issues can be fixed and verified before the release – hurting product quality.

2. The Problem: Large Volume of Builds

Sometimes, individual builds run relatively quickly. But in some organizations, there could be dozens or hundreds of builds run in each dev/test cycle. This could be because numerous teams (sometimes thousands of developers) developing different software components, and each running its own build. Or, the dev team might need to deliver numerous versions of the software for different platforms, customized builds for different customers, etc. Some organizations have relatively short builds– but find themselves needing to support dozens, if not hundreds, of these builds at any given time.

Why it matters:

If you’re a build engineer responsible for running 500 builds, you’ll feel the pain even if each of them is 10 minutes long (=~83 hours build time without parallelization). While builds may be short, cumulatively they take very long to run.
But if you’re a developer or QA engineer using such as build system, and your build takes only 10 minutes, why should you care?
  • Limited build resources – because the organization is running large numbers of builds, you’ll find you have limited access to build servers during specific time windows, or the servers are often overloaded and builds will take much longer.When you rely on running builds often to get fast feedback and fix bugs, you’ll notice that having to “wait in line” for your builds to run hurts your productivity (particularly if you’re practicing Agile), and that you can’t move development fast enough because your waiting for a build server.

 3. The Problem: Complex Builds

Software projects use a large number of modular components: different frameworks, components developed by different teams or by 3rd-party partners, open source libraries, and so on. As your product evolves, there are multiple versions of your own code, and also multiple versions of these many components, creating a many-dimensional matrix of dependencies and supported behaviors. That’s where things get complex to build.

Why it matters:

Complex builds reduce the flexibility of the build process and makes it much more difficult to manage and run the build:
  • Complex builds are brittle – interactions between many different components often lead to manual error, broken builds and worse – builds that run correctly but introduce bugs due to partial or incorrect sources.
  • Extensive manual efforts – executing a complex build and delivering its results requires a substantial manual effort. Even if the build is automated, it is typically automated in small pieces/components, and there is no orchestration of the entire process.
  • Incremental builds are difficult– often you’ll want to run a partial build and re-purpose items that haven’t changed and were previously compiled. With complex builds – due to partially specified dependencies – an incremental run could break the build and teams are forced to run the entire build in all scenarios.
  • Legacy components and fear of change – complex builds tend to have legacy components written years ago by staff members who are no longer at the organization. This impedes changes or optimizations to the build, for fear of possibly breaking legacy components that are not well understood.
  • Complex builds are long – there is a correlation between the complexity of the build and the time it takes to run, which introduces additional issues as described above.

What Can Be Done?

If your organization experiences at least some of the problems we detailed above, the rest of this page will show you what you can do to solve them:

Quick Fixes to Improve Build Speed

In this section we’ll discuss a few common solutions to improving build speed, and take you through them step by step.
Note: In our experience the most severe long-build issues occur in C/C++, C#, .NET and related technologies. We’ll focus on two build systems used in these environments: Make and Visual Studio. But the same techniques are applicable to many other build systems.

Quick Fix #1: Upgrade Your build Server

Faster CPUs, a newer motherboard with improved hard-disk access speed, more or faster memory, can all positively affect build time. Most importantly, the number of CPUs on your build server affects the number of parallel build jobs in a single build.

Quick Fix #2: Run Builds in Parallel on One Machine

One way to make builds go faster is to run jobs in parallel on the same machine. If you have a fast build server, or can upgrade it by adding more CPUs, this can significantly reduce build time.
We’ll cover three ways of running builds in parallel:
  1. Using the “make -j” command
  2. Running Visual studio MSBuild projects in parallel
  3. Running MSBuild items in parallel.

What you need to know:

A few caveats regarding parallel builds:
  • Parallel builds are limited to one machine – depending on the number and complexity of the builds you are running, one machine may not be enough. Clustering the build across several physical machines is more complex, and we discuss it below in the Heavyweight Solutions section.
  • Parallel execution can break your build – in some cases, because of dependencies between different parts of the build, or between build projects, parallel builds will not run correctly. The more jobs you run in parallel, the more severe the problem will be. We explain this in more detail in each of the parallelization techniques described.

 How to run builds in parallel with make -j: 

  1. Run the build as usual, but add the -j operator to the Make command, like this: make -j x , where x is the number of jobs you want to run in parallel. For example, make -j 2 runs two jobs in parallel.
  2. We recommend starting by running 2 jobs in parallel, and see if the build succeeds. Then gradually increase the number of concurrent jobs, and see at which point the build fails. You will not be able to parallelize beyond that point – so use the highest level of jobs that can be ran safely. Also, avoid running more jobs than CPU cores you have available on the build server.
    • NOTE: Why do builds fail when the number pf parallel jobs increases?The reason the build might failwhen you increase the number of jobs, is that many Makefiles have implicit dependencies. If there is an explicit dependency defined in the Makefile, the -j command takes it into account and only builds a target if its dependencies have already been built (see this Stack Overflow question for more details on how Make -j handles dependencies).But if there are implicit dependencies, make -j behaves unpredictably. For example, if in the original Makefile you built some header files, and then built some objects which include these header files, but you did not define this dependency in Make, the build will still run properly in serial mode (because the headers are built first and only then you build the objects that depend on them). But then when you run make -j 2, there is a possibility that some of the header files will be built in parallel to the objects that include them. Then, those objects will not have the header files they need, causing the build to fail, or even worse, to appear successful, but have broken sources included in the compilation.This is a simple example – there are much more complex cases, especially with recursive Makefiles, in which it is very difficult to uncover that an implicit dependency actually exists.In our experience, implicit dependencies are very common. This is mainly due to the fact that because a serial build will run with implicit dependencies, developers will typically not bother specifying them. In these cases, the dependency doesn’t become apparent until you attempt to start running things in parallel. It takes a lot of time and expertise to untangle a Makefile with any complexity, and explicitly define all the dependencies.
      If there are implicit dependencies, the parallel build will succeed at times, and will fail at other times – depending on which jobs are randomly selected to run in parallel. The more jobs you run in parallel, the higher the probability that an implicit dependency will be violated and the build will fail, or pass and create a broken output.
  3. If the parallel build fails, or you uncover a problem in the build output, examine the logs. It is recommended to keep a log of a serial build that succeeded, to compare with the log of the unsuccessful build.
    • NOTE: Make -j writes logs in a different order than the original serial build.There are several logging options – writing each line as it is executed (which will result in an interleaved log with lines from different modules/components mixed up), or grouping by targets or recursive invocations. Whichever logging option you choose, the order in the log will depend on the randomly-selected order in which build commands run in parallel.This makes comparing the Make -j log to the regular serial log tricky: you will have to isolate the component that caused the problem, find it in the original serial build log, and compare them to see what went wrong.For more details on Make -j, see the GNU Make documentation for Parallel Execution.

 How to run MSBuild projects in parallel in Visual Studio: 

This option is different from Make -j, in the sense that it can run entire builds in parallel on different processors, but does not allow you to parallelize individual build items. So it’s only useful if you’re running several projects at the same time.
  1. In the Visual Studio IDE, on the Tools menu, click Options.
  2. Expand the Projects and Solutions folder, and then select the Build and Runproperty page.
  3. In the Maximum number of parallel project builds text box, define how many projects will be allowed to run in parallel. This shouldn’t be higher than the number of CPUs on your build server.
  4. Open the solution containing the projects that you want to build.
  5. From the Build menu, select Batch Build.
  6. In the Build column, check the build configurations of the projects you want to build.
    • NOTE: Dependencies between projects could result in inconsistent builds.Visual Studio does respect project-to-project references and builds the reference project before the referring project. However, if several projects have a shared reference, that reference is built only once and “cached” for the next times it is referenced. Also, errors and exceptions on one build do not affect the running of other builds, which may depend on the failing build. See the full list of parallel project considerations on MSDN.
  7. Click the button for the build action that you want to perform (Build or Rebuild).
  8. The project system performs the multiprocessor build action and displays the build output in the Output window.
See more details on MSDN about running parallel projects in the Visual Studio IDE, or via the command line with the /maxcpucount switch.

 How to run MSBuild build items in parallel in Visual Studio: 

This option is similar to Make -j, in that it builds individual targets within the build in parallel on different processors.
  1. In the Visual Studio IDE, open Project Properties.
  2. Select Configuration Properties, then C/C++ General.
  3. Select Multi-Processor Compilation. This specifies that the project should run build items multi-threaded.
  4. Select Projects and Solutions, then VC++ Project Settings.
  5. Set Maximum concurrent C++ compilations to the number of processors you have on your machine (or the number of processors you want to run the parallel build on).
    • NOTE: Visual Studio’s multithreaded compilation flag (/MP) is not compatible with some other build options, such as incremental compilation and precompiled headers. If the build has a conflicting option, it may not be executed at all on multiple threads.
  6. Now, when you run the build, it will execute in parallel on several CPUs.
For more details, see this Stack Overflow discussion and this MSDN blog post (showing a different method).

Quick Fix #3: Build Avoidance

Another approach to improving build performance is “build avoidance”, which reduce build times by rebuilding only the pieces that need to be rebuilt, and not the whole code base. Tools like ccache and ClearCase winkins have co-opted the term build avoidance, but they are actually doing object reuse. Object reuse is when you use objects that other people have compiled in order to skip compiling those yourself. Object reuse only works in narrow scenarios, and it can be a headache to manage. A more traditional build avoidance tactic is of an incremental build from the top of the build tree. It basically means finding all the sources in the code that have been modified since the last build, and just recompiling those.
Build avoidance makes a lot of sense because, typically, development focuses on a specific module or modules and not on the entire product, so there are no changes to the rest of the product and incremental builds will run relatively fast. It’s especially useful for builds ran by the developers themselves, who are usually working on one isolated component. Therefore, it’ll be wasteful to build the full project just to test the changes of one component that comprises only a small percentage of the entire code.
Two common tools to help you do build avoidance are Rational ClearCase and the open source CCache. These tools work by looking at timestamps in build outputs. If the build runs and there is a pre-existing .o file, which is older than the corresponding .c file, that means the .o file is out of date and needs to be rebuilt. Otherwise, these tools will leave that .o file in place (“avoiding” recompilation of that part of the software).

What you need to know:

A few caveats regarding build avoidance, before we show how to do it:
  • Not recommended for production builds – In most scenarios we have seen, dev teams did not rely on build avoidance in production. As we discussed in relation to parallel builds, there are implicit dependencies and relationships between Makefiles. In some cases, these dependencies will cause an incremental build to break. If it happens occasionally, this might be acceptable for development or CI builds, but you don’t want to deal with these issues when deploying a release into production.
  • Incremental builds might be slow, wasteful and unreliable for complex builds– The larger your build and the more heavily recursive are the Makefiles, the more likely it is that build avoidance will break the build, or the build will succeed but include broken sources. If you know your build to be complex, use build avoidance with caution, and test to make sure that the builds still runs correctly when changing different parts of the project.

 Running an incremental build with CCache 

CCache is an open source tool that helps with build avoidance, but only if you use the GCC compiler.
  1. Download and unpack the latest version of CCache.
  2. Run a full build using your regular compilation command, but prefix the command with the word ccache. CCache will “step in” instead of the regular compiler and run the build, while caching outputs for next time.
  3. The next time you run the build, make sure the results of the previous build are still in the target directory. You will now “re-build” on top of this previous build.
  4. Run your compilation command again, prefixing it with ccache. CCache will now run an incremental build, detecting which build sources haven’t changed and supplying them from the cache, while rebuilding sources that have changed.
  5. If the build was successful, test to make sure it ran correctly. Note that every time you make changes to a different part of the codebase, you will have to re-test to make sure the incremental build did not break anything.
  6. Examine your logs to see the performance improvement with CCache compared to running the full build.

Quick Fix #4: Use a RAM Disk

As you probably know, disk access is a major bottleneck in many computing operations, especially builds which require access to files stored on your build server’s hard drive. A simple solution to overcome this bottleneck is to install a piece of software called a “RAM Drive” (available as open source), and perform the entire build operation there. This moves all your build artifacts to RAM, and can significantly reduce build time.
By the way, an alternative to using a RAM disk is running builds on Solid State Disk (SSD) – but of course this requires investing in additional hardware.

What you need to know:

A few caveats to running builds on a RAM disk:
  • Performance improvement will depend on read/write vs. CPU intensive activity. If there are a large number of files and read/write operations, a RAM disk should yield a big improvement. But in some builds, most of the running time is due to CPU-intensive operations on the same files, without a large number of reads and writes. In this case, a RAM disk will not provide major improvement, and a better option is upgrading to faster CPUs or parallelizing the build across several cores.
  • A RAM disk is volatile and has limited space – you’ll probably need to copy build sources to the RAM disk every time you want to run a build, and after the build runs, copy the output off the RAM disk to avoid losing it if the machine shuts down, and to conserve space on the disk because memory is limited.
  • The RAM disk becomes a shared resource – in many organizations there are multiple teams running builds. If you have only one build server, the RAM disk becomes a resource that must be shared/prioritized between the teams. You’ll need to have some sort of scheduling or allocation system, and someone to manage it, because one RAM disk will not be able to accommodate numerous builds running in parallel.

 How to run your build on a RAM disk: 

  1. Find a RAM disk program that supports your build server’s environment. You should be able to find one that is freeware or open source. Here is a list of RAM drive software from Wikipedia.
  2. Install and configure the RAM disk.
  3. Copy build sources to the RAM disk and run compilation as usual.
  4. Copy build output off the RAM disk.
For a real-life example of RAM disk usage with a large build, see this article on Code Project (scroll down to Step 3).

Quick Fix #5: Precompiled headers in C++

In C++ projects, most .cpp files include large headers that do not change from build to build. Instead of needlessly re-compiling these lines of codes with every build, it is possible to pre-compile these headers to save build time.
Precompiled headers work similarly to the technique of build avoidance, described earlier. Header files, which are reused throughout the project are cached, and are reused throughout the build, saving compilation time.
Note that your compiler must provide support for that feature, and not all compilers do.

 How to precompile headers in C/C++: (in Microsoft Compiler) 

  1. Choose the headers you want to recompile, and start by compiling these headers using the /Yc compiler option (see documentation of this option in MSDN).
  2. A PCH (precompiled header) file is created. Save it for inclusion in subsequent builds.
  3. In your Makefile, include the PCH files instead of the actual sources. An alternative method is to use the /Yu compiler option (see documentation of this option). Make sure you conform to the consistency rules for using PCH files – most importantly, you must use the PCH files on the same system environment you created them on.
  4. Run the build and test to see that the precompiled headers are included correctly.
See a visual example of a build process with precompiled headers, and a sample Makefile with use of PCH files, provided by MSDN.

Heavyweight Solutions for Long or Complex Builds

The previous quick fixes we discussed are all limited – either because they provide only a small performance improvement, or because they can cause the build to break or run inconsistently.
In this section we’ll quickly review solutions that are more complex to implement, but can provide bigger benefits in terms of build speed or consistency.

Solution #1: Distributed Builds

This is the obvious solution to long builds – distribute the problem across several physical machines. This is similar running builds in parallel on one machine with Make -j and similar techniques, but here the build tasks are distributed across a cluster of machines, which is more complex. An open source tool typically used to do this is Distcc (note, thought, that it is a limited type of distribution which only distributes compile processes).

Considerations when implementing distributed builds:

  • Complex setup and investment in hardware – a distributed build requires you to procure and set up several dedicated machines, install the distributed build software on all of them, make sure they communicate with each other correctly, and manage and update this infrastructure over time.
  • Implicit dependencies can break the build – The same limitations we discussed earlier regarding Make -j apply here: if your build has dependencies that are not explicitly defined in the Makefiles, the parallel build will fail sometimes, if by chance build items are run in parallel to their dependencies. The larger the build cluster, the higher the probability of errors in the build.
  • Shared drive becomes a bottleneck – in almost all distributed build techniques, there is a shared hard drive that the machines in the cluster use to read and write build artifacts. Very often the same files – for example, header files – are shared and used across many build items, and so are accessed in parallel by several machines in the cluster. We’ve worked with several development teams who have attempted “DIY” distributed builds, and have seen that at some point (usually around 10 machines) the shared disk access becomes a bottleneck and it’s impossible to scale further.
  • Clock synchronization can break the build – Make depends heavily on timestamps to manage the build process. When running on several machines, small differences in clocks between the machines can lead to a wrong decision. For example, if a target file was built on a machine with a clock that is 2 minutes earlier, another machine might find this file and decide it is “old” and should be built again, resulting in major inconsistencies. This issue can be resolved by precisely synchronizing the clocks of all machines participating in the parallel build.
  • Node failure breaks the entire build – for example, if one of the machines in the cluster is restarted, or its operating system crashes, the build items belonging to that machine won’t run, and the build will likely fail.
  • Overhead of invoking jobs – the time taken to invoke jobs (e.g. with ‘rsh’) can become prohibitive as the cluster grows, and will cancel out some of the performance improvement.

Solution #2: Manually Partitioning Makefiles

Some organizations have taken the extreme step of manually breaking up a build into “components” – a small number of self-contained steps that can be run in parallel on different machines or in different stages of the development process. This is different from distributed builds in which the same build structure runs on multiple machines, resulting in consistency problems. Here, Makefiles are first re-architected to enable them to run separately and still build correctly.

Considerations when partitioning Makefiles:

  • Requires refactoring your entire build process – Makefiles need to be rewritten and re-organized, which can be difficult and error-prone. If you have a large system of legacy Makefile, you will need to analyze exactly how they are built and which dependencies exist, and untangle them – often amounting to a massive undertaking.
  • Requires refactoring your code base – because the build will now be running in several completely separate components, source code needs to be re-architected as well to make sure there are no includes or dependencies in the code between the different components.
  • Requires specialized expertise – you will need staff with advanced knowledge in Makefile internals to re-architect the build. This is a special skill set that is not possessed by most developers or build engineers.
  • Typically yields only a small speedup – because of the difficulty involved, Makefiles will be partitioned into a small number of components, limiting the ability to parallelize or scale up the build.
  • Requires ongoing maintenance – after you artificially break up the build – and effectively your entire software project – into components, you’ll need to make sure no future changes violate this partitioning. This introduces complexities in ongoing development and requires supervision by the build team to make sure there are no cross-chunk dependencies.

Solution #3: Optimizing Makefiles

In this solution, the build is kept in one piece, and the Makefiles are rewritten to make them run more efficiently. While this is relatively a smaller effort than partitioning Makefiles (because the overarching structure of Make remains largely the same and refactoring of the codebase isn’t typically required) – Makefile optimization is a “black art” that only few developers have mastered.
To achieve meaningful performance gains, you will need detailed information on what is wrong in the build process, and where exactly do the bottlenecks lie, which requires an in-depth analysis of the Makefiles and their explicit/implicit dependencies. At the outset of the project you will not know the underlying issues or the performance improvements you could achieve. But in some cases Makefile optimization can yield a major performance improvement.

Considerations when optimizing Makefiles:

  • Requires specialized expertise and a major effort – make sure you have someone on staff who is proficient in Makefiles internals and has a lot of time to devote to the project.
  • You need to know where the problem lies – this is often the most difficult part of Makefile optimization, especially in large and complex builds.
  • Requires ongoing maintenance – even after Makefiles are optimized and a performance improvement is achieved, the build process will tend to “drift” towards an un-optimized state, as new build artifacts and new Makefiles are introduced. From time to time, the Makefiles will need to be optimized again as new inefficiencies are introduced.

Solution #4: Unity Builds

A “Unity Build” is a build that includes all .cpp files into a single compilation. We have seen cases in which this method provided major improvements in build speed.
Unity builds are faster because they avoid reparsing common headers across compilations. For example, if you have two compile steps for “foo.obj” and “bar.obj” and they both include “MyMonsterHeader.h”, and those two compiles happen separately, then your build as a whole ends up reading and (critically) *parsing* MyMonsterHeader.h twice.  If you slurp that all into a unity build, and you’ve done everything else correctly, then that header is only read and parsed once, no matter how many source files include it.
For more details see this blog post and screencast by OJ Reeves, and several other approaches in this Stack Overflow discussion.

Considerations when implementing unity builds:

  • Requires refactoring your entire build process – Makefiles need to be rewritten and re-organized to support the unity build structure. For complex builds this can be an enormous effort.
  • Required refactoring your code base as well – Unity Builds requires changes to actual source files. So if a .cpp file includes a certain header, that header will probably not be in the same place or the same structure. You will need to revise large portions of the code base to make sure nothing breaks in the transition to Unity Build.
  • A cheap alternative is a RAM disk – Our explanation on how to run your build on a RAM disk addresses the same issue of the I/O bottleneck. While it provides a smaller performance improvement, it is also easier to implement by an order of magnitude.
  • Also note that Unity builds basically break your ability to do incremental builds or build avoidance, because everything is recompiled in one shot.

Huddle – Electric Cloud’s Free Build Acceleration Technology

If you’ve read this far, you’ve learned a lot about traditional solutions to the challenges of long, numerous or complex builds. Most of the traditional approaches are either limited in the performance boost they provide, or may result in broken builds, especially in more complex scenarios.
Electric Cloud’s build acceleration solutions have been in use for over a decade in organizations like Dell, Ericsson, Oracle and VMWare – helping them tackle the most complex build scenarios. We’ve recently launched Huddle – a free-forever version of our ElectricAccelerator product.Huddle gives you free build acceleration WITH automatic dependency detection– to ensure build accuracy with no overhead or development changes on your end.
  • Get 20X build acceleration for free– even for the most complex builds.
  • Based on our proven ElectricAccelerator technology.
  • Guaranteed accuracy of the build – Parallelization can expose dependency problems and result in incorrect or broken builds. Our patented dependency detection technology discovers dependencies in Makefiles, even if they are not explicitly defined, and makes sure that even the most complex builds run correctly every time.
  • Seamless, automatic build parallelization – Huddle automatically parallelizes the build across clusters of physical or virtual machines, with no need to configure, monitor and maintain the build cluster.
  • No need for a dedicated data center – Huddle taps into your team’s unused CPU capacity to form a virtual build acceleration pool. You can add or remove machines from the build cluster instantly, with no change to configuration.
  • High-performance, distributed, virtual file system – Huddle stores build files in a virtual file system, overcoming the shared-disk bottleneck of most traditional approaches.
  • Visualization of build process – Huddle monitors and reports on all your build and test jobs, so you know which builds are running, how long they take, their results, and more.
  • Plugs into existing tools – Huddle works seamlessly with your existing build tools, including GNU make, NMAKE, Visual Studio, CMake, Jenkins and Bamboo.