Ambrose Software Inc.

Secure Groovy Script Execution in a Sandbox

Charles Chan — Thu, 18 Mar 2021 23:41:30 GMT

Lately, I was looking into executing untrusted Groovy Script in a sandbox. I was never a security guy, so the process was a valuable learning experience. I will use this article to summarize my findings.

Java Sandbox

At a high level, a Java sandbox relies on the following three components working together:

Bytecode verifier
ClassLoader
Security Manager and its policies

The bytecode verifier ensures that the compiled class files are valid and do not exploit the virtual machine. Class loaders are created hierarchically. If you imagine a tree of class loaders, the “leaf” class loaders can only access their classes and their parents’ classes on the same branch. Using this structure, you can execute code in a restricted environment by using a custom class loader. Finally, a security manager provides runtime permission checks based on a custom policy file.

Project Requirements

As everyone knows, there are tradeoffs between convenience and security. Finding the right balance requires discussions and collaborations with all key stakeholders. Before we dive into the article, let us assume that these are the facts we are dealing with:

Today, Groovy scripts are written for a very specific context. There are hundreds of them. They are executed in the same VM as the main program and they have access to classes from the main program. In other words, they are trusted.
Going forward, that specific context may be opened up to untrusted parties. We need a plan to secure the environment to run untrusted Groovy script in the future.

Given these facts, we have decided to focus on runtime permission checks and optionally source level checks using a Security Manager and various Groovy compiler customizations.

Steps to Secure the Groovy Script Execution environment

Step 1: Security Manager and Policy File

First and foremost, we need to install a security manager to perform runtime permission checks. You don’t want a Groovy script to invoke System.exit(0) if it is running in the same VM. A security manager works in tandem with a policy file. A policy file declares the permissions granted to each codebase. Codebase is basically a URL that describes where the code is loaded from. Fortunately, all Groovy script interpreted by the Groovy interpreter has the file:/groovy/script codebase or the file:/groovy/shell codebase. So, a policy file that grants all permission to your running program but not Groovy would look like this:

grant codeBase "file://<your jar file>" {
  permission java.security.AllPermission;
};
grant codeBase "file:/groovy/shell" {};
grant codeBase "file:/groovy/script" {};

Once we have the policy file, we can apply it by adding the following command line arguments:

-Djava.security.manager -Djava.security.policy=<policy file>

Step 2: Managing Policy Violations

If you are the lucky few in a green field development, there is no need to worry about this. But if you are, like me, trying to secure an existing environment, there are bound to be exceptions. The existing Groovy scripts may require permissions that are not granted in the policy file.

Instead of changing the policy file to grant the missing permissions, I suggest you make use of the AccessController.doPrivileged call. This method executes the block of code in a “privileged” manner, which means no permission checks. It is important that this block of code be as small as possible. The reason I advocate for this approach is that a method call is easily identifiable and can therefore be rectified in the near future.

Next, we will look at the specific measures that Groovy provides to help secure script execution.

Step 3: Groovy Binding Restrictions

Groovy Binding is a mechanism to pass variables into a Groovy script. If the Groovy script is untrusted, we must control what can be passed into the Groovy script. This is especially true because not all 3rd party library APIs are protected by permission checks.

Step 4: Apply Groovy Script Compiler Configurations

A Groovy script compiler can also be customized by different compilation customizers. For example, an ImportCustomizer can restrict what a Groovy script can import and a SecureASTCustomizer attempts to limit what construct a Groovy script can use. Unfortunately, both of them can be workaround quite easily as described in this article: https://kohsuke.org/2012/04/27/groovy-secureastcustomizer-is-harmful/(by Kohsuke Kawaguchi from Jenkins). This is a good article to read to understand the limitation of these customizer.

In the end, Mr. Kawaguchi is kind enough to publish the Groovy sandbox project on Github: https://github.com/jenkinsci/groovy-sandbox and the Script Security Plugin (Jenkins specific): https://github.com/jenkinsci/script-security-plugin. These two projects are definitely worth looking into for possible integration into your project.

Other Considerations

Besides the above considerations, you need to also manage how much CPU and memory can a Groovy script consume. If this is your concern, you can execute the Groovy script in a different thread than your main program. You can use a monitoring thread to kill the Groovy script thread if necessary.

Running your Groovy script in a separate container can also provide further protection but it is more costly to operate and cannot be easily rectified into an existing implementation.

Final Thoughts

Executing untrusted code is by definition a dangerous proposition. Your options to secure the execution environment may be limited by an existing implementation. No two projects are the same. By summarizing my findings here, I hope you can also benefit from them.

PS

I learned many things during this exercise. It is hard to incorporate them into the article’s flow. So, I will list them here instead:

The Java Security Manager is a System level setting. To control permission checks in a per-thread basis, you must write your own Security Manager and use ThreadLocal to conditionalize the permission checks.
Similarly, a custom Security Manager can use together with a custom class loader to provide more flexibility.
Java access control is based on codebase, i.e. where the code is loaded from. I found that running a Groovy script inside a JUnit test do not have the file:/groovy/shell or the file:/groovy/script codebase. Instead, they inherit the JUnit’s class’s code base. I have spent endless frustrating hours debugging something that should have worked.
Similarly, during development, a Spring Boot application running with the Maven Spring Boot Plugin has a different code base. You should grant permissions to both: ${user.dir}/target/classes and ${user.home}/.m2/repository
Policy can be added programmatically as well as using a policy file.

Migrate Your Flutter App to Null Safety

Charles Chan — Sat, 23 Jan 2021 17:04:00 GMT

Original article was published on Medium.

When I read about Flutter/Dart’s null safety beta, I was really excited to give it a try. This article talks about my experience migrating my Flutter Web application to fully embrace this exciting new feature.

Let’s get started.

Upgrade Dependencies

I started by building my old project “Carry On Baggage Allowance Calculator” from scratch (Old as in late 2020). Here’s a fragment of my pubspec.yaml :

environment:
  sdk: ">=2.7.0 <3.0.0"

dependencies:
  flutter:
    sdk: flutter
  flutter_localizations:
    sdk: flutter
  json_annotation: ^3.1.0
  intl: ^0.16.1
  yaml: ^2.2.1
  ...

With a fresh install, I did a % flutter pub get and was immediately greeted with an error:

Because luggage_finder_app depends on flutter_localizations any from sdk which depends on intl 0.17.0-nullsafety.2, intl 0.17.0-nullsafety.2 is required.So, because luggage_finder_app depends on intl ^0.16.1, version solving failed.Running "flutter pub get" in luggage_finder_app...pub get failed (1; So, because luggage_finder_app depends on intl ^0.16.1, version solving failed.)

Although it was claimed that null safety is an opt-in feature, a fresh install still somehow forced me to upgrade my dependency. Anyway, I was glad that the error message was very useful and the change seemed easy. So, I replaced the line:

intl: ^0.16.1

with:

intl: ^0.17.0-nullsafety.2

Now my % flutter pub get was clean and the app worked fine without any changes. That’s great! For any kind of migration exercise, it is always a good idea to start with a working baseline.

But hey, you asked, we haven’t turned on null safety yet! Yes, you’re right. Don’t celebrate too early. Change the line:

environment:
  sdk: ">=2.7.0 <3.0.0"

to:

environment:
  sdk: ">=2.12.0 <3.0.0"

We were officially in the null safety mode now. But we were not done with dependency upgrade yet. Looking through ALL of our dependencies, we noted a few of them have already published their null safety versions. Let us upgrade them as well. For example, I also upgraded:

json_annotation: ^4.0.0-nullsafety.0
yaml: ^3.0.0-nullsafety.0

If your dependency provides a builder, don’t forget to run them again (and after every code change that may affect the result):

% flutter packages pub run build_runner build

As soon as you changed the environment to >=2.12.0 , your IDE should immediately flag null safety issues. Let’s tackle them one by one here:

Constructor Changes

Let’s start with fixing the constructors:

Migrating a constructor to null safety

Kudos to the Dart/Flutter team. The error message I saw above was concise and easy to understand.

The first parameter Key is a parameter to the parent StatefulWidget class, let’s take a look at what it expects:

abstract class StatefulWidget extends Widget {
  /// Initializes [key] for subclasses.
  const StatefulWidget({ Key? key }) : super(key: key);

Do you notice the ? next to Key ? It means that the parameter is an nullable field. So, we can just denote the same in our own parameter in our constructor.

For the next parameter appDataUrl , it must not be null because that is where the application retrieves its data to function. To enforce non-null parameter, we changed the @required annotation to the required keyword. So, the modified constructor looked like this:

class HomePage extends StatefulWidget {
  HomePage({Key? key, required this.appDataUrl})
    : assert(appDataUrl != null),
      super(key: key);

Map Access

The next error I saw was related accessing elements inside a map.

Migrating a map access to null safety

Although it wasn’t mentioned in the here or there, it makes sense that a map may not contain the element you need with the given key and must therefore return you a null object. Since I knew an element will always be found using the key, I simply denoted the non-nullable result by adding !at the end:

airlines.forEach((airline) {
  selectedAirlineLuggageConstraints[airline] =        
    appData.airlineLuggageConstraints[airline]!;
});

True Nullable

Ideally, we don’t ever want to deal with nulls. However, there are cases where having a null object makes sense. Case in point: Element retrieval from a map as seen above. So, how do we denote that it’s okay to return null? Here’s my next error to fix:

Migrating a true nullable to null safety

Again, kudo to the Dart/Flutter team, the error message is easy to understand. Assuming you are not going to change the logic of your original program, the change is easy. Btw, it is important to NOT change any program logic during migration. This helps you identify issues due to migration, not your logic change.

Simply add a ? to tell the compiler that a null value can be returned from the method (I also took the opportunity to mark my function variables as nullable):

DimensionConstraint? _findSmallestFitting(
  List<DimensionConstraint> dimensionConstraints) {
    double? width, height, depth, weight;
    if (dimensionConstraints.isEmpty) {
      return null;
    }

Changing a function signature almost always have a cascade effect to the call centres. In my case, it generated the following errors.

Simply append ? to the variable to denote a nullable value:

final DimensionConstraint? maxPersonalItem =      
  _findSmallestFitting(personalItemConstraints);

Callback Function

A side effect to null safety is that the function signature is also controlled more tightly. For example, here’s my next error:

Migrating a callback function to null safety

It was compiling and working before, but now, it’s complaining not just about null safety but about the type entirely. To fix this, we can follow the recommendation of the error message and change the type:

final void Function()? onPressed;

@overrideWidget build(BuildContext context) {
  return OutlineButton.icon(
    shape: RoundedRectangleBorder(
      borderRadius: BorderRadius.all(Radius.circular(10))),
      onPressed: this.onPressed,

Operator Overloading

Most of our changes so far are straightforward and can be understood very easily. However, don’t get carried away. Especially when you are dealing with null aware operators and operator overloading. Look at my following error as an example:

Migrating an overloaded operator to null safety

The original code above clearly indicates that the variable name can be null, otherwise, what is the point of using name? to access the map function. However, the Dart compiler did not flag the variable name as nullable. Let’s take a look at why.

It turns out that the Yaml 3.0.0-nullsafety.0 version has this line:

@override
dynamic operator [](key) => nodes[key]?.value;

Hmm.. So, if the nodes[key] is null, it should return a null value. But the Dart compiler let it pass as if it will always return a non-null value. Is it because we cannot express nullable result in an overloaded operator? If you know the answer, let me know.

Anyway, the correct fix should be a combination of ? to denote all nullable variables.

final YamlMap? name = item['name'];
final Map<String, String>? nameMap =
  name?.map((locale, value) => MapEntry(locale, value));

Clean Up

After you are done with the conversion, there are cases where null check is no longer necessary. So, let’s do some clean up. For example:

Clean up unnecessary null assertions

Let’s change this to:

Dimension(this.width, this.height, this.depth, this.weight);

and my weight variable (which is nullable) to:

final double? weight;

Wow! Good riddance. This is what we want to see. Clean and concise code without worrying about runtime null safety.

Conclusion

In this article, I demonstrated how to migrate my pet project to use the null safety feature. Kudos to the Dart/Flutter team, the error messages are concise and easy to follow.

As a programmer, switching to null safety gives you extra confidence about your variables like type safety does. As you migrate, you might also uncover potential bugs that didn’t catch your eyes before.

I was very excited to see null safety landed in the Dart/Flutter land. As this article said, this is a major milestone for the language and the community.

Ghost & Gatsby — A perfect blogging platform

Charles Chan — Fri, 04 Dec 2020 22:35:18 GMT

Original article was published on Medium.

Do you want to have a blogging platform that is both beautiful and manageable? In the past, you are limited the clunky Wordpress or the snappy but not so user friendly static site generators. Recently, however, combining Gatsby and Ghost has finally brought about the revolution we need to take personal blogging to the next level.

Gatsby

Gatsby's command line interface to generate static HTMLs (Image by author)

Gatsby is basically a static site generator. It is often compared with Jekyll and Hugo. The primary difference is the language they are written in. Gatsby is written in JavaScript using the React library. So, if you are a React developer, it's the easiest to pickup.

What these site generate have in common is that they all take source files of certain format (e.g. Markdown) and pre-render them into HTML and CSS. The result is a highly optimized website that is built for speed and SEO (Search Engine Optimization).

What's the big deal? You may ask. Writing HTML is easy enough. However, consider this, if your website serves any image at all. You will want to optimize your images so that they look the best on any device sizes. Now, you are entering the realm of responsive images, something not for the faint of heart. Gatsby does it all for you! 🎉

I have been using Gatsby for little over two years and I have a love and hate relationship with it. On one hand, it is very handy and it creates excellent result. On the other hand, it is lacking a lot in terms of management capabilities. You are basically dealing with Markdown files and JavaScript files. There is also no concept of a lifecycle of a post (i.e. draft vs scheduled vs published).

Ghost

The Ghost Post Management Screen (Image by author)

Ghost is a modern publishing platform that provides a hosting service. Think of it like a personal Medium. To use their hosting solution, you do have to pay for it. However, the generous Ghost community provides the source code for free and recently, they release a new game changing feature that is reshaping personal blogging. This Ghost front-end allows you to use Ghost as a headless CMS while keeping your favorite front end (i.e. static site generator).

In the remainder of this article, I will talk about this integration.

Integrating Gatsby and Ghost

Gatsby and Ghost Integration (Image by author)

In this diagram, you can see the relationships among the different systems. Each component provides a crucial functionality. My setup looks like this, where go-rest-repeat is the root of my repository.

Gatsby + Ghost folder structure (Image by author)

Ghost Server

To install the Ghost server locally, use the following commands:

$ npm install ghost-cli@latest -g

Then in an empty folder (in my case ghost), type:

$ ghost install local

This will install the current version of Ghost under the folder. Go to the URL: http://localhost:2368/ghost to continue with your setup. I suggest a minimal setup for now as we are still building the system.

While you are there, make sure you also create a Custom Integration point so that you get the Content API key and the Admin API key.

Custom integrations for Gatsby (Image by author)

We can now move onto Gatsby setup.

Gatsby

Since we are using the Ghost server as a headless CMS, the look and feel of our pre-rendered website is really detached from Ghost itself. In order to preserve as much of the Casper theme as possible, I picked the styxlab/gatsby-starter-try-ghost starter. At the root of my repo, I typed:

$ git clone https://github.com/styxlab/gatsby-starter-try-ghost.git gatsby

API Keys

The starter is setup to retrieve content from Ghost using some GraphQL. To setup the location of the Ghost server, modify the .ghost.json file to include the API keys you have just generated:

{
  "development": {
    "apiUrl": "http://localhost:2368",
    "contentApiKey": "86b94101df2a031b9aaacb0ab9"
  },
  "production": {
    "apiUrl": "http://localhost:2368",
    "contentApiKey": "86b94101df2a031b9aaacb0ab9"
  }
}

Members Plugins

Since we are not running on the Ghost infrastructure, we don't have the membership functionality. So, we need to remove the members plugin:

$ npm uninstall gatsby-theme-ghost-members

Comment out the same plugin in gatsby-config.js file.

Image Plugins

To host your own images, you need to use the gatsby-rehype-inline-images plugin. Use the following command to install it:

$ npm install gatsby-rehype-inline-images

Then, add the plugin into your gatsby-config.js file:

plugins: [
    
    {
        resolve: `gatsby-transformer-rehype`,
        options: {
            filter: node => (
                node.internal.type === `GhostPost` ||
                node.internal.type === `GhostPage`
            ),
            plugins: [
                {
                    resolve: `gatsby-rehype-ghost-links`,
                },
                {
                    resolve: `gatsby-rehype-prismjs`,
                },
                {
                    resolve: `gatsby-rehype-inline-images`,
                },
            ],
        },
    },
    
]

That is all the setup you need on Gatsby. When you run gatsby build or gatsby develop next time, it will query the Ghost server, retrieve all of the posts, download images, and pre-render the static HTMLs into the /public folder.

Netlify

Last but not least, we will deploy our shiny website onto a CDN. We are grateful to have Netlify hosting our content for free. First, let's install the Netlify command line interface:

$ npm -g install netlify-cli

When you are in your project folder, login to Netlify with:

$ netlify login

Finally, deploy to Netlify ( -p means production):

$ netlify deploy -p

Netlify will proceed to ask you a few questions. Once you are done answering, your site will be uploaded and reachable on the internet.

Tips

Remove unused images

Ghost accumulates uploaded images even when they are not used. So, before I backup, I run the ghost-purge-images command to remove unused images. You can install this tool using:

$ npm install -g ghost-purge-images

Use the following command to purge these unused images:

$ ghost-purge-images purge --content-key=<content API Key> --admin-key=<Admin API Key>

Backing up your work

You don't want to lose your hard work in case something goes wrong with your company. Make sure everything under the ghost/content folder is included in your backup solution. I just make sure it's included in my repository.

Conclusion

The combination of Gatsby, Ghost, and Netlify provides a free personal blogging platform. The ease of use overweighs the little up front setup time. For bloggers who want complete control, there is a fear of locking down to the Ghost database. For other people, however, this can a perfect setup.

A Text Summarizer in Rust

Charles Chan — Thu, 26 Nov 2020 00:27:03 GMT

Original article was published on Medium.

Motivation

I read a lot of online news sites to help me catch up with the latest stories. From time to time, I found that some news stories from different sources are simply rephrasing each other. This happens sometimes with technical articles too, but I digress.

This love of reading news stories lead me into the idea of text summarization. Wouldn’t it be great if I can summarize across multiple news source and let me gather all the information?

I have another goal in mind. I love computer languages. Rust has been on my radar for a long time but I have never come up with a good excuse to learn to use it. I figured that a data process program like text summarization could be a good fit to learn about the gory details of memory ownership.

Text Summarization in Short

There are basically two techniques when it comes to text summarization: Abstractive and Extractive.

Abstraction-based summarization takes words from the original article based on semantic meaning. It can sometimes pick words from another source if the words fit the meaning. The idea is not unlike how a human would have summarize a piece of text. As you can imagine, this is not an easy problem and would almost require some form of machine understanding.

Extraction-based summarization takes a different approach. Instead of trying to understand the underlying text. It uses some mathematical formulas to rank each sentence from the article and output only sentences that are above certain score. This way, the meaning of the original text is mostly preserved without coding the machine to understand.

In this article, we will use an extraction based summarization technique. It’s perfect for individual developer, like me (or you), to experiment with and to appreciate the problem.

Let’s Start Programming!

Programming is fun. Learning a new language along the way is extra fun. It’s no secret that Rust can be intimidating. That’s okay. It only means that more people are there to help because we’ve all been there.

I don’t claim to be a Rust expert. In fact, this is my first meaningful Rust program. So, if you find anything unorthodox, please let me know! With that said, let’s start programming.

Breaking Down the Problem

As with any problem in the world, it is always a good idea to break down the problem into smaller ones and conquer each one separately (and savour successes along the way). With text summarization, I can think of the followings subproblems:

Turning a paragraph into sentences and words
Calculate sentence similarity
Sentence ranking
Putting it all together

Turning a paragraph into sentences and words

Now, this seems like an easy problem to solve… until you realize that maybe you want to also consider languages other than English. When it comes to non-English language, you really have only one option — Use a Unicode library. Luckily Rust has exactly that. It’s called unicode-segmentation . Let’s go ahead and add that into our Cargo.toml file:

[dependencies]
unicode-segmentation = "1.3.0"

Once the library is installed, we can use the following code fragment to get the sentences and words out of a paragraph.

let sentences = text.unicode_sentences().collect::<Vec<&str>>();
let mut sentences_and_words = vec![];
sentences.iter().for_each(|&sentence| {
  let words = split_into_words(sentence);
  sentences_and_words.push(words);
});

fn split_into_words(sentence: &str) -> Vec<&str> {
  let mut result = vec![];
  let words = sentence.unicode_words();
  for word in words {
    result.push(word);
  }
  result
}

In the code above, you should notice a few Rust features that stand out:

Memory Ownership

Did you notice the little & symbols? In Rust, memory is managed by a set of ownership rules. At any point in time, each value (a piece of memory) can only be owned by one variable. If you need to share the value with other parts of the program, you have to let them borrow it. The syntax for borrowing is the & sign. For more information, refer to the Rust Documentation.

Mutability

The mut keyword means that the sentences_and_words vector (yes, vec! is a macro to create a Vector object) is mutable. I like languages where the default for a variable is immutable. Mutability should be an opt-in, not the default.

Closure

The second feature is the use of closure in|&sentence| {...} . Finally a language that doesn’t use => or -> for closure. I do hope that Rust would adopt either -> or => to reduce the transition efforts but hey I am not a language designer.

String vs str

Do you also notice that we are using str (or specifically &str ) instead of the more popular String in other languages? In fact, Rust also have a String type. The comparison between str and String is similar to C++’s char* vs std::string . It takes some getting used to for developers from Java or JavaScript where a single String type is used. Luckily, converting between the two types is relatively pain free (but not automatic, unfortunately).

Trait

Oh, one more thing. Did you notice that once we have installed the library, the Rust string suddenly got a couple of new methods, e.g. text.unicode_sentences() and sentence.unicode_words() ? This is a really powerful feature called trait. Considering that Rust is a zero cost abstraction system. This is quite an achievement.

Calculate sentence similarity

Now that we have the basic data structure, we can move on to the next problem: Calculate sentence similarity. Before we dive into the code, let’s introduce a simple concept: cosine similarity.

Cosine Similarity

Cosine similarity is a measure of similarity between two non-zero vectors of an inner product space. It measures the cosine of the angle between two vectors projected in a multi-dimensional space. The smaller the angle, higher the cosine similarity.

Now, what does it mean to text summarization? What is in this vector? Given that we are trying to find how similar are two sentences, it probably makes sense to store in the vector the frequency of each word as it appears in each sentence.

Sentence vector

If you are OK with that, let’s go ahead and build the sentence vector. Instead of going through every line of code, let’s focus on the following function:

fn get_sentence_vector(sentence: &[&str], all_words_lc: &BTreeSet<String>, stop_words: &[&str]) -> Vec<usize> {
  let mut vector: Vec<usize> = vec![0; all_words_lc.len()];
  for word in sentence {
    let word_lc = word.to_lowercase();
    if !stop_words.contains(&word_lc.as_str()) {
      let index = all_words_lc.iter().position(|x| x.eq(&word_lc)).unwrap();
      vector[index] += 1;
    }
  }
  return vector;
}

This function takes 3 arguments:

A sentence (which is a list (or slice in Rust) of words &str )
A set of words in lower case all_words_lc . This is a set of words that are collected from the 2 sentences we are going to compare.
A list of stop words (words that are not important, in English, they are like ‘a’, ‘this’, ‘is’, etc.).

Inside the function, it calculates the frequency of occurrence of a word inside the sentence. The resulting vector has the same size as the all_words_lc set. The number in each element represents the number of times the word at that index in the all_words_lc set appears in the sentence.

Sentence similarity

Using the sentence vectors, we can finally calculate the sentence similarity:

fn sentence_similarity(s1: &[&str], s2: &[&str], stop_words: &[&str]) -> f64 {
  let all_words = get_all_words_lc(s1, s2);
  let v1 = get_sentence_vector(s1, &all_words, stop_words);
  let v2 = get_sentence_vector(s2, &all_words, stop_words);
  1.0 - cosine_distance(&v1, &v2)
}

We didn’t talk about the cosine_distance function yet. Don’t worry. It’s pure arithmetics and isn’t too hard to implement it yourself. (See Appendix)

Sentence ranking

Next, it is the most important part of the program: Sentence ranking.

Using the outputs of the sentence_similarity function, we can build a similarity matrix for each sentence pair in a paragraph. This matrix is then fed into a ranking algorithm to provide the score for each sentence, which is the basis of our summarization output.

PageRank in Rust

The PageRank algorithm is popularized by Google. It is used to rank the importance of a web page based on its outgoing links and its incoming links.

We are going to use the same algorithm to rank our sentences. Instead of web pages, we have sentences. Instead of the probability of a user moving from one web page to the other, we use the similarity of one sentence to the other.

Yes, we are taking a huge leap of faith here and assume that the importance of a sentence for summarization correlates to the similarity score of this sentence in relation to other sentences. Interestingly, this seems to wor reasonably well but feel free to customize the algorithm to suit your needs and experiment with it.

There are ready made PageRank algorithm in Python in the networkxlibrary. I don’t know if there is one for Rust. In any case, I wrote my own routine by following Wikipedia. It’s actually quite enjoyable.

///
/// Calculate a sentence rank similar to a page rank.
/// Please refer to [PageRank](https://en.wikipedia.org/wiki/PageRank) for more details.
///
fn calculate_sentence_rank(similarity_matrix: &Array2<f64>) -> Vec<f64> {
  let num_sentence = similarity_matrix.shape()[1];
  let threshold = 0.001;
  // Initialize a vector with the same value 1/number of sentences. Uniformly distributed across
  // all sentences. NOTE: perhaps we can make some sentences more important than the rest?
  let initial_vector: Vec<f64> = vec![1.0 / num_sentence as f64; num_sentence];
  let mut result = Array1::from(initial_vector);
  let mut prev_result = result.clone();
  let damping_factor = 0.85;
  let initial_m = damping_factor * similarity_matrix + (1.0 - damping_factor) / num_sentence as f64;
  loop {
    result = initial_m.dot(&result);
    let delta = &result - &prev_result;
    let mut converged = true;
    for i in 0..delta.len() {
      if delta[i] > threshold {
        converged = false;
        break;
      }
    }
    if converged {
      break;
    }
    prev_result = result.clone();
  }
  result.into_raw_vec()
}

Notice that in the above code, I also used the ndarray library in Rust. It provides important linear algebra model such as 2D Array (Array2) and arithmetics with overloaded operators (e.g. the - and * in the code above are overloaded to work with these new types).

Putting it all together

Finally, we can put everything together. The summarize function below makes use of our code above to create a summary of the input text. The arguments are the input text, the list of stop words, and the number of resulting sentences.

pub fn summarize(text: &str, stop_words: &[&str], num_sentence: usize) -> String {
  ...
  let matrix = build_similarity_matrix(&sentences_and_words, stop_words);
  let ranks = calculate_sentence_rank(&matrix);
  let mut sorted_ranks = ranks.clone();
  sorted_ranks.sort_by(|a, b| b.partial_cmp(a).unwrap());
  let least_rank = sorted_ranks[num_sentence + 1];
  let mut result: Vec<&str> = vec![];
  let mut included_count = 0;
  for i in 0..sentences.len() {
    if ranks[i] >= least_rank {
      included_count = included_count + 1;
      result.push(sentences[i]);
    }
    if included_count == num_sentence {
      break;
    }
  }
  result.join("")
}

Output

Let’s give our summarizer a test, given the following text:

As of Sunday, there were more than 58.2 million reported cases of COVID-19 worldwide, with more than 37.2 million of those cases listed as recovered, according to a COVID-19 tracking tool maintained by Johns Hopkins University. The global death toll stood at more than 1.3 million. In Asia, the daily tally of reported cases in Japan hit a record for the fourth day in a row, with 2,508 people confirmed infected, the Health Ministry said Sunday. A flurry of criticism has erupted, from opposition legislators and the public, slamming the government as having acted too slowly in halting its "GoTo" campaign, which encouraged travel and dining out with discounts. In Europe, French authorities ordered the culling of all minks at a farm after analysis showed a mutated version of the coronavirus was circulating among the animals. The move follows virus developments in mink farms in Denmark and other countries, including the Netherlands, Sweden and Greece. In the Americas, Chile says it will open its main border crossing and principal airport to foreign visitors on Monday after an eight-month pandemic shutdown. Arrivals will have to present evidence of a recent negative test for the novel coronavirus, as well as health insurance. They'll also have to report their whereabouts and health status for a two-week watch period. Those coming from high-risk countries will have to quarantine for 14 days. In Africa, Sudan's minister of cabinet affairs on Sunday tested positive for the coronavirus, the prime minister's office said, the latest in a string of senior officials to be infected as the country shows an increase of confirmed cases of COVID-19. Over the past month, acting ministers of finance and health, the central bank governor and two associates to Prime Minister Abdalla Hamdok have tested positive.

The output with 5 sentences are:

The global death toll stood at more than 1.3 million. A flurry of criticism has erupted, from opposition legislators and the public, slamming the government as having acted too slowly in halting its "GoTo" campaign, which encouraged travel and dining out with discounts. In Europe, French authorities ordered the culling of all minks at a farm after analysis showed a mutated version of the coronavirus was circulating among the animals. The move follows virus developments in mink farms in Denmark and other countries, including the Netherlands, Sweden and Greece. In the Americas, Chile says it will open its main border crossing and principal airport to foreign visitors on Monday after an eight-month pandemic shutdown.

Although there are obvious shortcomings in the output, the resulting text looks quite promising.

Conclusion

We talked about text summarization and we walked through an extraction based text summarizer written in Rust. This article neither goes deep into text summarization nor programming in Rust. It is my hope that you will do your own research into these topics if they pique your interest.

Appendix

Wait a minute… You want to see the program in its entirety? Sure, here you go:

use unicode_segmentation::UnicodeSegmentation;
use std::collections::BTreeSet;
use ndarray::{Array1, Array2};

pub fn summarize(text: &str, stop_words: &[&str], num_sentence: usize) -> String {
  let sentences = text.unicode_sentences().collect::<Vec<&str>>();
  if num_sentence >= sentences.len() {
    return text.to_string();
  }
  let mut sentences_and_words = vec![];
  sentences.iter().for_each(|&sentence| {
    let words = split_into_words(sentence);
    sentences_and_words.push(words);
  });
  let matrix = build_similarity_matrix(&sentences_and_words, stop_words);
  let ranks = calculate_sentence_rank(&matrix);
  let mut sorted_ranks = ranks.clone();
  sorted_ranks.sort_by(|a, b| b.partial_cmp(a).unwrap());
  let least_rank = sorted_ranks[num_sentence + 1];
  let mut result: Vec<&str> = vec![];
  let mut included_count = 0;
  for i in 0..sentences.len() {
    if ranks[i] >= least_rank {
      included_count = included_count + 1;
      result.push(sentences[i]);
    }
    if included_count == num_sentence {
      break;
    }
  }
  result.join("")
}

fn get_all_words_lc<'a>(sentence1: &[&'a str], sentence2: &[&'a str]) -> BTreeSet<String> {
  let mut all_words: BTreeSet<String> = BTreeSet::new();

  sentence1.iter().for_each(|w| {
    all_words.insert(w.to_lowercase());
  });

  sentence2.iter().for_each(|w| {
    all_words.insert(w.to_lowercase());
  });
  return all_words;
}

///
/// Retrieve a sentence vector based on the frequency of words that appears in the all_words_lc set.
/// all_words_lc should be a sorted set of lower cased words
/// The size of the resulting vector is the same as the all_words_lc set
/// stop_words are skipped
///
fn get_sentence_vector(sentence: &[&str], all_words_lc: &BTreeSet<String>, stop_words: &[&str]) -> Vec<usize> {
  let mut vector: Vec<usize> = vec![0; all_words_lc.len()];
  for word in sentence {
    let word_lc = word.to_lowercase();
    if !stop_words.contains(&word_lc.as_str()) {
      let index = all_words_lc.iter().position(|x| x.eq(&word_lc)).unwrap();
      vector[index] += 1;
    }
  }
  return vector;
}

///
/// Calculates the cosine distance between two vectors
/// Refer to [YouTube](https://www.youtube.com/watch?v=3X0wLRwU_Ws)
///
fn cosine_distance(vec1: &Vec<usize>, vec2: &Vec<usize>) -> f64 {
  let dot_product = dot_product(vec1, vec2);
  let root_sum_square1 = root_sum_square(vec1);
  let root_sum_square2 = root_sum_square(vec2);
  return dot_product as f64 / (root_sum_square1 * root_sum_square2);
}

fn root_sum_square(vec: &Vec<usize>) -> f64 {
  let mut sum_square = 0;
  for i in 0..vec.len() {
    sum_square += vec[i] * vec[i];
  }
  (sum_square as f64).sqrt()
}

fn dot_product(vec1: &Vec<usize>, vec2: &Vec<usize>) -> usize {
  let delta = vec1.len() - vec2.len();
  let shortest_vec = match delta {
    d if d < 0 => vec1,
    d if d > 0 => vec2,
    _ => vec1
  };
  let mut dot_product = 0;
  for i in 0..shortest_vec.len() {
    dot_product += vec1[i] * vec2[i];
  }
  dot_product
}

fn sentence_similarity(s1: &[&str], s2: &[&str], stop_words: &[&str]) -> f64 {
  let all_words = get_all_words_lc(s1, s2);
  let v1 = get_sentence_vector(s1, &all_words, stop_words);
  let v2 = get_sentence_vector(s2, &all_words, stop_words);
  1.0 - cosine_distance(&v1, &v2)
}

///
/// Calculate a similarity matrix for the given sentences.
/// Returns a 2-D array M_i,j such that for all 'j', sum(i, M_i,j) = 1
/// We take a leap of faith here and assume that cosine similarity is similar to the probability
/// that a sentence is important for summarization
///
fn build_similarity_matrix(sentences: &Vec<Vec<&str>>, stop_words: &[&str]) -> Array2<f64> {
  let len = sentences.len();
  let mut matrix = Array2::<f64>::zeros((len, len));
  let mut sum_column: Vec<f64> = vec![0.0; len];
  for i in 0..len {
    for j in 0..len {
      if i == j {
        continue;
      }
      matrix[[i, j]] = sentence_similarity(sentences[i].as_slice(), sentences[j].as_slice(), stop_words);
    }
  }
  // at this point we have the cosine similarity of each sentence.
  // take a leap of faith and assume that the cosine similarity is the probability that a sentence
  // is important for summarization.
  // We do this by normalizing the matrix along the column. The column values should add up to 1.
  for j in 0..len {
    let mut sum: f64 = 0.0;
    for i in 0..len {
      if i == j {
        continue;
      }
      sum += matrix[[i, j]];
    }
    sum_column[j] = sum;
  }
  for i in 0..len {
    for j in 0..len {
      if i == j {
        continue;
      }
      matrix[[i, j]] = matrix[[i, j]] / sum_column[j];
    }
  }
  matrix
}

///
/// Calculate a sentence rank similar to a page rank.
/// Please refer to [PageRank](https://en.wikipedia.org/wiki/PageRank) for more details.
///
fn calculate_sentence_rank(similarity_matrix: &Array2<f64>) -> Vec<f64> {
  let num_sentence = similarity_matrix.shape()[1];
  let threshold = 0.001;
  // Initialize a vector with the same value 1/number of sentences. Uniformly distributed across
  // all sentences. NOTE: perhaps we can make some sentences more important than the rest?
  let initial_vector: Vec<f64> = vec![1.0 / num_sentence as f64; num_sentence];
  let mut result = Array1::from(initial_vector);
  let mut prev_result = result.clone();
  let damping_factor = 0.85;
  let initial_m = damping_factor * similarity_matrix + (1.0 - damping_factor) / num_sentence as f64;
  loop {
    result = initial_m.dot(&result);
    let delta = &result - &prev_result;
    let mut converged = true;
    for i in 0..delta.len() {
      if delta[i] > threshold {
        converged = false;
        break;
      }
    }
    if converged {
      break;
    }
    prev_result = result.clone();
  }
  result.into_raw_vec()
}

fn split_into_words(sentence: &str) -> Vec<&str> {
  let mut result = vec![];
  let words = sentence.unicode_words();
  for word in words {
    result.push(word);
  }
  result
}

My experience with Flutter on Web

Charles Chan — Sun, 25 Oct 2020 04:00:00 GMT

Original article was published on Medium.

Have you heard of Flutter? The next Big Thing? Or the UI framework that’s built for the Fuchsia OS, the OS that is going to replace Android? I was skeptical because of all these marketing push. However, I was recently persuaded to give Flutter a second look and I was glad I did just that.

This article briefly talked about my experience in building my first Flutter application on the Web. Why web? Because it’s still the easiest to deploy and the most relevant to my professional work. Notice that Flutter on the Web is still in Beta. So, whatever limitation is expressed here may be resolved in the near future.

Ease of Development

One of Flutter’s primary goals is to make cross platform application development super easy. I would say with some caveats that they have achieved that. Here are a list of thing I like.

Dart

The Dart programming language is easy to pick up. A good language needs to balance among simplicity, functionality, and consistency. Personally, I find that Dart strikes a good balance for a general purpose application. To name a few interesting features:

Top level functions. Dart allows you to create top level functions without enclosing them to an artificial class if you don’t want to.
constantness. Dart has the concept of constants that works very much like C++.
Extension. Dart allows developers to extend classes, even built in ones. A good example is this crazy easy to use I18N extension.
null aware operators. These operators allow you to navigate object structure without all the if statements. Read this articles for details.

Flutter Framework

The Flutter Framework is mostly consistent across the board. You only need to grasp a few key concepts to become productive.

The most important concept is that a Flutter application comprises of a single widget hierarchy. If you want to add some functionality to a widget, say, gesture detection, enclose it with a GestureDetector widget. Want to draw inside your container? Enclose it with a CustomPaint widget. Once you have acquired the basic knowledge of the framework, it is easy to extrapolate.

Stateful vs Stateless

Another concept is the difference between a Stateless widget and a Stateful widget. If you have done React programming, you will feel right at home. A Stateless widget has no state and never changes. On the other hand, a Stateful widget manages state by way of the setState function. Every time the setState function is invoked, the UI is scheduled to be updated in the next frame. Flutter is smart enough to only refresh the dirty widgets.

Layout

Perhaps the hardest part of all UI development is the layout. In Flutter’s case, you need to learn a few special widgets (there are more but these are the essentials):

Container. A container is like a div . It provides padding, constraints, and even transformation to its child.
Row and Column. Row aligns its children along the horizontal axis (its main axis) while column aligns its children along the vertical axis (again, that’s its main axis). The key attributes are mainAxisAlignmentand crossAxisAlignment .
SizedBox. Inside a Row widget or a Column widget, you can use a SizedBox widget to control the spacing between two widgets.
Expanded. The Expanded widget tells the layout engine that its child should take up any remaining space inside a Row or a Column.
GridView. The GridView widget shows its children in a grid.
ListView. A simple widget to display items in a list.

IDE

This has been raved many times. Flutter’s integration with an IDE is simply amazing. Compilation is fast in general. With hot reload, changes are reflected immediately on the screen.

But what about the Web?

Here’s where my caveat lies. Although there is a whole lot of differences in the language and the framework when it comes to cross platform development, there are runtime differences on the web that should not be overlooked.

With a Flutter application, the framework controls every single pixel on the screen. All controls are drawn careful to mimic the native ones and behaviors are recreated as close to the native behavior as possible. In a web application, Flutter makes use of an HTML5 canvas to do just that.

Web Support for Flutter (https://flutter.dev/web)

Unfortunately, using a canvas as your application’s backdrop means that you lose a few standard web behaviors, e.g.:

text selection across widgets does not works.
Embedding a Flutter application inside an existing web application is hard. The only supported mechanism is through an iframe .

I am afraid that these two differences are going to turn away many potential web developers.

The first one means that you cannot create a regular web application using Flutter. Web users come to expect that text selection should work across widgets on any web page. Since Fluter Web is still in Beta, it is possible this might get fixed in the near future.

The second one means that you cannot easily replace part of your existing web application with Flutter. Imagine the headaches of having these iframe around and trying to make them talk to each other.

Conclusions

I have to admit that Flutter/Dart provides one of the most enjoyableprogramming experience I have had in years. The Dart language is easy to use and the Flutter framework is consistent and well thought through.

My caveat is that on the web, it does not produce a traditional web application and that can throw your users off.

I hope this article will encourage you to give Flutter a try. My advice for any new Flutter developer are:

Don’t be afraid to try. Dart borrows features from different languages. You will easily find familiarity with it.
Flutter documentation is well written, make good use of it.
From time to time, check out the Boring Flutter Development Show on YouTube. I have learnt a lot from it. As with anything, you learn more when you also practice at the same time.

Night Vision Camera for Raspberry Pi

Charles Chan — Thu, 08 Oct 2020 04:00:00 GMT

Original article was published on Medium.

Have you ever wonder what goes on in your backyard when you are sound asleep? This simple setup will reveal everything to you.

Maybe it’s just out of curiosity. Maybe you are a nature lover and don’t want to miss out on the nocturnals. Whatever it is, I am glad to tell you that it’s inexpensive to set up a night vision camera and discover everything you are dying to know.

For this project, you will need:

A working Raspberry Pi. I use a Raspberry Pi 4 but I believe anything other than Pi Zero would do. Pi Zero needs a different kind of cable. Cost: about USD 35
A night vision camera from AliExpress. I use one that has IR-CUT, which automatically switches the camera from day mode to night mode. It would take better pictures if you are keeping the camera on during the day. Another good thing about this camera is that you can adjust its focus manually. Cost: about USD 10

Raspberry Pi setup

There are many instructions online. You should follow the one that is specific to your model: https://projects.raspberrypi.org/en/projects/raspberry-pi-setting-up

NOTE: Enable Wifi for extra convenience. However, straightly speaking, it is not required for this setup.

Camera Setup

Make sure your Raspberry Pi is turned off.

Plug in your camera cable like below. Gently lift the black plastic part with your finger and your thumb. The black plastic won’t separate but the opening will allow you to insert the camera cable. The cable’s silver (unprinted) side should be facing the USB ports (see below). After the cable is inserted, gently push the black plastic back into the socket and make sure it’s secured.

Camera cable. Notice how the silver side of the cable is facing towards the USB ports.

Once the camera cable is in, you can turn on the Raspberry Pi again.

SSH into your Raspberry Pi and enable the Camera module. This can be done under the Interface Options menu within the raspi-configcommand. Once the camera is enabled, restart your Raspberry Pi to take effect.

Testing the Camera

Before we proceed, it’s best to test the camera to make sure it can take still images. SSH into your Raspberry Pi and execute the following command:

$ raspistill -v -o test.jpg

This will take a still image using the camera. You can use scp or any other means to view the picture. The test image should be exactly like a picture you’d have taken with a digital camera.

Pause for a moment here. Think about how amazing this is. With less than USD 50 and a few commands, you have created an intelligent camera.

Time lapse/Motion Capture

Of course we won’t stop here. Now that you can take still images, it’s not hard to extend that to take time lapse images. In fact, a simple shell script would probably suffice. However, instead of writing scripts ourselves, it’s better to stand on the shoulders of giants. In this case, the giant I am referring to is Pi-Timolo.

The Pi-Timolo project is a wonderful collection of python scripts that work together to bring you time lapsed images, motion capture, and low light capabilities. Installation is straightforward. Once you have it installed, it pretty much works out of the box.

The two main scripts you will need to run are:

pi-timolo.sh
webserver.sh

The first one starts the main program and begins time lapse captures, and motion captures. The second one starts a web server (port 8080 by default) which allows you to view the captured images without using scp or ftp .

For convenience, I put these two commands into my /etc/rc.local script as below (before the exit 0 line. This way, my camera and my web server are started automatically every time my Raspberry Pi reboots.

/home/pi/pi-timolo/pi-timolo.sh start
/home/pi/pi-timolo/webserver.sh start

Pi-Timolo offers a lot more functionalities than what I have shown here, e.g. turning the captured images into a video, cloning the images into Google Drive, etc. I will leave those for you to discover.

Enclosure

After we have setup the hardware and the software, we need to prepare for the outdoor elements.

There are water resistance case for Raspberry Pi and its camera module. However, I find that an old tupperware works just fine. All you need to do is to cut a small hole for the camera and drill some ventilation holes on the side. Make sure you monitor your Pi temperature to ensure you have enough ventilation. The CPU temperature should not exceed 85C.

My Pi Camera resting inside a tupperware container. Obviously, I will put a lid on when it’s outside.

Finding the perfect spot for your camera is always tough. So, before I find a permanent location for my camera, I rest my tupperware on top of a tripod so that I can make adjustment easily. Drill a small hole in a piece of unused wood would let you create a small platform for your camera.

Using a tripod is very versatile when you are still finding the perfect location for your Pi Camera

Conclusion

I have done Kubernetes and Docker on Pi, Spotify on Pi, and now a Night Vision Camera on Pi. I have to say that this is the most fun project I have done with my Pi so far and it’s exciting to see how easy it is to setup and all the possibilities it offers.

I cannot wait to see what it will capture tonight!

Node + Java Docker Image for Raspberry Pi

Charles Chan — Mon, 23 Mar 2020 04:00:00 GMT

A simple Docker image for a Node + Java environment.

Original article was published on Medium.

As I continue to build on my Raspberry Pi Infrastructure, I need to create a Docker image that contains both Java and Node. Unfortunately, no such thing exist as an official image and I don’t like using images created by others due to security concern. So, I will have to do it myself and you can do it too.

Dockerfile

The first step to create a docker image is to have a Dockerfile . The content of the file tells docker what to do. Here is what it looks like:

FROM arm32v7/node:10.19.0
RUN apt update && apt install -y openjdk-8-jdk

This Dockerfile tells docker that our image should be based on arm32v7/node:10.19.0 . On top of that, we use apt to install JDK 8. There is nothing fancy except that you’d need to find the right architecture for your base. Although Raspberry Pi has a 64 bit CPU, the OS (Raspbian) is a 32 bit ARM OS.

Private Registry

To build and push this docker image to my private repo, I need to use a handful of commands. To reduce the amount of repeated typing and errors, I use a Makefile to help me:

REPOSITORY=node-10-jdk-8
TAG=latest
REGISTRY=server.local:5000

default:
    -docker rmi $(REPOSITORY):$(TAG)
    -docker rmi $(REGISTRY)/$(REPOSITORY):$(TAG)
    docker build . -t $(REPOSITORY):$(TAG)
    docker tag $(REPOSITORY):$(TAG) $(REGISTRY)/$(REPOSITORY):$(TAG)    
    docker push $(REGISTRY)/$(REPOSITORY):$(TAG)

Notice that each line under the default: label must begin with a TAB.

Running the above Makefile would create the image and publish it to your private repo server.local:5000 .

If you are building this image on your Mac and your private repository is non-TLS, you will need to create a file ~/.docker/daemon.json with the following content:

{
  "insecure-registries": ["server.local:5000"]
}

Done!

That is it. Now, in your Jenkinsfile , you can use the image in your agentline:

agent {
  docker {
    image 'server.local:5000/node-10-jdk-8:latest'
    ...
  }
}

Summary

We have demonstrated how to create a Docker image for a Node + Java environment for Raspberry Pi. The Docker image is pushed to your private repository and can be pulled into your Jenkins build when needed.

Happy Programming. Enjoy!

A Raspberry Pi cluster with Docker, Kubernetes, and Jenkins

Charles Chan — Sat, 14 Mar 2020 04:00:00 GMT

Original article was published on Medium.

This winter, I have decided to try building a Pi’s cluster to test a small but meaningful installation of Kubernetes and Jenkins pipeline. There are many tutorials online that helped me along the way. However, as any developer will attest, none of the tutorials are a perfect fit for your setup. It took me a couple of days but at the end, I am quite happy with my setup. I hope this article will help you achieve your Pi cluster setup as well.

My setup consist of two Raspberry Pi 4 (4G version). One of them is the server node and the other is the worker node. The server node will serve as the Kubernetes master node. It will also run a private Docker Registry to publish my own images. These images will be built using Jenkins on the same server. The worker node will only run the Kubernetes worker node.

Common Setup (Server node and Worker node)

Base Installation

First of all, you need to download Raspbian Lite and write to your SD card. Before you boot up your Pi, follow the Setting up a Raspberry Pi headlessinstructions to setup WiFi and SSH. NOTE: Do NOT use a generated PSK. It didn’t work for me.

With WiFi and SSH setup, you can boot up your Pi with the prepared SD card. Once it’s up, you can ssh pi@raspberrypi.localwith password raspberry .

This works because avahi-daemon broadcasts the hostname using mDNS. If avahi-daemon isn’t installed, install it with apt. If you’re host machine is Windows, you may need to install Bonjour Print Service first.

If your Pi have multiple network interfaces (e.g. WiFi and Ethernet), it is better to enforce avahi-daemon to use one specific interface. This can be done by changing the allow-interfaces setting in /etc/avahi/avahi-daemon.conf.

Once you’re in, run raspi-config and setup the followings:

Expand file system under Advanced Options.
Change hostname under Network Options. I call my server node server and worker node agent . Their hostname in mDNS are server.local and agent.local respectively.
Set the GPU memory split to 16mb under Advanced Options.

Since we are going to use the Pi for Kubernetes, let’s enable container features in the kernel by editing /boot/cmdline.txt and adding the following to the end of the line:

cgroup_enable=cpuset cgroup_memory=1 cgroup_enable=memory

With Kubernetes, it is important to have a static IP for all of your nodes. You can either do it inside your Pi or reserve the IP on your DHCP server (usually your router). I use the router approach.

The basic idea is to use the command ip address to discover the MAC Address of the Pi and use it to reserve IPs on my router. In my case, the server node is 192.168.0.12 and the worker node is 192.168.0.13 .

We are almost done with the base setup. But before we proceed, let’s upgrade our installed packages.

$ sudo apt update
$ sudo apt upgrade

Now, your initial Pi setup is done. Let’s reboot it!

Save yourself some trouble in the future and enable passwordless login. On your host machine, run ssh-copy-id pi@<server node>.local and enter the password raspberry (if you haven’t changed it yet). Do the same for your worker node.

Now you can ssh into your Pi from this host without entering a password. While you’re in your Pi, change your password too.

Docker

Both nodes need to have Docker installed. Use the following command to install Docker and add user pi into the docker group.

$ curl -sSL https://get.docker.com | sh
$ sudo usermod -aG docker pi

The docker script will install a service which alters the firewall rules in your system. Specifically, it will enable ip_forward and change your default FORWARD policy from ACCEPT to DROP . Please refer to the Docker and iptables article for more information.

The FORWARD policy change will cause issues with the coredns pod in the future. So, let’s modify it. Debian Buster uses nftables (instead of iptables) by default but the Buster Lite distribution does not come with it. So we need to install it with apt . While we are at it, let’s install dnsutils as well so that we can test our changes later.

$ sudo apt install nftables dnsutils

Next, we create a new file /lib/systemd/system/nftables-forward-accept.service with the following content:

[Unit]
Description=Change FORWARD policty to ACCEPTAfter=docker.service

[Service]Type=simple
ExecStart=/usr/sbin/nft insert rule ip filter DOCKER-USER counter accept

[Install]WantedBy=multi-user.target

This service will be invoked after the docker.servie and it will add a new firewall rule. Once the file is in place, we need to enable it during bootup.$ sudo systemctl enable nftables-forward-accept.service

Reboot and verify that the DOCKER-USER has the following rules by running sudo nft list table filter

:chain DOCKER-USER {
  counter packets 224 bytes 94572 accept
  counter packets 0 bytes 0 return
}

Once this is verified, you can proceed.

Docker & Kubernetes (Server Node)

The following instructions apply to the server node only. Worker node specific instructions will follow this section.

Docker Registry

We will be using Jenkins to build and publish images to our private Docker registry. Since we are in closed door, we don’t need TLS or authentication. Create/modify the file /etc/docker/daemon.json and add the following inside:

{
  "insecure-registries" : ["server.local:5000"]
}

Now, we are ready to spin up the registry:

$ docker run -d -p 5000:5000 --restart always --name registry registry:2

Kubernetes (k3s)

K3S is a lightweight Kubernetes installation. It is especially suitable for Raspberry Pi. Install it with:

$ export K3S_KUBECONFIG_MODE="644"
$ curl -sfL https://get.k3s.io | sh -

Verify DNS

Remember we changed some firewall rules settings before? Now it is the time to test that the coredns pod can resolve external IP addresses. Execute the following command:

$ nslookup quay.io 10.43.0.10

(NOTE: 10.43.0.10 is the IP address of the coredns pod. You can use kubectl get svc -n kube-system to verify it.)

If the above command completes successfully, you are good to go.

Now, we are ready to install the worker node. Before we leave the server node, obtain its installation token first. We will need it when we install the worker node.

$ sudo cat /var/lib/rancher/k3s/server/node-token

Docker & Kubernetes (Worker Node)

Kubernetes (k3s)

We will need to install k3s on the worker node as well. The two special environment variables below will change the installation mode to worker mode.

$ export K3S_KUBECONFIG_MODE="644"
$ export K3S_URL="https://192.168.0.12:6443"
$ export K3S_TOKEN="XXXX” # the token you saved before.
$ curl -sfL https://get.k3s.io | sh -

Remember that my server node’s IP is 192.168.0.12

Docker Registry

Our docker registry is installed on the server node. However, we need to tell the agent where it is so that it can pull the image properly. Create a /etc/rancher/k3s/registries.yaml file with the following content:

mirrors:
  "server.local:5000":
    endpoint:
      - "http://192.168.0.12:5000"

What it means is that for the registry server.local:5000 , resolve it to http://192.168.0.12:5000 . We have to use the IP address here because K3S resolves name using the nameserver defined in /etc/resolv.conf instead of mDNS.

Now, you have the basic Docker and Kubernetes setup with your two Pi nodes. You should be able to push an image to your private Docker and create Kubernetes deployment using that image. The prefix of your image should be server.local:5000 (or whatever you decide to be in your registries.yaml)

Jenkins Installation (Server Node)

Once you have Docker and Kubernetes setup, you probably want to setup a Jenkins pipeline to build something meaningful. To install Jenkins, you first need a JDK. The following will install JDK 11 as of today:

$ sudo apt install default-jdk

Then, you can add Jenkins’s key to your package keys:

$ wget -q -O - https://pkg.jenkins.io/debian/jenkins.io.key | sudo apt-key add -

Once the key is installed, modify/create the file /etc/apt/sources.list.d/jenkins.list and add the following line:

deb https://pkg.jenkins.io/debian binary/

Finally, we can install Jenkins:

$ sudo apt update
$ sudo apt install jenkins

Once this is done, open http://server.local:8080 on your browser window and follow the onscreen instructions to continue with the installation.

It is also a good idea to change the Jenkins port to something less frequently use. You can do that by editing /etc/default/jenkins and modify the HTTP_PORT variable to the new port.

Finally, since Jenkins will be used to run docker, we need to add the jenkins user to the docker group.

$ sudo usermod -aG docker jenkins

Restart Jenkins to activate these changes:$ sudo service jenkins restart

Summary

These are the steps you’d need to get a cluster and a functional development pipeline going at home. Obviously, some of these steps can be automated through Ansible. However, by showing the steps in details, I hope you will find it useful when you setup your own cluster and development pipeline.

Projects

Charles Chan — Tue, 01 Jan 2019 15:31:00 GMT

As a consultant, continuous learning is extremely to me. Over the years, I have exposed myself to different technologies and document the experience. I have also published a couple of tools for the public to use.

Technical Articles

Writing helps you clarify your ideas and consolidate your knowledge. You can find a list of technical articles I wrote on this website as well as on Medium.

Web Application Development

Quacker Tools

Quacker Tools provides JSON Formatter, UUID Generator, and SQL Formatter for developers and non-developers alike, all without compromising their data privacy. Quacker Tool is also available in both English and Chinese.

Mobile Application Development

Kingsley Brush

Please note that this product has been discontinued. Thank you all for your support to give this application a solid 4 star rating.

Kingsley Brush is not just another drawing program on iPad. It provides functionalities such as customizable brushes and color pickers that are usually found in paid applications. With these tools, you can create professionally-looking paintings. It also provides numerous stamps to help you spice up your paintings and have fun from time to time. Email the finished painting to your loved ones.