Android: Opening URIs using Intent - protocol case matters

Protocols need to be lower case in URIs used in Android, so why doesn' Android make sure they are?!

Android: Opening URIs using Intent - protocol case matters

I had a problem recently when trying to open a web page from inside an Android application.  The problem was one of case, something that really should be a solved problem in my view.  In the end I only stumbled on the anwer by accident so I'm hoping this blog post helps others in future.

Roughly speaking, each different screen in an Android application is a different activity and you switch between activities with intents.  You can remember this by thinking that you intend to move from one activity to another.  When you want to open a website by tapping a link or a button in your app you also use an intent, targeting the URI class and passing across the URI you want to open.  URI stands for Uniform Resource Indicator and they look like URLs (Uniform Resource Locators).  In this post I'll share how the case (e.g. UPPER CASE and lower case) matters for what you pass to Uri.parse.

Code sample 1

public void onClick(View view) {
	Intent websiteIntent = new Intent(Intent.ACTION_VIEW, Uri.parse(item.getResourceValue()));
	activity.startActivity(websiteIntent);
}

In this first code block I'm taking the value of item.getResourceValue() and passing that to Uri.parse in order to open the web page.  The value came from an Application Programnming Interface (API) call to a central server, and was provided by an end user.  This is an important detail - because the data is provided by the end user we can't trust it.  First we can't trust the data to be right, secondly we can't trust the value to be safe.

In this case, the code being run said this:

public void onClick(View view) {
	Intent websiteIntent = new Intent(Intent.ACTION_VIEW, Uri.parse("Https://Africanpastors.org/"));
	activity.startActivity(websiteIntent);
}

On running the above, accessing "Https://Africanpastors.org/", the app crashes and there's a stack trace provided in Android Studio.

Stack trace

Stack traces show where an application stopped running and methods / classes were called by what at the point of the crash.  It was the stack trace that showed the capital "H" of "Https://" and there's an excerpt of the stack trace below.

Process: org.africanpastors.e_vitabu, PID: 13555
    android.content.ActivityNotFoundException: No Activity found to handle Intent { act=android.intent.action.VIEW dat=Https://Africanpastors.org/... }
        at android.app.Instrumentation.checkStartActivityResult(Instrumentation.java:2067)
        etc...

On the off-chance that the capital H was the problem I manually provided the URI in lower case - the page opened.

The solution

Annoyingly RFC 3986 declares that protocols (schemes) should be case-insensitive:

Although schemes are case-insensitive, the canonical form is lowercase and documents that specify schemes must do so with lowercase letters. An implementation should accept uppercase letters as equivalent to lowercase in scheme names (e.g., allow "HTTP" as well as "http") for the sake of robustness but should only produce lowercase scheme names for consistency.

That said, Android clearly disagrees otherwise I wouldn't have this problem using my code.  To ensure launching the web page works every time, regardless of provided data, we're going to replace Http with http if it exists at the beginning of the URL.

Intent websiteIntent = new Intent(Intent.ACTION_VIEW, Uri.parse(item.getResourceValue().replaceFirst("^Http","http")));
                    activity.startActivity(websiteIntent);

By using replaceFirst we can use a Regular Expression [1] (regex) to find the value we're looking for.  The caret (^) at the beginning of the first argument is regex for "the following characters must be at the very beginning of the string".

A better solution

The above solution only helps if the provided protocol is Http so doesn't cover Https, hTTp, HTTP or any other combination.  A better s0lution is to find the protocol in the provided URL and simply lowercase it.  As this code is going to be used in a few places I pulled it out into its own method:

/**
* Take provided URL string and ensure the protocol is all lower cased
* @param website URL
* @return String
*/
public static String lowercasedProtocolWebsite(String website){
   String cleanWebsite = website;
   // Get protocol (before ://):
   int indexEnd = website.indexOf("://");
   if(indexEnd != -1){
       String protocol = website.substring(0,indexEnd);
       // Replace the protocol with its lowercase version by performing a case insensitive search
       cleanWebsite = website.replaceFirst("^(?i)"+ protocol,protocol.toLowerCase());
   }
   return cleanWebsite;
}

Using the indexOf method we search the provided string for :// to find the position where :// starts, stored as indexEnd.  In the example below, :// starts at position 5:

1 2 3 4 5 6 7 8 9 
h t t p : / /

Next we split the provided URL (stored in a variable called website) at exactly indexEnd (5 in our example) charaters from the beginning of the string (0) using the substring method.  Essentially I'm saying "take the first indexOf characters and store them in the variable called protocol.

Finally I perform a case insensitive search using replaceFirst to replace the protocol found at the beginning of the website string with it's lower-cased version via the toLowerCase method.  Arguably I don't need to perform a case insensitive search here, given I know exactly what text I'm looking for so I might go and fix that later.

Why not lowercase everything?

Edit: I had some queries about why the whole string couldn't be lowercased - the answer is that some systems care!  For the domain name itself, for example blog.jonsdocs.org.uk, case is not important as DNS doesn't care.  Everything after the domain, following the /, could have a problem if you change its case.  It's easier to show this with an example.

Conclusion

I've had a few things crop up during Android development recently where the problems seem silly - these should have been solved already.  The problem I describe here, for example, could have already been handled in Android's Uri.parse method.  Instead developer's are forced to handle this themselves.

By pulling the code out into its own method (I called it lowercasedProtocolWebsite but that'll probably change) I'm able to re-use the code multiple times in my app.  This is significantly better than maintaining duplicate copies of the same method.


Banner image:

[1] A regular expression is a way of describing a piece of text formulaically, a bit like empirical formulae in chemistry.  More information on regular expressions can be found on Wikipedia.