Android Basics and Design Patterns

Thursday, 15 September 2016

HOW SSL WORKS AND CERTIFICATE/SSL PINNING TO PREVENT MAN-IN-THE-MIDDLE (MITM) ATTACK - PART 1

SSL (Secure Sockets Layer) is the standard security technology for establishing an encrypted link between a web server and a browser. This link ensures that all data passed between the web server and browsers remain private and integral.
In layman terms sender encrypts the message with a specific key and receiver decrypts that message using an identical key.

How does it create a secure connection ?

When browser and client interacts, they establish a connection using SSL Handshake, which is agnostic of user and will automatically takes place.
Essentially three keys are being used up public, private and session keys. Since, processing through public and private keys takes a lot of computation it's hard to do it every single time, so once SSL Handshake is over they use session keys to identify the same.

Understanding the flow:

Browser connects to a web server (website) secured with SSL (https). Browser requests that the server identify itself.
Server sends a copy of its SSL X509 Certificate, including the server’s public key.
Browser checks the certificate root against a list of trusted CAs and that the certificate is unexpired, unrevoked, and that its common name is valid for the website that it is connecting to. If the browser trusts the certificate, it creates, encrypts, and sends back a symmetric session key using the server’s public key.
Server decrypts the symmetric session key using its private key and sends back an acknowledgement encrypted with the session key to start the encrypted session.
Server and Browser now encrypt all transmitted data with the session key.

Difficult huh too many unrelated terms now needs to understand how encryption really works.

Encryption algorithms:

1. Diffie Hellman key exchange: When two parties exchange their data they come up with common one way function; with the help of it they can generate a shared secret to be used along in further communication. The strength of this function is time needed to reverse it and it works based on symmetric keys.
So how the one way function should look like :-
3^x mod 17 = resultant equally likely to be distributed between 0 - 16.
3- Generator (when raised to any exponent solution distributes uniformly), X- exponent or private key and 17 - Prime modulus (Have to be very large prime number say 2048 bits so decryption becomes fairly impossible)

Bob has his own private key say 25 so he calculates his public key as 3^25 mod 17 = 14, and sends 12 to alice and eve.
Alice has her own set of private key say 13 and calculates her public key as 3^13 mod 17 = 12 and sends to bob and eve.
Now the complex part comes where bob uses alice's public key (and alice uses bob's public key) with his private number to obtain a shared secret key which is 5. For eg: 12^25 mod 17 = 5 and 14 ^ 13 mod 17 = 5.
5 is common shared secret which eve cannot contemplate until unless he has one of their private keys. They can communicate without even worried about MITM attacks given the prime modulus has to be large enough.

So how we come up with common magic number 5.
For bob calculation goes like this performing both operations together 3 ^ 13 mod 17 = 12 i.e 3 ^ 13 ^ 25 mod 17 = 5, where for alice the same calculation is 3^25 mod 17 = 14 i.e 3 ^ 25 ^ 13 mod 17 = 5.
Safe and secure but what is the problem here, a sender has to manage 1000 distinct secret keys to communicate with 1000 different parties and send multiple messages just to establish them.

Public Key Cryptography: RSA Encryption Algorithm

RSA Encryption:- Non secret encryption. (Encryption keys and Decryption keys)

The idea is based on splitting it in two parts : Encryption and Decryption key.

So they need a special kind of one way function, it's difficult to reverse unless you have the trapdoor function.

m^e mod N = c which is easy to perform, however reversing c using e and N to find out which m is used is really difficult to perform.
So we need another function c ^ d mod N = m (to get an original message).
So both operations together is same as m^(e*d) mod N = m.

We come with e and d as encryption and decryption keys, and now we need a second one way function for generating d. Euclid algorithm for prime factorisation shown us that it's a complicated problem to find out prime factors, which gives its way for one way trapdoor function.

N = (Prime) P1 * (Prime) P2 ( if one prime is 200 digits long it takes multiple years to come to a solution) and the calculation is easier to perform.

Euler's totient function counts the positive integers up to a given integer n that are relatively prime to n ϕ(n), but it has a property with prime numbers.

ϕ(P1) = P1-1. and it has multiplicative properties also. P1 : Prime number

ϕ(N) = ϕ(P1) * ϕ(P2) = (P1 -1 ) * (P2-1);

Euler theorem states that if m and n are coprime positive integers, the m ^ ϕ(N) = 1 mod N. Now we can use the property where 1 ^ K = 1, so now m ^ (K* ϕ(N) )= 1 mod N. Multiply m to both sides results in m * m ^ (K* ϕ(N) )= m * 1 mod N.

So the final equation looks like m (K* ϕ(N) + 1) = m mod N and from above m^(e*d) = m mod N.

Bob has the server with p1 = 53, p2 = 59, N = 53 * 59 = 3127, ϕ(N) = ϕ(P1) * ϕ(P2) = 3016 and exponent e = 3 (Odd coprime number with ϕ(N)) and d = 2011.
Now he will hide everything and sends N and e (3127 and 3)to Alice, and she will encrypt her message with. 89 ^ 3 mod 3127 = 1394 (which is encrypted).
Bob and eve will receive this encrypted message i.e. 1394, but only bob could decipher it with his private key d. 1394 ^ 2011 mod 3127 = 89.

So now we know the concept of encryption and decryption keys and all these information glued together will give you the concept of ssl certificates.

What is an SSL Certificate ??

SSL Certificates are small data files that digitally bind a cryptographic key (namely private and public key) to an organization’s details. When installed on a web server, it activates the padlock and the https protocol and allows secure connections from a web server to a browser.

It can be self signed but browser checks list of trusted ca's before allowing communication. When choosing the right SSL provider, consider the fact that users’ web browsers normally keep a cached list of trusted CAs on file – so if a digital certificate is signed by an entity that’s not on the “approved” list, the browser will send a warning message to the user that the website may not be trustworthy.

Certificate binds these information mainly which we call it as SPKI (subject public key info)

A domain name, server name or hostname.
An organizational identity (i.e. company name) and location.

Do I really need an ssl certificate ??

SSL Certificates protect your sensitive information such as credit card information, usernames, passwords etc. It also:

Keeps data secure between severs
Increases your Google Rankings
Builds/Enhances customer trust
Improves conversion rates

How do I check certificate of my server ??

You can use OpenSSL client for verious ssl realted task:

to check the Server Cerificate : openssl s_client -showcerts -connect WordPress.com: Create a free website or blog:443(host:sslport)
to download the Certificate : openssl s_client -showcerts -connect WordPress.com: Create a free website or blog:443 |openssl x509 -outform PEM >wordpress_certfile.pem

By default browser/mobile host check lists of cached trusted ca's if the certificate is issued with them for the associated host it will simply acknowledge the certificate as trusted and starts communicating which involves high risks of MITM attacks.

What is Pinning ??

Pinning is the process of associating a host with their expected X509 certificate or public key. Once a certificate or public key is known or seen for a host, the certificate or public key is associated or 'pinned' to the host. If there are more then one certificate they hold a pinset to identify the communication channel.

So client needs to pin certificates or SPKI once the handshakes happened to verify the session key is similar to one generated during handshakes to avoid the MITM attacks.

More on how browser client or mobile devices does it will be explained in HOW SSL WORKS AND CERTIFICATE/SSL PINNING TO PREVENT MAN-IN-THE-MIDDLE (MITM) ATTACK - PART 2

Thursday, 6 August 2015

Garbage Collection Basics

Automatic memory management process, which attempts to reclaim garbage or memory occupied by objects that are no longer referenced by the program. It provides solution to major problems of dangling references, external fragmentation and also resolves memory leaks for you. Garbage collection can consume a significant amount of processing time of your program and thus have significant influence on performance. It can actually soak up a frame while rendering views or animations are running and may leads to a jank which most of the users might have experienced in Android devices.

The basic garbage collection strategy consists of determining which objects are root, live or garbage during program execution. Once the garbage is filtered out, it will become available for garbage collection during next phase of GC execution.

Garbage Collector World:

Object: Unit of storage in heap memory segment.
Object or reference graphs: Additional directed graph data structure required by GC to map objects and it's references which will be kept as nodes and edges shows the reference.
Roots: The objects that a program can access directly are those objects which are referenced by local variables on the processor stack as well as by any static variables that refer to objects. JNI calls are also be kept as the root.
Garbage: There is no edge referencing from these nodes/objects.
Frame: A frame is used to store data and partial results, as well as to perform dynamic linking, return values for methods, and dispatch exceptions. Once a method is invoked a frame is created and kept at the top of JVM stack of the created thread.
Heap: Unlike primitive variables and references in the local variable array(in each frame) objects are always stored on heap segment so cannot be disposed off when a method ends, instead they will only be removed by garbage collector.

Basic GC Algorithms:

Reference Counting,
Tracing Algorithms - Naive mark and sweep
Tracing Algorithms - Tri-color marking

Reference Counting:

It's often called as former garbage collection strategy and it's not a very common algorithm. In this approach every object has a count of the number of objects referencing to it, the one with 0 references is identified as garbage. An object's reference count is incremented whenever a reference to is is created, and decremented when a reference is disposed or cleaned up.

Disadvantages and Alleviating techniques:

Cycles : If two or more objects refer to each other it cannot be garbage collected as the reference count will never become zero. To solve this problem first cycle needs to be detected and then we can make sure the cycle end points are kept as weak references and will be easily available of garbage collection.
Space Overhead(Reference count): We need extra space for storing reference count with each object, which can be kept adjacent to object's memory or in side table. Unsigned pointer is commonly used for this task, meaning that 32 or 64 bits of reference count storage is allocated for each object.
Speed Overhead: Whenever an object is garbage collected more than one reference count is updated.
Atomicity: In multi-threaded environment when objects are shared across threads, increment and decrement operation needs to be atomic.

Tracing Garbage Collection:
Basic strategy for it consists of two phases :
1. Tracing which objects are reachable by a chain of references from certain root objects, considering rest of them as garbage.
2. After identifying garbage it sweeps them from the heap space.

These cycles are started for claiming memory and happens very often when the system runs low on memory. It waits for garbage to be accumulate and when system runs on low memory it starts the garbage collection and halts the execution of a program. For incremental garbage collector the collection happens in discrete phases and in between program can resume it's execution. It can be categorized under below two categories.

1. Naive Mark and Sweep Algorithm:
In this approach each object in memory has a flag(typically a reserved single bit) which helps to distinguish the object is garbage or live. The flag is always cleared except the collection cycle, during the execution it does a traversal of entire tree from root objects(accessible directly from variables on vm stack) . Finally, all memory is scanned to identify free and used blocks, those whose reserved bit are still clear are not reachable by any program object, and hence their memory are freed.

It has several disadvantages as the whole memory segment is examined twice and periodically halting the program execution.

2. Tri-color Marking:
Basic differentiation in tri-color marking is, it keeps three sets of objects namely white black and gray.
Initially all objects are marked as white set of objects.
White Set: Objects that are candidates for garbage collection.
Gray Set: Contains all objects reachable from root and yet to be searched for references to white set. Since, they are directly reachable from roots they eventually end up being black set. These objects is set to be initialized if directly referenced from the root level.
Black Set: Directly/Indirectly reachable from root and those who don't hold any references to white set of objects.
It has major advantage - it performs iterations for gc "on-the-fly", without halting the process for significant periods of time. This is done by marking objects as they are allocated and during mutation, maintaining the various sets.

Moving vs Non Moving Garbage Collection:
Once the garbage collector runs the marking phase it sorts out the unreachable or garbage objects, and can flush the marked garbage while moving the non reachable/live objects to heap different memory segments to avoid fragmentation.
1. Additional efforts to reclaim the objects freed by dead objects; the entire region from where live objects were moved can be considered as empty memory region and thus saving the extra efforts to search memory segment in heap memory segment.
2. Fragmentation can be avoided and easily memory can be allocated to newly created objects, moreover the fragmentation can lead to more gc pauses as it attempts to clear memory from garbage objects.
3. If used with proper techniques the objects that refer to each other can be allocated tightly in virtual address space and thus saving the time for moving the objects in heap memory.

Understanding Android Context:
In Android Runtime or Dalvik, every application has its own process which runs in its own VM. More Significantly, GC implementations are within a VM and thus every process has its own GC even though it is sharing data across application.

Android Runtime v/s Dalvik garbage collections:
The garbage collection strategies vary greatly in Dalvik and android runtime. One major difference is that Dalvik is not a moving collector. When a long pause in the application won't impact user experience (for eg when app is in background and is not playing audio) it compacts the heap memory in Android Runtime execution.

There is separate heap space for bitmaps, making it more easier to find space for large bitmap objects, making it faster to find memory for these large objects without wading the potentially fragmented regular heap, pauses in art are regularly in the realm of 2-3 ms.

Garbage collection is improved in post lollipop devices, it still should be avoided whenever possible, particularly during noticeable situations like animations, where a missed frame can result to jank.

Friday, 19 June 2015

Strategy Design Pattern

Strategy beneath Strategy Design Pattern ??

It defines a family of behaviors, encapsulate each of them, and clients can modify their behavioral implementations dynamically without interchanging the behavior in abstract client family.

How behavior is different from clients:

A simple example where one behavior eat is similar in all clients of Animal class. But an animal can choose from behavior of an herbivorous or carnivorous or an omnivorous diets. These behaviors could be modified dynamically.

Super Class	Sub-Classes	IDietBehavior	Dietary Behavior
Animal	Bird	eat	Herbivorous
	Cow		Carnivorous
	Tiger		Omnivorous
	PolarBear

Please try yourself with above classes, after understanding the below bank example.

Client behaviors can also vary independently without worrying about internal implementations.
Why do we need this pattern ?

Reduces long list of conditional behavior.
Avoid redundant code.
Independent clients and behavior to disallow other classes to force change the current behavior.
Encapsulation to hide code from user.

AbstractBank.java

HsbcBank.java

AxisBank.java

ILoanProvider.java

BlockMortgage.java

Mortgage.java

Test.java

Drawbacks : Increased number of classes, managing the code is also a little difficult.

Sunday, 14 June 2015

Android Looper and Handler

Understanding Android Looper:

Looper keeps a thread specific event message loops. It's based on event driven programming paradigm; which waits for the events to process and when it has some event it immediately dispatches the specific event to the target(handler) to perform the desired action.

It is a default implementation of android framework and manages concurrency using message queue and declared as a final class to prohibit inheritance.

Looper's Message Queue:

Synchronized Message Queue will process Messages and Runnables to the target placed on the queue by different threads.
Only one looper per thread is allowed and throws Illegal state exception("Only one Looper may be created per thread") if someone tries to prepare more than one looper in the same thread, this is enforced in Looper.java class.
Looper Message Queue is managed by Handler, it add remove messages/runnables to message queue.

Threads by default doesn't have a message queue attached to communicate with them.

Looper key methods :

prepare(): static method which initializes message queue and current thread object and holds a thread local constructor. Incumbent to call this method prior to running event loop.

loop(): runs the event message loop.

Wait for messages in the queue. (Might block)
Inversion of control: Dispatches messages to the respective targets(Handler).
Recycle messages once its processing done.

quit(): shut down the message event loop

set message queue to quit and send null message while Quitting.
null message stops event loop to stop.

Handler : The Brain, Master-Mind

A Handler allows you to send and process Message and Runnable objects associated with a thread's MessageQueue.

When you create a new Handler, it is bound to the message queue of the thread that is creating it -- from that point on, it will deliver messages and runnables to that message queue and execute them as they come out of the message queue.

Creating Handlers :
1. Need a looper to associate to a thread message queue.
2. Callback interface to handle messages when done processing by looper.
3. In default constructor, it uses current thread looper and if it's not prepared, will throw a run-time exception.

java.lang.RuntimeException: Can't create handler inside thread that has not called Looper.prepare()

Why do we need handlers ??

to schedule messages and runnables to be executed as some point in the future.
to enqueue an action to be performed on a different thread from another thread. (Inter-thread communication).

Threads can communicate with each other using handler which can send messages and runnables to different threads looper.

Scheduling messages is accomplished with the post, postAtTime(Runnable, long), postDelayed, sendEmptyMessage, sendMessage, sendMessageAtTime, and sendMessageDelayed methods.

UI threads has its own looper prepared from ActivityMainThread.java to receive messages from itself and other threads. UI thread message queue is accessible to all threads using method Looper.getMainLooper(), AsyncTask, CursorLoader, AsyncQueryHandler, services and other app components all are using main thread looper to deliver messages to the main thread.