Thursday, October 22, 2015

Is Cloud Technology really cheaper??

Cloud has been not only the fastest growing technology but its revenue model is also a unique thing to look at. The recent $67 billon merger between Dell and EMC was a marriage forced by the rise of cloud, as EMC is good at storage and Dell as maker for computers.

Though we know that cloud computing is cheaper, especially for the booming startups, but in this blog we will try and take the other side, which is a bit tricky. There are a few points to think about.
Firstly, lets see that the upload cost is nil for most of the service providers. This means, that they want you to store as much information as possible and then it will be difficult to switch to other service provider at a later stage.
Next, once you are tied to a particular service provider, they start to tighten the screws, be it based on cost or the already decided service levels.
The security of your data is another important thing you need to look at. As we know that Indian government/public sector companies were not going on cloud considering the fact that the data would be stored outside the country and the law of the land my prevail. So, Microsoft has recently started their India datacenter based out of Pune. Now if government sector is successful target, it would be a continued long term business for Microsoft. The levels of revenue for any service provider are diverse, from Iaas to PaaS to SaaS.
As the company grows, it needs to note that the usage is billed every minute for the cloud service. This was not the case for in-house servers where a bit of misuse by any employee was not chargeable to the company.
Also, the service revenue to put all the existing on-premise things on cloud for a company is not free and it is charged on a premium because of the business impact they have if not transferred correctly.

Overall, if we look at the cost of cloud services, there are many small nuances which are handy at times for company. The question of cloud being cheaper than the on-premise facility depends on the usage of the company and the current infrastructure of the company.

Wednesday, February 18, 2015

Internet of Things..

Any physical object being embedded with electronics device or sensors with some software code which make them exchange data, comes under the umbrella of IoT. This is one of the latest trends in the IT world and companies like Cisco and IBM are having dedicated units for their advancement in this domain. As per a report by Gartner, experts estimate that the IoT will consist of almost 50 billion objects by 2020. That is big, really big.

As per a list by OECD in 2015, South Korea has the highest number of devices online per inhabitant. It has some 40 devices per 100 people. India ranks way down at 24th position with 0.6 devices per 100 people. ABI Research estimates that more than 30 billion devices will be wirelessly connected to the Internet of Things by 2020. In an active move to accommodate new and emerging technological innovation, the UK Government, in their 2015 budget, allocated £40,000,000 towards research into the Internet of Things.  

The ability to network embedded devices with limited CPU, memory and power resources means that IoT finds applications in nearly every field. On the other hand, IoT systems could also be responsible for performing actions, not just sensing things. Shopping systems could track your shopping usage and link it to your mobile number and  then advice you the next time you are going for shopping by an SMS to your phone. Another excellent application that the Internet of Things brings to the picture is home security solutions. Your water will be warm as soon as you get up in the morning for the shower.

As I have some assignments to complete for tomorrow, I wouldn't be able to write more. But be assured, IoT is the next thing and we all need to be ready for it. By the time I would enter into the professional world, it would be a hot cake.

Tuesday, June 10, 2014

Market Research Data Analytic: Basics about Scales

This post is for the basics of Market Research data analysis part. This is not comprehensive but it gives the user the first hand knowledge of how to use SPSS and do basic data analysis, which can be useful for 70% of the cases.

 First we need to know more about the data types. These data types are different from the data types which we have in computer language. The data types in this case are of 4 types: Nominal, Ordinal, Interval and Ratio.



Nominal data is when we have to just differentiate between two or more types of data. It is just like assigning a particular code to a specific value. This can be like "Male" and "Female" or say making any specific group of "Users", "Potential Customers" and "Not Interested". This type of data is just choosing a category and so it is also called as categorical data. With this data, we can just do basic mathematical calculations like finding percentage of one type or central tendency (mode) etc. Median and mean doesn't make sense because there is no rank.

Ordinal data is when we have to arrange the data along with a specific name given as in Nominal. It can be like arranging 5 films in a particular order. The films are arranged in an order which shows which is better than the other (in my terms) but it doesn't tell that a film is better by what margin. It shows comparison but not the relative comparison between the two. Here we can have median and percentage. Mean doesn't hold good here too.

Interval scale is an ordinal scale which also shows the differentiation between the two data sets. It can be like Celsius Scale of temperature, it shows that 45 degree is 2 degrees more than 43 degrees, but at the same time it doesn't have an absolute zero. Ratios are not allowed since 20 °C cannot be said to be "twice as hot" as 10 °C, nor can multiplication/division be carried out between any two dates directly. We can have mean, median and mode as the characteristics of data.

In case of ratio scale, there is absolute zero defined and so we can have the ratio among the different values. So, weight, age, height etc. all are ratio scales. These are the most mature of all the scales and all the mathematical calculations are possible on this data.

Based on this specific general case of data, we can go to the next level and start using SPSS.

Tuesday, November 27, 2012

SalesForce SOAP API- Basic Setup

I will share about the basic usage of SalesForce API based on SOAP protocol. I will discuss only the important steps and the error messages which I got because I spent around a complete day in googling about the errors and then finally getting them correct.

After you create your salesforce developer account, you need to create a wsdl from it. You can choose either partner wsdl or enterprise wsdl or any other, as the case may be. For it, you need to do the following:
Login to salesforce --> YourName --> Setup --> Develop --> API.
Here you choose the type of WSDL you want to create.

NOTE: If you use chrome to make the wsdl file, you might end-up in creating wrong wsdl files and it will throw error while creating jar out of that wsdl. The error will be something like "Exception in thread "main" com.sforce.ws.wsdl.WsdlParseException: Unsupported Schema element found http://www.w3.org/2001/XMLSchema:complextype. At: 93:41". So, use internet explorer for making the wsdl.

To make the wsdl, rt click the link "Generate Partner WSDL" and click on save the target as. In that dialogue box, you will need to change the format extension to wsdl. For example let's save as partner.wsdl.

Then you will need to make the jar out of the wsdl you created. Before that, you will need to download the WSC JAR. You can download the file from here. Download the WSC file appropriate to the java version you have.

Now you need to make the jar file out of the wsdl file which you downloaded. For it, you will need to run the following command:

java –classpath pathToJAR/wsc-22.jar com.sforce.ws.tools.wsdlc pathToWsdl/WsdlFilenamepathToOutputJar/OutputJarFilename


Here, you need to remember the version of java used and the WSC you downloaded. You might face the error something like:



C:\Users\samahesh>java -classpath C:\sfdc\WSC\wsc-22.jar com.sforce.ws.tools.wsd
lc C:\sfdc\WSC\wsdl.jsp C:\sfdc\WSC\wsdl.jar
[WSC][wsdlc.run:312]Created temp dir: C:\Users\samahesh\AppData\Local\Temp\wsdlc
-temp-3098315371680609491-dir
Exception in thread "main" com.sforce.ws.wsdl.WsdlParseException: Parse error: F
ound invalid XML. entity reference name can not contain character =' (position:
START_TAG seen ...sh = \'\';\nvar url = \'https://login.salesforce.com/?ec=302&s
tartURL=... @13:58)
        at com.sforce.ws.wsdl.WsdlParser.next(WsdlParser.java:93)
        at com.sforce.ws.wsdl.Definitions.read(Definitions.java:110)
        at com.sforce.ws.wsdl.WsdlFactory.create(WsdlFactory.java:68)
        at com.sforce.ws.tools.wsdlc.(wsdlc.java:75)
        at com.sforce.ws.tools.wsdlc.run(wsdlc.java:312)
        at com.sforce.ws.tools.wsdlc.main(wsdlc.java:303)
Caused by: com.sforce.ws.ConnectionException: Found invalid XML. entity referenc
e name can not contain character =' (position: START_TAG seen ...sh = \'\';\nvar
 url = \'https://login.salesforce.com/?ec=302&startURL=... @13:58)
        at com.sforce.ws.parser.XmlInputStream.next(XmlInputStream.java:137)
        at com.sforce.ws.wsdl.WsdlParser.next(WsdlParser.java:89)
        ... 5 more
Caused by: com.sforce.ws.parser.XmlPullParserException: entity reference name ca
n not contain character =' (position: START_TAG seen ...sh = \'\';\nvar url = \'
https://login.salesforce.com/?ec=302&startURL=... @13:58)
        at com.sforce.ws.parser.MXParser.parseEntityRef(MXParser.java:2204)
        at com.sforce.ws.parser.MXParser.nextImpl(MXParser.java:1266)
        at com.sforce.ws.parser.MXParser.next(MXParser.java:1085)
        at com.sforce.ws.parser.XmlInputStream.next(XmlInputStream.java:135)
        ... 6 more


You might also face some error as:


C:\Users\samahesh>java -classpath C:\sfdc\WSC\wsc-22.jar com.sforce.ws.tools.wsd
lc C:\sfdc\WSC\wsdl.jsp C:\sfdc\WSC\wsdl.jar
[WSC][wsdlc.run:312]Created temp dir: C:\Users\samahesh\AppData\Local\Temp\wsdlc
-temp-3098315371680609491-dir
Exception in thread "main" com.sforce.ws.wsdl.WsdlParseException: Parse error: F
ound invalid XML. entity reference name can not contain character =' (position:
START_TAG seen ...sh = \'\';\nvar url = \'https://login.salesforce.com/?ec=302&s
tartURL=... @13:58)
        at com.sforce.ws.wsdl.WsdlParser.next(WsdlParser.java:93)
        at com.sforce.ws.wsdl.Definitions.read(Definitions.java:110)
        at com.sforce.ws.wsdl.WsdlFactory.create(WsdlFactory.java:68)
        at com.sforce.ws.tools.wsdlc.(wsdlc.java:75)
        at com.sforce.ws.tools.wsdlc.run(wsdlc.java:312)
        at com.sforce.ws.tools.wsdlc.main(wsdlc.java:303)
Caused by: com.sforce.ws.ConnectionException: Found invalid XML. entity referenc
e name can not contain character =' (position: START_TAG seen ...sh = \'\';\nvar
 url = \'https://login.salesforce.com/?ec=302&startURL=... @13:58)
        at com.sforce.ws.parser.XmlInputStream.next(XmlInputStream.java:137)
        at com.sforce.ws.wsdl.WsdlParser.next(WsdlParser.java:89)
        ... 5 more
Caused by: com.sforce.ws.parser.XmlPullParserException: entity reference name ca
n not contain character =' (position: START_TAG seen ...sh = \'\';\nvar url = \'
https://login.salesforce.com/?ec=302&startURL=... @13:58)
        at com.sforce.ws.parser.MXParser.parseEntityRef(MXParser.java:2204)
        at com.sforce.ws.parser.MXParser.nextImpl(MXParser.java:1266)
        at com.sforce.ws.parser.MXParser.next(MXParser.java:1085)
        at com.sforce.ws.parser.XmlInputStream.next(XmlInputStream.java:135)
        ... 6 more



The main reason for the errors is the mismatch of the java version or something like downloading from chrome. To prevent any such errors, try using the java directly from where you want to use. For e.g., in the above error, it is due to the reason that default jave in my classpath is pointing to JDK 1.7 but it requires the JDK 1.6. So we can modify our command to use JDK 1.6 as something like:


C:\Users\samahesh>"C:\Program Files\Java\jdk1.6.0_27\bin\java.exe" -classpath C:\sfdc\WSC\wsc-22.jar com.sforce.ws.tools.wsdlc C:\sfdc\WSC\partner.wsdl C:\sfdc\WSC\wsdl.jar

After the command runs successfully, you will get the message as "...Generated JAR file...". 

Now you are ready to use this jar file in you sample program.
Create a sample program in any IDE, say Eclipse. Add the jar file created in above step in the build path of the project. Also add the WSC file which you downloaded. Use API and run the programs.

Reference: http://www.salesforce.com/us/developer/docs/api/index.htm







Monday, November 7, 2011

MYSQL with Cloudera-VM for Hadoop..

I will discuss about how to configure hive to run using MYSQL Db instead of default derby DB in cloudera's VM.

First step is to install mysql-server which is provided in the packages.
Go to terminal and start:

$ sudo apt-get install mysql-server
It will start the installation of MYSQL.

Keep the username and password as root

After it gets installed, u can check that it works fine. For it, u go to the location where u hav installed it (/etc/mysql) and give the command:

$ mysql -uroot -proot

or if u don't want to show the password:

$ mysql -uroot -p

(it will ask for the password)

Den if u want a seperate user for hadoop then u can do that:
create user 'hadoop'@'localhost' identified by 'hadoop'
grant all privileges on *.* to 'hadoop'@'localhost'with grant option

I have done it though using the root user only. So i don't need to explicitely give the permissions (it has by default)



Then u can 'exit' from it.

After this, u need to change the hive configuration so it can use MySQL.
(/etc/hive/conf/hive-site.xml or whereever u have installed)




Now u need to get the MYSQL JDBC driver to make the connection. Go to  this site  to get the latest version. If it is not compatible with the version of your MYSQL version, check for previous versions.

After downloading the file, untar it. Copy the jar file from inside it to the lib folder of hive. (default is /usr/lib/hive/lib/).

Now u can check that hive connects to the MYSQL DB. Go to hive (/etc/hive) and type command:

$ hive

hive> show tables;

U will see that the tables which u created in your derby DB are now not available in it.. It is now connected to MYSQL DB.

U can now start playing around hive with MYSQL as DB.





Tuesday, November 1, 2011

Making files in UNIX using scripting


Lets say that I need to create some files in UNIX box. The name of the files and the respective contents are written in a text file. Say, I have a text file as “fileContents.txt” which has the name of the file to be created followed by its contents which is followed by next file name and its contents and so on..

The file “fileContents.txt” is like:



sds_md_Acc_ContactRead.pl

#!/bin/perl
require "./directory_working/bin/package.pl";
        $exec_dir = $ARGV[0];
        $REPO = $ARGV[1];
----------------------------------------------------------------------------------------

sds_md_Acc_Datatype.pl

#!/bin/perl
require "./directory_working/bin/package.pl";
        $exec_dir = $ARGV[0];
        $REPO = $ARGV[1];
-------------------------------------------------------------------------------------------

sds_md_Bin_anno.pl

#!/bin/perl
………..
………..
………..
And so on and on


Now, v need to make a script that will read the name from this txt file, make that file and insert the respective contents in it.

First, we need to get the name of the files which we want to create. For that we can use the following command:

cat fileContents.txt | grep ‘\.pl$’

Here, the symbol ‘\’ is for escape sequence because we want to match the ‘.pl’ and if we don’t give the escape sequence, it will take it as command.
“The . (period) means any character(s) in this position, for example, ton. will find tons, tone and tonneau but not wanton because it has no following character”

The symbol ‘$’ is used so that grep takes the word only which has the ‘.pl’ at its end (no character after that).

Now, we also need to find the line number of the occurrence of the file name. This is requird because we know that the text is in between two file names. So, for line numbers use,

cat fileContents.txt | grep -n ‘\.pl$’

Here, -n will give the line number also. Important thing to note is the name of the file and the line numbers are separated by a “:”.

We can store this result in some file say “saurabh.txt”. For that:

cat fileContents.txt | grep -n ‘\.pl$’ > saurabh.txt


U can open and check the file saurabh.txt.

Now we will write the script. Make a file say saurabh.sh which will contain the following contents:

1.       #!/bin/bash
2.       while read LINE
3.       do
4.                        end_lineNum=`echo $LINE | cut -d ":" -f1`
5.                        end_lineNum=`echo $end_lineNum -3 |bc`
6.                        fileName=`echo $LINE | cut -d ":" -f2`
7.                        sed -n "$start_lineNum","$end_lineNum"p fileContents.txt > "$OldFile"
8.                        end_lineNum=`echo $end_lineNum + 5|bc`
9.                        start_lineNum=$end_lineNum
10.                     OldFile=$fileName
11.   done < saurabh.txt
12.   end_lineNum=`wc -l fileContents.txt | awk '{print $1}'`
13.   sed -n "$start_lineNum","$end_lineNum"p fileContents.txt > "$OldFile"


The code is mostly self-explanatory. We are first reading the file ‘saurabh.txt’ and separating its contents (line number and name of the file) by cutting based on ‘:’. Next we are using the line number of the previous file name and the new one. We know, that content exists in between those line numbers and hence we are storing the line number of the previous file as ‘start_lineNum’ and that of the new one as ‘end_lineNum’. We are using the ‘sed’ command to get the contents in between the two line numbers from ‘fileContents.txt’ and storing it into a file which is made on the fly. The last file will not be made in the by the loop because the loop terminates before making the last file. So, for that we are using the lines 12 and 13. In 12, we are getting the total number of lines in the file ‘fileContents.txt’ because that will be the last line of the content of the last file. In 13th line, we are making that last file and storing the contents.

Execute the script as
sh saurabh.sh


It will make the files along with the contents. J

Important point to note is if u use
cat saursbh.txt | while red LINE
do
…….
…….
…….
done


instead of :
while read LINE
do
……
……
……
done > saurabh.txt

U will note that the variable defined inside the loop will not be accessible outside the loop. This is because the command “cat” is blocking command. Read more about blocking commands

Thursday, October 27, 2011

Installing Hadoop on Windows using Cygwin...


Hi all, this time we will be discussing about installing Hadoop on windows using Cygwin. For it, the first step is to install Cygwin on the windows system. U can download the Cygwin version from here (http://www.cygwin.com/). Download and install it. Make sure u select the “OpenSSh” package when it asks for which packages to include. As:



Now, set the environment variable (if it is not set by itself while installing) to point to the installation of Cygwin. 

Configure the ssh daemon. For it, execute the command:
ssh-host-config

When asked if privilege separation should be used, answer no.
When asked if sshd should be installed as a service, answer yes.
When asked about the value of CYGWIN environment variable, enter ntsec.

Now, if the Cygwin service is not started, go to “services.msc” from run and start the Cygwin service.

Then u need to do authorization so that it may not ask every time and u can connect to it using ssh. For this, follow the following steps:
Execute the command in Cygwin:
ssh-keygen

When it asks for the file names and the pass phrases, type ENTER (without writing anything) and accept default values.

After that, once the command executes fully, type the following:
cd ~/.ssh

To check the keys, u can do “ls”. U will see the id_rsa.pub and id_rsa files.

If it is the first time u r using the Cygwin installed on yr system, den it is ok otherwise u will need to write those values to the authorization value. 

For that execute:
cat id_rsa.pub >> authorized_keys

Now try and login into yr own localhost by executing:
ssh localhost

The first time it will ask for conformation, do yes and den u can see that from next time u will not have to do it again.

Its all done. Your Cygwin is ready to install Hadoop in it and start working. So u need to download Hadoop. You can download it here (http://www.gtlib.gatech.edu/pub/apache//hadoop/common/).

You will also need to have Java installed on your machine. This is because Hadoop uses Java also and u will have to set the environment variable “JavaHome”. I have used Java 1.6 and the tutorial is based on usage of that version.

After u have downloaded Hadoop, u need to extract it (as it is in zip format). To extract it, u can type the following command on Cygwin command prompt.
tar -xzf hadoop-0.20.203.0.tar.gz

The file name may be different as per the version u have downloaded). Also I have extracted the Hadoop to the Cygwin folder under C drive. So, commands and paths may change as per the case may be.

Once it is done, u can see the contents of the Hadoop file by using the command “ls”.

If u want to work with Hadoop file system, go into the location where extracted Hadoop is present and type:
$ cd ../../hadoop-0.20.203.0
$ bin/hadoop fs -ls

What it says??? It says about the env. variable JAVA_HOME not set. So, lets set its path and other configuration files too.

Go to the Hadoop extracted folder -> conf  -> hadoop.env.sh (open it to edit)

This is the file where the environment variables are set. In this, u fill find one commented line (line beginning with #) as 
#export JAVA_HOME=
U  need  to un-comment it and then give the path where the Java is installed on your system.

If u want to give the path, u will have to include the cygdrive before it to access it. For e.g., if u have java at location c://Java, then u will have to give the path as:
export JAVA_HOME=/cygdrive/c/Java/jdk1.6.0_27


Or else if there is some space in between in the path, then u need to have one escape sequence character. For e.g.,

export JAVA_HOME=/cygdrive/c/Program\ Files/Java/jdk1.6.0_27


Rest all the env. variables are set by default. 

Now we go to second configuration file, hdfs-site.xml in the same location. In it, write the following in between the tag configuration:



Here we have to set the no of replication which should be done for every file. U can set the value as u want. Here we have set the value as 1.

Now open the file core-site.xml and give the following properties in it:




U can give any port number instead of 4440 but it should be free.


Now we have to make changes to mapred-site.xml file, located at the same location. In it, write the following:



Here we give the port number where we want to run the job tracker. U can give any of the free ports.


All the configurations are done. U can see the remaining files. U can note that in both the files “master” and “slave”, it is the localmachine which is acting as both.


Then the last thing u need to do is to make the namenode available and ready. To do it, first u need to format it because it is being used for the first time. So type the following command

I am assuming that u r presently in the Hadoop folder. If not, type cd
Eg. $   cd ../../hadoop-0.20.203.0 

$ bin/hadoop  namenode –format

It will give some information and then will finally format the namenode. Now u need to run and see it. So, type :
$ bin/hadoop namenode

And it will start running. Don’t stop it, instead open new command prompt of Cygwin and go to the Hadoop location and type:
$ bin/hadoop fs
It will show the options u can use with the fs command. Try using some of them like ls etc.

Now, in another command prompt, run the secondary name node as:
$ bin/hadoop secondarynamenode

In the third command prompt, run job tracker as:
$ bin/hadoop jobtracker

In fourth, run the datanode as:
$ bin/hadoop datanode

And in the last u can run the task tracer as:
$ bin/hadoop tasktracker

While u were starting these, u must have noticed the change in the name node. Try reading those changes and information.

Next task is to run Hive in this system. Download hive from here. (http://mirror.olnevhost.net/pub/apache//hive/). Try using it till my next post.. :-)