Tuesday, March 26, 2013

Business App IT Lab - Session 10


Assignment 1



Create 3 vectors, x, y, z and choose any random values for them, ensuring they are of equal length, bind them together.Create 3 dimensional plots of the same.

Solution:

First creating a random data set of 50 items with mean =30 and standard deviation =10

> data <- rnorm(50,mean=30,sd=10)
> data

Taking sample data of length 10 from the created data set in three different vectors x,y,z
> x <- sample(data,10)
> x

> y <- sample(data,10)
> y

> z <- sample(data,10)
> z

Binding the three vectors x,y,z into a vector T using cbind
> T <- cbind(x,y,z)
> T



Plotting 3d graph 

Command:

> plot3d(T[,1:3])





Plotting of graph with labels for axes and color

Command 
> plot3d(T[,1:3], xlab="X Axis" , ylab="Y Axis" , zlab="Z Axis", col=rainbow(500))
 


Plotting of graph with labels for axes, color and type = spheres

Command
> plot3d(T[,1:3], xlab="X Axis" , ylab="Y Axis" , zlab="Z Axis", col=rainbow(5000), type='s')



Plotting of graph with labels for axes, color and type = points

Command

> plot3d(T[,1:3], xlab="X Axis" , ylab="Y Axis" , zlab="Z Axis", col=rainbow(5000), type='p')



Plotting of graph with labels for axes, color and type = lines

Command

> plot3d(T[,1:3], xlab="X Axis" , ylab="Y Axis" , zlab="Z Axis", col=rainbow(5000), type='l')



Assignment 2

Choose 2 random variables 
Create 3 plots: 
1. X-Y 
2. X-Y|Z (introducing a variable z and cbind it to z and y with 5 diff categories)
3. Color code and draw the graph 
4. Smooth and best fit line for the curve

Solution

Creating a data set for two random variables and then introducing third variable z

Command:

> x <- rnorm(5000, mean= 20 , sd=10)
> y <- rnorm(5000, mean= 10, sd=10)
> z1 <- sample(letters, 5)
> z2 <- sample(z1, 5000, replace=TRUE)
> z <- as.factor(z2)
> z
 



Creating Quick Plots

Command:

>qplot(x,y)




>qplot(x,z)



For semi-transparent plot

> qplot(x,z, alpha=I(2/10))




For coloured plot

> qplot(x,y, color=z)




For Logarithmic coloured plot

> qplot(log(x),log(y), color=z)



Best Fit and Smooth curve using "geom"

Command:

> qplot(x,y,geom=c("path","smooth"))



> qplot(x,y,geom=c("point","smooth"))
 


> qplot(x,y,geom=c("boxplot","jitter"))



Saturday, March 23, 2013

Business App IT Lab - Session 9

Assignment

Tools that can be used for a variety of data visualisation tasks

We will look at couple of tools developed to collate and analyse from Google Drive.
The google docs are the need of the hour and managing them is not a comfortable job for a manager. 
Several free apps are available which help them scrutinize.

FreeDive

What it does: This alpha project from the Knight Digital Media Center at UC Berkeley turns a Google Docs spreadsheet into an interactive, sortable database that can be posted on the Web.

Benefits: In addition to text searching, you can include numerical range-based sliders. Usage is free. End users can easily create their own databases from spreadsheets without writing code.


FreeDive's chief current attraction is the ability to create databases without programming; however, freeDive source code will be posted and available for use once the project is more mature. That could appeal to IT departments seeking a way to offer this type of service in-house, allowing end users to turn a Google Doc into a filterable, sortable Web database using the Google Visualization API, Google Query Language, JavaScript and jQuery -- without needing to manually generate that code.

Drawbacks: My test application ran into some intermittent problems; for example, it wouldn't display my data list when using the "show all records" button. This is an alpha project, and should be treated as such.

In addition, the current iteration limits spreadsheets to 10 columns and a single sheet. One column must have numbers, so this won't work for text-only information. The search widget is currently limited to a few specific choices of fields to search, although this might increase as the project matures. (A paid service like Caspio would offer more customization.) The nine-step wizard might get cumbersome after frequent use.





Explore your results

Instructions: Use the filter(s) below to customize your search results. Use the tool above to perform a new search.
Fetching data... Thank you for waiting.
Searches with a large number of results may take longer to load.

FreeDive's chief current attraction is the ability to create databases without programming; however, freeDive source code will be posted and available for use once the project is more mature. That could appeal to IT departments seeking a way to offer this type of service in-house, allowing end users to turn a Google Doc into a filterable, sortable Web database using the Google Visualization API, Google Query Language, JavaScript and jQuery -- without needing to manually generate that code.

Drawbacks: My test application ran into some intermittent problems; for example, it wouldn't display my data list when using the "show all records" button. This is an alpha project, and should be treated as such.

In addition, the current iteration limits spreadsheets to 10 columns and a single sheet. One column must have numbers, so this won't work for text-only information. The search widget is currently limited to a few specific choices of fields to search, although this might increase as the project matures. (A paid service like Caspio would offer more customization.) The nine-step wizard might get cumbersome after frequent use.





Friday, March 15, 2013

Business App IT Lab - Session 8


Assignment

We will be doing Panel Data Analysis of "Produc" data

We will be analysing on three types of model :
      Pooled affect model
      Fixed affect model
      Random affect model 

Then we will be determining which model is the best by using functions:
       pFtest : for determining between fixed and pooled
       plmtest : for determining between pooled and random
       phtest: for determining between random and fixed

Commands:

Loading data: 
> data(Produc , package ="plm")
> head(Produc)

 Data

Pooled Effect Model 

> pool <- plm(log(pcap)~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp) , data =Produc, model=("pooling"), index = c("state","year"))
> summary(pool)



 Pooled Effect Model


Fixed Affect Model:

> fixed <- plm(log(pcap)~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp) , data =Produc, model=("within"), index = c("state","year"))
> summary(fixed)


Fixed Effect Model

Random Effect Model:
> random <- plm(log(pcap)~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) + log(emp) + log(unemp) , data =Produc, model=("random"), index = c("state","year"))
> summary(random)


Random Effect Model

Comparison


The comparison between the models would be a Hypothesis testing based on the following concept:

H0: Null Hypothesis: the individual index and time based params are all zero
H1: Alternate Hypothesis: atleast one of the index and time based params is non zero

Pooled vs Fixed

Null Hypothesis: Pooled Effect Model
Alternate Hypothesis : Fixed Effect Model

Command:
> pFtest(fixed,pool)
Result:

data:  log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) +      log(emp) + log(unemp)
F = 56.6361, df1 = 47, df2 = 761, p-value < 2.2e-16
alternative hypothesis: significant effects
Since the p value is negligible so we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Fixed Effect Model.

pFtest


Pooled vs Random

Null Hypothesis: Pooled Affect Model
Alternate Hypothesis: Random Affect Model

Command :
> plmtest(pool)

Result:

        Lagrange Multiplier Test - (Honda)
data:  log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) +      log(emp) + log(unemp)
normal = 57.1686, p-value < 2.2e-16
alternative hypothesis: significant effects

Since the p value is negligible so we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Random Effect Model.

plmtest

Random vs Fixed

Null Hypothesis: No Correlation . Random Affect Model
Alternate Hypothesis: Fixed Affect Model

Command:
 > phtest(fixed,random)

Result:

        Hausman Test
data:  log(pcap) ~ log(hwy) + log(water) + log(util) + log(pc) + log(gsp) +      log(emp) + log(unemp)
chisq = 93.546, df = 7, p-value < 2.2e-16
alternative hypothesis: one model is inconsistent

Since the p value is negligible so we reject the Null Hypothesis and hence Alternate hypothesis is accepted which is to accept Fixed Effect Model.

phtest


Conclusion: 

So after making all the comparisons we come to the conclusion that Fixed Effect Model is best suited to do the panel data analysis for "Produc" data set.

Hence , we conclude that within the same id i.e. within same "state" there is no variation.