로컬모드로 Spark 를 띄우기 위해 먼저 Spark 부터 받자
http://spark.apache.org
혹은 새로 빌드를 하거나
[INFO] Reactor Summary:
[INFO]
[INFO] Spark Project Parent POM ........................... SUCCESS [ 13.980 s]
[INFO] Spark Project Test Tags ............................ SUCCESS [01:04 min]
[INFO] Spark Project Sketch ............................... SUCCESS [ 20.141 s]
[INFO] Spark Project Networking ........................... SUCCESS [ 27.022 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 21.725 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [ 35.521 s]
[INFO] Spark Project Launcher ............................. SUCCESS [ 26.792 s]
[INFO] Spark Project Core ................................. SUCCESS [08:13 min]
[INFO] Spark Project GraphX ............................... SUCCESS [01:25 min]
[INFO] Spark Project Streaming ............................ SUCCESS [02:52 min]
[INFO] Spark Project Catalyst ............................. SUCCESS [06:12 min]
[INFO] Spark Project SQL .................................. SUCCESS [06:28 min]
[INFO] Spark Project ML Library ........................... SUCCESS [06:23 min]
[INFO] Spark Project Tools ................................ SUCCESS [ 21.218 s]
[INFO] Spark Project Hive ................................. SUCCESS [04:34 min]
[INFO] Spark Project Docker Integration Tests ............. SUCCESS [ 25.573 s]
[INFO] Spark Project REPL ................................. SUCCESS [ 37.997 s]
[INFO] Spark Project Assembly ............................. SUCCESS [ 43.001 s]
[INFO] Spark Project External Flume Sink .................. SUCCESS [ 34.540 s]
[INFO] Spark Project External Flume ....................... SUCCESS [ 51.423 s]
[INFO] Spark Project External Flume Assembly .............. SUCCESS [ 5.614 s]
[INFO] Spark Project External Kafka ....................... SUCCESS [01:07 min]
[INFO] Spark Project Examples ............................. SUCCESS [02:21 min]
[INFO] Spark Project External Kafka Assembly .............. SUCCESS [ 7.941 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 46:56 min
[INFO] Finished at: 2016-03-26T20:05:46+09:00
[INFO] Final Memory: 107M/1622M
[INFO] ------------------------------------------------------------------------
시간이 무지 걸리네요.
SQL context available as sqlContext.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.0.0-SNAPSHOT
/_/
Using Scala version 2.11.7 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_65)
Type in expressions to have them evaluated.
Type :help for more information.
scala> sc.version
res0: String = 2.0.0-SNAPSHOT
소문에 Spark 1.6버전은 2.0 가기 전 테스트 단계라더니 빌드해보니 2.0.0-SNAPSHOT 네요. 2.0 이 멀지 않았나 보다. 여튼. R과 Spark 정도는 환경변수에서 패스를 잡아야 속 편하다. 혹시 저처럼 git 에서 받을 경우에는 R 에서 library(SparkR) 가 없다고 나올수 있으니 C:\app\spark\R 에서 install-dev 실행하면 R 패키지가 생성된다.
C:\app\spark\R>install-dev.bat
* installing *source* package 'SparkR' ...
** R
** inst
** preparing package for lazy loading
Creating a new generic function for 'colnames' in package 'SparkR'
Creating a new generic function for 'colnames<-' in package 'SparkR'
Creating a new generic function for 'cov' in package 'SparkR'
Creating a new generic function for 'drop' in package 'SparkR'
Creating a new generic function for 'na.omit' in package 'SparkR'
Creating a new generic function for 'filter' in package 'SparkR'
Creating a new generic function for 'intersect' in package 'SparkR'
Creating a new generic function for 'sample' in package 'SparkR'
Creating a new generic function for 'transform' in package 'SparkR'
Creating a new generic function for 'subset' in package 'SparkR'
Creating a new generic function for 'summary' in package 'SparkR'
Creating a new generic function for 'lag' in package 'SparkR'
Creating a new generic function for 'rank' in package 'SparkR'
Creating a new generic function for 'sd' in package 'SparkR'
Creating a new generic function for 'var' in package 'SparkR'
Creating a new generic function for 'predict' in package 'SparkR'
Creating a new generic function for 'rbind' in package 'SparkR'
Creating a generic function for 'alias' from package 'stats' in package 'SparkR'
Creating a generic function for 'substr' from package 'base' in package 'SparkR'
Creating a generic function for '%in%' from package 'base' in package 'SparkR'
Creating a generic function for 'mean' from package 'base' in package 'SparkR'
Creating a generic function for 'lapply' from package 'base' in package 'SparkR'
Creating a generic function for 'Filter' from package 'base' in package 'SparkR'
Creating a generic function for 'unique' from package 'base' in package 'SparkR'
Creating a generic function for 'nrow' from package 'base' in package 'SparkR'
Creating a generic function for 'ncol' from package 'base' in package 'SparkR'
Creating a generic function for 'head' from package 'utils' in package 'SparkR'
Creating a generic function for 'factorial' from package 'base' in package 'SparkR'
Creating a generic function for 'atan2' from package 'base' in package 'SparkR'
Creating a generic function for 'ifelse' from package 'base' in package 'SparkR'
** help
No man pages found in package 'SparkR'
*** installing help indices
** building package indices
** testing if installed package can be loaded
*** arch - i386
*** arch - x64
* DONE (SparkR)
C:\app\spark\R>dir lib
C 드라이브의 볼륨에는 이름이 없습니다.
볼륨 일련 번호: C68F-889D
C:\app\spark\R\lib 디렉터리
2016-03-26 오후 10:18 <DIR> .
2016-03-26 오후 10:18 <DIR> ..
2016-03-26 오후 10:18 <DIR> SparkR
2016-03-26 오후 10:18 590,033 sparkr.zip
1개 파일 590,033 바이트
3개 디렉터리 105,600,221,184 바이트 남음
C:\app\spark\R>
이제 RStudio 켜서 다음 같이 해보자.
> Sys.setenv(SPARK_HOME = "C:/app/spark")
> .libPaths(c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib"), .libPaths()))
> library(SparkR)
다음의
패키지를
부착합니다: 'SparkR'
The following objects are masked from 'package:stats':
cov, filter, lag, na.omit, predict, sd, var
The following objects are masked from 'package:base':
colnames, colnames<-, drop, intersect, rank, rbind, sample, subset, summary, transform
> sc <- sparkR.init(master = "local")
Launching java with spark-submit command C:/app/spark/bin/spark-submit.cmd sparkr-shell C:\Users\bhjo0\AppData\Local\Temp\RtmpA7HkfH\backend_port23c82ced221c
> sqlContext <- sparkRSQL.init(sc)
> sc
Java ref type org.apache.spark.api.java.JavaSparkContext id 0
> sqlContext
Java ref type org.apache.spark.sql.SQLContext id 1
> DF <- createDataFrame(sqlContext, faithful)
> head(DF)
eruptions waiting
1 3.600 79
2 1.800 54
3 3.333 74
4 2.283 62
5 4.533 85
6 2.883 55
> localDF <- data.frame(name=c("John", "Smith", "Sarah"), age=c(19, 23, 18))
> df <- createDataFrame(sqlContext, localDF)
> printSchema(df)
root
|-- name: string (nullable = true)
|-- age: double (nullable = true)
> path <- file.path(Sys.getenv("SPARK_HOME"), "examples/src/main/resources/people.json")
> peopleDF <- jsonFile(sqlContext, path)
Warning message:
'jsonFile' is deprecated.
Use 'read.json' instead.
See help("Deprecated")
> printSchema(peopleDF)
root
|-- age: long (nullable = true)
|-- name: string (nullable = true)
> registerTempTable(peopleDF, "people")
> teenagers <- sql(sqlContext, "SELECT name FROM people WHERE age >= 13 AND age <= 19")
> teenagersLocalDF <- collect(teenagers)
> print(teenagersLocalDF)
name
1 Justin
> sparkR.stop()
만약 library(sparkR) 에서 오류가 나면 패스나 빌드가 안된 경우 일 테고, 그외에 로컬이 아닌 하둡내 설치된 Spark 에서 사용하고자 하면 init 할 때 적당히 넣어주면 된다. 만약 계속 사용할 거면 그냥 설치해도 된다.
> install.packages("C:/app/spark/R/lib/sparkr.zip", repos = NULL, type = "win.binary")
> library(SparkR)
'Biz > Analysis' 카테고리의 다른 글
RStudio Server with a Proxy (0) | 2016.07.17 |
---|---|
Shiny에서 SparkR 실행하기 (0) | 2016.03.28 |
Hands-on Tour of Apache Spark in 5 Minutes (0) | 2016.03.25 |
Magellan: Geospatial Analytics on Spark (0) | 2016.03.25 |
RStudio 시작시 오류 (0) | 2014.12.31 |
댓글