r - Extract names of the levels of a factor -
this question has answer here:
- subsetting in data.table 4 answers
i'm trying read huge matrix (2.8gb) in r, thus, far, best have found is
require(data.table) dt<-fread("bigmatrix.csv")
of know nothing!
after i'm able tell matrix has 3 columns , 50 milion rows.
each row of type
object1 object 2 distance 1: kho.central_khoisan.gwi kho.central_khoisan.gwi 0.0000000 2: kho.central_khoisan.gwi kho.central_khoisan.gxana 0.2195843 3: kho.central_khoisan.gwi kho.central_khoisan.khoekhoegowab 0.6749363 4: kho.central_khoisan.gwi kho.central_khoisan.khwe 0.6089206 5: kho.central_khoisan.gwi kho.central_khoisan.korana 0.7163111 6: kho.central_khoisan.gwi kho.central_khoisan.kwadi 0.8017179
so it's comparing distances of 2 objects pairwise approximately 6900 objects
now comes problem:
i want excract pairwise comparison of 41 objects. don't know how guy gave me dataset has called these 41 objects!!
so solution find levels of dt$object1, write them in file , scan them find 41 need, how can it?
i tried
foo<-factor(dt$object1)
so when call
foo .... 6895 levels: aa.beja.beja aa.beja.beja_2 aa.berber.awjilah ... zun.zuni.zuni
but
foo$levels
gives me error!
i'm sure there smarter way in c++ (i.e. loop on each row, insert name of object 1 in vector of strings if it's not present yet), how do it?
edit: question arose:
i have identified 41 objects need, how exctract data.table rows relevant me?
i can store names of objects in data frame or vector
try: levels(as.factor(dt$object1))
Comments
Post a Comment